Documentation

InfluxDB v3 storage engine architecture

The InfluxDB v3 storage engine is a real-time, columnar database optimized for time series data built in Rust on top of Apache Arrow and DataFusion. It supports infinite tag cardinality (number of unique tag values), real-time queries, and is optimized to reduce storage cost.

Storage engine diagram

Ingester Querier Object StorageTime series data stored inApache Parquet format CatalogRelational metadataservice Compactor Query yet-to-be-persisted data WALShort-termpersistence Write requests Query requests

Storage engine components

Ingester

The Ingester processes line protocol submitted in write requests and persists time series data to the Object store. In this process, the Ingester does the following:

  • Queries the Catalog to identify where data should be persisted and to ensure the schema of the line protocol is compatible with the schema of persisted data.
  • Accepts or rejects points in the write request and generates a response.
  • Processes line protocol and persists time series data to the Object store in Apache Parquet format. Each Parquet file represents a partition–a logical grouping of data.
  • Makes yet-to-be-persisted data available to Queriers to ensure leading edge data is included in query results.
  • Maintains a short-term write-ahead log (WAL) to prevent data loss in case of a service interruption.
Ingester scaling strategies

The Ingester can be scaled both vertically and horizontally. Horizontal scaling increases write throughput and is typically the most effective scaling strategy for the Ingester.

Querier

The Querier handles query requests and returns query results for requests. It supports both SQL and InfluxQL through Apache Arrow DataFusion.

Query life cycle

At query time, the querier:

  1. Receives the query request and builds a query plan.

  2. Queries the Ingesters to:

    • ensure the schema assumed by the query plan matches the schema of written data
    • include recently written, yet-to-be-persisted data in query results
  3. Queries the Catalog to find partitions in the Object store that contain the queried data.

  4. Reads partition Parquet files that contain the queried data and scans each row to filter data that matches predicates in the query plan.

  5. Performs any additional operations (for example: deduplicating, merging, and sorting) specified in the query plan.

  6. Returns the query result to the client.

Querier scaling strategies

The Querier can be scaled both vertically and horizontally. Horizontal scaling increases query throughput to handle more concurrent queries. Vertical scaling improves the Querier’s ability to process computationally intensive queries.

Catalog

The Catalog is a PostgreSQL-compatible relational database that stores metadata related to your time series data including schema information and physical locations of partitions in the Object store. It fulfills the following roles:

  • Provides information about the schema of written data.
  • Tells the Ingester what partitions to persist data to.
  • Tells the Querier what partitions contain the queried data.
Catalog scaling strategies

Scaling strategies available for the Catalog depend on the PostgreSQL-compatible database used to run the catalog. All support vertical scaling. Most support horizontal scaling for redundancy and failover.

Object store

The Object store contains time series data in Apache Parquet format. Each Parquet file represents a partition. By default, InfluxDB partitions tables by day, but you can customize the partitioning strategy. Data in each Parquet file is sorted, encoded, and compressed.

Object store scaling strategies

Scaling strategies available for the Object store depend on the underlying object storage services used to run the object store. Most support horizontal scaling for redundancy, failover, and increased capacity.

Compactor

The Compactor processes and compresses partitions in the Object store to continually optimize storage. It then updates the Catalog with locations of compacted data.

Compactor scaling strategies

The Compactor can be scaled both vertically and horizontally. Because compaction is a compute-heavy process, vertical scaling (especially increasing the available CPU) is the most effective scaling strategy for the Compactor. Horizontal scaling increases compaction throughput, but not as efficiently as vertical scaling.


Scaling strategies

The following scaling strategies can be applied to components of the InfluxDB v3 storage architecture.

For information about scaling your InfluxDB Cloud Dedicated infrastructure, contact InfluxData support.

Vertical scaling

Vertical scaling (also known as “scaling up”) involves increasing the resources (such as RAM or CPU) available to a process or system. Vertical scaling is typically used to handle resource-intensive tasks that require more processing power.

Horizontal scaling

Horizontal scaling (also known as “scaling out”) involves increasing the number of nodes or processes available to perform a given task. Horizontal scaling is typically used to increase the amount of workload or throughput a system can manage, but also provides additional redundancy and failover.


Was this page helpful?

Thank you for your feedback!


The future of Flux

Flux is going into maintenance mode. You can continue using it as you currently are without any changes to your code.

Flux is going into maintenance mode and will not be supported in InfluxDB 3.0. This was a decision based on the broad demand for SQL and the continued growth and adoption of InfluxQL. We are continuing to support Flux for users in 1.x and 2.x so you can continue using it with no changes to your code. If you are interested in transitioning to InfluxDB 3.0 and want to future-proof your code, we suggest using InfluxQL.

For information about the future of Flux, see the following: