Understanding the following concepts will help you get the most out of InfluxDB.
The InfluxDB storage engine and the Time-Structured Merge Tree (TSM) The InfluxDB storage engine looks very similar to a LSM Tree. It has a write ahead log and a collection of read-only data files which are similar in concept to SSTables in an LSM Tree. TSM files contain sorted, compressed series data.
InfluxDB will create a shard for each block of time. For example, if you have a retention policy with an unlimited duration, shards will be created for each 7 day block of time.
What’s in a database? This page gives SQL users an overview of how InfluxDB is like an SQL database and how it’s not. It highlights some of the major distinctions between the two and provides a loose crosswalk between the different database terminologies and query languages.
In general… InfluxDB is designed to work with time-series data. SQL databases can handle time-series but weren’t created strictly for that purpose. In short, InfluxDB is made to store a large volume of time-series data and perform real-time analysis on those data, quickly.
InfluxDB is a time series database. Optimizing for this use case entails some tradeoffs, primarily to increase performance at the cost of functionality. Below is a list of some of those design insights that lead to tradeoffs:
For the time series use case, we assume that if the same data is sent multiple times, it is the exact same data that a client just sent several times.
Pro: Simplified conflict resolution increases write performance.
aggregation An InfluxQL function that returns an aggregated value across a set of points. For a complete list of the available and upcoming aggregations, see InfluxQL functions.
Related entries: function, selector, transformation
batch A collection of data points in InfluxDB line protocol format, separated by newlines (0x0A). A batch of points may be submitted to the database using a single HTTP request to the write endpoint. This makes writes using the InfluxDB API much more performant by drastically reducing the HTTP overhead.
Covers key concepts to learn about InfluxDB.
Covers general guidelines for InfluxDB schema design and data layout.
InfluxDB stores measurement and tag information in an index so data can be queried quickly.
In earlier versions, the index was stored in-memory, requiring a lot of RAM and restricting the number of series that a machine could hold (typically, 1-4 million series, depending on the machine).
Time Series Index (TSI) stores index data both in memory and on disk, removing RAM restrictions. This lets you store more series on a machine.
Find overview and background information on Time Series Index (TSI) in this topic. For detail, including how to enable and configure TSI, see Time Series Index (TSI) details.
Overview To support a large number of time series, that is, a very high cardinality in the number of unique time series that the database stores, InfluxData has added the new Time Series Index (TSI). InfluxData supports customers using InfluxDB with tens of millions of time series.