Understanding the following concepts will help you get the most out of InfluxDB.
The InfluxDB storage engine and the Time-Structured Merge Tree (TSM) The InfluxDB storage engine looks very similar to a LSM Tree. It has a write ahead log and a collection of read-only data files which are similar in concept to SSTables in an LSM Tree. TSM files contain sorted, compressed series data.
InfluxDB will create a shard for each block of time. For example, if you have a retention policy with an unlimited duration, shards will be created for each 7 day block of time.
What’s in a database? This page gives SQL users an overview of how InfluxDB is like an SQL database and how it’s not. It highlights some of the major distinctions between the two and provides a loose crosswalk between the different database terminologies and query languages.
In general… InfluxDB is designed to work with time-series data. SQL databases can handle time-series but weren’t created strictly for that purpose. In short, InfluxDB is made to store a large volume of time-series data and perform real-time analysis on those data, quickly.
InfluxDB is a time series database. Optimizing for this use case entails some tradeoffs, primarily to increase performance at the cost of functionality. Below is a list of some of those design insights that lead to tradeoffs:
For the time series use case, we assume that if the same data is sent multiple times, it is the exact same data that a client just sent several times.
Pro: Simplified conflict resolution increases write performance.
aggregation An InfluxQL function that returns an aggregated value across a set of points. See InfluxQL Functions for a complete list of the available and upcoming aggregations.
Related entries: function, selector, transformation
batch A collection of points in line protocol format, separated by newlines (0x0A). A batch of points may be submitted to the database using a single HTTP request to the write endpoint. This makes writes via the HTTP API much more performant by drastically reducing the HTTP overhead.
Covers key concepts to learn about InfluxDB.
Covers general guidelines for InfluxDB schema design and data layout.
Time Series Index (TSI) description When InfluxDB ingests data, we store not only the value but we also index the measurement and tag information so that it can be queried quickly. In earlier versions, index data could only be stored in-memory, however, that requires a lot of RAM and places an upper bound on the number of series a machine can hold. This upper bound is usually somewhere between 1 - 4 million series depending on the machine used.
Time Series Index (TSI) In order to support a large number of time series, that is, a very high cardinality in the number of unique time series that the database stores, InfluxData has added the new Time Series Index (TSI). InfluxData supports customers using InfluxDB with tens of millions of time series. InfluxData’s goal, however, is to expand to hundreds of millions, and eventually billions. Using InfluxData’s TSI storage engine, users should be able to have millions of unique time series.