InfluxDB Cloud data durability
InfluxDB Cloud replicates all data in the storage tier across two availability zones in a cloud region, automatically creates backups, and verifies that replicated data is consistent and readable.
On this page
InfluxDB Cloud replicates data in both the write tier and the storage tier.
- Write tier: all data written to InfluxDB is processed by a durable message queue. The message queue partitions each batch of points based off series keys and then replicates each partition across other physical nodes in the message queue.
- Storage tier: all data in the underlying storage tier is replicated across two availability zones in a cloud region.
InfluxDB Cloud backs up all data in the following way:
Backup on write
All inbound write requests to InfluxDB Cloud are added to a durable message queue. The message queue does the following:
- Caches the line protocol of each write request.
- Writes data to the storage tier.
- Routinely persists cached line protocol to object storage as an out-of-band backup.
Message queue backups provide raw line protocol that can be used to recover from catastrophic failure in the storage tier or an accidental deletion. The durability of the message queue is 96 hours, meaning InfluxDB Cloud can sustain a failure of its underlying storage tier or object storage services for up to 96 hours without any data loss.
To minimize potential data loss due to defects introduced in the InfluxDB Cloud service, we minimize the code used between the data ingest and backup processes.
Backup after compaction
Periodic TSM snapshots
To provide multiple data recovery points, InfluxDB Cloud takes weekly snapshots of TSM files uploaded to object storage. The TSM snapshot includes a copy of all (non-deleted) data when the snapshot is created. These snapshots are preserved for 100 days.
InfluxDB Cloud uses the following out-of-band backups stored in object storage to recover data:
- Message queue backup: line protocol from inbound write requests within the last 96 hours
- Compaction backup: TSM files
- TSM snapshots: Weekly snapshots of TSM files in objectstore
The Recovery Point Objective (RPO) is any accepted write. The Recovery Time Objective (RTO) is harder to definitively predict as potential failure modes can vary. While most common failure modes can be resolved within minutes or hours, critical failure modes may take longer. For example, if we need to rebuild all data from the TSM snapshots and message queue backup, it could take 24 hours or longer.
InfluxDB Cloud has two data verification services running at all times:
- Entropy detection: ensures that replicated data is consistent
- Data verification: verifies that data written to InfluxDB is readable
InfluxDB Cloud status
InfluxDB Cloud regions and underlying services are monitored at all times. For information about the current status of InfluxDB Cloud, see the InfluxDB Cloud status page.
Was this page helpful?
Thank you for your feedback!
Support and feedback
Thank you for being part of our community! We welcome and encourage your feedback and bug reports for InfluxDB and this documentation. To find support, use the following resources:
Customers with an annual or support contract can contact InfluxData Support.