Data ingest lifecycle best practices

Data ingested into InfluxDB must conform to the retention period of the database in which it is stored. Points with timestamps outside of the retention period are no longer queryable, but may still have references maintained in Object storage or the Catalog, resulting in an increase in operational overhead and cost. To reduce these factors, it is important to manage the lifecycle of ingested data.

Use the following best practices to manage the lifecycle of data in your InfluxDB cluster:

Use appropriate retention periods
Tune garbage collection

Use appropriate retention periods

When creating or updating a database, use a retention period that is appropriate for your requirements. Storing data longer than is required adds unnecessary operational cost to your InfluxDB cluster.

Tune garbage collection

Once data falls outside of a database’s retention period, the garbage collection service can remove all artifacts associated with the data from the Catalog and Object store. Tune the garbage collector cutoff period to ensure that data is removed in a timely manner.

Use the following environment variables to tune the garbage collector:

INFLUXDB_IOX_GC_OBJECTSTORE_CUTOFF: the age at which Parquet files not referenced in the Catalog become eligible for deletion from Object storage. The default is 30d.
INFLUXDB_IOX_GC_PARQUETFILE_CUTOFF: how long to retain rows in the Catalog that reference Parquet files marked for deletion. The default is 30d.

These values tune how aggressive the garbage collector can be. A shorter duration value means that files can be removed at a faster pace.

To ensure there is a grace period before files and references are removed, the minimum garbage collector (GC) object store and Parquet file cutoff time is three hours (3h).

We recommend setting these options to a value aligned to your organization’s backup and recovery strategy. For example, a value of 6h (6 hours) would be appropriate for running a lean Catalog that only maintains references to recent data and does not require backups.

Use case examples

Use the following scenarios as a guide for different use cases:

Leading edge data with no backups

Custom backup window with object storage versioning

Custom backup window without object storage versioning

Was this page helpful?

Thank you for your feedback!

Support and feedback

Thank you for being part of our community! We welcome and encourage your feedback and bug reports for InfluxDB and this documentation. To find support, use the following resources:

Customers with an annual or support contract can contact InfluxData Support.

Edit this page Submit docs issue Submit InfluxDB issue

Data ingest lifecycle best practices

Use appropriate retention periods

Tune garbage collection

Use case examples

Support and feedback

The future of Flux

InfluxDB v3 enhancements and InfluxDB Clustered is now generally available

InfluxDB v3 performance and features

InfluxDB Clustered general availability

Data ingest lifecycle best practices

Use appropriate retention periods

Tune garbage collection

Use case examples

Support and feedback

What is your InfluxDB cluster URL?

Enter cluster URL

Thank you for your feedback!

The future of Flux

InfluxDB v3 enhancements and InfluxDB Clustered is now generally available

InfluxDB v3 performance and features

InfluxDB Clustered general availability