Use pandas to analyze data

Use pandas, the Python data analysis library, to process, analyze, and visualize data stored in an InfluxDB Clustered database.

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

pandas documentation

Install prerequisites
Install pandas
Use PyArrow to convert query results to pandas
Use pandas to analyze data
- View data information and statistics
- Downsample time series

Install prerequisites

The examples in this guide assume using a Python virtual environment and the InfluxDB v3 influxdb3-python Python client library. For more information, see how to get started using Python to query InfluxDB.

Installing influxdb3-python also installs the pyarrow library that provides Python bindings for Apache Arrow.

Install pandas

To use pandas, you need to install and import the pandas library.

In your terminal, use pip to install pandas in your active Python virtual environment:

pip install pandas

Use PyArrow to convert query results to pandas

The following steps use Python, influxdb3-python, and pyarrow to query InfluxDB and stream Arrow data to a pandas DataFrame.

In your editor, copy and paste the following code to a new file–for example, pandas-example.py:

# pandas-example.py

from influxdb_client_3 import InfluxDBClient3
import pandas

# Instantiate an InfluxDB client configured for a database
client = InfluxDBClient3(
  "https://cluster-host.com",
  database="DATABASE_NAME
",
  token="DATABASE_TOKEN
")

# Execute the query to retrieve all record batches in the stream
# formatted as a PyArrow Table.
table = client.query(
  '''SELECT *
    FROM home
    WHERE time >= now() - INTERVAL '90 days'
    ORDER BY time'''
)

client.close()

# Convert the PyArrow Table to a pandas DataFrame.
dataframe = table.to_pandas()

print(dataframe)

Replace the following configuration values:
- DATABASE_NAME: the name of the database to query
- DATABASE_TOKEN: a database token with read permission on the specified database
In your terminal, use the Python interpreter to run the file:
```
python pandas-example.py
```

The example calls the following methods:

InfluxDBClient3.query(): sends the query request and returns a pyarrow.Table that contains all the Arrow record batches from the response stream.
pyarrow.Table.to_pandas(): Creates a pandas.DataFrame from the data in the PyArrow Table.

View example results

Next, use pandas to analyze data.