A command-line interface (CLI) for managing Home Assistant sensor data, addressing gaps in the built-in data management capabilities of statistics data in the Recorder. This tool helps with data exploration, analysis, modification, and migration tasks that aren't available in the Home Assistant UI.
- Storage Analysis: Find which sensors are using the most database space
- Data Cleanup: Remove or modify incorrect sensor readings
- Data Migration: Export data from one sensor and import to another or merge data between instances
- Value Filtering: Find readings outside normal ranges
- Offline Modifications: Copy database to a PC, modify data safely, and apply changes without typing SQL
- Missing Data Recovery: Fix gaps in energy or sensor data by adding missing records
- External Visualization and Analysis: Extract data for use with external tools (e.g. Excel)
- Status: Examine database structure, size, and record counts
- List: List all entities with storage metrics and time ranges
- Export: Export entity data with flexible filtering options
- Import: Import and modify sensor data with safety features
ALWAYS BACK UP YOUR HOME ASSISTANT DATABASE BEFORE MAKING MODIFICATIONS.
This tool can make permanent changes to your Home Assistant database. While it includes safety features like --dry-run, unintended modifications could affect your Home Assistant installation or cause data loss. It's strongly recommended to:
- Create a complete backup of your Home Assistant instance
- Work on a copy of your database when possible
- Use the
--dry-runoption to preview changes before applying them
# Clone the repository
git clone https://github.com/naevtamarkus/homeassistant-statistics-cli.git
cd homeassistant-statistics-cli
# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install click sqlalchemy tabulateBy default, the tool looks for a Home Assistant database at ./home-assistant_v2.db. You can specify a different database location using the --db-url option or by setting the HA_DB_URL environment variable:
# Using --db-url option
python cli_ha_statistics.py --db-url="sqlite:///path/to/home-assistant_v2.db" status
# Using environment variable
export HA_DB_URL="sqlite:///path/to/home-assistant_v2.db"
python cli_ha_statistics.py statusGet an overview of your database tables, including row counts and size estimates:
python cli_ha_statistics.py statusExample output:
Database type: sqlite, schema 50
Time: 2025-05-17 12:34:56 UTC
----------------------------------------------------------------------
Table Rows Cols Records % total ~ MB
events 52394 6 314364 34.5% 2.4
statistics 120840 9 1087560 25.7% 8.3
states 45983 8 367864 12.2% 2.8
statistics_short_term 28475 9 256275 7.4% 2.0
...
----------------------------------------------------------------------
TOTAL RECORDS: 3,125,489
TOTAL SIZE: 23.80 MB
List entities in the database with stats about record counts and storage:
python cli_ha_statistics.py list [--sort COLUMN] [--reverse] [--csv] [--after DATE] [--before DATE]Options:
--sort: Sort by column (count,first,last, orkb)--reverse: Reverse sort order--csv: Output in CSV format--after: Only include records after this date--before: Only include records before this date
Example output:
ID Entity Count First Last ~ KB Unit
2 sensor.temperature_living_room 86400 2025-01-01 00:00:00 2025-05-17 12:30:00 775.2 Β°C
3 sensor.humidity_bathroom 84960 2025-01-01 00:00:00 2025-05-17 12:30:00 764.6 %
7 sensor.power_consumption 43200 2025-03-01 00:00:00 2025-05-17 12:30:00 388.8 W
...
Export records for specific entities with filtering options:
python cli_ha_statistics.py export ENTITY [ENTITY...] [--above VALUE] [--below VALUE] [--after DATE] [--before DATE] > export.csvOptions:
--above VALUE: Only include records with values strictly above threshold--below VALUE: Only include records with values strictly below threshold--after DATE: Only include records after this date--before DATE: Only include records before this date
Example:
# Export temperature readings above 25Β°C from April 2025
python cli_ha_statistics.py export sensor.temperature_living_room --above 25 --after "2025-04-01" > high_temps.csvImport (and modify) records from a CSV file:
python cli_ha_statistics.py import file.csv [--dry-run]Options:
--dry-run: Preview changes without modifying the database
The import function performs an "upsert" operation - it updates existing records or inserts new ones, and can delete records with matching IDs when appropriate fields are provided.
python cli_ha_statistics.py list --sort kbpython cli_ha_statistics.py export sensor.temperature_living_room --above 30 --below 10 > abnormal_temps.csv-
Export the data you need to modify:
python cli_ha_statistics.py export sensor.temperature_living_room --after "2025-05-01" > temp.csv
-
Edit the CSV file to correct values
-
Preview changes without modifying the database:
python cli_ha_statistics.py import temp.csv --dry-run
-
Apply the changes:
python cli_ha_statistics.py import temp.csv
-
Export a time range containing the gap:
python cli_ha_statistics.py export sensor.energy_consumption --after "2025-04-01" --before "2025-04-30" > energy.csv
-
Beware that, depending on the time range, your CSV file may contain entries for both
statisticsandstatistics_short_termtables. You might want to ignore or remove data from thestatistics_short_termif you're only interested in long-term statistics. -
In Excel/spreadsheet software:
- Identify missing time periods
- Add new rows without IDs (leave ID field empty)
- Set appropriate timestamps and values
- Save the CSV
-
Preview and import the fixed data:
python cli_ha_statistics.py import energy.csv --dry-run python cli_ha_statistics.py import energy.csv
For example, when you replace a sensor with another and want to move the history from the old sensor into the new one:
-
Export data from the old sensor:
python cli_ha_statistics.py export sensor.old_temperature > old_sensor.csv
-
In a text editor or spreadsheet, edit the
old_sensor.csvfile:- Remove the rows you don't want to migrate to the new sensor. This might include all data from the
statistics_short_termtable. - Change the metadata_id of all entries of the old sensor to match that of the new sensor. You can see the ID of the new sensor with the
listcommand.
- Remove the rows you don't want to migrate to the new sensor. This might include all data from the
-
Import the modified data:
python cli_ha_statistics.py import old_sensor.csv --dry-run python cli_ha_statistics.py import old_sensor.csv
For safer modifications on a production system:
-
Copy your Home Assistant database to a separate computer:
scp homeassistant@homeassistant:/config/home-assistant_v2.db ./
-
Work locally and eventually run the import with dry-run to generate SQL statements:
python cli_ha_statistics.py import fixes.csv --dry-run > sql_fixes.sql -
Review the SQL statements for correctness
-
Apply the SQL directly to your production database:
# For SQLite, if the file has less than 1000 transactions: sqlite3 /path/to/home-assistant_v2.db < sql_fixes.sql # If the file has more than 1000 transactions, avoid DB locking issues by splitting it first: split -l 1000 sql_fixes.sql sql_ sqlite3 /path/to/home-assistant_v2.db < sql_1; sleep 3; sqlite3 /path/to/home-assistant_v2.db < sql_2; sleep 3; ... # Or through Home Assistant's database shell ha database execute < sql_fixes.sql
For more details on the Home Assistant database schema, refer to the official Home Assistant documentation.
Home Assistant's database structure is key to effectively using this tool. The statistics-related tables store all sensor measurements for historical data and long-term trends.
Home Assistant might change the DB format from time to time. This documentation and the CLI itself is aware of the schema of the DB up to a certain number. If the schema changed in the meantime, a warning message will be displayed.
The statistics data is primarily stored in three tables:
-
statistics_meta: Contains metadata about each entity
id: The primary key (metadata_id used in other tables)statistic_id: The entity ID (e.g.,sensor.temperature_living_room)unit_of_measurement: The unit used for readings (e.g., Β°C, kWh)source: The source of the statistics (usually "recorder")
-
statistics: Contains hourly aggregated data for long-term storage
id: The primary key for each recordmetadata_id: References the entity in statistics_metastart_ts: Unix timestamp for the start of the measurement periodcreated_ts: When the record was createdmean,min,max: Statistical values for the periodsum: Cumulative value (commonly used for energy, water consumption)state: Last known state value- Other fields related to measurement characteristics
-
statistics_short_term: Contains 5-minute aggregated data for recent history
- Same structure as the statistics table
- Data typically gets purged after a configurable retention period
The database uses Unix timestamps (seconds since January 1, 1970):
start_ts: Beginning of the measurement periodcreated_ts: When the record was created (often slightly later than start_ts)last_reset_ts: For accumulating sensors (like energy), when the counter was last reset
When exporting data to CSV format, these timestamps are converted to human-readable dates in the format YYYY-MM-DD HH:MM:SS, and stored in the third column date. This field is provided only for human convenience and is ignored in the import process.
When you export data, each row contains:
table,entity,date,id,metadata_id,created_ts,start_ts,mean,min,max,last_reset,last_reset_ts,state,sum
statistics,sensor.temperature_living_room,2025-05-17 12:00:00,1240825,342,1747238410.2970626,1747234800.0,21.5,21.0,22.0,,,,
[...]
Field definitions:
table: Which table the record belongs to (statisticsorstatistics_short_term).entity: The entity ID (for information only, not used during import)date: Human-readable version of start_ts (for information only, not used during import)id: Database primary key (required for updates and deletes)metadata_id: References the entity definition in statistics_meta. This field is always requiredcreated_ts: When the record was created (Unix timestamp)start_ts: Start of the measurement period (Unix timestamp).mean: Average value during the periodmin: Minimum value during the periodmax: Maximum value during the periodlast_reset: Value at the last counter reset (for accumulating sensors)last_reset_ts: When the counter was last resetstate: Last known statesum: Cumulative value (for accumulating sensors)
When importing data:
-
For updates (modifying existing records):
table: Required to identify which table to updateid: Required to identify which record to update- Value fields you want to modify (
mean,min,max,state,sum, etc.)
-
For inserts (adding new records):
table: Requiredmetadata_id: Required to identify which entity the data belongs tostart_ts: Required to specify when the measurement occurredcreated_ts: Typically set to the same value as start_ts if not specified- Value fields for the new record
-
For deletes (removing records):
table: Requiredid: Required to identify which record to delete- No value fields should be included
Different types of sensors typically use different fields:
-
Regular sensors (temperature, humidity, etc.):
- Use
mean,min, andmaxfields stateis sometimes populatedsumis usually empty
- Use
-
Accumulating sensors (energy, water, etc.):
- Use
sumas the primary value - May have
last_resetandlast_reset_tspopulated mean,min,maxmight all be the same value
- Use
-
State sensors (switches, modes):
- Use
statefield - Other fields might be empty or populated with the same value
- Use
Understanding these fields helps you correctly modify data without introducing inconsistencies that might affect Home Assistant's functionality.
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the project
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Home Assistant for the amazing home automation platform
- SQLAlchemy for the database toolkit
- Click for the command-line interface
- Tabulate for pretty-printing tabular data