This project continuously collects real-time vehicle position data from the City of Albuquerque's public transit API. The data is stored in a PostgreSQL database with the PostGIS extension for powerful geospatial querying. The entire application is containerized using Docker and Docker Compose for easy setup and deployment.
- Continuous Data Collection: A Python script runs in a loop to fetch data every 30 seconds.
- Geospatial Database: Uses PostgreSQL + PostGIS to store location data efficiently.
- Dockerized Environment: Fully containerized for one-command setup and consistent runs.
- Data Export: Includes a utility script to export the collected data to a Parquet file with custom SQL queries.
- Robust and Resilient: Designed to handle network errors and shut down gracefully.
- /abq-transit-project/
- ├── docker-compose.yml # Orchestrates all Docker containers
- ├── db_init/
- │ └── init.sql # Initializes the database table on first run
- └── python_app/
- ├── Dockerfile # Defines the Python application container
- ├── requirements.txt # Python dependencies
- ├── collect_and_load.py # Main script for continuous data collection
- └── export_to_parquet.py # Utility script to export data
- Docker
- Docker Compose
-
Clone or set up the project files according to the structure above.
-
Open a terminal in the root
abq-transit-projectdirectory. -
Build and run the services using Docker Compose. The
-dflag runs the containers in the background (detached mode).docker compose up --build -d
The first time you run this, Docker will build the Python image, create a persistent volume for the database, and run the init.sql script to create the vehicle_snapshots table. The collect_and_load.py script will then start running and collecting data.
You can connect directly to the PostgreSQL database to run custom queries.
-
Open the
psqlshell:docker exec -it postgis_db psql -U myuser -d abq_transit -
Run a query. For example, to find the 5 most recently tracked vehicles:
SELECT vehicle_id, route_short_name, speed_mph, timestamp_collected FROM vehicle_snapshots ORDER BY timestamp_collected DESC LIMIT 5;
-
Exit by typing
\q.
Use the export_to_parquet.py script to save data from the database to a .parquet file. The output file will appear in the python_app directory.
-
Run the Export Script This command executes the script inside the running
python_appcontainer.-
To export the entire database (default query):
docker exec python_app python3 export_to_parquet.py -
To export using a custom query: Use the
--queryflag and wrap your SQL in quotes.docker exec python_app python3 export_to_parquet.py --query "SELECT * FROM vehicle_snapshots WHERE speed_mph > 60;"
-
This is the most common error and occurs if the database starts without running the initialization script. The database logs will contain a line that says Skipping initialization.
This happens when a database volume from a previous failed run already exists. To fix this, you must perform a full reset.
-
Stop the containers:
docker compose down
-
Delete the database volume. This is the critical step. It will not delete your code.
docker volume rm abq-transit-project_postgres_data
-
Start fresh. This will force Docker to create a new database and run the
init.sqlscript.docker compose up --build -d