This ensures complete reproducibility across Windows, macOS, and Linux.
-
You need to create a folder called
dataand to addgps_cleaned.csvinside it -
Build and start the container:
docker-compose up --build -
Open your browser and go to: http://localhost:8888 (you may need to enter a token, the token is generated in docker logs)
-
Select the
Python (gps-analytics)kernel in Jupyter Lab to create a new notebook.
Your notebooks and source code are automatically synced thanks to mounted volumes.
Jupyter Lab is running inside the container, with PySpark and all dependencies pre-installed.
-
First run "spark_streaming.ipynb" to simulate streaming data processing.
-
Open grafana at http://localhost:3001 with user: admin and password: admin to visualize streaming data.
-
Run "spark_batching.ipynb" to process batch data and generate best route for a random set of GPS coordinates.
-
In grafana you will be able to visualize the route on the map.
Use docker-compose up --build in a terminal to rebuild the whole docker image.