GitHub - flaviumircia/big-data-gps-coordinates: Big Data project built around GPS coordinates

Big Data GPS coordinates processing

This ensures complete reproducibility across Windows, macOS, and Linux.

You need to create a folder called data and to add gps_cleaned.csv inside it
Build and start the container: docker-compose up --build
Open your browser and go to: http://localhost:8888 (you may need to enter a token, the token is generated in docker logs)
Select the Python (gps-analytics) kernel in Jupyter Lab to create a new notebook.

Your notebooks and source code are automatically synced thanks to mounted volumes.

Jupyter Lab is running inside the container, with PySpark and all dependencies pre-installed.

First run "spark_streaming.ipynb" to simulate streaming data processing.
Open grafana at http://localhost:3001 with user: admin and password: admin to visualize streaming data.
Run "spark_batching.ipynb" to process batch data and generate best route for a random set of GPS coordinates.
In grafana you will be able to visualize the route on the map.

Use docker-compose up --build in a terminal to rebuild the whole docker image.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.idea		.idea
db		db
grafana/provisioning		grafana/provisioning
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
environment.yml		environment.yml