building a data pipeline with Apache Airflow to load data from multiple sources into a data warehouse
- install docker-descktop link.
- At first you'll need to get the source code of the project. Do this by cloning the ETL-PIP.
git clone https://github.com/ETL-PIP/Pipline.git
- navigate to path the project
cd Pipline
- add execution permission a file entrypoint.sh
chmod +x /script/entrypoint.sh
- install images and run containers on docker
docker-compose up
- Create the .env File
MONGO_URI="mongodb+srv://admin:[email protected]/?retryWrites=true&w=majority"
MYSQL_HOST="mysql_container" # The MySQL container name (ensure Docker setup aligns with this) MYSQL_USER="admin" # MySQL username MYSQL_PASSWORD="your_mysql_password" MYSQL_DATABASE="mydatabase"
DATA_DIR="/opt/airflow/dags/data/"
- Docker
- Docker Compose
- Python 3.8+
- Apache Airflow
- MongoDB Atlas
- MySQL database
Airflow DAG (dags/pipeline_dag.py
)
dashboard the airflow for monitoring
data the all resources in data warehouse