Skip to content

building a data pipeline with Apache Airflow to load data from multiple sources into a data warehouse

License

Notifications You must be signed in to change notification settings

ETL-PIP/Pipline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pipline

building a data pipeline with Apache Airflow to load data from multiple sources into a data warehouse architecture

Must be you have docker for run containers

  • install docker-descktop link.

Getting started

  • At first you'll need to get the source code of the project. Do this by cloning the ETL-PIP.
git clone https://github.com/ETL-PIP/Pipline.git
  • navigate to path the project
cd Pipline
  • add execution permission a file entrypoint.sh
chmod +x /script/entrypoint.sh
  • install images and run containers on docker
docker-compose up
  • Create the .env File

MongoDB Connection URI

MONGO_URI="mongodb+srv://admin:[email protected]/?retryWrites=true&w=majority"

MySQL Configuration

MYSQL_HOST="mysql_container" # The MySQL container name (ensure Docker setup aligns with this) MYSQL_USER="admin" # MySQL username MYSQL_PASSWORD="your_mysql_password" MYSQL_DATABASE="mydatabase"

Directory for JSON/SQLite files

DATA_DIR="/opt/airflow/dags/data/"

Prerequisites

  • Docker
  • Docker Compose
  • Python 3.8+
  • Apache Airflow
  • MongoDB Atlas
  • MySQL database

Airflow DAG (dags/pipeline_dag.py)

architecture

dashboard the airflow for monitoring

architecture

data the all resources in data warehouse

architecture

About

building a data pipeline with Apache Airflow to load data from multiple sources into a data warehouse

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published