The marketing team of a sales company manually updates a Google Sheet with important data every day. However, this data is siloed in Drive and not easily accessible to the ML and analytics teams. Manual handling introduces risks like inconsistency, delays, and human error β all of which slow down data-driven decision-making.
To address this, I built an automated ETL pipeline that bridges the gap between manual spreadsheet entry and cloud-based data platforms.
- π A
pythonscript that connects to google drive API leveraginggspreadlibrary - π Scans for target spreadsheet's existence, opens and extract data from target sheet.
- π Formats extracted data as
pandasdataframe - π§Ή Dataframe columns are further extracted and formated as
snake_case - βοΈ Loads transformed dataframe to Amazon s3 bucket via
awswrangler - β° Tasks to run daily, fully automated by Airflow.
- π
Timestamps: Each ingestion run stores a uniquely timestamped copy of the data, enabling version control, historical tracking, and easy integration with time-based analytics.

π Tech Stack: Python3.12 | Apache Airflow | Docker | Boto3 | Google Drive API | Boto3
-
Prerequisite:
- running
docker desktopordocker engineto containerize Airflow python 3.12- Google service account credentials
- Amazon s3 bucket and IAM User access key
- running
-
Clone this repository to your local machine
git clone https://github.com/Taofeecoh/google_drive_API.git
-
Change to project's directory
cd google_drive_API/orchestration -
Create virtual environment
python3 -m venv .venv
-
Activate environment
source .venv/bin/activate # On Linux source .venv\Scripts\activate # On windows
-
Change your working directory to the project's orchestration directory
cd google_drive_API/orchestration -
Install requirements
pip install apache-airflow==2.11.0 pip install -r requirements.txt
-
Configure airflow
mkdir -p ./dags ./logs ./plugins ./config
echo -e "AIRFLOW_UID=$(id -u)" > .env # if you don't have an .env file echo -e "AIRFLOW_UID=$(id -u)" >> .env # append if you have an .env file
-
Spin up airflow docker containers
docker compose up airflow-init
docker compose up -d
-
Log into Airflow UI at
localhost:8080and defineairflowvariables :AWS_KEY_IDAWS_SECRET_KEYCREDENTIALS_AIRFLOW_GSERVICE
-
Trigger
dagonAirflow UIand confirm object storage in destination:AWS s3.
Contributions are welcome! Please fork the repository and submit a pull request with your enhancements.
