This repository contains Bash scripts designed to automate data ETL, file management tasks. Each script demonstrates shell scripting techniques for automation, data organization, and reproducible workflows.
.
βββ scripts/ # Downloaded raw files
βββ data/
β βββ raw/ # Downloaded raw files
β βββ transformed/ # After field transformation
β βββ gold/ # After filtering
β βββ json_and_CSV/ # JSON and CSV files ready for import
βββ logs/
βββ .env # env file
βββ README.md
A lightweight ETL (Extract, Transform, Load) process implemented entirely in Bash.
Workflow
Extract β Downloads a dataset from a given web URL.
Transform β Performs transformations (e.g., column renaming, filtering, cleanup).
Load β Saves intermediate and final data into dedicated directories representing transformation layers.
A utility script that scans a parent directory for data files and organizes them by type.
Features
Detects .csv and .json files.
Moves each file to dedicated folder for both csv and json types
Skips if no matching files are found.
This script ingests multiple CSV files into a PostgreSQL database.
Workflow
Scan Directory β Iterates through all .csv files in a given folder.
Database Connection β Connects to a PostgreSQL instance using provided credentials.
Table Management β For each file:
Creates a corresponding table (if it doesnβt exist).
Loads the CSV content into the table.
