An ETL pipeline for extracting data from the Federal Election Commission (FEC) and integrating it into a MariaDB database.
FECDataConnect
is designed to fetch, transform, and load data from the FEC into a structured database. This ensures easy accessibility and analysis of election-related data.
- Data Extraction: Automated scraping of FEC data.
- Transformation: Pre-processing and cleaning of raw FEC data to ensure database readiness.
- Loading: Streamlined insertion of the transformed data into a MariaDB database.
- Clone the Repository:
git clone https://github.com/your_username/FECDataConnect.git
- Install Required Libraries:
pip install -r requirements.txt
- Database Configuration: [Instructions for setting up your MariaDB database, configuring user privileges, etc.]
Run the main script to start the ETL process:
python main.py
Contributions are welcome!
- Fork the repository.
- Create your feature branch ('git checkout -b feature/AmazingFeature').
- Commit your changes ('git commit -m 'Add some AmazingFeature').
- Push the branch ('git push origin feature/AmazingFeature').
- Open a pull request.
For major changes, please open an issue first to discuss what you'd like to change.
This project is licensed under the MIT License. For more details, see the LICENSE file in the repository. Contact
FECDataConnect/
│
├── data/
│ ├── raw/ # For storing raw scraped data
│ ├── processed/ # For data that's been cleaned/transformed
│ └── archive/ # For archival purposes (optional)
│
├── src/
│ ├── etl/
│ │ ├── extract.py # Code to extract data
│ │ ├── transform.py # Code to transform data
│ │ └── load.py # Code to load data into MariaDB
│ │
│ ├── utils/ # Helper scripts, utilities, etc.
│ └── config.py # Configuration variables/settings
│
├── logs/ # Directory for logs (if you're logging events/errors)
│
├── tests/ # For unit tests
│
├── .gitignore # Specifies intentionally untracked files to ignore
├── LICENSE
├── README.md
└── requirements.txt # Lists all project dependencies