Local Data Engineering Toolkit in Python

Note

Hi Followers, Thank you for taking the time to read me. Let me help you understand the scope and progress with better ease below:

Milestones
README.md (This document)
Enhancements
Documentation (Coming Soon)

Updated At Thu 3 Oct 2024

Local Data Engineering Toolkit in Python

Plan

Milestone	Epic	Target Date	Delivery Date	Release Owner	Comment
0.1.0	HelloWorld	1st Oct 24	1st Oct 24	@tusharchou	Good Start
0.1.1	Ingestion	3rd Oct 24	9th Oct 24	@tusharchou	First Sprint
0.1.2	Warehousing	18th Oct 24	TBD	@tusharchou	Coming Soon
1.0.0	Ready for Production	1st Nov 24	TBD	TBD	End Game

Milestone

Local Data Platform

Business information systems require fresh data every day organised in a manner that retrival is cost effective. Making a local data platform requires a setup where you can recreate production usecases and develop new pipelines.

Problem Statement

Question	Answer
What?	a local data platform that can scale up to cloud
Why?	save costs on cloud infra and developement time
When?	start of product development life cycle
Where?	local first
Who?	Business who want a product data platform that will run locally and scale up when the time comes.

A python library that uses open source tools to orchestrate a data platform operations locally for development and testing

Components

Orchestrator
- cron
- Airflow
Source
- APIs
- Files
Target
- Iceberg
- DuckDB
- Space and Time
Catalog
- Rest

Source

Parquet

Data can be available as single file in the source format. For example New York Yellow taxi data is available to be pulled from here

curl https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-01.parquet -o /tmp/yellow_tripdata_2023-01.parquet

local-data-platform/

Target

CSV
Google Sheet
Iceberg

References

iceberg-python near-data-lake duckdb

Self Promotion

Reliable Change Data Capture using Iceberg Introduction to pyiceberg

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
docs		docs
local-data-platform		local-data-platform
tmp/warehouse		tmp/warehouse
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md
README.rst		README.rst
lumache.py		lumache.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Local Data Engineering Toolkit in Python

Plan

Milestone

Local Data Platform

Problem Statement

Components

Source

Parquet

Target

References

Self Promotion

About

Uh oh!

Releases

Packages

Languages

License

brmhastra/local-data-platform

Folders and files

Latest commit

History

Repository files navigation

Local Data Engineering Toolkit in Python

Plan

Milestone

Local Data Platform

Problem Statement

Components

Source

Parquet

Target

References

Self Promotion

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages