Note
Hi Followers, Thank you for taking the time to read me. Let me help you understand the scope and progress with better ease below:
- Milestones
- README.md (This document)
- Enhancements
- Documentation (Coming Soon)
Updated At Thu 3 Oct 2024
| Milestone | Epic | Target Date | Delivery Date | Release Owner | Comment |
|---|---|---|---|---|---|
| 0.1.0 | HelloWorld | 1st Oct 24 | 1st Oct 24 | @tusharchou | Good Start |
| 0.1.1 | Ingestion | 3rd Oct 24 | 9th Oct 24 | @tusharchou | First Sprint |
| 0.1.2 | Warehousing | 18th Oct 24 | TBD | @tusharchou | Coming Soon |
| 1.0.0 | Ready for Production | 1st Nov 24 | TBD | TBD | End Game |
-
0.1.0 : Done+ Published Library on PyPI
-
0.1.1 : In Progress- Demo BigQuery compatibility
- 0.1.1 : Done+ Documentation: Updated README to explain clearly problem and plan of excecution
- PR : In Progress- Feature: Simply query NEAR Coin GCP Data Lake through BiqQuery
- PR : In Progress- Feature: Privately store NYC Yellow Taxi Rides Data in Local Data Platform
- FR : In Progress- Change: Easily solve for User's Local Data Need
- IS : In Progress- Documentation: Align on Product Framework
- IS : In Progress- Request: Source Parquet Table
- IS : In Progress- Request: Source Iceberg Table
- IS : In Progress- Request: Target Iceberg Table
- IS : In Progress- Request: Target.put() Iceberg Table
- IS : In Progress- Request: NYCYellowTaxi.rides.put()
- IS : In Progress- Request: NYCYellowTaxi.rides.get()
- IS : In Progress- Request: test.iceberg.exception()
- IS : In Progress- Documentation: NEAR Trader-How to use NEAR Data Lake
- IS : In Progress- Request: Source.get() BigQuery
- IS : To-do- Request: Iceberg Partitioning and Version Control
- IS : To-do- Request: Align on Product Framework
- IS : In Progress- Align on Product Framework
-
0.1.2 : To-do Continuous Integration
-
0.1.9 : To-doLaunch Documentation
-
0.2.0 : To-do Cloud Integration
-
1.0.0 : To-do Demo BigQuery compatibility
Business information systems require fresh data every day organised in a manner that retrival is cost effective. Making a local data platform requires a setup where you can recreate production usecases and develop new pipelines.
| Question | Answer |
|---|---|
| What? | a local data platform that can scale up to cloud |
| Why? | save costs on cloud infra and developement time |
| When? | start of product development life cycle |
| Where? | local first |
| Who? | Business who want a product data platform that will run locally and scale up when the time comes. |
A python library that uses open source tools to orchestrate a data platform operations locally for development and testing
- Orchestrator
- cron
- Airflow
- Source
- APIs
- Files
- Target
- Iceberg
- DuckDB
- Space and Time
- Catalog
- Rest
Data can be available as single file in the source format. For example New York Yellow taxi data is available to be pulled from here
curl https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-01.parquet -o /tmp/yellow_tripdata_2023-01.parquet
local-data-platform/
- CSV
- Google Sheet
- Iceberg
iceberg-python near-data-lake duckdb
Reliable Change Data Capture using Iceberg Introduction to pyiceberg