This project is my learning record of Apache Iceberg. Build a demo data lakehouse based on iceberg and query these iceberg tables with iceberg-supported clients.
Below picture is the data lakehouse project service architecture. Use minio as object storage system and iceberg as open table format. Use Nessie, a open source iceberg catalog server as a single access point for multiple iceberg clients.
This project use 3 iceberg clients like below.
- Start related docker images.
$ docker-compose up
- Prepare related python enviroment
# Need poetry as python package management tool first
$ poetry install
- Download these sample csv files below and place them into
data
folder.
https://www.kaggle.com/datasets/mayurgadekar5555/industrial-equipment-maintenance-data/data
- Run Jupyter notebook
load_data_iceberg.ipynb
trino_iceberg.ipynb
- https://www.dremio.com/blog/intro-to-dremio-nessie-and-apache-iceberg-on-your-laptop/#h-setting-up-dremionessieminio
- https://github.com/wirelessr/trino-iceberg-playground
- https://py.iceberg.apache.org/api/
- https://projectnessie.org/guides/iceberg-rest/
- https://www.dremio.com/blog/intro-to-pyiceberg/
- https://projectnessie.org/iceberg/trino/