Skip to content

A modern data lake operations toolkit working with multiple table formats (Delta, Iceberg, Parquet) and engines (Spark, Polars) via the same APIs.

License

Notifications You must be signed in to change notification settings

hoaihuongbk/lakeops

Repository files navigation

LakeOps

PyPI version Python Versions Tests codecov

A modern data lake operations toolkit working with multiple table formats (Delta, Iceberg, Parquet) and engines (Spark, Polars) via the same APIs.

Features

  • Multi-format support: Delta, Iceberg, Parquet
  • Multiple engine backends: Apache Spark, Polars (default)
  • Storage operations: read, write

To learn more, read the user guide.

Quick Start

Installation

pip install lakeops

Sample Usage

from pyspark.sql import SparkSession
from lakeops import LakeOps
from lakeops.core.engine import SparkEngine

# Init Spark session and create LakeOps instance
spark = SparkSession.builder.getOrCreate()
engine = SparkEngine(spark)
ops = LakeOps(engine)

# Read data from table name
df = ops.read("s3://local/test/table", format="parquet")

# Write data to table name
ops.write(df, "s3://local/test/table", format="parquet")

About

A modern data lake operations toolkit working with multiple table formats (Delta, Iceberg, Parquet) and engines (Spark, Polars) via the same APIs.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published