apache · fresh-borzoni · Feb 8, 2026 · Feb 9, 2026 · Feb 9, 2026 · Feb 9, 2026
diff --git a/bindings/python/API_REFERENCE.md b/bindings/python/API_REFERENCE.md
@@ -0,0 +1,278 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+# Python API Reference
+
+Complete API reference for the Fluss Python client. For a usage guide with examples, see the [Python Client Guide](README.md).
+
+## `Config`
+
+| Method / Property | Description |
+|---|---|
+| `Config(properties: dict = None)` | Create config from a dict of key-value pairs |
+| `.bootstrap_server` | Get/set coordinator server address |
+| `.request_max_size` | Get/set max request size in bytes |
+| `.writer_batch_size` | Get/set write batch size in bytes |
+
+## `FlussConnection`
+
+| Method | Description |
+|---|---|
+| `await FlussConnection.connect(config) -> FlussConnection` | Connect to a Fluss cluster |
+| `await conn.get_admin() -> FlussAdmin` | Get admin interface |
+| `await conn.get_table(table_path) -> FlussTable` | Get a table for read/write operations |
+| `conn.close()` | Close the connection |
+
+Supports `with` statement (context manager).
+
+## `FlussAdmin`
+
+| Method | Description |
+|---|---|
+| `await create_database(name, ignore_if_exists=False, database_descriptor=None)` | Create a database |
+| `await drop_database(name, ignore_if_not_exists=False, cascade=True)` | Drop a database |
+| `await list_databases() -> list[str]` | List all databases |
+| `await database_exists(name) -> bool` | Check if a database exists |
+| `await get_database_info(name) -> DatabaseInfo` | Get database metadata |
+| `await create_table(table_path, table_descriptor, ignore_if_exists=False)` | Create a table |
+| `await drop_table(table_path, ignore_if_not_exists=False)` | Drop a table |
+| `await get_table(table_path) -> TableInfo` | Get table metadata |
+| `await list_tables(database_name) -> list[str]` | List tables in a database |
+| `await table_exists(table_path) -> bool` | Check if a table exists |
+| `await list_offsets(table_path, bucket_ids, offset_type, timestamp=None) -> dict[int, int]` | Get offsets for buckets |
+| `await list_partition_offsets(table_path, partition_name, bucket_ids, offset_type, timestamp=None) -> dict[int, int]` | Get offsets for a partition's buckets |
+| `await create_partition(table_path, partition_spec, ignore_if_exists=False)` | Create a partition |
+| `await drop_partition(table_path, partition_spec, ignore_if_not_exists=False)` | Drop a partition |
+| `await list_partition_infos(table_path) -> list[PartitionInfo]` | List partitions |
+| `await get_latest_lake_snapshot(table_path) -> LakeSnapshot` | Get latest lake snapshot |
+
+## `FlussTable`
+
+| Method | Description |
+|---|---|
+| `new_scan() -> TableScan` | Create a scan builder |
+| `await new_append_writer() -> AppendWriter` | Create writer for log tables |
+| `new_upsert(columns=None, column_indices=None) -> UpsertWriter` | Create writer for PK tables (optionally partial) |
+| `new_lookup() -> Lookuper` | Create lookuper for PK tables |
+| `get_table_info() -> TableInfo` | Get table metadata |
+| `get_table_path() -> TablePath` | Get table path |
+| `has_primary_key() -> bool` | Check if table has a primary key |
+
+## `TableScan`
+
+| Method | Description |
+|---|---|
+| `.project(indices) -> TableScan` | Project columns by index |
+| `.project_by_name(names) -> TableScan` | Project columns by name |
+| `await .create_log_scanner() -> LogScanner` | Create record-based scanner (for `poll()`) |
+| `await .create_batch_scanner() -> LogScanner` | Create batch-based scanner (for `poll_arrow()`, `to_arrow()`, etc.) |
+
+## `AppendWriter`
+
+| Method | Description |
+|---|---|
+| `.append(row) -> WriteResultHandle` | Append a row (dict, list, or tuple) |
+| `.write_arrow(table)` | Write a PyArrow Table |
+| `.write_arrow_batch(batch) -> WriteResultHandle` | Write a PyArrow RecordBatch |
+| `.write_pandas(df)` | Write a Pandas DataFrame |
+| `await .flush()` | Flush all pending writes |
+
+## `UpsertWriter`
+
+| Method | Description |
+|---|---|
+| `.upsert(row) -> WriteResultHandle` | Upsert a row (insert or update by PK) |
+| `.delete(pk) -> WriteResultHandle` | Delete a row by primary key |
+| `await .flush()` | Flush all pending operations |
+
+## `WriteResultHandle`
+
+| Method | Description |
+|---|---|
+| `await .wait()` | Wait for server acknowledgment of this write |
+
+## `Lookuper`
+
+| Method | Description |
+|---|---|
+| `await .lookup(pk) -> dict \| None` | Lookup a row by primary key |
+
+## `LogScanner`
+
+| Method | Description |
+|---|---|
+| `.subscribe(bucket_id, start_offset)` | Subscribe to a bucket |
+| `.subscribe_buckets(bucket_offsets)` | Subscribe to multiple buckets (`{bucket_id: offset}`) |
+| `.subscribe_partition(partition_id, bucket_id, start_offset)` | Subscribe to a partition bucket |
+| `.subscribe_partition_buckets(partition_bucket_offsets)` | Subscribe to multiple partition+bucket combos (`{(part_id, bucket_id): offset}`) |
+| `.unsubscribe_partition(partition_id, bucket_id)` | Unsubscribe from a partition bucket |
+| `.poll(timeout_ms) -> list[ScanRecord]` | Poll individual records (record scanner only) |
+| `.poll_arrow(timeout_ms) -> pa.Table` | Poll as Arrow Table (batch scanner only) |
+| `.poll_batches(timeout_ms) -> list[RecordBatch]` | Poll batches with metadata (batch scanner only) |
+| `.to_arrow() -> pa.Table` | Read all subscribed data as Arrow Table (batch scanner only) |
+| `.to_pandas() -> pd.DataFrame` | Read all subscribed data as DataFrame (batch scanner only) |
+
+## `ScanRecord`
+
+| Property | Description |
+|---|---|
+| `.bucket -> TableBucket` | Bucket this record belongs to |
+| `.offset -> int` | Record offset in the log |
+| `.timestamp -> int` | Record timestamp |
+| `.change_type -> ChangeType` | Change type (AppendOnly, Insert, UpdateBefore, UpdateAfter, Delete) |
+| `.row -> dict` | Row data as `{column_name: value}` |
+
+## `RecordBatch`
+
+| Property | Description |
+|---|---|
+| `.batch -> pa.RecordBatch` | Arrow RecordBatch data |
+| `.bucket -> TableBucket` | Bucket this batch belongs to |
+| `.base_offset -> int` | First record offset |
+| `.last_offset -> int` | Last record offset |
+
+## `Schema`
+
+| Method | Description |
+|---|---|
+| `Schema(schema: pa.Schema, primary_keys=None)` | Create from PyArrow schema |
+| `.get_column_names() -> list[str]` | Get column names |
+| `.get_column_types() -> list[str]` | Get column type names |
+
+## `TableDescriptor`
+
+| Method | Description |
+|---|---|
+| `TableDescriptor(schema, *, partition_keys=None, bucket_count=None, bucket_keys=None, comment=None, log_format=None, kv_format=None, properties=None, custom_properties=None)` | Create table descriptor |
+| `.get_schema() -> Schema` | Get the schema |
+
+## `TablePath`
+
+| Method / Property | Description |
+|---|---|
+| `TablePath(database, table)` | Create a table path |
+| `.database_name -> str` | Database name |
+| `.table_name -> str` | Table name |
+
+## `TableInfo`
+
+| Property / Method | Description |
+|---|---|
+| `.table_id -> int` | Table ID |
+| `.table_path -> TablePath` | Table path |
+| `.num_buckets -> int` | Number of buckets |
+| `.schema_id -> int` | Schema ID |
+| `.comment -> str \| None` | Table comment |
+| `.created_time -> int` | Creation timestamp |
+| `.modified_time -> int` | Last modification timestamp |
+| `.get_primary_keys() -> list[str]` | Primary key columns |
+| `.get_partition_keys() -> list[str]` | Partition columns |
+| `.get_bucket_keys() -> list[str]` | Bucket key columns |
+| `.has_primary_key() -> bool` | Has primary key? |
+| `.is_partitioned() -> bool` | Is partitioned? |
+| `.get_schema() -> Schema` | Get table schema |
+| `.get_column_names() -> list[str]` | Column names |
+| `.get_column_count() -> int` | Number of columns |
+| `.get_properties() -> dict` | All table properties |
+| `.get_custom_properties() -> dict` | Custom properties only |
+
+## `PartitionInfo`
+
+| Property | Description |
+|---|---|
+| `.partition_id -> int` | Partition ID |
+| `.partition_name -> str` | Partition name |
+
+## `DatabaseDescriptor`
+
+| Method / Property | Description |
+|---|---|
+| `DatabaseDescriptor(comment=None, custom_properties=None)` | Create descriptor |
+| `.comment -> str \| None` | Database comment |
+| `.get_custom_properties() -> dict` | Custom properties |
+
+## `DatabaseInfo`
+
+| Property / Method | Description |
+|---|---|
+| `.database_name -> str` | Database name |
+| `.created_time -> int` | Creation timestamp |
+| `.modified_time -> int` | Last modification timestamp |
+| `.get_database_descriptor() -> DatabaseDescriptor` | Get descriptor |
+
+## `LakeSnapshot`
+
+| Property / Method | Description |
+|---|---|
+| `.snapshot_id -> int` | Snapshot ID |
+| `.table_buckets_offset -> dict[TableBucket, int]` | All bucket offsets |
+| `.get_bucket_offset(bucket) -> int \| None` | Get offset for a bucket |
+| `.get_table_buckets() -> list[TableBucket]` | Get all buckets |
+
+## `TableBucket`
+
+| Method / Property | Description |
+|---|---|
+| `TableBucket(table_id, bucket)` | Create non-partitioned bucket |
+| `TableBucket.with_partition(table_id, partition_id, bucket)` | Create partitioned bucket |
+| `.table_id -> int` | Table ID |
+| `.bucket_id -> int` | Bucket ID |
+| `.partition_id -> int \| None` | Partition ID (None if non-partitioned) |
+
+## `FlussError`
+
+| Property | Description |
+|---|---|
+| `.message -> str` | Error message |
+
+Raised for all Fluss-specific errors (connection failures, table not found, schema mismatches, etc.). Inherits from `Exception`.
+
+## Constants
+
+| Constant | Value | Description |
+|---|---|---|
+| `fluss.EARLIEST_OFFSET` | `-2` | Start reading from earliest available offset |
+| `fluss.LATEST_OFFSET` | `-1` | Start reading from latest offset (only new records) |
+| `fluss.OffsetType.EARLIEST` | `"earliest"` | For `list_offsets()` |
+| `fluss.OffsetType.LATEST` | `"latest"` | For `list_offsets()` |
+| `fluss.OffsetType.TIMESTAMP` | `"timestamp"` | For `list_offsets()` with timestamp |
+
+## `ChangeType`
+
+| Value | Short String | Description |
+|---|---|---|
+| `ChangeType.AppendOnly` (0) | `+A` | Append-only |
+| `ChangeType.Insert` (1) | `+I` | Insert |
+| `ChangeType.UpdateBefore` (2) | `-U` | Previous value of updated row |
+| `ChangeType.UpdateAfter` (3) | `+U` | New value of updated row |
+| `ChangeType.Delete` (4) | `-D` | Delete |
+
+## Data Types
+
+| PyArrow Type | Fluss Type | Python Type |
+|---|---|---|
+| `pa.boolean()` | Boolean | `bool` |
+| `pa.int8()` / `int16()` / `int32()` / `int64()` | TinyInt / SmallInt / Int / BigInt | `int` |
+| `pa.float32()` / `float64()` | Float / Double | `float` |
+| `pa.string()` | String | `str` |
+| `pa.binary()` | Bytes | `bytes` |
+| `pa.date32()` | Date | `datetime.date` |
+| `pa.time32("ms")` | Time | `datetime.time` |
+| `pa.timestamp("us")` | Timestamp (NTZ) | `datetime.datetime` |
+| `pa.timestamp("us", tz="UTC")` | TimestampLTZ | `datetime.datetime` |
+| `pa.decimal128(precision, scale)` | Decimal | `decimal.Decimal` |
diff --git a/bindings/python/DEVELOPMENT.md b/bindings/python/DEVELOPMENT.md
@@ -0,0 +1,114 @@
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+-->
+
+# Development
+
+## Requirements
+
+- Python 3.9+
+- Rust 1.70+
+- [uv](https://docs.astral.sh/uv/) package manager
+- Linux or MacOS
+
+> **Before you start:**
+> Please make sure you can successfully build and run the [Fluss Rust client](../../crates/fluss/README.md) on your machine.
+> The Python bindings require a working Fluss Rust backend and compatible environment.
+
+## Install Development Dependencies
+
+```bash
+cd bindings/python
+uv sync --all-extras
+```
+
+## Build Development Version
+
+```bash
+source .venv/bin/activate
+uv run maturin develop
+```
+
+## Build Release Version
+
+```bash
+uv run maturin build --release
+```
+
+## Code Formatting and Linting
+
+```bash
+uv run ruff format python/
+uv run ruff check python/
+```
+
+## Type Checking
+
+```bash
+uv run mypy python/
+```
+
+## Run Examples
+
+```bash
+uv run python example/example.py
+```
+
+## Build API Docs
+
+```bash
+uv run pdoc fluss
+```
+
+## Release
+
+```bash
+# Build wheel
+uv run maturin build --release
+
+# Publish to PyPI
+uv run maturin publish
+```
+
+## Project Structure
+
+```
+bindings/python/
+├── Cargo.toml            # Rust dependency configuration
+├── pyproject.toml         # Python project configuration
+├── README.md              # User guide
+├── DEVELOPMENT.md         # This file
+├── API_REFERENCE.md       # API reference
+├── src/                   # Rust source code (PyO3 bindings)
+│   ├── lib.rs
+│   ├── config.rs
+│   ├── connection.rs
+│   ├── admin.rs
+│   ├── table.rs
+│   └── error.rs
+├── fluss/                 # Python package
+│   ├── __init__.py
+│   ├── __init__.pyi       # Type stubs
+│   └── py.typed
+└── example/
+    └── example.py
+```
+
+## License
+
+Apache 2.0 License