Skip to content

fix(python): use resolving storage for python binding#2246

Open
CTTY wants to merge 3 commits intoapache:mainfrom
CTTY:ctty/python-storage
Open

fix(python): use resolving storage for python binding#2246
CTTY wants to merge 3 commits intoapache:mainfrom
CTTY:ctty/python-storage

Conversation

@CTTY
Copy link
Collaborator

@CTTY CTTY commented Mar 17, 2026

Which issue does this PR close?

What changes are included in this PR?

  • Use OpenDalResolvingStorage rather than resolving schemes directly in the python binding

Are these changes tested?

Copy link
Member

@geruh geruh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great addition for the python side! Looks like the cargo file only brings in s3, fs, and memory here but the resolving factory lists more storage backends from opendal.

iceberg-storage-opendal = { path = "../../crates/storage/opendal", features = ["opendal-s3", "opendal-fs", "opendal-memory"] }

@CTTY
Copy link
Collaborator Author

CTTY commented Mar 17, 2026

This looks like a great addition for the python side! Looks like the cargo file only brings in s3, fs, and memory here but the resolving factory lists more storage backends from opendal.

Good point! will fix it

Copy link
Member

@geruh geruh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! All tests pass, built the binding and verified it works e2e with PyIceberg, and file scheme.

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

.map_err(|e| PyRuntimeError::new_err(format!("Invalid table identifier: {e}")))?;

let factory = storage_factory_from_path(&metadata_location)?;
let factory = Arc::new(OpenDalResolvingStorageFactory::new());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love it! its like java's ResolvingFileIO

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can start exporting this as another FileIO for pyiceberg

arrow = { version = "57.1", features = ["pyarrow", "chrono-tz"] }
iceberg = { path = "../../crates/iceberg" }
iceberg-storage-opendal = { path = "../../crates/storage/opendal", features = ["opendal-s3", "opendal-fs", "opendal-memory"] }
iceberg-storage-opendal = { path = "../../crates/storage/opendal", features = ["opendal-memory", "opendal-fs", "opendal-s3", "opendal-gcs", "opendal-oss", "opendal-azdls"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

opendal-all = ["opendal-memory", "opendal-fs", "opendal-s3", "opendal-gcs", "opendal-oss", "opendal-azdls"]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(actually can we just use opendal-all here?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add GCS (Google Cloud Storage) support to Python bindings

3 participants