Skip to content

Conversation

@CTTY
Copy link
Contributor

@CTTY CTTY commented Oct 17, 2025

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

Added some tests for the new storage builder and registry, but mostly relying on the existing tests

}

#[async_trait]
impl Storage for OpenDALGcsStorage {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be pretty annoying to implement nearly the same thing for all storage services, can we avoid that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this thread is relavant: https://docs.google.com/document/d/1-CEvRvb52vPTDLnzwJRBx5KLpej7oSlTu_rg0qKEGZ8/edit?disco=AAABrRO9Prk

The benefit of doing this is that users are allowed to only implement Storage for certain schemes. The annoying part of having duplicate code for multiple schemes will mostly apply to a versatile storage implementation like OpenDAL, which already has a convenient operator layer. For custom storage, I don't expect them to implement all schemes anyway (I may be wrong on this assumption)

For code duplication, I consider OpenDAL Storage to be the "managed" default storage that lives in this repo and we will have more control over the implementation. Once we have a new crate for each storage implementation(iceberg-storage-opendal), we can add some helpers to reduce the code duplication

/// }
/// ```
#[async_trait]
pub trait Storage: Debug + Send + Sync {
Copy link
Contributor

@liurenjie1024 liurenjie1024 Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following our discussion in last community sync, let's create a new crate called iceberg-storage, and put newly added things into that storage. I think with this approach we don't need to change existing code path, and when we reached consensus on the api, we could replace FileIO in core crate with FileIO in iceberg-storage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I'll start working on this next week

I think I forgot some details from the meeting but I assume eventually Storage trait is still going to live under the core crate just like the existing Catalog trait, right? The iceberg-storage is more of a temporary crate to make review/collaboration easier

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the Storage trait only or the whole storage implementation? I'm thinking about putting them in the iceberg-storage crate forever, sinc this align with our small crate pattern. iceberg-storage has no much dependencies, and could be used standalone. After we split out iceberg-storage, we could even more out of the core crate, like puffin format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on putting the Storage trait in iceberg-storage forever.

@Xuanwo
Copy link
Member

Xuanwo commented Nov 3, 2025

Hi, @liurenjie1024 and @CTTY, I have thinked about this again. I think we can split the concept around fileio and fileio-provider (or just using the words in this PR, Storage and Fileio).

As discussed in #1797, sometimes people don't care about FileIO (the thing in java) at all. They will initiate, build and manage their own IO abstraction and only want to be used as Storage in the iceberg-rust.

So, I think we should make FileIO optional in the future and only depends on Storage. However it can be tricky since we do need the power to build a storage from s3://bucket/name. We can discuss them face to face and come up with a proposal.

@liurenjie1024
Copy link
Contributor

Hi, @liurenjie1024 and @CTTY, I have thinked about this again. I think we can split the concept around fileio and fileio-provider (or just using the words in this PR, Storage and Fileio).

As discussed in #1797, sometimes people don't care about FileIO (the thing in java) at all. They will initiate, build and manage their own IO abstraction and only want to be used as Storage in the iceberg-rust.

So, I think we should make FileIO optional in the future and only depends on Storage. However it can be tricky since we do need the power to build a storage from s3://bucket/name. We can discuss them face to face and come up with a proposal.

I'm a little confused about your point, would you mind to write a more detailed proposal?

@Xuanwo
Copy link
Member

Xuanwo commented Nov 3, 2025

I'm a little confused about your point, would you mind to write a more detailed proposal?

Will do

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make FileIO a Trait

4 participants