Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: adopt opendal? #7548

Open
skyzh opened this issue Apr 29, 2024 · 9 comments
Open

pageserver: adopt opendal? #7548

skyzh opened this issue Apr 29, 2024 · 9 comments
Labels
t/question Issue type: question

Comments

@skyzh
Copy link
Member

skyzh commented Apr 29, 2024

discussed in #7545, keeping track of vendor sdks and working around them is hard.

it should work as a drop-in replacement of the s3/azure sdks, but need to investigate whether they support time travel queries to facilitate our disaster recovery tools.

@skyzh skyzh added the t/question Issue type: question label Apr 29, 2024
@arpad-m
Copy link
Member

arpad-m commented Apr 29, 2024

As one data point, they seem to require account keys for Azure blob storage, or maybe it's just not in their docs for Azure blob storage. Ideally we don't want that but authenticate from the environment.

For S3, more authentication methods are supported, link. Still, I don't think they support authentication via profiles, which makes life easier. And also (quoting page linked just before):

OpenDAL will not refresh the temporary security credentials, please keep in mind to refresh those credentials in time.

@arpad-m
Copy link
Member

arpad-m commented Apr 29, 2024

I think @koivunej 's proposal was to ditch SDK crates entirely and directly invoke the http calls, so go in the opposite direction from the one proposed in this issue.

@jcsp
Copy link
Contributor

jcsp commented Apr 30, 2024

Looks like S3 versioning support is still a work in progress: apache/opendal#3943

In general I'm a bit cautious about adopting third party crates rather than vendor SDKs: like Arpad says, authentication can be an awkward area -- ideally we should be using libraries that support all the cloud vendor's native auth methods, so that we have flexibilty in how we deploy.

@koivunej
Copy link
Member

koivunej commented May 7, 2024

To update my stance, I think the s3 SDK is looking very good, as we've "recently" observed with its handling of "slow down" and apart from the defaults, it has been working unsurprisingly. I wouldn't want to ditch it.

@Xuanwo
Copy link

Xuanwo commented Jun 3, 2024

For S3, more authentication methods are supported, link. Still, I don't think they support authentication via profiles, which makes life easier. And also (quoting page linked just before):

OpenDAL will not refresh the temporary security credentials, please keep in mind to refresh those credentials in time.

Hi, opendal maintainers here. Seems there are some confusion about this comment.

OpenDAL does support most authentication methods that aws provide, including:

  • Env
  • Profile
  • IDMSv2
  • Web Identity (a.k.a Assume Role With Web Identity)
  • Assume Role

Our goal is to support native authentication methods from various storage services, enabling users to seamlessly access all available storage options.

Accroding to this comment:

OpenDAL will not refresh the temporary security credentials, please keep in mind to refresh those credentials in time.

When users statically set security_credentials through our service builder, opendal cannot and will not refresh the credentials.

@Xuanwo
Copy link

Xuanwo commented Jun 3, 2024

As one data point, they seem to require account keys for Azure blob storage, or maybe it's just not in their docs for Azure blob storage.

Apologies for the confusion.

For azblob services, neither the account name nor the account key is necessary. OpenDAL will automatically attempt to load these from the environment, configuration, and web identity if they are not provided by the user. I will take some time to clarify this in our documentation.

@Xuanwo
Copy link

Xuanwo commented Nov 16, 2024

Hi, I'm going to start working on this to demonstrate what it will look like if there are no strong objections.

I plan to add GCS support or replace Azure Storage first (which I believe should have less impact than s3). Which option do you prefer?

cc @jcsp @koivunej @skyzh @arpad-m

@jcsp
Copy link
Contributor

jcsp commented Nov 18, 2024

@Xuanwo I'm happy to see a prototype, but please be aware that we haven't committed to this as a direction yet -- we'd need to see what the prototype looks like & if there are any caveats.

@Xuanwo
Copy link

Xuanwo commented Dec 18, 2024

Hi, @jcsp @koivunej @skyzh @arpad-m,

I have created a PoC for your evaluation: #10181

It's not a fully functional PR, but I believe it serves as a good starting point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t/question Issue type: question
Projects
None yet
Development

No branches or pull requests

5 participants