Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abstract persistent files through Apache OpenDAL #5626

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

txase
Copy link

@txase txase commented Feb 21, 2025

This PR represents the first set of attempting to incorporate the changes from the AWS Serverless POC in #5591 and contains two commits:

  1. Refactor the codebase to access persistent files through Apache OpenDAL
  2. Add support for an optional AWS S3 OpenDAL backend behind the new s3 feature flag

I decided to put these two commits into one PR to make it easier for me to get the new abstractions right, and to demonstrate what it takes to add an additional OpenDAL backend. I'm happy to split these into separate sequential PRs if that's preferred.

The first commit looks quite large at first glance, but many changes simply make all file accesses asynchronous and fallible by returning Result types. I think the key changes worth reviewing are:

  • src/auth.rs: Slightly refactored initialize_keys() as needed for OpenDAL access of private key file
  • src/config.rs:
    • CONFIG is still a synchronous Lazy type, but we have to calculate it from async methods. I shoehorned a tiny tokio async thread that runs to completion to calculate the value. The alternative of patching every use of CONFIG to be async, for no runtime benefit, seemed not worth doing.
    • See the new opendal_operator_for_path() and the abstracted CONFIG.opendal_operator_for_path_type() methods for the core of how operators are managed for various paths.
  • src/util.rs: Has a new function save_temp_file() that abstracts the saving of TempFiles that Rocket creates when files are uploaded
  • Wherever persistent data files are used, we now generate an OpenDAL Operator to access them. This includes config.json, sends, attachments, and icons.

The second commit is much smaller and more straightforward. The only thing worth pointing out is that OpenDAL uses reqsign under the covers to configure AWS credentials. However, AWS SDK configs have repeatedly been extended for better credential generation. For example, I use AWS Identity Center (aka AWS SSO) to generate temporary access tokens in my dev environment. reqsign doesn't support AWS SSO configs, but it has an escape hatch I utilized to load credentials. In the escape hatch I load the official AWS SDK config and credential generation crates to generate credentials. The one annoying part of the escape hatch is that reqsign's AwsCredentialLoad trait uses anyhow::Result, so we have to pull in anyhow just for this escape hatch :(.

Trying it out

These changes should be a behavioral no-op for existing use cases. The one minor change is the attachments, icon_cache, and sends folders aren't created at startup as OpenDAL FS service creates them when the first Operator is instantiated for each.

To try out the new S3 changes:

  1. Build with the s3 feature turned on
  2. Configure an AWS profile (this implementation honors all standard env vars like AWS_PROFILE and AWS_REGION along with standard AWS configs like ~/.aws/config)
  3. Set the following config values:
    • DATA_FOLDER -> s3://<bucket in the matching AWS region>[/<optional path prefix>]
    • TMP_FOLDER -> data/tmp (or your preference, but must be set to a local path)
    • TEMPLATES_FOLDER -> data/templates (or your preference, but must be set to a local path)
    • DATABASE_URL -> data/db.sqlite3 (or your preference, but must be set to a valid value)

@williamdes
Copy link
Contributor

On the same topic, I recommend https://github.com/lusingander/stu as a simple CLI tool to visualize the uploaded files

@williamdes
Copy link
Contributor

Set the following config values:

can they all be S3 ?
even DATABASE_URL ?

@txase
Copy link
Author

txase commented Feb 21, 2025

No, they can’t all be abstracted through opendal, nor should they be. You don’t want your entire sqlite db to be streamed back and forth to s3 on every record change, for example. One could also imagine tmp files could go through opendal, but it doesn’t make sense to do so even if it’s possible.

@williamdes
Copy link
Contributor

No, they can’t all be abstracted through opendal, nor should they be. You don’t want your entire sqlite db to be streamed back and forth to s3 on every record change, for example. One could also imagine tmp files could go through opendal, but it doesn’t make sense to do so even if it’s possible.

Okay, but TEMPLATES_FOLDER can be in s3 ?
What are they ?

@txase
Copy link
Author

txase commented Feb 21, 2025

I believe they are for customizing email and/or admin page html using handlebars. I had two thoughts on them:

  • They could be considered data, but they are probably more like assets one would include in an installation. It doesn’t seem like they should be considered user data to me that should be propagated through opendal.
  • The folder location is passed to a handlebars library function. Maybe there’s a way to customize the handlebars library behavior to operate through opendal rather than direct file access, but due to the previous point it doesn’t seem worthwhile.

@txase txase force-pushed the opendal branch 2 times, most recently from 6b18a84 to e6204d9 Compare February 21, 2025 17:20
@txase
Copy link
Author

txase commented Feb 21, 2025

I fixed all the GHA check issues so they pass. No material changes were made.

@BlackDex
Copy link
Collaborator

I fixed all the GHA check issues so they pass. No material changes were made.

Might want to install pre-commit, it will help on catching those before you push 😀.

@@ -82,6 +82,12 @@ impl Fairing for AppHeaders {
// 2FA/MFA Site check: api.2fa.directory
// # Mail Relay: https://bitwarden.com/blog/add-privacy-and-security-using-email-aliases-with-bitwarden/
// app.simplelogin.io, app.addy.io, api.fastmail.com, quack.duckduckgo.com

#[cfg(s3)]
let s3_connect_src = "https://*.amazonaws.com";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be configurable, not all S3 use AWS
there is MinIO or Garage as examples

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could remove this and instruct users to add their service's specific url via ALLOWED_CONNECT_SRC?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, less specific code could be better

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants