Skip to content

feat: implement global namespace with caching and smart S3 routing#6

Merged
zcourts merged 1 commit intomainfrom
feature/caching
Dec 3, 2025
Merged

feat: implement global namespace with caching and smart S3 routing#6
zcourts merged 1 commit intomainfrom
feature/caching

Conversation

@zcourts
Copy link
Contributor

@zcourts zcourts commented Dec 3, 2025

This commit overhauls the bucket lookup and region handling logic to support a true global namespace, while significantly improving performance and maintaining data locality constraints.

Key Changes:

  • Global Bucket Lookup: Modified Persistence::get_bucket_by_name to remove the region constraint. Nodes can now look up any bucket in the global database regardless of its home region.
  • Metadata Caching: Introduced MetadataCache using moka. This caches bucket, tenant, and policy metadata locally on each node to reduce load on the global database.
  • Cache Invalidation via P2P: Implemented a pub/sub mechanism over the existing libp2p cluster mesh. Nodes broadcast MetadataEvent (e.g., BucketUpdated) when modifying metadata, allowing peers to
    invalidate their local caches near-instantly.
  • S3 Gateway Smart Routing: Updated the S3 Gateway to support 301 Moved Permanently / 307 Temporary Redirect responses when a client accesses a bucket located in a different region. This enables AWS
    SDKs to automatically retry against the correct endpoint.
  • S3 Location Constraint: Updated CreateBucket to correctly parse and respect the LocationConstraint XML body, allowing S3 clients to specify the target region for new buckets.
  • Internal Logic Updates: Refactored ObjectManager to verify bucket region after lookup, ensuring requests are still processed by the correct regional node (or redirected at the gateway layer).
  • Admin CLI & Tests: Updated the admin CLI to accommodate the new configuration structure. Fixed integration tests to account for eventual consistency when using the direct-to-DB admin tool by
    implementing retry logic and configurable cache TTLs.

Components Touched:

  • anvil-core: Persistence layer (SQL updates), Caching logic, P2P Event definitions, ObjectManager refactoring.
  • anvil: S3 Gateway routing/redirection, Service startup/wiring.
  • anvil-cli / admin: Configuration updates.
  • tests: Enhanced robustness for async cache updates.

Performance Impact:
23 Massively reduced read load on the Global Postgres instance for hot paths (e.g., checking bucket existence on every object request).

This commit overhauls the bucket lookup and region handling logic to support a true global namespace, while significantly improving performance and maintaining data locality constraints.

 **Key Changes:**

 *   **Global Bucket Lookup:** Modified `Persistence::get_bucket_by_name` to remove the `region` constraint. Nodes can now look up any bucket in the global database regardless of its home region.
 *   **Metadata Caching:** Introduced `MetadataCache` using `moka`. This caches bucket, tenant, and policy metadata locally on each node to reduce load on the global database.
 *   **Cache Invalidation via P2P:** Implemented a pub/sub mechanism over the existing `libp2p` cluster mesh. Nodes broadcast `MetadataEvent` (e.g., `BucketUpdated`) when modifying metadata, allowing peers to
      invalidate their local caches near-instantly.
 *   **S3 Gateway Smart Routing:** Updated the S3 Gateway to support `301 Moved Permanently` / `307 Temporary Redirect` responses when a client accesses a bucket located in a different region. This enables AWS
      SDKs to automatically retry against the correct endpoint.
 *   **S3 Location Constraint:** Updated `CreateBucket` to correctly parse and respect the `LocationConstraint` XML body, allowing S3 clients to specify the target region for new buckets.
 *   **Internal Logic Updates:** Refactored `ObjectManager` to verify bucket region *after* lookup, ensuring requests are still processed by the correct regional node (or redirected at the gateway layer).
 *   **Admin CLI & Tests:** Updated the `admin` CLI to accommodate the new configuration structure. Fixed integration tests to account for eventual consistency when using the direct-to-DB admin tool by
      implementing retry logic and configurable cache TTLs.

 **Components Touched:**

 *   `anvil-core`: Persistence layer (SQL updates), Caching logic, P2P Event definitions, ObjectManager refactoring.
 *   `anvil`: S3 Gateway routing/redirection, Service startup/wiring.
 *   `anvil-cli` / `admin`: Configuration updates.
 *   `tests`: Enhanced robustness for async cache updates.

 **Performance Impact:**
   23 Massively reduced read load on the Global Postgres instance for hot paths (e.g., checking bucket existence on every object request).
@zcourts zcourts merged commit 2e7cc5d into main Dec 3, 2025
1 check failed
@zcourts zcourts deleted the feature/caching branch December 3, 2025 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant