Huggingface based ingestion by zcourts · Pull Request #4 · worka-ai/anvil

zcourts · 2025-10-30T17:28:30Z

No description provided.

…sumption breaks when needing to push to a bucket we incorrectly assumed to be public in the worker

This commit resolves a series of cascading failures in the Hugging Face ingestion integration test. The root cause was a design flaw where the background worker lacked the security context (tenant_id, region) of the original requester, forcing it to incorrectly assume the target bucket was public. The fix involved re-architecting the ingestion flow to securely propagate the necessary context from the initial gRPC request to the worker. Key Changes: - **Schema:** The `hf_ingestions` table has been updated to store the `tenant_id`, `requester_app_id`, and `target_region`, providing the worker with the information it needs to act on the user's behalf. - **Services:** - The `start_ingestion` service now correctly captures the `tenant_id` and `app_id` from the caller's JWT claims and persists them to the database. - Fixed a bug where the JWT `sub` claim (the app ID) was being incorrectly used as an app name. The service now correctly looks up the app by its ID. - **Worker:** - The `handle_hf_ingestion` worker has been refactored to query and use the `tenant_id` and `target_region` when looking up the target bucket, removing the flawed "public bucket" assumption. - All `println!` macros have been replaced with structured `tracing` logs (`info!`, `debug!`, `error!`). - A debugging `panic!` has been removed in favor of proper error logging and returning a `Result`. - **Tests:** - The `hf_ingestion_integration_test` has been fixed and made more robust. It no longer fails with a `403 Forbidden` during verification. - The test now correctly verifies that the private object is inaccessible to anonymous requests first, then uses a gRPC call to make the bucket public, and finally confirms that the object is accessible. - Corrected a bug where the initial `create_bucket` gRPC call was missing its authorization token.

The rust tooling is wrong about all of these, they're being used but only in the test so it is emitting warnings for them. The only legit one is the deprecated functions being used in crypto which we should really upgrade but will do in a future commit

This commit completes a major architectural refactoring to prepare the Anvil workspace for an open-core model, separating the foundational components from future enterprise extensions. The key changes include: - **Crate Separation:** The original `anvil` crate has been split into `anvil-core` (a pure library containing the fundamental structs, traits, and managers) and `anvil` (the main binary application that depends on `anvil-core`). - **Enterprise Feature Flag:** An `enterprise` feature flag has been added to the `anvil` crate. When enabled, it activates an optional dependency on the `anvil-enterprise` crate, allowing for the seamless addition of enterprise-specific services and logic. - **Test Harness Migration:** The test utilities have been extracted into a dedicated `anvil-test-utils` crate, which is now used by all integration tests across the workspace. - **Build Fixes:** Resolved numerous compilation, dependency, and routing issues that arose during the refactoring, resulting in a stable build where all original OSS tests now pass successfully on the new architecture. This new structure provides a clean and maintainable foundation for building and releasing both open-source and commercial versions of Anvil from a unified codebase.

…enterprise in tests - anvil-core: export cloneable AuthInterceptorFn and return it from create_grpc_router - anvil: pass core-provided interceptor into enterprise extender and serve merged Routes - test-utils: enable anvil crate’s enterprise feature so TestCluster includes enterprise services - Rationale: stable, scalable extension point for enterprise gRPC services and consistent middleware

This commit introduces a significant enhancement to the Anvil streaming core by implementing a new FFI (Foreign Function Interface) layer with caching and metrics. This FFI is consumed by a new Python SDK, `anvil-torch`, which provides a lazy-loading mechanism for PyTorch tensors, enabling more efficient memory usage and faster model loading for large models. Key changes include: - **FFI Layer (`anvil-ffi`):** - Implemented a new FFI with an `AnvilTensor` struct for binary-safe data transfer. - Added an LRU cache for tensors to reduce redundant data fetching. - Introduced metrics for cache hits, misses, and bytes fetched. - Improved error handling with `last_error_message`. - **Python SDK (`anvil-sdk-py`):** - Created the `anvil-torch` package with an `AnvilLoaderWrapper` to interface with the new FFI. - Implemented `enable`, `metrics`, and `load_from_anvil` functions for seamless PyTorch integration. - Added end-to-end tests for streaming inference with PyTorch models. - **Build and Test Infrastructure (`Justfile`):** - Added a comprehensive set of `just` commands for end-to-end testing using Docker Compose. - New commands streamline the process of bootstrapping Anvil, managing Hugging Face model ingestion, and running integration tests. - **Enterprise Features:** - Implemented pagination for the `list_tensors` service in `anvil-enterprise`. - **Bug Fixes and Refinements:** - Updated `ObjectRef` to use `Option<String>` for `version_id` to avoid empty strings. - Corrected linker arguments for macOS in `anvil-sdk-py-bindings`.

… denied masked as a 404

… to be read as well

…the same DB in the test...the current theory why the created bucket is not found

…sed by depending on HOME instead of using --config explicitly

zcourts added 30 commits October 30, 2025 17:26

Start implementing Huggingface based ingestion

63b32a4

Start doing some testing of the hugging face ingestion features

f8ebc60

Fully implemented fetching from hugging face but our earlier naive as…

9e6a78b

…sumption breaks when needing to push to a bucket we incorrectly assumed to be public in the worker

Clean up all the warnings

13e43c9

The rust tooling is wrong about all of these, they're being used but only in the test so it is emitting warnings for them. The only legit one is the deprecated functions being used in crypto which we should really upgrade but will do in a future commit

Flesh out the cli

96aa939

Implement CLI and test suite to go with it

16e2c19

Add missing target_region...how did it run locally?????

8624846

Add restrictions to region names making sure they're DNS friendly

4354aad

Use unique bucket names for tests so parallel runs don't conflict in CI

7f53864

Start restructuring for features being enabled/disabled

8127014

Enterprise feature not needed

6469da6

:(

81456b4

proxy enterprise feature down to anvil dep

db2573a

Tweak featutre activation

8abfc4f

Try an unsafe rust approach as feature flagging it isn't cutting it

2ac6c98

Trying to use an extern based approach to enterprise extensions

1e16cfc

Try using a registration based approach to enterprise extensions

a7ef4b7

add some logs

dc32a3e

Add basic OSS modifications needed to support admin console

a9a31ca

Additional admin related APIs

61f9d26

Introduce admin users as a concept

e5ff065

Run on custom larger runner

360c98e

Restore object listing query

e0fcc71

Add a bunch of sleep to test a hypothesis about the last ci failures

be17257

Try using an objective wait between the cli calls

0457ea9

Real fix is the hard coded app - we were correctly getting permission…

286c1e0

… denied masked as a 404

zcourts added 16 commits November 11, 2025 19:17

Try limiting prallel tests

a7a48d5

Try using grpc based checks to wait for bucket availability in tests

60d5636

Drop this nonesense

6179dff

Looking increasingly like a race condition in the CI so wait for gRPC…

137d250

… to be read as well

sigh

1a9bd61

get logs only for failing tests

2a4bd2b

and now?

6a0b4bc

Old school println

c582023

usng tracing to log, reduce noise from gossip in tests

b9e7f2e

Force early failure for logs

bb0b03b

Ensure test logger's online initialised once

bb5addc

The mystery deepens

76f78aa

Use per test URL for the global DB so the server and CLI connects to …

e193658

…the same DB in the test...the current theory why the created bucket is not found

Rolling back wait_for_bucket based nonesense

722a181

Pass config in and stop depending on HOME

c3e4902

Put back all CI setup - the issue was race condition or other bug cau…

ac4b352

…sed by depending on HOME instead of using --config explicitly

zcourts merged commit c8ab349 into main Nov 13, 2025
1 check passed

zcourts deleted the feature/hf branch November 13, 2025 23:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Huggingface based ingestion#4

Huggingface based ingestion#4
zcourts merged 46 commits intomainfrom
feature/hf

zcourts commented Oct 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zcourts commented Oct 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant