feat: Media HTTP fetching and b64 decoding #3967

milesial · 2025-10-29T20:51:35Z

Overview:

See #3630 for context, this MR is a subset of it, focusing on HTTP fetch and b64 decode.
E2E plumbing will be done in a followup MR, the current behavior of URL passthrough stays the default.

506888486-0ff8a654-e710-4554-a9c3-22ee04d93fe5

Summary by CodeRabbit

New Features
- Added multimodal media loading support enabling images and videos to be fetched from data URLs and HTTP/HTTPS sources.
- Media processing now runs asynchronously for improved responsiveness.
Chores
- Refactored media handling to support asynchronous operations.

coderabbitai · 2025-10-29T20:56:15Z

Walkthrough

Dependencies were added to support media fetching (reqwest, base64). The preprocessor's core functions were converted to async to enable multimodal media loading. New MediaLoader and EncodedMediaData types were introduced to handle media retrieval from URLs. All call sites were updated to use await syntax.

Changes

Cohort / File(s)	Change Summary
Dependency Updates `lib/llm/Cargo.toml`	Added reqwest (workspace) and base64 (0.22) for media operations; added mockito (1.7.0) as dev dependency
Preprocessor Core Async Conversion `lib/llm/src/preprocessor.rs`	Converted `preprocess_request` and `gather_multi_modal_data` to async functions; added `media_loader: Option<MediaLoader>` field; integrated async media fetching workflow with conditional loader execution
Media Module Implementation `lib/llm/src/preprocessor/media/mod.rs`, `lib/llm/src/preprocessor/media/common.rs`, `lib/llm/src/preprocessor/media/loader.rs`	Created new media submodule with EncodedMediaData struct for handling data/http(s) URLs, including base64 decoding; introduced MediaLoader struct with async `fetch_media_part` method for fetching image/video content
Test Updates `lib/llm/tests/preprocessor.rs`	Updated preprocessor test to await async `preprocess_request` call

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Async/await conversion affects multiple call sites and function signatures
New media handling module introduces network I/O and URL parsing logic
Error handling chains (anyhow results) and encoding/decoding operations require careful verification
Particular attention needed on:
- Correctness of async propagation through preprocessor call chain
- EncodedMediaData URL scheme handling (data:, http, https) and base64 decoding logic
- MediaLoader's content part type matching and error propagation
- Test assertions align with async behavior

Poem

🐰 Hops through URLs, both high and low,
Base64 decoded, media flows,
Async await, the loader's song,
Multimodal dreams, oh so strong! ✨

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The provided description is incomplete compared to the repository's PR description template. While an Overview section is present and provides context (referencing issue #3630 and noting that E2E plumbing will follow), the description is missing the Details section, the "Where should the reviewer start?" section, and a properly formatted Related Issues section using action keywords. The Overview alone is insufficient to meet the template's requirements, as it lacks substantive information about the specific changes made and guidance for reviewers.	The author should add the missing sections: expand the Details section to describe what was changed and why (async function signatures, new media module structure, dependency additions), add a "Where should the reviewer start?" section highlighting key files like preprocessor.rs and the new media module files, and include a Related Issues section using a proper action keyword (e.g., "Relates to #3630") to link to the related issue.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The pull request title "feat: Media HTTP fetching and b64 decoding" clearly and concisely summarizes the main changes introduced in this PR. The changes focus on adding HTTP media fetching capabilities and base64 decoding support, which aligns directly with the implementation of the MediaLoader, EncodedMediaData, and related async infrastructure for media handling. The title is specific enough to convey the primary purpose without being overly verbose.
Docstring Coverage	✅ Passed	Docstring coverage is 82.35% which is sufficient. The required threshold is 80.00%.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3fe5653 and 0860585.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (6)

lib/llm/Cargo.toml (2 hunks)
lib/llm/src/preprocessor.rs (10 hunks)
lib/llm/src/preprocessor/media/common.rs (1 hunks)
lib/llm/src/preprocessor/media/loader.rs (1 hunks)
lib/llm/src/preprocessor/media/mod.rs (1 hunks)
lib/llm/tests/preprocessor.rs (1 hunks)

🧰 Additional context used

🧠 Learnings (4)

📚 Learning: 2025-06-24T20:59:35.725Z

Learnt from: ishandhanani
PR: ai-dynamo/dynamo#1626
File: lib/llm/src/preprocessor.rs:238-239
Timestamp: 2025-06-24T20:59:35.725Z
Learning: In lib/llm/src/preprocessor.rs, the `sampling_options` call in the `preprocess_request` method is placed in the common section after the match statement on `request.prompt_input_type()`, meaning it applies to both `PromptInput::Tokens` and `PromptInput::Text` request types.

Applied to files:

lib/llm/tests/preprocessor.rs
lib/llm/src/preprocessor.rs

📚 Learning: 2025-09-16T19:47:30.312Z

Learnt from: KrishnanPrash
PR: ai-dynamo/dynamo#3067
File: lib/llm/src/preprocessor/prompt/template/oai.rs:87-134
Timestamp: 2025-09-16T19:47:30.312Z
Learning: In Dynamo, multimodal requests (containing image_url or other non-text content) are processed through a completely different workflow than text-only requests, so the may_be_fix_msg_content function in lib/llm/src/preprocessor/prompt/template/oai.rs will only encounter text-only content arrays.

Applied to files:

lib/llm/tests/preprocessor.rs
lib/llm/src/preprocessor.rs

📚 Learning: 2025-08-18T20:51:51.324Z

Learnt from: PeaBrane
PR: ai-dynamo/dynamo#2465
File: lib/runtime/src/pipeline/network/egress/push_router.rs:0-0
Timestamp: 2025-08-18T20:51:51.324Z
Learning: The runtime crate cannot depend on the llm crate due to architectural dependency constraints, preventing imports from lib/llm into lib/runtime.

Applied to files:

lib/llm/Cargo.toml

📚 Learning: 2025-08-25T22:04:45.205Z

Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2700
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:19-28
Timestamp: 2025-08-25T22:04:45.205Z
Learning: The response_generator() method exists on multiple request types in the codebase: NvCreateChatCompletionRequest (for chat completions) and NvCreateCompletionRequest (for text completions). When making signature changes, it's important to distinguish between these different object types as they have separate implementations and call sites.

Applied to files:

lib/llm/src/preprocessor.rs

🧬 Code graph analysis (3)

lib/llm/src/preprocessor/media/loader.rs (1)

lib/llm/src/preprocessor/media/common.rs (1)

from_url (17-43)

lib/llm/src/preprocessor/media/common.rs (1)

lib/llm/src/preprocessor/media/loader.rs (1)

new (19-25)

lib/llm/src/preprocessor.rs (2)

lib/llm/src/protocols/common/preprocessor.rs (2)

builder (95-97)

builder (132-134)

lib/llm/src/preprocessor/prompt.rs (1)

messages (51-51)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)

GitHub Check: vllm (amd64)
GitHub Check: operator (arm64)
GitHub Check: trtllm (amd64)
GitHub Check: sglang (arm64)
GitHub Check: operator (amd64)
GitHub Check: sglang (amd64)
GitHub Check: clippy (.)
GitHub Check: tests (lib/bindings/python)
GitHub Check: tests (launch/dynamo-run)
GitHub Check: tests (.)
GitHub Check: clippy (launch/dynamo-run)
GitHub Check: tests (lib/runtime/examples)
GitHub Check: clippy (lib/bindings/python)
GitHub Check: Build and Test - dynamo

🔇 Additional comments (15)

lib/llm/Cargo.toml (2)

174-174: LGTM! Appropriate test dependency for HTTP mocking.

The mockito 1.7.0 dev dependency enables HTTP endpoint mocking in tests, as demonstrated in lib/llm/src/preprocessor/media/common.rs.

143-145: Dependencies correctly configured for media fetching.

Verification confirms reqwest (v0.12.22) is properly defined in workspace.dependencies with appropriate features, and lib/llm/Cargo.toml correctly references it via { workspace = true }. The base64 v0.22 dependency is appropriately specified for encoding support.

lib/llm/tests/preprocessor.rs (1)

554-554: LGTM! Correct async usage.

The test properly awaits the now-async preprocess_request method, consistent with the preprocessor's async conversion for media loading support.

lib/llm/src/preprocessor/media/mod.rs (1)

1-8: LGTM! Clean module structure.

The module properly declares submodules and exposes a minimal public API surface through EncodedMediaData and MediaLoader.

lib/llm/src/preprocessor/media/loader.rs (2)

19-25: LGTM! Simple and correct HTTP client initialization.

The constructor properly initializes the reqwest client with a user agent and propagates errors via Result<Self>.

27-50: LGTM! Clear media type handling with appropriate TODOs.

The method correctly:

Delegates URL fetching to EncodedMediaData::from_url

Supports ImageUrl and VideoUrl

Explicitly rejects AudioUrl with a clear error message

Documents future work (decode, NIXL registration) via TODOs

The TODOs align with the PR objectives stating that E2E plumbing will be added in a follow-up MR.

lib/llm/src/preprocessor/media/common.rs (4)

8-12: LGTM! Simple and appropriate data structure.

The struct cleanly separates raw bytes from the encoding state, enabling deferred base64 decoding.

17-43: LGTM! Correct URL handling with appropriate scheme support.

The implementation properly:

Handles data: URLs by extracting base64 content after the comma

Fetches http/https URLs via reqwest with error propagation

Validates non-empty responses

Rejects unsupported schemes with clear error messages

The deferred base64 decoding (line 26 stores as-is) aligns with the comment on line 16 about avoiding expensive operations in the tokio runtime.

46-52: LGTM! Correct conditional base64 decoding.

The method properly decodes base64-encoded bytes when the flag is set, using the standard base64 engine.

55-147: LGTM! Comprehensive test coverage.

The tests thoroughly cover:

Valid base64 data URLs (decode verification)

Empty base64 data (error case)

Invalid data URL format (error case)

HTTP 200 success with mockito

HTTP 404 error handling

Unsupported URL schemes (FTP)

This provides strong confidence in the implementation's correctness.

lib/llm/src/preprocessor.rs (5)

14-14: LGTM! Clean module addition.

The media module is properly exposed and MediaLoader is imported for use in the preprocessor.

Also applies to: 30-30

117-117: LGTM! Infrastructure prepared for future media loading.

The media_loader field is properly initialized to None, preserving current behavior (URL passthrough). The TODO on line 146 correctly notes that enabling this requires decoder configuration from the MDC, planned for a follow-up MR per the PR objectives.

Also applies to: 146-146, 154-154

274-345: LGTM! Correct async multimodal data gathering with appropriate TODOs.

The implementation correctly:

Converts the method to async (line 274)

Queues fetch tasks when media_loader is present (lines 315-324)

Falls back to URL passthrough when no loader is configured (lines 318-323)

Executes fetch tasks concurrently via join_all (lines 330-338)

Note: Fetched results (line 330 _results) are intentionally unused, as indicated by the TODO on line 337. This aligns with the PR objectives stating that E2E plumbing will be added in a follow-up MR. Since media_loader is always None (line 146), the fetch path is currently never executed, preserving existing behavior.

862-862: LGTM! Call sites properly updated to await async methods.

Both preprocess_request (line 862) and gather_multi_modal_data (line 989) are correctly awaited in their respective generate implementations.

Also applies to: 989-989

168-168: Verification confirmed: All call sites properly await the async function.

The script output shows both discovered call sites of preprocess_request are correctly using .await:

lib/llm/tests/preprocessor.rs:554 uses .await.unwrap()

lib/llm/src/preprocessor.rs:862 uses .await?

No changes are needed.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

lib/llm/src/preprocessor/media.rs

lib/llm/src/preprocessor/media/common.rs

lib/llm/src/preprocessor/media/loader.rs

Signed-off-by: Alexandre Milesi <[email protected]>

milesial requested a review from a team as a code owner October 29, 2025 20:51

pull-request-size bot added the size/L label Oct 29, 2025

github-actions bot added the feat label Oct 29, 2025

milesial mentioned this pull request Oct 29, 2025

feat: Image decoder in the frontend #3971

Open

rmccorm4 requested review from KrishnanPrash, indrajit96 and krishung5 October 29, 2025 21:42

grahamking reviewed Oct 29, 2025

View reviewed changes

lib/llm/src/preprocessor/media.rs Show resolved Hide resolved

grahamking reviewed Oct 29, 2025

View reviewed changes

lib/llm/src/preprocessor/media/common.rs Outdated Show resolved Hide resolved

grahamking reviewed Oct 29, 2025

View reviewed changes

lib/llm/src/preprocessor/media/loader.rs Outdated Show resolved Hide resolved

grahamking approved these changes Oct 29, 2025

View reviewed changes

copy-pr-bot bot temporarily deployed to GITLAB October 29, 2025 22:26 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 18:19 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 18:22 Inactive

krishung5 approved these changes Oct 30, 2025

View reviewed changes

milesial added 3 commits October 30, 2025 16:28

feat: Media HTTP fetching and b64 decoding

7a9d6e0

Signed-off-by: Alexandre Milesi <[email protected]>

chore: address reviews

f5dfc0e

Signed-off-by: Alexandre Milesi <[email protected]>

feat: Add some security measures

8a17e40

Signed-off-by: Alexandre Milesi <[email protected]>

milesial force-pushed the alexandrem/frontend-media-fetching branch from 11fe8e2 to 8a17e40 Compare October 30, 2025 23:28

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 23:28 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 30, 2025 23:38 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Media HTTP fetching and b64 decoding #3967

feat: Media HTTP fetching and b64 decoding #3967

milesial commented Oct 29, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: Media HTTP fetching and b64 decoding #3967

Are you sure you want to change the base?

feat: Media HTTP fetching and b64 decoding #3967

Conversation

milesial commented Oct 29, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 29, 2025

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

milesial commented Oct 29, 2025 •

edited by coderabbitai bot

Loading