feat: Media URL passthrough in OAI preprocessor #3733

milesial · 2025-10-20T15:49:31Z

Overview:

Offshoot from #3630.

Details:

Any media URL in the OAI requests will be included in the preprocessed common object, in a multi_modal_data field (map type str -> list of URLs)

running 12 tests
test test_media_url_passthrough::case_12_mixed_multiple ... ok
test test_media_url_passthrough::case_02_single_image ... ok
test test_media_url_passthrough::case_01_no_media ... ok
test test_media_url_passthrough::case_08_image_and_video ... ok
test test_media_url_passthrough::case_06_two_videos ... ok
test test_media_url_passthrough::case_04_single_audio ... ok
test test_media_url_passthrough::case_11_all_three_types ... ok
test test_media_url_passthrough::case_07_two_audios ... ok
test test_media_url_passthrough::case_05_three_images ... ok
test test_media_url_passthrough::case_09_images_and_audio ... ok
test test_media_url_passthrough::case_03_single_video ... ok
test test_media_url_passthrough::case_10_video_and_audios ... ok

Summary by CodeRabbit

New Features
- Enabled processing and handling of multimodal content including images, videos, and audio URLs within requests.
Tests
- Added comprehensive test coverage for multimodal data scenarios, including single media, multiple media types, and mixed content.

Signed-off-by: Alexandre Milesi <[email protected]>

copy-pr-bot · 2025-10-20T15:49:35Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-10-20T15:56:34Z

Walkthrough

The changes convert the preprocess_request method to asynchronous, introduce a new gather_multi_modal_data method for extracting media URLs from request messages, and extend the PreprocessedRequest structure with an optional multi_modal_data field to store collected multimodal content.

Changes

Cohort / File(s)	Summary
Protocol & Type Definitions `lib/llm/src/protocols/common/preprocessor.rs`	Added public enum `MultimodalData` with `Url` variant; added public type alias `MultimodalDataMap`; extended `PreprocessedRequest` struct with optional `multi_modal_data` field and builder annotation.
Preprocessor Implementation `lib/llm/src/preprocessor.rs`	Converted `preprocess_request` from synchronous to asynchronous method; introduced new public async method `gather_multi_modal_data` to build multimodal data map from request messages; updated internal call sites to await preprocessing and incorporate multimodal data gathering; enhanced error context with `with_context` messages.
Tests `lib/llm/tests/preprocessor.rs`	Added helper function `build_message` to construct messages with multimodal content (image/video/audio URLs); added parameterized test `test_media_url_passthrough` using `rstest` to validate multimodal data collection across various scenarios (no media, single, multiple, mixed types).

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant Preprocessor
    participant MessageParser
    participant Builder
    participant PreprocessedRequest

    Client->>Preprocessor: preprocess_request (async)
    activate Preprocessor
    Preprocessor->>Preprocessor: Apply template
    Preprocessor->>Preprocessor: Gather tokens
    Preprocessor->>Builder: Create PreprocessedRequestBuilder
    Preprocessor->>MessageParser: Extract media URLs from messages
    MessageParser-->>Preprocessor: MultimodalDataMap
    alt Media found
        Preprocessor->>Builder: gather_multi_modal_data
        Builder->>Builder: Attach multi_modal_data field
    end
    Preprocessor->>PreprocessedRequest: Build final request
    PreprocessedRequest-->>Client: Return preprocessed request
    deactivate Preprocessor

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

The changes involve converting a core method to async (requiring verification of await correctness and error propagation), introducing new data types and extraction logic for multimodal content, and adding comprehensive test coverage across multiple scenarios. The modifications span three distinct files with interconnected concerns, but the changes follow a coherent pattern and are logically cohesive.

Poem

🐰 Hark! The preprocessor now dreams in async await,
Gathering images, videos, sounds—a multimodal feast so great,
From messages they leap, captured in maps with care,
Building richer requests, floating through the air!
Tests dance across scenarios, ensuring all flows right,
A smooth async journey—onward to the light! 🚀

Pre-merge checks

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title "feat: Media URL passthrough in OAI preprocessor" directly and clearly reflects the main purpose of the PR, which is to add support for including media URLs in the preprocessed request object. It is concise, specific enough for a teammate to understand the primary change when scanning history, and follows conventional commit format with the "feat:" prefix. The title accurately represents the core objective of the changeset without being vague or overly broad.
Description Check	✅ Passed	The PR description provides an Overview section referencing the related PR (#3630) and a comprehensive Details section that clearly explains the change with concrete test output demonstrating that all 12 test cases pass. The description adequately communicates what was implemented (media URL inclusion in the multi_modal_data field) and provides evidence that it works correctly. However, the description is missing the "Where should the reviewer start?" section that identifies specific files for focused review, and the Related Issues section is incomplete—it mentions #3630 but doesn't use the recommended action keywords (Closes/Fixes/Resolves/Relates to) for proper issue linking.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

lib/llm/tests/preprocessor.rs (2)
498-526: Build JSON safely; avoid manual string concatenation.

Current helper will break if text contains quotes/newlines. Use serde_json to construct content parts.
-fn build_message(text: &str, chunks: &[(&str, usize)]) -> String {
-    let mut content_parts = vec![format!(r#"{{"type": "text", "text": "{}"}}"#, text)];
-
-    for (chunk_type, count) in chunks {
-        for i in 1..=*count {
-            let chunk = match *chunk_type {
-                "image_url" => format!(
-                    r#"{{"type": "image_url", "image_url": {{"url": "https://example.com/img{}.jpg"}}}}"#,
-                    i
-                ),
-                "video_url" => format!(
-                    r#"{{"type": "video_url", "video_url": {{"url": "https://example.com/vid{}.mp4"}}}}"#,
-                    i
-                ),
-                "audio_url" => format!(
-                    r#"{{"type": "audio_url", "audio_url": {{"url": "https://example.com/audio{}.mp3"}}}}"#,
-                    i
-                ),
-                _ => panic!("Unknown chunk type: {}", chunk_type),
-            };
-            content_parts.push(chunk);
-        }
-    }
-
-    format!(
-        r#"[{{"role": "user", "content": [{}]}}]"#,
-        content_parts.join(", ")
-    )
-}
+fn build_message(text: &str, chunks: &[(&str, usize)]) -> String {
+    use serde_json::json;
+    let mut parts = vec![json!({"type": "text", "text": text})];
+    for (chunk_type, count) in chunks {
+        for i in 1..=*count {
+            match *chunk_type {
+                "image_url" => parts.push(json!({"type":"image_url","image_url":{"url": format!("https://example.com/img{}.jpg", i)}})),
+                "video_url" => parts.push(json!({"type":"video_url","video_url":{"url": format!("https://example.com/vid{}.mp4", i)}})),
+                "audio_url" => parts.push(json!({"type":"audio_url","audio_url":{"url": format!("https://example.com/audio{}.mp3", i)}})),
+                _ => panic!("Unknown chunk type: {}", chunk_type),
+            }
+        }
+    }
+    serde_json::to_string(&vec![json!({"role": "user", "content": parts})]).unwrap()
+}
547-597: Nice coverage; consider removing HF gating for this path.

The passthrough logic doesn’t need model downloads; the HF token gate slows CI. A unit test targeting gather_multi_modal_data with a minimal stub request would decouple from HF. I can sketch a test helper if desired.
lib/llm/src/preprocessor.rs (1)
271-320: gather_multi_modal_data: robustness and efficiency improvements.

needless async (no await): either make sync or add an allow to silence clippy.

messages.len().unwrap_or(0) silently eats errors; prefer propagating len() errors (or early-return with context).

avoid JSON round‑trip (to_value/from_value); parse directly if messages exposes typed items.

replace stringly keys with shared constants or an enum to prevent typos.

optionally cap collected items per request to avoid unbounded growth/DoS.

Example targeted tweaks (illustrative):
// constants near the top-level
const MEDIA_IMAGE: &str = "image_url";
const MEDIA_VIDEO: &str = "video_url";
const MEDIA_AUDIO: &str = "audio_url";
And consider making this function sync for now; if you want to keep it async for future Decoded fetches, add:
#[allow(clippy::unused_async)]
pub async fn gather_multi_modal_data(...) -> Result<()>
If messages.len() can error, do we want to skip or fail the request? Please confirm desired behavior; I can propose an exact diff accordingly.
lib/llm/src/protocols/common/preprocessor.rs (1)
11-15: Serde feature already enabled; focus on forward-compat with #[non_exhaustive] if versioning stability is a priority.

After verification:

Serde feature: Already configured in root Cargo.toml (url = { version = "2.5", features = ["serde"] }). No action needed.

URL logging: No evidence found of PreprocessedRequest or MultimodalData being logged with Debug output that would expose URLs. The codebase does not currently leak full media URLs via Debug implementations.

Forward-compat with #[non_exhaustive]: Valid suggestion. This pattern is already used throughout the codebase (e.g., in lib/async-openai/src/types/responses.rs). Apply to MultimodalData if you plan to add Decoded variant without breaking downstream code.

Suggested change remains relevant:
+#[non_exhaustive]
 #[derive(Serialize, Deserialize, Debug, Clone)]
 pub enum MultimodalData {
     Url(url::Url),
     // TODO: Decoded(DecodedMediaData),
 }
Also applies to line 17-19 (PreprocessedRequest) if forward-compat is a priority.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4b7a806 and 41d0120.

📒 Files selected for processing (3)

lib/llm/src/preprocessor.rs (7 hunks)
lib/llm/src/protocols/common/preprocessor.rs (2 hunks)
lib/llm/tests/preprocessor.rs (2 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-08-25T22:04:45.205Z

Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2700
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:19-28
Timestamp: 2025-08-25T22:04:45.205Z
Learning: The response_generator() method exists on multiple request types in the codebase: NvCreateChatCompletionRequest (for chat completions) and NvCreateCompletionRequest (for text completions). When making signature changes, it's important to distinguish between these different object types as they have separate implementations and call sites.

Applied to files:

lib/llm/src/preprocessor.rs

🧬 Code graph analysis (2)

lib/llm/tests/preprocessor.rs (1)

lib/llm/src/preprocessor.rs (1)

new (119-125)

lib/llm/src/protocols/common/preprocessor.rs (1)

lib/llm/src/preprocessor.rs (1)

builder (190-234)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: tests (launch/dynamo-run)
GitHub Check: clippy (.)
GitHub Check: clippy (launch/dynamo-run)
GitHub Check: tests (.)
GitHub Check: tests (lib/runtime/examples)
GitHub Check: tests (lib/bindings/python)
GitHub Check: clippy (lib/bindings/python)

🔇 Additional comments (2)

lib/llm/src/preprocessor.rs (2)

962-965: No action needed — NvCreateCompletionRequest correctly implements OAIChatLikeRequest.

The trait impl is present at lib/llm/src/preprocessor/prompt/template/oai.rs:189, confirming the call to gather_multi_modal_data(&request, &mut builder) satisfies its trait bounds. The code is correct as-is; no guard or refactoring is required.

165-186: All call sites properly awaited; async conversion verified.

Verification confirms both discovered call sites correctly use .await:

lib/llm/tests/preprocessor.rs:562 — awaited with .await.unwrap()

lib/llm/src/preprocessor.rs:837 — awaited with .await?

The async signature change is correctly wired throughout the codebase.

lib/llm/src/protocols/common/preprocessor.rs

rmccorm4 · 2025-10-20T16:40:48Z

@milesial please address the coderabbit comments and failing check

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: milesial <[email protected]>

Signed-off-by: milesial <[email protected]>

lib/llm/tests/preprocessor.rs

rmccorm4 · 2025-10-24T21:37:22Z

/ok to test d68458b

rmccorm4

Kicked off the backend tests so we can test if there is any impact on some of the existing multimodal tests for vllm, sglang, etc.

lib/llm/src/protocols/common/preprocessor.rs

lib/llm/src/preprocessor.rs

lib/llm/tests/preprocessor.rs

Signed-off-by: Alexandre Milesi <[email protected]>

rmccorm4 · 2025-10-27T22:29:15Z

/ok to test e9e2479

Signed-off-by: Alexandre Milesi <[email protected]>

feat: Media URL passthrough in OAI preprocessor

41d0120

Signed-off-by: Alexandre Milesi <[email protected]>

milesial requested a review from a team as a code owner October 20, 2025 15:49

pull-request-size bot added the size/L label Oct 20, 2025

github-actions bot added the feat label Oct 20, 2025

coderabbitai bot reviewed Oct 20, 2025

View reviewed changes

lib/llm/src/protocols/common/preprocessor.rs Outdated Show resolved Hide resolved

milesial and others added 2 commits October 20, 2025 23:29

fix: skip serializing if null

9f9befa

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: milesial <[email protected]>

fix: clippy lint

d68458b

Signed-off-by: milesial <[email protected]>

rmccorm4 requested review from ayushag-nv, krishung5 and paulhendricks October 24, 2025 00:47

rmccorm4 reviewed Oct 24, 2025

View reviewed changes

lib/llm/tests/preprocessor.rs Show resolved Hide resolved

krishung5 approved these changes Oct 24, 2025

View reviewed changes

rmccorm4 reviewed Oct 24, 2025

View reviewed changes

grahamking reviewed Oct 27, 2025

View reviewed changes

lib/llm/src/protocols/common/preprocessor.rs Outdated Show resolved Hide resolved

grahamking reviewed Oct 27, 2025

View reviewed changes

lib/llm/src/preprocessor.rs Outdated Show resolved Hide resolved

grahamking reviewed Oct 27, 2025

View reviewed changes

lib/llm/src/preprocessor.rs Outdated Show resolved Hide resolved

grahamking reviewed Oct 27, 2025

View reviewed changes

lib/llm/tests/preprocessor.rs Show resolved Hide resolved

chore: address reviews

4d57187

Signed-off-by: Alexandre Milesi <[email protected]>

grahamking approved these changes Oct 27, 2025

View reviewed changes

tests: fix tests

e9e2479

Signed-off-by: Alexandre Milesi <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 27, 2025 22:29 Inactive

rmccorm4 enabled auto-merge (squash) October 27, 2025 22:29

copy-pr-bot bot temporarily deployed to GITLAB October 27, 2025 22:30 Inactive

rmccorm4 merged commit a79122c into main Oct 27, 2025
28 of 29 checks passed

rmccorm4 deleted the alexandrem/frontend-media-url-passthrough branch October 27, 2025 23:33

csabakecskemeti pushed a commit to csabakecskemeti/dynamo that referenced this pull request Oct 31, 2025

feat: Media URL passthrough in OAI preprocessor (ai-dynamo#3733)

fa5575f

Signed-off-by: Alexandre Milesi <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Media URL passthrough in OAI preprocessor #3733

feat: Media URL passthrough in OAI preprocessor #3733

Uh oh!

milesial commented Oct 20, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Oct 20, 2025

Uh oh!

coderabbitai bot commented Oct 20, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

rmccorm4 commented Oct 20, 2025

Uh oh!

Uh oh!

rmccorm4 commented Oct 24, 2025

Uh oh!

rmccorm4 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rmccorm4 commented Oct 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

feat: Media URL passthrough in OAI preprocessor #3733

feat: Media URL passthrough in OAI preprocessor #3733

Uh oh!

Conversation

milesial commented Oct 20, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Oct 20, 2025

Uh oh!

coderabbitai bot commented Oct 20, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rmccorm4 commented Oct 20, 2025

Uh oh!

Uh oh!

rmccorm4 commented Oct 24, 2025

Uh oh!

rmccorm4 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rmccorm4 commented Oct 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

milesial commented Oct 20, 2025 •

edited by coderabbitai bot

Loading