Skip to content

Conversation

@milesial
Copy link
Contributor

@milesial milesial commented Oct 20, 2025

Overview:

Offshoot from #3630.

Details:

Any media URL in the OAI requests will be included in the preprocessed common object, in a multi_modal_data field (map type str -> list of URLs)

running 12 tests
test test_media_url_passthrough::case_12_mixed_multiple ... ok
test test_media_url_passthrough::case_02_single_image ... ok
test test_media_url_passthrough::case_01_no_media ... ok
test test_media_url_passthrough::case_08_image_and_video ... ok
test test_media_url_passthrough::case_06_two_videos ... ok
test test_media_url_passthrough::case_04_single_audio ... ok
test test_media_url_passthrough::case_11_all_three_types ... ok
test test_media_url_passthrough::case_07_two_audios ... ok
test test_media_url_passthrough::case_05_three_images ... ok
test test_media_url_passthrough::case_09_images_and_audio ... ok
test test_media_url_passthrough::case_03_single_video ... ok
test test_media_url_passthrough::case_10_video_and_audios ... ok

Summary by CodeRabbit

  • New Features

    • Enabled processing and handling of multimodal content including images, videos, and audio URLs within requests.
  • Tests

    • Added comprehensive test coverage for multimodal data scenarios, including single media, multiple media types, and mixed content.

@milesial milesial requested a review from a team as a code owner October 20, 2025 15:49
@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 20, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 20, 2025

Walkthrough

The changes convert the preprocess_request method to asynchronous, introduce a new gather_multi_modal_data method for extracting media URLs from request messages, and extend the PreprocessedRequest structure with an optional multi_modal_data field to store collected multimodal content.

Changes

Cohort / File(s) Summary
Protocol & Type Definitions
lib/llm/src/protocols/common/preprocessor.rs
Added public enum MultimodalData with Url variant; added public type alias MultimodalDataMap; extended PreprocessedRequest struct with optional multi_modal_data field and builder annotation.
Preprocessor Implementation
lib/llm/src/preprocessor.rs
Converted preprocess_request from synchronous to asynchronous method; introduced new public async method gather_multi_modal_data to build multimodal data map from request messages; updated internal call sites to await preprocessing and incorporate multimodal data gathering; enhanced error context with with_context messages.
Tests
lib/llm/tests/preprocessor.rs
Added helper function build_message to construct messages with multimodal content (image/video/audio URLs); added parameterized test test_media_url_passthrough using rstest to validate multimodal data collection across various scenarios (no media, single, multiple, mixed types).

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant Preprocessor
    participant MessageParser
    participant Builder
    participant PreprocessedRequest

    Client->>Preprocessor: preprocess_request (async)
    activate Preprocessor
    Preprocessor->>Preprocessor: Apply template
    Preprocessor->>Preprocessor: Gather tokens
    Preprocessor->>Builder: Create PreprocessedRequestBuilder
    Preprocessor->>MessageParser: Extract media URLs from messages
    MessageParser-->>Preprocessor: MultimodalDataMap
    alt Media found
        Preprocessor->>Builder: gather_multi_modal_data
        Builder->>Builder: Attach multi_modal_data field
    end
    Preprocessor->>PreprocessedRequest: Build final request
    PreprocessedRequest-->>Client: Return preprocessed request
    deactivate Preprocessor
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

The changes involve converting a core method to async (requiring verification of await correctness and error propagation), introducing new data types and extraction logic for multimodal content, and adding comprehensive test coverage across multiple scenarios. The modifications span three distinct files with interconnected concerns, but the changes follow a coherent pattern and are logically cohesive.

Poem

🐰 Hark! The preprocessor now dreams in async await,
Gathering images, videos, sounds—a multimodal feast so great,
From messages they leap, captured in maps with care,
Building richer requests, floating through the air!
Tests dance across scenarios, ensuring all flows right,
A smooth async journey—onward to the light! 🚀

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title "feat: Media URL passthrough in OAI preprocessor" directly and clearly reflects the main purpose of the PR, which is to add support for including media URLs in the preprocessed request object. It is concise, specific enough for a teammate to understand the primary change when scanning history, and follows conventional commit format with the "feat:" prefix. The title accurately represents the core objective of the changeset without being vague or overly broad.
Description Check ✅ Passed The PR description provides an Overview section referencing the related PR (#3630) and a comprehensive Details section that clearly explains the change with concrete test output demonstrating that all 12 test cases pass. The description adequately communicates what was implemented (media URL inclusion in the multi_modal_data field) and provides evidence that it works correctly. However, the description is missing the "Where should the reviewer start?" section that identifies specific files for focused review, and the Related Issues section is incomplete—it mentions #3630 but doesn't use the recommended action keywords (Closes/Fixes/Resolves/Relates to) for proper issue linking.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
lib/llm/tests/preprocessor.rs (2)

498-526: Build JSON safely; avoid manual string concatenation.

Current helper will break if text contains quotes/newlines. Use serde_json to construct content parts.

-fn build_message(text: &str, chunks: &[(&str, usize)]) -> String {
-    let mut content_parts = vec![format!(r#"{{"type": "text", "text": "{}"}}"#, text)];
-
-    for (chunk_type, count) in chunks {
-        for i in 1..=*count {
-            let chunk = match *chunk_type {
-                "image_url" => format!(
-                    r#"{{"type": "image_url", "image_url": {{"url": "https://example.com/img{}.jpg"}}}}"#,
-                    i
-                ),
-                "video_url" => format!(
-                    r#"{{"type": "video_url", "video_url": {{"url": "https://example.com/vid{}.mp4"}}}}"#,
-                    i
-                ),
-                "audio_url" => format!(
-                    r#"{{"type": "audio_url", "audio_url": {{"url": "https://example.com/audio{}.mp3"}}}}"#,
-                    i
-                ),
-                _ => panic!("Unknown chunk type: {}", chunk_type),
-            };
-            content_parts.push(chunk);
-        }
-    }
-
-    format!(
-        r#"[{{"role": "user", "content": [{}]}}]"#,
-        content_parts.join(", ")
-    )
-}
+fn build_message(text: &str, chunks: &[(&str, usize)]) -> String {
+    use serde_json::json;
+    let mut parts = vec![json!({"type": "text", "text": text})];
+    for (chunk_type, count) in chunks {
+        for i in 1..=*count {
+            match *chunk_type {
+                "image_url" => parts.push(json!({"type":"image_url","image_url":{"url": format!("https://example.com/img{}.jpg", i)}})),
+                "video_url" => parts.push(json!({"type":"video_url","video_url":{"url": format!("https://example.com/vid{}.mp4", i)}})),
+                "audio_url" => parts.push(json!({"type":"audio_url","audio_url":{"url": format!("https://example.com/audio{}.mp3", i)}})),
+                _ => panic!("Unknown chunk type: {}", chunk_type),
+            }
+        }
+    }
+    serde_json::to_string(&vec![json!({"role": "user", "content": parts})]).unwrap()
+}

547-597: Nice coverage; consider removing HF gating for this path.

The passthrough logic doesn’t need model downloads; the HF token gate slows CI. A unit test targeting gather_multi_modal_data with a minimal stub request would decouple from HF. I can sketch a test helper if desired.

lib/llm/src/preprocessor.rs (1)

271-320: gather_multi_modal_data: robustness and efficiency improvements.

  • needless async (no await): either make sync or add an allow to silence clippy.
  • messages.len().unwrap_or(0) silently eats errors; prefer propagating len() errors (or early-return with context).
  • avoid JSON round‑trip (to_value/from_value); parse directly if messages exposes typed items.
  • replace stringly keys with shared constants or an enum to prevent typos.
  • optionally cap collected items per request to avoid unbounded growth/DoS.

Example targeted tweaks (illustrative):

// constants near the top-level
const MEDIA_IMAGE: &str = "image_url";
const MEDIA_VIDEO: &str = "video_url";
const MEDIA_AUDIO: &str = "audio_url";

And consider making this function sync for now; if you want to keep it async for future Decoded fetches, add:

#[allow(clippy::unused_async)]
pub async fn gather_multi_modal_data(...) -> Result<()>

If messages.len() can error, do we want to skip or fail the request? Please confirm desired behavior; I can propose an exact diff accordingly.

lib/llm/src/protocols/common/preprocessor.rs (1)

11-15: Serde feature already enabled; focus on forward-compat with #[non_exhaustive] if versioning stability is a priority.

After verification:

  1. Serde feature: Already configured in root Cargo.toml (url = { version = "2.5", features = ["serde"] }). No action needed.

  2. URL logging: No evidence found of PreprocessedRequest or MultimodalData being logged with Debug output that would expose URLs. The codebase does not currently leak full media URLs via Debug implementations.

  3. Forward-compat with #[non_exhaustive]: Valid suggestion. This pattern is already used throughout the codebase (e.g., in lib/async-openai/src/types/responses.rs). Apply to MultimodalData if you plan to add Decoded variant without breaking downstream code.

Suggested change remains relevant:

+#[non_exhaustive]
 #[derive(Serialize, Deserialize, Debug, Clone)]
 pub enum MultimodalData {
     Url(url::Url),
     // TODO: Decoded(DecodedMediaData),
 }

Also applies to line 17-19 (PreprocessedRequest) if forward-compat is a priority.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4b7a806 and 41d0120.

📒 Files selected for processing (3)
  • lib/llm/src/preprocessor.rs (7 hunks)
  • lib/llm/src/protocols/common/preprocessor.rs (2 hunks)
  • lib/llm/tests/preprocessor.rs (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-25T22:04:45.205Z
Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2700
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:19-28
Timestamp: 2025-08-25T22:04:45.205Z
Learning: The response_generator() method exists on multiple request types in the codebase: NvCreateChatCompletionRequest (for chat completions) and NvCreateCompletionRequest (for text completions). When making signature changes, it's important to distinguish between these different object types as they have separate implementations and call sites.

Applied to files:

  • lib/llm/src/preprocessor.rs
🧬 Code graph analysis (2)
lib/llm/tests/preprocessor.rs (1)
lib/llm/src/preprocessor.rs (1)
  • new (119-125)
lib/llm/src/protocols/common/preprocessor.rs (1)
lib/llm/src/preprocessor.rs (1)
  • builder (190-234)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: tests (launch/dynamo-run)
  • GitHub Check: clippy (.)
  • GitHub Check: clippy (launch/dynamo-run)
  • GitHub Check: tests (.)
  • GitHub Check: tests (lib/runtime/examples)
  • GitHub Check: tests (lib/bindings/python)
  • GitHub Check: clippy (lib/bindings/python)
🔇 Additional comments (2)
lib/llm/src/preprocessor.rs (2)

962-965: No action needed — NvCreateCompletionRequest correctly implements OAIChatLikeRequest.

The trait impl is present at lib/llm/src/preprocessor/prompt/template/oai.rs:189, confirming the call to gather_multi_modal_data(&request, &mut builder) satisfies its trait bounds. The code is correct as-is; no guard or refactoring is required.


165-186: All call sites properly awaited; async conversion verified.

Verification confirms both discovered call sites correctly use .await:

  • lib/llm/tests/preprocessor.rs:562 — awaited with .await.unwrap()
  • lib/llm/src/preprocessor.rs:837 — awaited with .await?

The async signature change is correctly wired throughout the codebase.

@rmccorm4
Copy link
Contributor

@milesial please address the coderabbit comments and failing check

milesial and others added 2 commits October 20, 2025 23:29
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: milesial <[email protected]>
Signed-off-by: milesial <[email protected]>
@rmccorm4
Copy link
Contributor

/ok to test d68458b

Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kicked off the backend tests so we can test if there is any impact on some of the existing multimodal tests for vllm, sglang, etc.

Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
@rmccorm4
Copy link
Contributor

/ok to test e9e2479

@rmccorm4 rmccorm4 enabled auto-merge (squash) October 27, 2025 22:29
@rmccorm4 rmccorm4 merged commit a79122c into main Oct 27, 2025
28 of 29 checks passed
@rmccorm4 rmccorm4 deleted the alexandrem/frontend-media-url-passthrough branch October 27, 2025 23:33
csabakecskemeti pushed a commit to csabakecskemeti/dynamo that referenced this pull request Oct 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants