-
Notifications
You must be signed in to change notification settings - Fork 663
feat: Media URL passthrough in OAI preprocessor #3733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Alexandre Milesi <[email protected]>
WalkthroughThe changes convert the Changes
Sequence Diagram(s)sequenceDiagram
actor Client
participant Preprocessor
participant MessageParser
participant Builder
participant PreprocessedRequest
Client->>Preprocessor: preprocess_request (async)
activate Preprocessor
Preprocessor->>Preprocessor: Apply template
Preprocessor->>Preprocessor: Gather tokens
Preprocessor->>Builder: Create PreprocessedRequestBuilder
Preprocessor->>MessageParser: Extract media URLs from messages
MessageParser-->>Preprocessor: MultimodalDataMap
alt Media found
Preprocessor->>Builder: gather_multi_modal_data
Builder->>Builder: Attach multi_modal_data field
end
Preprocessor->>PreprocessedRequest: Build final request
PreprocessedRequest-->>Client: Return preprocessed request
deactivate Preprocessor
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes The changes involve converting a core method to async (requiring verification of await correctness and error propagation), introducing new data types and extraction logic for multimodal content, and adding comprehensive test coverage across multiple scenarios. The modifications span three distinct files with interconnected concerns, but the changes follow a coherent pattern and are logically cohesive. Poem
Pre-merge checks✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
lib/llm/tests/preprocessor.rs (2)
498-526: Build JSON safely; avoid manual string concatenation.Current helper will break if text contains quotes/newlines. Use serde_json to construct content parts.
-fn build_message(text: &str, chunks: &[(&str, usize)]) -> String { - let mut content_parts = vec![format!(r#"{{"type": "text", "text": "{}"}}"#, text)]; - - for (chunk_type, count) in chunks { - for i in 1..=*count { - let chunk = match *chunk_type { - "image_url" => format!( - r#"{{"type": "image_url", "image_url": {{"url": "https://example.com/img{}.jpg"}}}}"#, - i - ), - "video_url" => format!( - r#"{{"type": "video_url", "video_url": {{"url": "https://example.com/vid{}.mp4"}}}}"#, - i - ), - "audio_url" => format!( - r#"{{"type": "audio_url", "audio_url": {{"url": "https://example.com/audio{}.mp3"}}}}"#, - i - ), - _ => panic!("Unknown chunk type: {}", chunk_type), - }; - content_parts.push(chunk); - } - } - - format!( - r#"[{{"role": "user", "content": [{}]}}]"#, - content_parts.join(", ") - ) -} +fn build_message(text: &str, chunks: &[(&str, usize)]) -> String { + use serde_json::json; + let mut parts = vec![json!({"type": "text", "text": text})]; + for (chunk_type, count) in chunks { + for i in 1..=*count { + match *chunk_type { + "image_url" => parts.push(json!({"type":"image_url","image_url":{"url": format!("https://example.com/img{}.jpg", i)}})), + "video_url" => parts.push(json!({"type":"video_url","video_url":{"url": format!("https://example.com/vid{}.mp4", i)}})), + "audio_url" => parts.push(json!({"type":"audio_url","audio_url":{"url": format!("https://example.com/audio{}.mp3", i)}})), + _ => panic!("Unknown chunk type: {}", chunk_type), + } + } + } + serde_json::to_string(&vec![json!({"role": "user", "content": parts})]).unwrap() +}
547-597: Nice coverage; consider removing HF gating for this path.The passthrough logic doesn’t need model downloads; the HF token gate slows CI. A unit test targeting gather_multi_modal_data with a minimal stub request would decouple from HF. I can sketch a test helper if desired.
lib/llm/src/preprocessor.rs (1)
271-320: gather_multi_modal_data: robustness and efficiency improvements.
- needless async (no await): either make sync or add an allow to silence clippy.
- messages.len().unwrap_or(0) silently eats errors; prefer propagating len() errors (or early-return with context).
- avoid JSON round‑trip (to_value/from_value); parse directly if messages exposes typed items.
- replace stringly keys with shared constants or an enum to prevent typos.
- optionally cap collected items per request to avoid unbounded growth/DoS.
Example targeted tweaks (illustrative):
// constants near the top-level const MEDIA_IMAGE: &str = "image_url"; const MEDIA_VIDEO: &str = "video_url"; const MEDIA_AUDIO: &str = "audio_url";And consider making this function sync for now; if you want to keep it async for future Decoded fetches, add:
#[allow(clippy::unused_async)] pub async fn gather_multi_modal_data(...) -> Result<()>If messages.len() can error, do we want to skip or fail the request? Please confirm desired behavior; I can propose an exact diff accordingly.
lib/llm/src/protocols/common/preprocessor.rs (1)
11-15: Serde feature already enabled; focus on forward-compat with#[non_exhaustive]if versioning stability is a priority.After verification:
Serde feature: Already configured in root Cargo.toml (
url = { version = "2.5", features = ["serde"] }). No action needed.URL logging: No evidence found of PreprocessedRequest or MultimodalData being logged with Debug output that would expose URLs. The codebase does not currently leak full media URLs via Debug implementations.
Forward-compat with
#[non_exhaustive]: Valid suggestion. This pattern is already used throughout the codebase (e.g., inlib/async-openai/src/types/responses.rs). Apply to MultimodalData if you plan to addDecodedvariant without breaking downstream code.Suggested change remains relevant:
+#[non_exhaustive] #[derive(Serialize, Deserialize, Debug, Clone)] pub enum MultimodalData { Url(url::Url), // TODO: Decoded(DecodedMediaData), }Also applies to line 17-19 (PreprocessedRequest) if forward-compat is a priority.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
lib/llm/src/preprocessor.rs(7 hunks)lib/llm/src/protocols/common/preprocessor.rs(2 hunks)lib/llm/tests/preprocessor.rs(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-25T22:04:45.205Z
Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2700
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:19-28
Timestamp: 2025-08-25T22:04:45.205Z
Learning: The response_generator() method exists on multiple request types in the codebase: NvCreateChatCompletionRequest (for chat completions) and NvCreateCompletionRequest (for text completions). When making signature changes, it's important to distinguish between these different object types as they have separate implementations and call sites.
Applied to files:
lib/llm/src/preprocessor.rs
🧬 Code graph analysis (2)
lib/llm/tests/preprocessor.rs (1)
lib/llm/src/preprocessor.rs (1)
new(119-125)
lib/llm/src/protocols/common/preprocessor.rs (1)
lib/llm/src/preprocessor.rs (1)
builder(190-234)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: tests (launch/dynamo-run)
- GitHub Check: clippy (.)
- GitHub Check: clippy (launch/dynamo-run)
- GitHub Check: tests (.)
- GitHub Check: tests (lib/runtime/examples)
- GitHub Check: tests (lib/bindings/python)
- GitHub Check: clippy (lib/bindings/python)
🔇 Additional comments (2)
lib/llm/src/preprocessor.rs (2)
962-965: No action needed — NvCreateCompletionRequest correctly implements OAIChatLikeRequest.The trait impl is present at
lib/llm/src/preprocessor/prompt/template/oai.rs:189, confirming the call togather_multi_modal_data(&request, &mut builder)satisfies its trait bounds. The code is correct as-is; no guard or refactoring is required.
165-186: All call sites properly awaited; async conversion verified.Verification confirms both discovered call sites correctly use
.await:
lib/llm/tests/preprocessor.rs:562— awaited with.await.unwrap()lib/llm/src/preprocessor.rs:837— awaited with.await?The async signature change is correctly wired throughout the codebase.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: milesial <[email protected]>
Signed-off-by: milesial <[email protected]>
|
/ok to test d68458b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kicked off the backend tests so we can test if there is any impact on some of the existing multimodal tests for vllm, sglang, etc.
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
|
/ok to test e9e2479 |
Signed-off-by: Alexandre Milesi <[email protected]>
Overview:
Offshoot from #3630.
Details:
Any media URL in the OAI requests will be included in the preprocessed common object, in a
multi_modal_datafield (map type str -> list of URLs)Summary by CodeRabbit
New Features
Tests