Skip to content

Conversation

@indrajit96
Copy link
Contributor

@indrajit96 indrajit96 commented Nov 24, 2025

Overview:

Adds a required --enable-multimodal security flag to explicitly enable multimodal processing in vLLM workers, preventing unintended processing of multimodal data from untrusted sources.
Similar to https://github.com/vllm-project/vllm/pull/27204/files

Details:

  • New flag: Added --enable-multimodal argument that must be explicitly set when using any multimodal worker type
  • Documentation: Updated docs/backends/vllm/multimodal.md with security requirement notice
  • Examples: Updated all launch scripts to include the new flag:

Where should the reviewer start?

  • components/src/dynamo/vllm/args.py - Core validation logic
  • docs/backends/vllm/multimodal.md - Security documentation

Summary by CodeRabbit

  • New Features

    • Added --enable-multimodal flag for explicit multimodal processing control. This flag is now required when using multimodal components.
  • Documentation

    • Added security requirements and guidance for multimodal processing configuration.
  • Chores

    • Updated example launch scripts to include the new multimodal flag across all supported configurations.

✏️ Tip: You can customize this high-level summary in your review settings.

@indrajit96 indrajit96 changed the title Add security flag to MM flow in vllm feat: Add security flag to MM flow in vllm Nov 24, 2025
@indrajit96 indrajit96 marked this pull request as ready for review November 24, 2025 20:40
@indrajit96 indrajit96 requested review from a team as code owners November 24, 2025 20:40
@github-actions github-actions bot added the feat label Nov 24, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 24, 2025

Walkthrough

A new --enable-multimodal flag is introduced to the VLLM configuration system. This flag is required when exactly one multimodal component flag is set, enforced through validation logic. The flag is added to all multimodal example launch scripts, and documentation is updated to reflect the security requirement.

Changes

Cohort / File(s) Summary
Core multimodal configuration
components/src/dynamo/vllm/args.py
Added enable_multimodal boolean field to Config dataclass (default False), new CLI argument --enable-multimodal, and validation logic requiring the flag when exactly one multimodal component flag is set.
Documentation
docs/backends/vllm/multimodal.md
Added Security Requirement admonition describing --enable-multimodal as a required startup flag for all multimodal workers, noting failure if multimodal flags are used without enabling.
Aggregated multimodal launch scripts
examples/backends/vllm/launch/agg_multimodal_epd.sh, agg_multimodal_llama.sh
Added --enable-multimodal flag to multimodal processor and encode worker invocations.
Disaggregated multimodal launch scripts
examples/backends/vllm/launch/disagg_multimodal_epd.sh, disagg_multimodal_llama.sh
Added --enable-multimodal flag to all multimodal component invocations (processor, encode/prefill/decode workers).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Validation logic in args.py: Verify the "exactly one multimodal flag" check is correctly implemented and all edge cases are handled
  • Consistency across scripts: Ensure all four launch scripts uniformly apply the flag to the correct components
  • Documentation accuracy: Confirm the security requirements admonition accurately describes the feature behavior

Poem

🐰 A flag hops in, multimodal and bright,
Security wrapped in validation tight,
From args to scripts, the changes align,
Four launch scripts dance, all in line,
Docs updated—the feature now signed! ✨

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding a security flag (--enable-multimodal) to the multimodal flow in vLLM. It is concise, clear, and directly relates to the primary objective of the PR.
Description check ✅ Passed The PR description covers all required template sections with clear explanations of the feature, validation logic, documentation, and example updates.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e75bcf6 and 89cdc59.

📒 Files selected for processing (6)
  • components/src/dynamo/vllm/args.py (4 hunks)
  • docs/backends/vllm/multimodal.md (1 hunks)
  • examples/backends/vllm/launch/agg_multimodal_epd.sh (1 hunks)
  • examples/backends/vllm/launch/agg_multimodal_llama.sh (1 hunks)
  • examples/backends/vllm/launch/disagg_multimodal_epd.sh (2 hunks)
  • examples/backends/vllm/launch/disagg_multimodal_llama.sh (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: ptarasiewiczNV
Repo: ai-dynamo/dynamo PR: 2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.972Z
Learning: The `--torch-backend=auto` flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.
📚 Learning: 2025-10-28T05:48:37.621Z
Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 3634
File: components/src/dynamo/vllm/multimodal_utils/model.py:39-42
Timestamp: 2025-10-28T05:48:37.621Z
Learning: In components/src/dynamo/vllm/multimodal_utils/model.py, the AutoModel.from_pretrained call with trust_remote_code=True in the load_vision_model function is intentional and expected for the vLLM multimodal implementation.

Applied to files:

  • examples/backends/vllm/launch/agg_multimodal_llama.sh
📚 Learning: 2025-07-22T10:22:28.972Z
Learnt from: ptarasiewiczNV
Repo: ai-dynamo/dynamo PR: 2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.972Z
Learning: The `--torch-backend=auto` flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.

Applied to files:

  • examples/backends/vllm/launch/agg_multimodal_llama.sh
📚 Learning: 2025-07-03T10:14:30.570Z
Learnt from: fsaady
Repo: ai-dynamo/dynamo PR: 1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

Applied to files:

  • examples/backends/vllm/launch/disagg_multimodal_llama.sh
🪛 LanguageTool
docs/backends/vllm/multimodal.md

[grammar] ~27-~27: Ensure spelling is correct
Context: ...out --enable-multimodal. This flag is analogus to --enable-mm-embeds in vllm serve. ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 Ruff (0.14.5)
components/src/dynamo/vllm/args.py

234-234: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: operator (amd64)
  • GitHub Check: trtllm (arm64)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: sglang (amd64)
  • GitHub Check: vllm (amd64)
  • GitHub Check: vllm (arm64)
  • GitHub Check: sglang (arm64)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (8)
examples/backends/vllm/launch/agg_multimodal_llama.sh (1)

14-17: LGTM! Security flag correctly added.

The --enable-multimodal flag is properly added to both the processor and encode-prefill-worker invocations, aligning with the security requirement to prevent unintended multimodal data processing.

examples/backends/vllm/launch/disagg_multimodal_llama.sh (1)

51-59: LGTM! Consistent security flag application.

The --enable-multimodal flag is correctly added to all three multimodal components (processor, encode-prefill-worker, and decode-worker) in the disaggregated setup, ensuring comprehensive security coverage.

examples/backends/vllm/launch/agg_multimodal_epd.sh (1)

78-82: LGTM! Security flag properly applied.

The --enable-multimodal flag is correctly added to all three multimodal components (processor, encode-worker, and multimodal-worker), ensuring the security requirement is enforced across the aggregated EPD setup.

components/src/dynamo/vllm/args.py (4)

63-63: LGTM! Good security-by-default design.

The enable_multimodal flag defaults to False, following the security principle of requiring explicit opt-in for multimodal processing.


163-167: LGTM! Clear CLI argument definition.

The --enable-multimodal CLI argument is properly defined with store_true action and clear help text explaining its purpose.


233-234: LGTM! Effective validation logic.

The validation correctly enforces that any multimodal component flag requires --enable-multimodal to be set, preventing unintended multimodal data processing. The error message is clear and actionable.

Note: The Ruff TRY003 hint is a style preference suggesting shorter messages or custom exception classes, but the current implementation is perfectly acceptable for this use case.


274-274: LGTM! Proper flag propagation.

The enable_multimodal flag is correctly propagated from CLI args to the config object, consistent with the pattern used for other configuration flags.

examples/backends/vllm/launch/disagg_multimodal_epd.sh (1)

79-99: LGTM! Comprehensive security flag coverage.

The --enable-multimodal flag is correctly added to all four multimodal components (processor, encode-worker, prefill-worker, and decode-worker) in the disaggregated EPD setup, ensuring complete security coverage across the entire multimodal pipeline.

Copy link
Contributor

@KrishnanPrash KrishnanPrash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New changes look good to me 👍. Great Work 💯

I believe the ARM container builds in CI are experiencing network problems, so you may have to re-run failed jobs a few times, before you get a successful run.

@dagil-nvidia
Copy link
Contributor

/ok to test 6144eb1

@KrishnanPrash
Copy link
Contributor

Also, with the most recent changes, I believe agg_multimodal.sh will also need the --enable-multimodal flag.

The script runs for test cases on our Gitlab vllm_gpu_2 job.

@indrajit96
Copy link
Contributor Author

Also, with the most recent changes, I believe agg_multimodal.sh will also need the --enable-multimodal flag.

The script runs for test cases on our Gitlab vllm_gpu_2 job.

Thanks for spotting this!! :)

Copy link
Contributor

@dmitry-tokarev-nv dmitry-tokarev-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's test it well.

@KrishnanPrash
Copy link
Contributor

Updating branch to pick up CI fixes from #4561.

@nv-tusharma nv-tusharma enabled auto-merge (squash) November 25, 2025 01:41
@nv-tusharma nv-tusharma merged commit 550bf98 into main Nov 25, 2025
32 of 34 checks passed
@nv-tusharma nv-tusharma deleted the ibhosale_vllm_mm_flag branch November 25, 2025 02:01
nv-tusharma pushed a commit that referenced this pull request Nov 25, 2025
nv-tusharma added a commit that referenced this pull request Nov 25, 2025
Co-authored-by: Indrajit Bhosale <[email protected]>
Co-authored-by: KrishnanPrash <[email protected]>
nv-tusharma added a commit that referenced this pull request Nov 25, 2025
nv-tusharma pushed a commit that referenced this pull request Nov 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants