You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
help="Run as unified encode+prefill+decode worker for models requiring integrated image encoding (e.g., Llama 4)",
163
164
)
165
+
parser.add_argument(
166
+
"--enable-multimodal",
167
+
action="store_true",
168
+
help="Enable multimodal processing. If not set, none of the multimodal components can be used.",
169
+
)
164
170
parser.add_argument(
165
171
"--mm-prompt-template",
166
172
type=str,
@@ -218,6 +224,9 @@ def parse_args() -> Config:
218
224
"Use only one of --multimodal-processor, --multimodal-encode-worker, --multimodal-worker, --multimodal-decode-worker, or --multimodal-encode-prefill-worker"
219
225
)
220
226
227
+
ifmm_flags==1andnotargs.enable_multimodal:
228
+
raiseValueError("Use --enable-multimodal to enable multimodal processing")
Copy file name to clipboardExpand all lines: docs/backends/vllm/multimodal.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,6 +22,10 @@ Dynamo supports multimodal models with vLLM v1. In general, multimodal models ca
22
22
> [!WARNING]
23
23
> **LLaVA Model Limitation**: Do not use LLaVA models (e.g., `llava-hf/llava-1.5-7b-hf`) with the standard aggregated serving setup, as they contain keywords that Dynamo cannot yet parse. LLaVA models can still be used with the EPD (Encode-Prefill-Decode) setup described below.
24
24
25
+
> [!IMPORTANT]
26
+
> **Security Requirement**: All multimodal workers require the `--enable-multimodal` flag to be explicitly set at startup. This is a security feature to prevent unintended processing of multimodal data from untrusted sources. Workers will fail at startup if multimodal flags (e.g., `--multimodal-worker`, `--multimodal-processor`) are used without `--enable-multimodal`.
27
+
This flag is analogus to `--enable-mm-embeds` in vllm serve but also extends it to all multimodal content (url, embeddings, b64).
28
+
25
29
# Multimodal EPD Deployment Examples
26
30
27
31
This section provides example workflows and reference implementations for deploying a multimodal model using Dynamo and vLLM v1 with EPD(Encode-Prefill-Decode) pipeline.
0 commit comments