feat: Media processing in the frontend - 1st pass #3630

milesial · 2025-10-15T00:31:32Z

(WIP)

Overview:

Media decoding in the frontend for VLMs (images, videos).

Details:

Decodes multimodal data from the OAI chat request (image_url, video_url) in the frontend processor into decoded tensors (pixel values).
Passes the decoded data to the next step in the graph (backend) via NIXL readable descriptors (can be used with python nixl_connect).

Decoding data involves:

Potentially fetching the data from the web
Potentially decoding base64
Running the actual media decoding (JPEG, H264, ...)

These last two steps can be CPU-heavy and are done in the rayon runtime.
This decoding is optional, if dynamo was not built with this feature, or if no decoding configuration is passed, unprocessed URLs will be passed.

Preprocessor holds a MediaLoader, which has an HTTP client and media decoders for each modality. Decoder configuration is passed via the MDC. In the future, per-request or even per-item options could override this default configuration. MediaLoader also holds a NIXL agent to handle registration of the storages. The underlying data is only cleared once the request object is dropped on the frontend, which happens at the end of generate().

TODOs:

This MR:

Have media decoding code under a feature flag

Future work:

Microbench tests
Per-request decoder options
HW decoding
Seek-based video decoding for sparse sampling
Parallel HTTP fetch and decoding
Early-free decoded memory data once read
Pre-allocate RAM slab to share a unique NIXL metadata

Where should the reviewer start?

Flow starting from gather_multi_modal_data in preprocessor.rs

copy-pr-bot · 2025-10-15T00:31:36Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Alexandre Milesi <[email protected]>

rmccorm4 · 2025-10-28T20:03:43Z

container/Dockerfile

This decoding is optional, if dynamo was not built with this feature, or if no decoding configuration is passed, unprocessed URLs will be passed.

If the feature is gated behind a compile-time feature flag, I think it will be difficult to consume for most users since they'll need to build from source for one way or the other. Is this something that can be set as a frontend flag or environment variable or something instead? What do you think on how to control frontend-side media decoding feature @grahamking @krishung5 @indrajit96 ?

Hi @rmccorm4
I have a draft PR (#3929) for this compile time flag into alexandre's branch which is WIP.
I have taken the workflow for that, using enable_kvbm and block-manager feature group as an inspiration

dynamo/container/Dockerfile

Line 358 in b1732a5

if [ "$ENABLE_KVBM" = "true" ]; then \

Do you think that workflow is too tedious on the user side for a front end change ? Because for using KVBM also the user needs to compile or build the wheel again ?

So at the end of the day what we wanted to do is not prevent people from running the regular frontend if they don't have the required media loading system dependencies at runtime (ffmpeg mostly), and they don't need media decoding.

So the solution we are working on is a build-time flag. That way even not having ffmpeg during build is possible. But this means having different wheels for different features yes.

If we are in charge of the build and don't care about having ffmpeg on our side during build, then another solution could be to require ffmpeg during build, but at runtime, if the dynamic linking fails to find ffmpeg, disable video decoding? Need to see how doable this is with rust.

Signed-off-by: Alexandre Milesi <[email protected]>

pull-request-size bot added the size/XL label Oct 15, 2025

github-actions bot added the feat label Oct 15, 2025

milesial mentioned this pull request Oct 20, 2025

feat: Media URL passthrough in OAI preprocessor #3733

Merged

milesial force-pushed the alexandrem/frontend-media-decoding branch from 6df1e40 to 80594ff Compare October 23, 2025 00:10

pull-request-size bot added size/XXL and removed size/XL labels Oct 27, 2025

milesial force-pushed the alexandrem/frontend-media-decoding branch 2 times, most recently from 6a44d3d to f4edee8 Compare October 28, 2025 16:10

copy-pr-bot bot temporarily deployed to GITLAB October 28, 2025 16:58 Inactive

copy-pr-bot bot had a problem deploying to GITLAB October 28, 2025 16:59 Failure

milesial added 6 commits October 28, 2025 12:00

feat: Media processing in the frontend - 1st pass

e915bce

Signed-off-by: Alexandre Milesi <[email protected]>

NIXL data passing, install ffmpeg

0860419

Signed-off-by: Alexandre Milesi <[email protected]>

tests: Add decoding unit tests

ab0a31f

Signed-off-by: Alexandre Milesi <[email protected]>

tests: Some more tests

73a5b7a

Signed-off-by: Alexandre Milesi <[email protected]>

chore: fix rebase

72ca692

Signed-off-by: Alexandre Milesi <[email protected]>

chore: Cleanup restructure

7ca3075

Signed-off-by: Alexandre Milesi <[email protected]>

milesial force-pushed the alexandrem/frontend-media-decoding branch from d29d284 to 7ca3075 Compare October 28, 2025 19:01

copy-pr-bot bot temporarily deployed to GITLAB October 28, 2025 19:01 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 28, 2025 19:06 Inactive

rmccorm4 reviewed Oct 28, 2025

View reviewed changes

feat: cache nixl metadata, zlib encoding, tests

01316d7

copy-pr-bot bot temporarily deployed to GITLAB October 28, 2025 21:26 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 28, 2025 21:27 Inactive

chore: parallel processing, more tests

6a7a9bf

Signed-off-by: Alexandre Milesi <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 29, 2025 20:10 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 29, 2025 20:12 Inactive

This was referenced Oct 29, 2025

feat: Media HTTP fetching and b64 decoding #3967

Merged

feat: Image decoder in the frontend #3971

Open

feat: decoded media via NIXL #3988

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Media processing in the frontend - 1st pass #3630

feat: Media processing in the frontend - 1st pass #3630

milesial commented Oct 15, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Oct 15, 2025

Uh oh!

rmccorm4 Oct 28, 2025

Uh oh!

vanshilshah97 Oct 28, 2025

Uh oh!

milesial Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: Media processing in the frontend - 1st pass #3630

Are you sure you want to change the base?

feat: Media processing in the frontend - 1st pass #3630

Conversation

milesial commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

(WIP)

Overview:

Details:

Where should the reviewer start?

Uh oh!

copy-pr-bot bot commented Oct 15, 2025

Uh oh!

rmccorm4 Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

vanshilshah97 Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

milesial Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

milesial commented Oct 15, 2025 •

edited

Loading