Implement asynchronous module prefetching (QWEN only so far) #10594

rattus128 · 2025-11-01T16:42:27Z

Draft of a generic module prefetcher. Implement the core feature and give one example of how to use it with QWEN.

This is able to get very close to compute saturation whereas --async-offload as-is still has a few compute stalls.

Leaving as a draft for now, as I am still trying to find a better way.

Start comfy use QWEN to try it out. You need the following startup args:

--async-offload --fast pinned_memory --reserve-vram 3

It consumes a bit extra VRAM so you need to --reserve-vram to avoid OOMing.

The async offload streams reason for existence is to transfer from RAM to GPU. The post processing compute steps are a bonus on the side stream, but if the compute stream is running a long kernel, it can stall the side stream, as it wait to type-cast the bias before transferring the weight. So do a pure xfer of the weight straight up, then do everything bias, then go back to fix the weight type and do weight patches.

Implement an API that allows instrumenting a model with a prefetch queue. Units of work are on the nn.Module level.

rattus128 added 4 commits November 2, 2025 01:31

ops: dont take an offload stream if you dont need one

fcdb4a5

ops: Implement prefetching API

2da4f0e

Implement an API that allows instrumenting a model with a prefetch queue. Units of work are on the nn.Module level.

qwen: Implement transformer block prefetching

ec37c80

rattus128 force-pushed the prs/prefetching branch from e05d78f to ec37c80 Compare November 1, 2025 16:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement asynchronous module prefetching (QWEN only so far) #10594

Implement asynchronous module prefetching (QWEN only so far) #10594

rattus128 commented Nov 1, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Implement asynchronous module prefetching (QWEN only so far) #10594

Are you sure you want to change the base?

Implement asynchronous module prefetching (QWEN only so far) #10594

Conversation

rattus128 commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rattus128 commented Nov 1, 2025 •

edited

Loading