Fix GGUF reload race causing heap corruption on Apple Silicon#2021
Open
ArFeRR wants to merge 1 commit into
Open
Fix GGUF reload race causing heap corruption on Apple Silicon#2021ArFeRR wants to merge 1 commit into
ArFeRR wants to merge 1 commit into
Conversation
On MPS, WanVideoSampler.process() re-reads the multi-GB GGUF tensor blob on every prompt. Async device frees race against CPU allocations in unified memory and corrupt a libmalloc free-block header — the 2nd render onward dies with BUG IN CLIENT OF LIBMALLOC / EXC_BREAKPOINT. Guard the GGUF branch with a one-shot transformer._gguf_weights_loaded flag and reuse the already-resident weights on subsequent calls. Set WANVIDEO_DISABLE_GGUF_RELOAD_GUARD=1 to restore the original behaviour for LoRA hot-swap scenarios. Tested on M4 Pro / macOS 25.5 / torch 2.7.1 (MPS) with Wan2.2 GGUF Q5_K_M: 11 sequential renders, zero crashes, output bit-identical to single-scene baseline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
On Apple Silicon (MPS), rendering 2+ scenes back-to-back in a single
ComfyUI process with a GGUF-quantised WanVideo model crashes the whole
process with
BUG IN CLIENT OF LIBMALLOC: memory corruption of free block/EXC_BREAKPOINT (SIGTRAP). The first render always succeeds;the second fails reliably. The only known workaround was a full ComfyUI
restart between scenes (30–60 s overhead per transition).
Root cause
WanVideoSampler.process()callsload_weights()unconditionally onevery prompt (
nodes_sampler.py:128-131). For GGUF this re-reads thefull 14.8 GB tensor blob and rebinds module parameters. On MPS, async
device frees race against CPU-side allocations in unified memory and
corrupt a libmalloc free-block header.
Fix
Guard the GGUF reload with a one-shot flag on the transformer object,
matching the existing per-transformer state pattern (
patched_linear,blocks_to_swap, …). SetWANVIDEO_DISABLE_GGUF_RELOAD_GUARD=1torestore the original behaviour for LoRA hot-swap scenarios.
The second
load_weights()call site at line ~2367 is already gated byif offloaded:and is untouched. CUDA is unaffected (separate VRAMarena, no race).
Testing
load_weights(), reproduces every timeHappy to share the crash report
.ips, full ComfyUI log of thesuccessful run, or a reproducer workflow on request.