Skip to content

Fix GGUF reload race causing heap corruption on Apple Silicon#2021

Open
ArFeRR wants to merge 1 commit into
kijai:mainfrom
ArFeRR:fix-gguf-reload-mps-heap-corruption
Open

Fix GGUF reload race causing heap corruption on Apple Silicon#2021
ArFeRR wants to merge 1 commit into
kijai:mainfrom
ArFeRR:fix-gguf-reload-mps-heap-corruption

Conversation

@ArFeRR

@ArFeRR ArFeRR commented May 25, 2026

Copy link
Copy Markdown

Problem

On Apple Silicon (MPS), rendering 2+ scenes back-to-back in a single
ComfyUI process with a GGUF-quantised WanVideo model crashes the whole
process with BUG IN CLIENT OF LIBMALLOC: memory corruption of free block / EXC_BREAKPOINT (SIGTRAP). The first render always succeeds;
the second fails reliably. The only known workaround was a full ComfyUI
restart between scenes (30–60 s overhead per transition).

Root cause

WanVideoSampler.process() calls load_weights() unconditionally on
every prompt (nodes_sampler.py:128-131). For GGUF this re-reads the
full 14.8 GB tensor blob and rebinds module parameters. On MPS, async
device frees race against CPU-side allocations in unified memory and
corrupt a libmalloc free-block header.

Fix

Guard the GGUF reload with a one-shot flag on the transformer object,
matching the existing per-transformer state pattern (patched_linear,
blocks_to_swap, …). Set WANVIDEO_DISABLE_GGUF_RELOAD_GUARD=1 to
restore the original behaviour for LoRA hot-swap scenarios.

if gguf_reader is not None: #handle GGUF
    if not getattr(transformer, "_gguf_weights_loaded", False) or \
       os.environ.get("WANVIDEO_DISABLE_GGUF_RELOAD_GUARD") == "1":
        load_weights(transformer, ...)
        transformer._gguf_weights_loaded = True
    set_lora_params_gguf(transformer, patcher.patches)
    transformer.patched_linear = True

The second load_weights() call site at line ~2367 is already gated by
if offloaded: and is untouched. CUDA is unaffected (separate VRAM
arena, no race).

Testing

Device Apple M4 Pro, 48 GB unified memory
OS / Python / torch macOS 25.5 / 3.12.13 / 2.7.1 (MPS)
Model Wan2.2-I2V-A14B GGUF Q5_K_M, high + low noise
Before scene 2 crashes during load_weights(), reproduces every time
After 11 sequential scenes in one process, zero crashes, output bit-identical to single-scene baseline

Happy to share the crash report .ips, full ComfyUI log of the
successful run, or a reproducer workflow on request.

On MPS, WanVideoSampler.process() re-reads the multi-GB GGUF tensor
blob on every prompt. Async device frees race against CPU allocations
in unified memory and corrupt a libmalloc free-block header — the 2nd
render onward dies with BUG IN CLIENT OF LIBMALLOC / EXC_BREAKPOINT.

Guard the GGUF branch with a one-shot transformer._gguf_weights_loaded
flag and reuse the already-resident weights on subsequent calls. Set
WANVIDEO_DISABLE_GGUF_RELOAD_GUARD=1 to restore the original behaviour
for LoRA hot-swap scenarios.

Tested on M4 Pro / macOS 25.5 / torch 2.7.1 (MPS) with Wan2.2 GGUF
Q5_K_M: 11 sequential renders, zero crashes, output bit-identical
to single-scene baseline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant