Fix GGUF reload race causing heap corruption on Apple Silicon by ArFeRR · Pull Request #2021 · kijai/ComfyUI-WanVideoWrapper

ArFeRR · 2026-05-25T08:11:44Z

Problem

On Apple Silicon (MPS), rendering 2+ scenes back-to-back in a single
ComfyUI process with a GGUF-quantised WanVideo model crashes the whole
process with BUG IN CLIENT OF LIBMALLOC: memory corruption of free block / EXC_BREAKPOINT (SIGTRAP). The first render always succeeds;
the second fails reliably. The only known workaround was a full ComfyUI
restart between scenes (30–60 s overhead per transition).

Root cause

WanVideoSampler.process() calls load_weights() unconditionally on
every prompt (nodes_sampler.py:128-131). For GGUF this re-reads the
full 14.8 GB tensor blob and rebinds module parameters. On MPS, async
device frees race against CPU-side allocations in unified memory and
corrupt a libmalloc free-block header.

Fix

Guard the GGUF reload with a one-shot flag on the transformer object,
matching the existing per-transformer state pattern (patched_linear,
blocks_to_swap, …). Set WANVIDEO_DISABLE_GGUF_RELOAD_GUARD=1 to
restore the original behaviour for LoRA hot-swap scenarios.

if gguf_reader is not None: #handle GGUF
    if not getattr(transformer, "_gguf_weights_loaded", False) or \
       os.environ.get("WANVIDEO_DISABLE_GGUF_RELOAD_GUARD") == "1":
        load_weights(transformer, ...)
        transformer._gguf_weights_loaded = True
    set_lora_params_gguf(transformer, patcher.patches)
    transformer.patched_linear = True

The second load_weights() call site at line ~2367 is already gated by
if offloaded: and is untouched. CUDA is unaffected (separate VRAM
arena, no race).

Testing


Device	Apple M4 Pro, 48 GB unified memory
OS / Python / torch	macOS 25.5 / 3.12.13 / 2.7.1 (MPS)
Model	Wan2.2-I2V-A14B GGUF Q5_K_M, high + low noise
Before	scene 2 crashes during `load_weights()`, reproduces every time
After	11 sequential scenes in one process, zero crashes, output bit-identical to single-scene baseline

Happy to share the crash report .ips, full ComfyUI log of the
successful run, or a reproducer workflow on request.

On MPS, WanVideoSampler.process() re-reads the multi-GB GGUF tensor blob on every prompt. Async device frees race against CPU allocations in unified memory and corrupt a libmalloc free-block header — the 2nd render onward dies with BUG IN CLIENT OF LIBMALLOC / EXC_BREAKPOINT. Guard the GGUF branch with a one-shot transformer._gguf_weights_loaded flag and reuse the already-resident weights on subsequent calls. Set WANVIDEO_DISABLE_GGUF_RELOAD_GUARD=1 to restore the original behaviour for LoRA hot-swap scenarios. Tested on M4 Pro / macOS 25.5 / torch 2.7.1 (MPS) with Wan2.2 GGUF Q5_K_M: 11 sequential renders, zero crashes, output bit-identical to single-scene baseline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix GGUF reload race causing heap corruption on Apple Silicon#2021

Fix GGUF reload race causing heap corruption on Apple Silicon#2021
ArFeRR wants to merge 1 commit into
kijai:mainfrom
ArFeRR:fix-gguf-reload-mps-heap-corruption

ArFeRR commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ArFeRR commented May 25, 2026

Problem

Root cause

Fix

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant