Add support for NVFP4/FP8 mixed quantized checkpoints in ComfyUI#2029
Open
mattneel wants to merge 2 commits into
Open
Add support for NVFP4/FP8 mixed quantized checkpoints in ComfyUI#2029mattneel wants to merge 2 commits into
mattneel wants to merge 2 commits into
Conversation
ComfyUI core (>=0.23) ships native NVFP4 + mixed-precision quantization via comfy.quant_ops, with the FP4/FP8 GEMM kernels provided by comfy_kitchen. Such checkpoints store, per quantized linear, the packed weight (uint8 for NVFP4 / float8 for FP8) plus scale tensors and a per-layer `comfy_quant` JSON marker, and a top-level `_quantization_metadata` header. WanVideoWrapper's loader only handled GGUF and fp8-scaled, so these files failed to load. This adds auto-detected support: when the state dict contains `*.comfy_quant` keys, the affected nn.Linear weights are reconstructed as comfy QuantizedTensor objects (the same way ComfyUI core's _lazy_load_from_state_dict does), so the linear dispatches to comfy_kitchen's scaled_mm_nvfp4 / FP8 GEMM via __torch_dispatch__. No new kernels are introduced; it reuses what ComfyUI ships. - comfy_quant_linear.py: detection + QuantizedTensor reconstruction (NVFP4/FP8) - nodes_model_loading.py: detect in load_weights, reconstruct, and skip the already-loaded quantized params in the main assignment loop Notes/limitations (open to maintainer guidance): - weights load on the main transformer device; block-swap-aware placement and LoRA-merge for quantized layers follow the existing "no merge for quantized weights" rule and are left as follow-ups. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The initial loader bound QuantizedTensor weights but the forward pass failed for NVFP4 layers with `mat1 and mat2 shapes cannot be multiplied (Mx5120 and 2560x5120)`. Three NVFP4-specific issues (FP8 worked because its packed width equals its logical width): - comfy_quant_linear: derive the logical shape from the packed qdata + packing factor (in = qdata.shape[1]*2 for NVFP4, *1 for FP8) instead of trusting module.in_features. The model's Linear is instantiated from the checkpoint's stored weight width; for NVFP4 that's the packed half-width (2560), so using it for Params.orig_shape made dequantize() return a half-width tensor and the GEMM failed. This is the root cause. - comfy_quant_linear: route quantized CustomLinear through _linear_forward_direct so plain F.linear -> aten.linear.default dispatches to comfy_kitchen's NVFP4/FP8 GEMM, instead of the wanvideo.linear_forward custom op (a custom-op boundary can strip the tensor subclass). Also clear scale_weight / is_gguf on those modules. - custom_linear: _prepare_weight returns the QuantizedTensor intact for comfy_quant layers; a `.to(input)` cast there is unnecessary and risks collapsing it. - nodes_model_loading: skip the `.weight_scale`->`.scale_weight` rename for comfy_quant checkpoints, which keep their own scale tensor names. Validated end-to-end: a real Wan2.2-Animate generation (ViTPose pose+face conditioning, 20-step) on the NVFP4-mixed checkpoint matches the FP8 baseline at 27 dB PSNR / 0.94 correlation, visually identical. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

This pull request adds support for loading ComfyUI-native quantized checkpoints (NVFP4/FP8 mixed precision) in WanVideoWrapper. It introduces a new loader that reconstructs quantized weights as
QuantizedTensorobjects, ensuring compatibility with ComfyUI's efficient inference kernels. The changes also update the model loading pipeline to detect and properly handle these quantized checkpoints, avoiding unnecessary conversions and ensuring correct dispatch to the optimized GEMM kernels.ComfyUI-native quantized checkpoint (NVFP4/FP8) support:
comfy_quant_linear.pythat detects ComfyUI-native quantized checkpoints and reconstructs quantized weights asQuantizedTensorobjects, enabling direct use of ComfyUI's NVFP4/FP8 GEMM kernels.nodes_model_loading.pyto detect ComfyUI quantized checkpoints, invoke the new loader, and skip redundant weight assignments for quantized layers. [1] [2] [3]Integration with existing model code:
custom_linear.pyto ensure quantized weights are kept intact and dispatched correctly, bypassing any conversion that would break quantized inference.nodes_model_loading.pyfor use in the model loading pipeline.