Skip to content

Bug: Scales do not persist in fp32 when LoRA is applied. #2

@silveroxides

Description

@silveroxides

I am on ops-changes branch of ComfyUI.
There was no issues when no LoRA was being Loaded.
Model loaded: https://huggingface.co/silveroxides/Chroma1-HD-fp8-scaled/blob/main/Chroma1-HD-fp8matmulmixed_large_rev2.safetensors
LoRA used: https://huggingface.co/silveroxides/Chroma-LoRAs/blob/main/flash-heun-pruned/chroma-flash-heun_r01-fp32-pruned.safetensors
workflow: comfy-kitchen-test.json

FP8 _scaled_mm failed: Invalid scaling configuration.
- For TensorWise scaling, a and b should be float8, scales should be float and singletons.
- For RowWise scaling, a and b should be float8, scales should be float, scale_a should be (7056, 1) and scale_b should be (1, 9216), and both should be contiguous.
- For BlockWise 1x128 scaling, a and b should be float8, scales should be float, scale_a should be (7056, 24) and scale_b should be (24, 9216), and both should be outer-dim-major.
- For BlockWise 128x128 scaling, a and b should be float8, scales should be float, scale_a should be (56, 24) and scale_b should be (24, 72), and both should be near-inner-dim-major (with 16-byte aligned strides).
- For Blockwise 1x32 scaling, a and b should be float8, scales should be float8_e8m0fnu, scale_a should have 688128 elements and scale_b should have 884736 elements, and both should be contiguous.
- For Blockwise 1x16 scaling, a and b should be float4 (packed 2x), scales should be float8_e4m3fn, scale_a should have 2752512 elements and scale_b should have 3538944 elements, and both should be contiguous.
Got a.dtype()=Float8_e4m3fn, scale_a.dtype()=Float, scale_a.size()=[1], scale_a.stride()=[1], b.dtype()=Float8_e4m3fn, scale_b.dtype()=Half, scale_b.size()=[] and scale_b.stride()=[], falling back to dequantization
  0%|                                                                                           | 0/26 [00:00<?, ?it/s]
!!! Exception during processing !!! No backend can handle 'dequantize_per_tensor_fp8': triton: scale: dtype torch.float16 not in {torch.float32}; eager: scale: dtype torch.float16 not in {torch.float32}
Traceback (most recent call last):
  File "C:\Users\ishim\Tools\ComfyUI\venv\Lib\site-packages\comfy_kitchen\tensor\fp8.py", line 158, in _handle_fp8_linear
    output = _fp8_scaled_mm(input_qdata, weight_t, scale_a, scale_b, bias, out_dtype)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\venv\Lib\site-packages\comfy_kitchen\tensor\fp8.py", line 93, in _fp8_scaled_mm
    output = torch._scaled_mm(
             ^^^^^^^^^^^^^^^^^
RuntimeError: Invalid scaling configuration.
- For TensorWise scaling, a and b should be float8, scales should be float and singletons.
- For RowWise scaling, a and b should be float8, scales should be float, scale_a should be (7056, 1) and scale_b should be (1, 9216), and both should be contiguous.
- For BlockWise 1x128 scaling, a and b should be float8, scales should be float, scale_a should be (7056, 24) and scale_b should be (24, 9216), and both should be outer-dim-major.
- For BlockWise 128x128 scaling, a and b should be float8, scales should be float, scale_a should be (56, 24) and scale_b should be (24, 72), and both should be near-inner-dim-major (with 16-byte aligned strides).
- For Blockwise 1x32 scaling, a and b should be float8, scales should be float8_e8m0fnu, scale_a should have 688128 elements and scale_b should have 884736 elements, and both should be contiguous.
- For Blockwise 1x16 scaling, a and b should be float4 (packed 2x), scales should be float8_e4m3fn, scale_a should have 2752512 elements and scale_b should have 3538944 elements, and both should be contiguous.
Got a.dtype()=Float8_e4m3fn, scale_a.dtype()=Float, scale_a.size()=[1], scale_a.stride()=[1], b.dtype()=Float8_e4m3fn, scale_b.dtype()=Half, scale_b.size()=[] and scale_b.stride()=[]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\ishim\Tools\ComfyUI\execution.py", line 518, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\execution.py", line 329, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\execution.py", line 303, in _async_map_node_over_list
    await process_inputs(input_dict, i)
  File "C:\Users\ishim\Tools\ComfyUI\execution.py", line 291, in process_inputs
    result = f(**inputs)
             ^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy_api\internal\__init__.py", line 149, in wrapped_func
    return method(locked_class, **inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy_api\latest\_io.py", line 1570, in EXECUTE_NORMALIZED
    to_return = cls.execute(*args, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy_extras\nodes_custom_sampler.py", line 950, in execute
    samples = guider.sample(noise.generate_noise(latent), latent_image, sampler, sigmas, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise.seed)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy\samplers.py", line 1050, in sample
    output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy\patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy\samplers.py", line 994, in outer_sample
    output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy\samplers.py", line 980, in inner_sample
    samples = executor.execute(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy\patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\custom_nodes\ComfyUI-TiledDiffusion\utils.py", line 34, in KSAMPLER_sample
    return orig_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy\samplers.py", line 752, in sample
    samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\venv\Lib\site-packages\torch\utils\_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy\k_diffusion\sampling.py", line 202, in sample_euler
    denoised = model(x, sigma_hat * s_in, **extra_args)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy\samplers.py", line 401, in __call__
    out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy\samplers.py", line 953, in __call__
    return self.outer_predict_noise(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy\samplers.py", line 960, in outer_predict_noise
    ).execute(x, timestep, model_options, seed)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy\patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy\samplers.py", line 963, in predict_noise
    return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy\samplers.py", line 381, in sampling_function
    out = calc_cond_batch(model, conds, x, timestep, model_options)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy\samplers.py", line 206, in calc_cond_batch
    return _calc_cond_batch_outer(model, conds, x_in, timestep, model_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy\samplers.py", line 214, in _calc_cond_batch_outer
    return executor.execute(model, conds, x_in, timestep, model_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy\patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy\samplers.py", line 326, in _calc_cond_batch
    output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy\model_base.py", line 163, in apply_model
    return comfy.patcher_extension.WrapperExecutor.new_class_executor(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy\patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy\model_base.py", line 205, in _apply_model
    model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy\ldm\chroma\model.py", line 269, in forward
    return comfy.patcher_extension.WrapperExecutor.new_class_executor(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy\patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy\ldm\chroma\model.py", line 292, in _forward
    out = self.forward_orig(img, img_ids, context, txt_ids, timestep, guidance, control, transformer_options, attn_mask=kwargs.get("attention_mask", None))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy\ldm\chroma\model.py", line 213, in forward_orig
    img, txt = block(img=img,
               ^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy\ldm\flux\layers.py", line 212, in forward
    img_qkv = self.img_attn.qkv(img_modulated)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy\ops.py", line 675, in forward
    output = self._forward(input, self.weight, self.bias)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\comfy\ops.py", line 643, in _forward
    return torch.nn.functional.linear(input, weight, bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\venv\Lib\site-packages\comfy_kitchen\tensor\base.py", line 327, in __torch_dispatch__
    return op_handlers[parent_cls](qt, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\venv\Lib\site-packages\comfy_kitchen\tensor\fp8.py", line 172, in _handle_fp8_linear
    return torch.nn.functional.linear(*dequantize_args((input_tensor, weight, bias)))
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\venv\Lib\site-packages\comfy_kitchen\tensor\base.py", line 351, in dequantize_args
    return type(args)(dequantize_args(a) for a in args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\venv\Lib\site-packages\comfy_kitchen\tensor\base.py", line 351, in <genexpr>
    return type(args)(dequantize_args(a) for a in args)
                      ^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\venv\Lib\site-packages\comfy_kitchen\tensor\base.py", line 347, in dequantize_args
    return args.dequantize()
           ^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\venv\Lib\site-packages\comfy_kitchen\tensor\base.py", line 265, in dequantize
    full = self._layout_cls.dequantize(qdata, self._params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\venv\Lib\site-packages\comfy_kitchen\tensor\fp8.py", line 67, in dequantize
    return ck.dequantize_per_tensor_fp8(qdata, params.scale, params.orig_dtype)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\venv\Lib\site-packages\comfy_kitchen\__init__.py", line 84, in dequantize_per_tensor_fp8
    return torch.ops.comfy_kitchen.dequantize_fp8(x, scale, dtype_code)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\venv\Lib\site-packages\torch\_ops.py", line 1255, in __call__
    return self._op(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\venv\Lib\site-packages\torch\_library\custom_ops.py", line 343, in backend_impl
    result = self._backend_fns[device_type](*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\venv\Lib\site-packages\torch\_compile.py", line 53, in inner
    return disable_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\venv\Lib\site-packages\torch\_dynamo\eval_frame.py", line 1044, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\venv\Lib\site-packages\torch\_library\custom_ops.py", line 376, in wrapped_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\venv\Lib\site-packages\comfy_kitchen\backends\eager\quantization.py", line 241, in _op_dequantize_fp8
    impl = registry.get_implementation("dequantize_per_tensor_fp8", kwargs=kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\venv\Lib\site-packages\comfy_kitchen\registry.py", line 269, in get_implementation
    selected_backend = self.get_capable_backend(func_name, kwargs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ishim\Tools\ComfyUI\venv\Lib\site-packages\comfy_kitchen\registry.py", line 233, in get_capable_backend
    raise NoCapableBackendError(func_name, failures)
comfy_kitchen.exceptions.NoCapableBackendError: No backend can handle 'dequantize_per_tensor_fp8': triton: scale: dtype torch.float16 not in {torch.float32}; eager: scale: dtype torch.float16 not in {torch.float32}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions