Skip to content

This new version of diffusers causes problems when doing inference with Flux 1.D and Wan2.2 Text To Image (explanation below) #12878

@ukaprch

Description

@ukaprch

Describe the bug

A few days ago I decided to upgrade Torch (cuda:gpu) from v2.7.1 to 2.8.0. In addition I also decided to update diffusers from v0.35.0 to v0.36.0. I updated additional components as well due to the Torch upgrade as well as compatibility with the new version of diffusers.
llama-stack
transformers
huggingface-hub
tokenizers
xformers (for torch 2.8)
triton-windows-3.5 (for torch 2.8)
sageattention-2.2.0 (for torch 2.8)
flash_attn-2.8.2 (for torch 2.8)

There may have been additional components added as is the case when upgrading.

I didn't change any of my code for inference. I encountered 2 major problems with the new diffusion library which may or may not be specifically attributable to diffusers alone:

  1. When running Flux 1.D text to image, the model/transformer seemed to ignore parts of the prompt. When running WAN 2.2 T2I it was even worse. The model/transformer ignored the entire prompt. So basically, something changed in the environment after the updates.

  2. When running WAN 2.2 T2I using "Sage" attention, I get a black image. I traced this back to the transformer attn processing. After a few iterations of attn processing, Sage Attn starts experiencing problems with the Query Tensor producing nan's... which eventually cause my black image. The K and V components don't seem affected, only the Q part. Strange.

As I stated, I didn't change any of my code for inference, only the environment as pertains to what I stated above. Has anyone else experience this or a similar problem?

It doesn't help to produce the code / logs as they don't show anything regarding the problem I stated above. This is a problem that doesn't emit an error message, only a bad or wrong image.

Reproduction

N/A. Note that I use the diffuser library locally on my PC. Not using Focus, A111, or any other alternative library.
I should add that I'm using the Quanto Quantized (INT8) versions of the Transformers and T5 Encoders due to VRAM limitations for both Flux 1.D and WAN 2.2 and have experienced no previous problems in doing so with the diffusers library.

In the interim, I have reverted back to previous releases of diffusers-0.35.0.dist-info and transformers-4.55.4.dist-info (keeping my upgraded Torch 2,8 functionality) and am not experiencing the problems I note here.

Logs

System Info

System: Windows 10 OS; Python v12.5.0
GPU: RTX 4090
Diffusers: v0.36.0 (current)
Transformers: v5.0.0 (current)
Torch: v2.8.0 (cuda)
Cuda Toolkit: v12.8

Who can help?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions