Skip to content

[modular] add Modular flux for text-to-image #11995

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jul 29, 2025
Merged

[modular] add Modular flux for text-to-image #11995

merged 13 commits into from
Jul 29, 2025

Conversation

sayakpaul
Copy link
Member

@sayakpaul sayakpaul commented Jul 26, 2025

What does this PR do?

Plan to add the other tasks in a follow-up! I hope that's okay. Code to test this PR:

Unfold
import torch
from diffusers.modular_pipelines import SequentialPipelineBlocks
from diffusers.modular_pipelines.flux.modular_blocks import TEXT2IMAGE_BLOCKS
from diffusers.utils.logging import set_verbosity_debug

set_verbosity_debug()

model_id = "black-forest-labs/FLUX.1-dev"

blocks = SequentialPipelineBlocks.from_blocks_dict(TEXT2IMAGE_BLOCKS)

pipeline = blocks.init_pipeline()
pipeline.load_components(["text_encoder"], repo=model_id, subfolder="text_encoder", torch_dtype=torch.bfloat16)
pipeline.load_components(["tokenizer"], repo=model_id, subfolder="tokenizer")
pipeline.load_components(["text_encoder_2"], repo=model_id, subfolder="text_encoder_2", torch_dtype=torch.bfloat16)
pipeline.load_components(["tokenizer_2"], repo=model_id, subfolder="tokenizer_2")
pipeline.load_components(["scheduler"], repo=model_id, subfolder="scheduler")
pipeline.load_components(["transformer"], repo=model_id, subfolder="transformer", torch_dtype=torch.bfloat16)
pipeline.load_components(["vae"], repo=model_id, subfolder="vae", torch_dtype=torch.bfloat16)
pipeline.to("cuda")


prompt = "A cat and a dog baking a cake together in a kitchen. The cat is carefully measuring flour, while the dog is stirring the batter with a wooden spoon. The kitchen is cozy, with sunlight streaming through the window."
output = pipeline(
    prompt=prompt, num_inference_steps=28, guidance_scale=3.5, generator=torch.manual_seed(0)
)
output.get_intermediate("images")[0].save("modular_flux.png")

Output:

image

Also, I have decided to not implement any guidance in this PR as the original Flux pipeline doesn't have any guidance. LMK if that is okay.

@@ -11,12 +11,14 @@
@dataclass
class FluxPipelineOutput(BaseOutput):
"""
Output class for Stable Diffusion pipelines.
Output class for Flux image generation pipelines.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope this change is okay.

return mu


def _pack_latents(latents, batch_size, num_channels_latents, height, width):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't use "Copied from ..." here because:

make fix-copies enforces a weird indentation for this, which is errored out by the repo consistency check.

So, say you have the following as a standalone function in a module:

# Copied from diffusers.pipelines.flux.pipeline_flux.FluxPipeline._pack_latents
def _pack_latents(latents, batch_size, num_channels_latents, height, width):
    latents = latents.view(batch_size, num_channels_latents, height // 2, 2, width // 2, 2)
    latents = latents.permute(0, 2, 4, 1, 3, 5)
    latents = latents.reshape(batch_size, (height // 2) * (width // 2), num_channels_latents * 4)

    return latents

The moment you run make fix-copies after this, you will have the following diff:

+# Copied from diffusers.pipelines.flux.pipeline_flux.FluxPipeline._pack_latents
 def _pack_latents(latents, batch_size, num_channels_latents, height, width):
-    latents = latents.view(batch_size, num_channels_latents, height // 2, 2, width // 2, 2)
+        latents = latents.view(batch_size, num_channels_latents, height // 2, 2, width // 2, 2)
+        latents = latents.permute(0, 2, 4, 1, 3, 5)
+        latents = latents.reshape(batch_size, (height // 2) * (width // 2), num_channels_latents * 4)
+
+        return latents
     latents = latents.permute(0, 2, 4, 1, 3, 5)
     latents = latents.reshape(batch_size, (height // 2) * (width // 2), num_channels_latents * 4)

One can notice the messed up indentation. We should fix in a separate PR. Cc: @DN6

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice actually
I think we should move a lot more methods away from pipeline and as functions
# Copied from does not work well for poeple that's not maintainers; with modular system, all the methods are refactored to not depends on state anyway

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Could be cool to consider in the set of refactors @DN6 is doing 👀

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @sayakpaul
can you manually create a modular repo for flux too? (see #11913 (comment))

raise ValueError(f"`prompt` or `prompt_2` has to be of type `str` or `list` but is {type(prompt)}")

@staticmethod
def _get_t5_prompt_embeds(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can turn these two methods to functions and use across different models: flux/ltx/sd3 ....
I will put up a prototype in one of my PRs, just FYI here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Would be very curious to learn more.

@sayakpaul
Copy link
Member Author

@yiyixuxu here is the repo: https://huggingface.co/diffusers-internal-dev/modular-flux.1-dev/.

Do we have to manually populate repo like seen in Wan?

I will merge this PR once the above point is clarified.

@yiyixuxu
Copy link
Collaborator

@sayakpaul

yes, manually
but we will make it work with standard repo directly in #11944

Do we have to manually populate repo

@sayakpaul
Copy link
Member Author

Alright. I manually populated the repo. Looking forward to that PR. I will open PRs for the other tasks for Flux. This is getting very infectious to work on ❤️

@sayakpaul
Copy link
Member Author

Failing tests are unrelated.

@sayakpaul sayakpaul merged commit 203dc52 into main Jul 29, 2025
13 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants