Skip to content

Conversation

sayakpaul
Copy link
Member

@sayakpaul sayakpaul commented Oct 9, 2025

What does this PR do?

Subcedes #12269.

Test code:
import torch 
from diffusers import ModularPipeline
from diffusers.utils import load_image

repo_id = "black-forest-labs/FLUX.1-Kontext-dev"

pipe = ModularPipeline.from_pretrained(repo_id)
pipe.load_components(torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")

image = load_image(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/yarn-art-pikachu.png" 
).convert("RGB")
prompt = "Make Pikachu hold a sign that says 'Black Forest Labs is awesome', yarn art style, detailed, vibrant colors"

output = pipe(
    image=image,
    prompt=prompt,
    guidance_scale=2.5,
    num_inference_steps=28,
    max_sequence_length=512,
    generator=torch.manual_seed(0)
)
output.values["images"][0].save("modular_flux_kontext_image.png")

prompt = "A cat and a dog baking a cake together in a kitchen. The cat is carefully measuring flour, while the dog is stirring the batter with a wooden spoon. The kitchen is cozy, with sunlight streaming through the window."
output = pipe(
    prompt=prompt, 
    num_inference_steps=28, 
    guidance_scale=3.5, 
    generator=torch.manual_seed(0),
    max_sequence_length=512,
)
output.values["images"][0].save("modular_flux.png")

Results:

T2I I2I
alt text alt text

@asomoza
Copy link
Member

asomoza commented Oct 9, 2025

in case you need it, the code I'm using to test:

import torch

from diffusers.modular_pipelines import ComponentsManager, ModularPipeline
from diffusers.utils import load_image

# CONFIG
repo_id = "black-forest-labs/FLUX.1-Kontext-dev"
device = "cuda"
prompt = "make it sknow"
guidance_scale = 2.5
num_inference_steps = 28
seed = 0
image = load_image("https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/differential/dog_source.png").convert("RGB")

# COMPONENTS MANAGER
components = ComponentsManager()
components.enable_auto_cpu_offload(device=device)

# BLOCKS
blocks =  ModularPipeline.from_pretrained(repo_id, components_manager=components).blocks

# ENCODE PROMPT
text_blocks = blocks.sub_blocks.pop("text_encoder")
text_encoder_node = text_blocks.init_pipeline(repo_id, components_manager=components)
text_encoder_node.load_components(torch_dtype=torch.bfloat16)

text_state = text_encoder_node(prompt=prompt, max_sequence_length=512)
text_embeddings = text_state.get_by_kwargs("denoiser_input_fields")

# ENCODE IMAGE
encoder_block_name = next((name for name in blocks.block_names if "encode" in name.lower() and "text" not in name.lower()), None)
encoder_blocks = blocks.sub_blocks.pop(encoder_block_name)
encoder_node = encoder_blocks.init_pipeline(repo_id, components_manager=components)
encoder_node.load_components(torch_dtype=torch.bfloat16)
state = encoder_node(image=image)
image_latents = state.get("image_latents")

# DENOISE
denoise_blocks = blocks.sub_blocks.pop("denoise")
denoise_node = denoise_blocks.init_pipeline(repo_id, components_manager=components)
denoise_node.load_components(torch_dtype=torch.bfloat16)

generator = torch.Generator(device=device).manual_seed(seed)

denoise_state = denoise_node(
    **text_embeddings,
    guidance_scale=guidance_scale,
    num_inference_steps=num_inference_steps,
    generator=generator,
    image_latents=image_latents,
)
latents = denoise_state.get("latents")

# VAE DECODE
decoder_blocks = blocks.sub_blocks.pop("decode")
decoder_node = decoder_blocks.init_pipeline(repo_id, components_manager=components)
decoder_node.load_components(torch_dtype=torch.bfloat16)
image = decoder_node(latents=latents, output="images")[0]
image.save("modular_result.png")

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sayakpaul sayakpaul marked this pull request as draft October 9, 2025 13:11
@sayakpaul sayakpaul marked this pull request as ready for review October 10, 2025 05:04
@sayakpaul
Copy link
Member Author

@asomoza you should be good to go now :)

Copy link
Member

@asomoza asomoza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! both T2I and I2I now works with nodes and I don't see any other issues

@sayakpaul sayakpaul merged commit 693d8a3 into main Oct 10, 2025
15 of 17 checks passed
@sayakpaul sayakpaul deleted the flux-kontext-modular-new branch October 10, 2025 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants