Detect 2.0 vs 2.1 ZImageControlNetModel #12861

hlky · 2025-12-18T12:18:24Z

What does this PR do?

In DIFFUSERS_DEFAULT_PIPELINE_PATHS z-image-turbo-controlnet-2.x is changed for separate z-image-turbo-controlnet-2.0 and z-image-turbo-controlnet-2.1. z-image-turbo-controlnet-2.x is kept for CHECKPOINT_KEY_NAMES, there is no key difference between the 2 versions. The detection comes from another key and checking whether the weight is all zeros: torch.all(checkpoint["control_noise_refiner.0.before_proj.weight"] == 0.0). This works due to zero_module when before_proj is initialized - these layers remained untrained in 2.0. We also account for the possibility of control_noise_refiner being removed from the checkpoint as done in the Diffusers version.

diffusers/src/diffusers/models/controlnets/controlnet_z_image.py

Line 343 in 55463f7

self.before_proj = zero_module(nn.Linear(self.dim, self.dim))

With PR:

controlnet = ZImageControlNetModel.from_single_file(
    hf_hub_download(
        "alibaba-pai/Z-Image-Turbo-Fun-Controlnet-Union-2.0",
        "Z-Image-Turbo-Fun-Controlnet-Union-2.0.safetensors",
    ),
    torch_dtype=torch.bfloat16,
-   config="hlky/Z-Image-Turbo-Fun-Controlnet-Union-2.0",
)

Passing config is no longer required.

import torch
from diffusers import ZImageControlNetModel
from huggingface_hub import hf_hub_download


controlnet = ZImageControlNetModel.from_single_file(
    hf_hub_download(
        "alibaba-pai/Z-Image-Turbo-Fun-Controlnet-Union-2.0",
        "Z-Image-Turbo-Fun-Controlnet-Union-2.0.safetensors",
    ),
    torch_dtype=torch.bfloat16,
)
assert controlnet.control_noise_refiner is None

controlnet = ZImageControlNetModel.from_single_file(
    hf_hub_download(
        "alibaba-pai/Z-Image-Turbo-Fun-Controlnet-Union-2.0",
        "Z-Image-Turbo-Fun-Controlnet-Union-2.1.safetensors",
    ),
    torch_dtype=torch.bfloat16,
)
assert controlnet.control_noise_refiner is not None

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

iwr-redmond · 2025-12-23T10:17:54Z

Will this also work for the new 8step and Tile-8step 2.1 variants?

hlky · 2025-12-23T11:31:27Z

@iwr-redmond Yes the same detection should work as the control_noise_refiner layers will be trained in those new variants. This PR is simply QOL improvement for using 2.0 checkpoint so we don't need to pass config to use it.

Regarding new variants:

As 8 step is based on 2.1 it should be ok to use already on main, just load it with from_single_file and pass lower num_inference_steps to pipeline.
Tile variant looks like it should work with ZImageControlNetPipeline and just pass low res image to control_image and pass height + width as the desired high resolution.

I will report back with test results later today, if you experience any issues in the meantime let me know.

hlky · 2025-12-23T12:16:47Z

Both new variants are working on main.

Tile

Code

Note: 8 num_inference_steps is used in official examples, the 1.0 version and Turbo itself used 9, I don't know the context of using 9 considering it's 8-step distilled model.

Note: 0.85 controlnet_conditioning_scale used in official Tile example.

Note: Using 1536 height/width here vs 1728/992 in official example as the input low res is square, but non-square output also included below.

import torch
from diffusers import ZImageControlNetPipeline
from diffusers import ZImageControlNetModel
from diffusers.utils import load_image
from huggingface_hub import hf_hub_download

controlnet = ZImageControlNetModel.from_single_file(
    hf_hub_download(
        "alibaba-pai/Z-Image-Turbo-Fun-Controlnet-Union-2.1",
        filename="Z-Image-Turbo-Fun-Controlnet-Tile-2.1-8steps.safetensors",
    ),
    torch_dtype=torch.bfloat16,
)

pipe = ZImageControlNetPipeline.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo", controlnet=controlnet, torch_dtype=torch.bfloat16
)
pipe.to("cuda")
control_image = load_image(
    "https://huggingface.co/alibaba-pai/Z-Image-Turbo-Fun-Controlnet-Union-2.1/resolve/main/asset/low_res.jpg?download=true"
)
prompt = "这是一张充满都市气息的户外人物肖像照片。画面中是一位年轻男性，他展现出时尚而自信的形象。人物拥有精心打理的短发发型，两侧修剪得较短，顶部保留一定长度，呈现出流行的Undercut造型。他佩戴着一副时尚的浅色墨镜或透明镜框眼镜，为整体造型增添了潮流感。脸上洋溢着温和友善的笑容，神情放松自然，给人以阳光开朗的印象。他身穿一件经典的牛仔外套，这件单品永不过时，展现出休闲又有型的穿衣风格。牛仔外套的蓝色调与整体氛围十分协调，领口处隐约可见内搭的衣物。照片的背景是典型的城市街景，可以看到模糊的建筑物、街道和行人，营造出繁华都市的氛围。背景经过了恰当的虚化处理，使人物主体更加突出。光线明亮而柔和，可能是白天的自然光，为照片带来清新通透的视觉效果。整张照片构图专业，景深控制得当，完美捕捉了一个现代都市年轻人充满活力和自信的瞬间，展现出积极向上的生活态度。"
image = pipe(
    prompt,
    control_image=control_image,
    controlnet_conditioning_scale=0.85,
    height=1536,
    width=1536,
    num_inference_steps=8,
    guidance_scale=0.0,
    generator=torch.Generator("cuda").manual_seed(43),
).images[0]
image.save("zimage-tile.png")

Low res

1536h 1536w	1728h 992w

8 step

Code

Note: 8 num_inference_steps is used in official examples, the 1.0 version and Turbo itself used 9, I don't know the context of using 9 considering it's 8-step distilled model.

import torch
from diffusers import ZImageControlNetPipeline
from diffusers import ZImageControlNetModel
from diffusers.utils import load_image
from huggingface_hub import hf_hub_download
controlnet = ZImageControlNetModel.from_single_file(
    hf_hub_download(
        "alibaba-pai/Z-Image-Turbo-Fun-Controlnet-Union-2.1",
        filename="Z-Image-Turbo-Fun-Controlnet-Union-2.1-8steps.safetensors",
    ),
    torch_dtype=torch.bfloat16,
)

pipe = ZImageControlNetPipeline.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo", controlnet=controlnet, torch_dtype=torch.bfloat16
)
pipe.to("cuda")
control_image = load_image(
    "https://huggingface.co/alibaba-pai/Z-Image-Turbo-Fun-Controlnet-Union/resolve/main/asset/pose.jpg?download=true"
)
prompt = "一位年轻女子站在阳光明媚的海岸线上，白裙在轻拂的海风中微微飘动。她拥有一头鲜艳的紫色长发，在风中轻盈舞动，发间系着一个精致的黑色蝴蝶结，与身后柔和的蔚蓝天空形成鲜明对比。她面容清秀，眉目精致，透着一股甜美的青春气息；神情柔和，略带羞涩，目光静静地凝望着远方的地平线，双手自然交叠于身前，仿佛沉浸在思绪之中。在她身后，是辽阔无垠、波光粼粼的大海，阳光洒在海面上，映出温暖的金色光晕。"
image = pipe(
    prompt,
    control_image=control_image,
    controlnet_conditioning_scale=0.75,
    height=1728,
    width=992,
    num_inference_steps=8,
    guidance_scale=0.0,
    generator=torch.Generator("cuda").manual_seed(43),
).images[0]
image.save("zimage-8step.png")

hlky added 2 commits December 18, 2025 12:04

Detect 2.0 vs 2.1 ZImageControlNetModel

17b8fee

Possibility of control_noise_refiner being removed

caf49b8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Detect 2.0 vs 2.1 ZImageControlNetModel #12861

Detect 2.0 vs 2.1 ZImageControlNetModel #12861

hlky commented Dec 18, 2025 •

edited

Loading

Uh oh!

iwr-redmond commented Dec 23, 2025

Uh oh!

hlky commented Dec 23, 2025

Uh oh!

hlky commented Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Detect 2.0 vs 2.1 ZImageControlNetModel #12861

Are you sure you want to change the base?

Detect 2.0 vs 2.1 ZImageControlNetModel #12861

Conversation

hlky commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Who can review?

Uh oh!

iwr-redmond commented Dec 23, 2025

Uh oh!

hlky commented Dec 23, 2025

Uh oh!

hlky commented Dec 23, 2025

Tile

8 step

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hlky commented Dec 18, 2025 •

edited

Loading