add ChronoEdit #12593

zhangjiewu · 2025-11-05T05:00:04Z

add ChronoEdit

This PR adds ChronoEdit, a state-of-the-art image editing model that reframes image editing as a video generation task to achieve physically consistent edits.

HF Model: https://huggingface.co/nvidia/ChronoEdit-14B-Diffusers
Gradio Demo: https://huggingface.co/spaces/nvidia/ChronoEdit
Paper: https://arxiv.org/abs/2510.04290
Code: https://github.com/nv-tlabs/ChronoEdit
Website: https://research.nvidia.com/labs/toronto-ai/chronoedit/

cc: @sayakpaul @yiyixuxu @asomoza

Usage

Full model

import torch
import numpy as np
from diffusers import AutoencoderKLWan, ChronoEditTransformer3DModel, ChronoEditPipeline
from diffusers.utils import export_to_video, load_image
from transformers import CLIPVisionModel
from PIL import Image

model_id = "nvidia/ChronoEdit-14B-Diffusers"
image_encoder = CLIPVisionModel.from_pretrained(model_id, subfolder="image_encoder", torch_dtype=torch.float32)
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
transformer = ChronoEditTransformer3DModel.from_pretrained(model_id, subfolder="transformer", torch_dtype=torch.bfloat16)
pipe = ChronoEditPipeline.from_pretrained(model_id, image_encoder=image_encoder, transformer=transformer, vae=vae, torch_dtype=torch.bfloat16)
pipe.to("cuda")

image = load_image(
    "https://huggingface.co/spaces/nvidia/ChronoEdit/resolve/main/examples/3.png"
)
max_area = 720 * 1280
aspect_ratio = image.height / image.width
mod_value = pipe.vae_scale_factor_spatial * pipe.transformer.config.patch_size[1]
height = round(np.sqrt(max_area * aspect_ratio)) // mod_value * mod_value
width = round(np.sqrt(max_area / aspect_ratio)) // mod_value * mod_value
print("width", width, "height", height)
image = image.resize((width, height))
prompt = (
    "The user wants to transform the image by adding a small, cute mouse sitting inside the floral teacup, enjoying a spa bath. The mouse should appear relaxed and cheerful, with a tiny white bath towel draped over its head like a turban. It should be positioned comfortably in the cup’s liquid, with gentle steam rising around it to blend with the cozy atmosphere. "
    "The mouse’s pose should be natural—perhaps sitting upright with paws resting lightly on the rim or submerged in the tea. The teacup’s floral design, gold trim, and warm lighting must remain unchanged to preserve the original aesthetic. The steam should softly swirl around the mouse, enhancing the spa-like, whimsical mood."
)

output = pipe(
    image=image,
    prompt=prompt,
    height=height,
    width=width,
    num_frames=5,
    num_inference_steps=50,
    guidance_scale=5.0,
    enable_temporal_reasoning=False,
    num_temporal_reasoning_steps=0,
).frames[0]
export_to_video(output, "output.mp4", fps=4)
Image.fromarray((output[-1] * 255).clip(0, 255).astype("uint8")).save("output.png")

Full model with temporal reasoning

output = pipe(
    image=image,
    prompt=prompt,
    height=height,
    width=width,
    num_frames=29,
    num_inference_steps=50,
    guidance_scale=5.0,
    enable_temporal_reasoning=True,
    num_temporal_reasoning_steps=50,
).frames[0]

With 8-steps distillation LoRA

import torch
import numpy as np
from diffusers import AutoencoderKLWan, ChronoEditTransformer3DModel, ChronoEditPipeline
from diffusers.utils import export_to_video, load_image
from transformers import CLIPVisionModel
from PIL import Image

model_id = "nvidia/ChronoEdit-14B-Diffusers"
image_encoder = CLIPVisionModel.from_pretrained(model_id, subfolder="image_encoder", torch_dtype=torch.float32)
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
transformer = ChronoEditTransformer3DModel.from_pretrained(model_id, subfolder="transformer", torch_dtype=torch.bfloat16)
pipe = ChronoEditPipeline.from_pretrained(model_id, image_encoder=image_encoder, transformer=transformer, vae=vae, torch_dtype=torch.bfloat16)
lora_path = hf_hub_download(repo_id=model_id, filename="lora/chronoedit_distill_lora.safetensors")
pipe.load_lora_weights(lora_path)
pipe.fuse_lora(lora_scale=1.0)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=2.0)
pipe.to("cuda")

image = load_image(
    "https://huggingface.co/spaces/nvidia/ChronoEdit/resolve/main/examples/3.png"
)
max_area = 720 * 1280
aspect_ratio = image.height / image.width
mod_value = pipe.vae_scale_factor_spatial * pipe.transformer.config.patch_size[1]
height = round(np.sqrt(max_area * aspect_ratio)) // mod_value * mod_value
width = round(np.sqrt(max_area / aspect_ratio)) // mod_value * mod_value
print("width", width, "height", height)
image = image.resize((width, height))
prompt = (
    "The user wants to transform the image by adding a small, cute mouse sitting inside the floral teacup, enjoying a spa bath. The mouse should appear relaxed and cheerful, with a tiny white bath towel draped over its head like a turban. It should be positioned comfortably in the cup’s liquid, with gentle steam rising around it to blend with the cozy atmosphere. "
    "The mouse’s pose should be natural—perhaps sitting upright with paws resting lightly on the rim or submerged in the tea. The teacup’s floral design, gold trim, and warm lighting must remain unchanged to preserve the original aesthetic. The steam should softly swirl around the mouse, enhancing the spa-like, whimsical mood."
)

output = pipe(
    image=image,
    prompt=prompt,
    height=height,
    width=width,
    num_frames=5,
    num_inference_steps=8,
    guidance_scale=1.0,
    enable_temporal_reasoning=False,
    num_temporal_reasoning_steps=0,
).frames[0]
export_to_video(output, "output.mp4", fps=4)
Image.fromarray((output[-1] * 255).clip(0, 255).astype("uint8")).save("output.png")

yiyixuxu · 2025-11-05T17:24:05Z

src/diffusers/models/transformers/transformer_chronoedit.py

+from ..modeling_outputs import Transformer2DModelOutput
+from ..modeling_utils import ModelMixin
+from ..normalization import FP32LayerNorm
+from .transformer_wan import WanTimeTextImageEmbedding, WanTransformerBlock


can we copy over these 2 things and add a #Copied from, instead of importing from wan?

yep, that makes sense. so we’ll need to copy the all the modules in transformer_wan here.

yiyixuxu

thanks for the PR! I left one question about whether we support any number of num_frame
other than that, I think we should remove stuff that's in wan but not needed here for chrono to simplify the code a bit, but if you want to keep it consistent and may support these features in the future, that's ok too

yiyixuxu · 2025-11-05T18:35:46Z

src/diffusers/pipelines/chronoedit/pipeline_chronoedit.py

+        self.video_processor = VideoProcessor(vae_scale_factor=self.vae_scale_factor_spatial)
+        self.image_processor = image_processor
+
+    def _get_t5_prompt_embeds(


let's add a Copied from if it's same one as Wan

yiyixuxu · 2025-11-05T18:37:31Z

src/diffusers/pipelines/chronoedit/pipeline_chronoedit.py

+
+        return prompt_embeds
+
+    def encode_image(


yiyixuxu · 2025-11-05T18:46:34Z

src/diffusers/pipelines/chronoedit/pipeline_chronoedit.py

+        image_encoder: CLIPVisionModel = None,
+        transformer: ChronoEditTransformer3DModel = None,
+        transformer_2: ChronoEditTransformer3DModel = None,
+        boundary_ratio: Optional[float] = None,


Suggested change

boundary_ratio: Optional[float] = None,

if we don't support the two stage denoising loop, let's remove parameter and all its related logic, to simplify the pipeline a bit

yiyixuxu · 2025-11-05T18:50:18Z

src/diffusers/pipelines/chronoedit/pipeline_chronoedit.py

+        num_frames: int = 81,
+        num_inference_steps: int = 50,
+        guidance_scale: float = 5.0,
+        guidance_scale_2: Optional[float] = None,


Suggested change

guidance_scale_2: Optional[float] = None,

yiyixuxu · 2025-11-05T18:51:23Z

src/diffusers/pipelines/chronoedit/pipeline_chronoedit.py

+        prompt_embeds: Optional[torch.Tensor] = None,
+        negative_prompt_embeds: Optional[torch.Tensor] = None,
+        image_embeds: Optional[torch.Tensor] = None,
+        last_image: Optional[torch.Tensor] = None,


it's a image editing task and can output video to show the reasoning process, no? what would be a meaningful use case to also pass a last_iamge parameter here?

yiyixuxu · 2025-11-05T19:00:00Z

src/diffusers/pipelines/chronoedit/pipeline_chronoedit.py

+        if self.config.boundary_ratio is not None and image_embeds is not None:
+            raise ValueError("Cannot forward `image_embeds` when the pipeline's `boundary_ratio` is not configured.")
+
+    def prepare_latents(


i think this is same as in wan i2v too?
if you want to just add a #Copied from and keep this method as it is, it's fine! we can also just remove all the logics we don't need here related to last_frame and expand_timesteps

yes it's the same as in wan i2v. I add reference to original function and remove all the logics for wan2.2.

yiyixuxu · 2025-11-05T19:36:56Z

src/diffusers/models/transformers/transformer_chronoedit.py

+        freqs_cos = self.freqs_cos.split(split_sizes, dim=1)
+        freqs_sin = self.freqs_sin.split(split_sizes, dim=1)
+
+        assert num_frames == 2 or num_frames == self.temporal_skip_len, (


i don't understand this check here, I think after temporal reasoning step, mum_frames is 2, but other than that e.g. if temporal reasoning is not enabled, this dimension will have various lengths, based on the num_frames variable the users passed to pipeline, no?
if our model can only work with fixed num_frames, maybe we can throw an error from the pipeline when we check the inputs?

yes, it works on num_frames >= 2. I've removed this check in latest commit.

zhangjiewu · 2025-11-06T04:59:58Z

Hi @yiyixuxu, thanks for your review and suggestions! I’ve updated the code accordingly in the latest commit. Please feel free to make any further changes if needed.

src/diffusers/pipelines/chronoedit/pipeline_chronoedit.py

yiyixuxu

looking great! do you add a doc page to oin this PR?
also tests, but we can help with tests if you need

src/diffusers/pipelines/chronoedit/pipeline_chronoedit.py

yiyixuxu · 2025-11-06T16:05:33Z

doc we can do something similar to wan:

add a page like this https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/pipelines/wan.md
adding it to https://github.com/huggingface/diffusers/blob/main/docs/source/en/_toctree.yml

yiyixuxu · 2025-11-06T16:07:52Z

tests, can just follow what wan did
https://github.com/huggingface/diffusers/blob/main/tests/pipelines/wan/test_wan.py#L39
only need fast tests for Chrono for now I think! we don't need slow test for now

HuggingFaceDocBuilderDev · 2025-11-06T16:16:47Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Co-authored-by: YiYi Xu <[email protected]>

zhangjiewu · 2025-11-07T01:22:57Z

looking great! do you add a doc page to oin this PR? also tests, but we can help with tests if you need

test added. will work on the doc now :)

zhangjiewu · 2025-11-07T03:09:50Z

@yiyixuxu docs has been added. 104e886

yiyixuxu · 2025-11-07T22:55:03Z

@bot /style

github-actions · 2025-11-07T22:55:38Z

Style fix runs successfully without any file modified.

dg845 · 2025-11-08T03:27:00Z

Hi @zhangjiewu, could you perform the following?

Can you run make fix-copies so that the CI repository consistency check succeeds?
Can you add the docs at api/models/chronoedit_transformer_3d to the _toctree as well? For reference, here is how WanTransformer3DModel is added:

diffusers/docs/source/en/_toctree.yml

Lines 388 to 389 in a9cb08a

- local: api/models/wan_transformer_3d

title: WanTransformer3DModel

Otherwise, the docs will not build successfully.

Thanks!

zhangjiewu · 2025-11-08T04:13:59Z

Hey @dg845, I’ve completed the two tasks you commented on. Thank you!

dg845 · 2025-11-08T06:39:57Z

I see that tests/pipelines/chronoedit/test_chronoedit.py::ChronoEditPipelineFastTests::test_inference fails both on the CI and when I tried it locally because the generated_slice is not close enough to the expected_slice. Is this failure expected?

zhangjiewu · 2025-11-09T02:23:48Z

Hi @dg845, I got these errors when running pytest tests/pipelines/chronoedit/test_chronoedit.py, even for tests/pipelines/wan/test_wan_image_to_video.py. Any thoughts?

============================ short test summary info =============================
FAILED tests/pipelines/chronoedit/test_chronoedit.py::ChronoEditPipelineFastTests::test_inference - RuntimeError: Expected all tensors to be on the same device, but found at lea...
FAILED tests/pipelines/chronoedit/test_chronoedit.py::ChronoEditPipelineFastTests::test_save_load_float16 - RuntimeError: expected scalar type Float but found Half
============== 2 failed, 29 passed, 3 skipped, 3 warnings in 38.25s =============

Could you try if the following input works?

inputs = {
    "image": image,
    "prompt": "dance monkey",
    "negative_prompt": "negative",  # TODO
    "height": image_height,
    "width": image_width,
    "generator": generator,
    "num_inference_steps": 2,
    "guidance_scale": 6.0,
    "num_frames": 5,
    "max_sequence_length": 16,
    "output_type": "pt",
}
...
self.assertEqual(generated_video.shape, (5, 3, 16, 16))

sayakpaul · 2025-11-09T03:15:42Z

For test_save_load_float16, #12500 might be relevant.

add ChronoEdit

6b9f195

sayakpaul requested review from DN6 and dg845 November 5, 2025 05:51

yiyixuxu reviewed Nov 5, 2025

View reviewed changes

add ref to original function & remove wan2.2 logics

4bb4427

yiyixuxu reviewed Nov 6, 2025

View reviewed changes

src/diffusers/pipelines/chronoedit/pipeline_chronoedit.py Outdated Show resolved Hide resolved

yiyixuxu approved these changes Nov 6, 2025

View reviewed changes

src/diffusers/pipelines/chronoedit/pipeline_chronoedit.py Outdated Show resolved Hide resolved

yiyixuxu added the close-to-merge label Nov 6, 2025

zhangjiewu and others added 4 commits November 6, 2025 19:48

Update src/diffusers/pipelines/chronoedit/pipeline_chronoedit.py

e6e30cd

Co-authored-by: YiYi Xu <[email protected]>

Update src/diffusers/pipelines/chronoedit/pipeline_chronoedit.py

c126a3e

Co-authored-by: YiYi Xu <[email protected]>

Merge branch 'huggingface:main' into chronoedit

49a9dc1

add ChronoeEdit test

27d3f1b

add docs

104e886

wjay added 2 commits November 7, 2025 20:02

add docs

bc10308

make fix-copies

a32a140

add ChronoEdit #12593

Are you sure you want to change the base?

add ChronoEdit #12593

Uh oh!

Conversation

zhangjiewu commented Nov 5, 2025

add ChronoEdit

Usage

Full model

Full model with temporal reasoning

With 8-steps distillation LoRA

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhangjiewu Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhangjiewu commented Nov 6, 2025

Uh oh!

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yiyixuxu commented Nov 6, 2025

Uh oh!

yiyixuxu commented Nov 6, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 6, 2025

Uh oh!

zhangjiewu commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhangjiewu commented Nov 7, 2025

Uh oh!

yiyixuxu commented Nov 7, 2025

Uh oh!

github-actions bot commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dg845 commented Nov 8, 2025

Uh oh!

zhangjiewu commented Nov 8, 2025

Uh oh!

dg845 commented Nov 8, 2025

Uh oh!

zhangjiewu commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayakpaul commented Nov 9, 2025

Uh oh!

Reviewers

Assignees

Labels

zhangjiewu Nov 5, 2025 •

edited

Loading

zhangjiewu commented Nov 7, 2025 •

edited

Loading

github-actions bot commented Nov 7, 2025 •

edited

Loading

zhangjiewu commented Nov 9, 2025 •

edited

Loading