Skip to content

Conversation

@lawrence-cj
Copy link
Contributor

@lawrence-cj lawrence-cj commented Nov 4, 2025

What does this PR do?

This PR add SANA-Video, a new text/image-to-video model from NVIDIA
Paper
Project
HF weight

Cc: @yiyixuxu @asomoza @sayakpaul

import torch
from diffusers import SanaPipeline, SanaVideoPipeline, UniPCMultistepScheduler, DPMSolverMultistepScheduler
from diffusers import AutoencoderKLWan
from diffusers.utils import export_to_video


model_id = "Efficient-Large-Model/SANA-Video_2B_480p_diffusers"
pipe = SanaVideoPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16)
# pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=8.0)
# pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=8.0)
pipe.vae.to(torch.float32)
pipe.text_encoder.to(torch.bfloat16)
pipe.to("cuda")
model_score = 30

prompt = "Evening, backlight, side lighting, soft light, high contrast, mid-shot, centered composition, clean solo shot, warm color. A young Caucasian man stands in a forest, golden light glimmers on his hair as sunlight filters through the leaves. He wears a light shirt, wind gently blowing his hair and collar, light dances across his face with his movements. The background is blurred, with dappled light and soft tree shadows in the distance. The camera focuses on his lifted gaze, clear and emotional."
negative_prompt = "A chaotic sequence with misshapen, deformed limbs in heavy motion blur, sudden disappearance, jump cuts, jerky movements, rapid shot changes, frames out of sync, inconsistent character shapes, temporal artifacts, jitter, and ghosting effects, creating a disorienting visual experience."
motion_prompt = f" motion score: {model_score}."
prompt = prompt + motion_prompt

video = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=480,
    width=832,
    frames=81,
    guidance_scale=6,
    num_inference_steps=50,
    generator=torch.Generator(device="cuda").manual_seed(42),
).frames[0]

export_to_video(video, "sana_video.mp4", fps=16)

Results:

sana_v2.mp4

@sayakpaul sayakpaul requested a review from dg845 November 4, 2025 03:16
return int(default_hw[0]), int(default_hw[1])

@staticmethod
def resize_and_crop_tensor(samples: torch.Tensor, new_width: int, new_height: int) -> torch.Tensor:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think exposing an interface like VaeImageProcessor.resize:

def resize(
self,
image: Union[PIL.Image.Image, np.ndarray, torch.Tensor],
height: int,
width: int,
resize_mode: str = "default", # "default", "fill", "crop"
) -> Union[PIL.Image.Image, np.ndarray, torch.Tensor]:

would be more robust, since different video preprocessing pipelines will probably make different choices here. Not blocking, on the diffusers side we can follow up to support more video pipelines here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I would let u guys help to finish this part. Thanks!!

Copy link
Collaborator

@dg845 dg845 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Would you be able to add tests and docs? We can help with both, especially the tests, but for the docs it may be harder for us as we are not as familiar with the intricacies of the model.

Copy link
Collaborator

@dg845 dg845 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow up changes! I have made some suggestions that should help the Sana Video pipeline tests pass.

Sorry for all the small change requests, but could you also do the following?

  1. Can you run the following to make sure that the CI code quality check is green?
make style
make quality
make fix-copies
  1. Can you add the new Sana Video markdown docs to docs/source/en/_toctree.yml? For reference, here is how the Sana pipeline docs were added:
    - local: api/pipelines/sana
    title: Sana
    This change will help the docs build correctly.

@lawrence-cj
Copy link
Contributor Author

lawrence-cj commented Nov 5, 2025

Thanks for the follow up changes! I have made some suggestions that should help the Sana Video pipeline tests pass.

Sorry for all the small change requests, but could you also do the following?

  1. Can you run the following to make sure that the CI code quality check is green?
make style
make quality
make fix-copies
  1. Can you add the new Sana Video markdown docs to docs/source/en/_toctree.yml? For reference, here is how the Sana pipeline docs were added:
    - local: api/pipelines/sana
    title: Sana

    This change will help the docs build correctly.

Done! Let's test it.

Comment on lines +376 to +377
- local: api/models/sana_video_transformer3d
title: SanaVideoTransformer3DModel
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause an error when building the docs since the api/models/sana_video_transformer3d file doesn't currently exist. Could you add a markdown doc for the transformer as well? For reference, here is the documentation for SanaTransformer2DModel: https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/models/sana_transformer2d.md

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@dg845
Copy link
Collaborator

dg845 commented Nov 6, 2025

@bot /style

@github-actions
Copy link
Contributor

github-actions bot commented Nov 6, 2025

Style bot fixed some files and pushed the changes.

@dg845
Copy link
Collaborator

dg845 commented Nov 6, 2025

@lawrence-cj, thanks again for the PR! The CI errors are unrelated to the PR so merging.

@dg845 dg845 merged commit b3e9dfc into huggingface:main Nov 6, 2025
9 of 11 checks passed
@lawrence-cj
Copy link
Contributor Author

Thank you so much for your support! ❤️

Cc @dg845 @sayakpaul @yiyixuxu

@sayakpaul
Copy link
Member

Congratulations on the release!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants