Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add hunyuanvideo performance #387

Merged
merged 2 commits into from
Dec 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 12 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ Furthermore, xDiT incorporates optimization techniques from [DiTFastAttn](https:

<h2 id="updates">📢 Updates</h2>

* 🎉**December 7, 2024**: xDiT is the official parallel inference engine for [HunyuanVideo](https://github.com/Tencent-Hunyuan/HunyuanVideo), reducing 5-sec video generation latency from 31 minutes to 5 minutes!
* 🎉**November 28, 2024**: xDiT achieves 1.6 sec end-to-end latency for 28-step [Flux.1-Dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) inference on 4xH100!
* 🎉**November 20, 2024**: xDiT supports [CogVideoX-1.5](https://huggingface.co/THUDM/CogVideoX1.5-5B) and achieved 6.12x speedup compare to the implementation in diffusers!
* 🎉**November 11, 2024**: xDiT has been applied to [mochi-1](https://github.com/xdit-project/mochi-xdit) and achieved 3.54x speedup compare to the official open source implementation!
Expand Down Expand Up @@ -158,31 +159,35 @@ Currently, if you need the parallel version of ComfyUI, please fill in this [app

<h3 id="perf_mochi1">Mochi1</h3>

1. [mochi1-xdit: Reducing the Inference Latency by 3.54x Compare to the Official Open Souce Implementation!](https://github.com/xdit-project/mochi-xdit)
1. [HunyuanVideo Performance Report](./docs/performance/hunyuanvideo.md)

<h3 id="perf_cogvideox">CogVideo</h3>

2. [CogVideo Performance Report](./docs/performance/cogvideo.md)
2. [mochi1-xdit: Reducing the Inference Latency by 3.54x Compare to the Official Open Souce Implementation!](https://github.com/xdit-project/mochi-xdit)

<h3 id="perf_cogvideox">CogVideo</h3>

3. [CogVideo Performance Report](./docs/performance/cogvideo.md)

<h3 id="perf_flux">Flux.1</h3>

3. [Flux Performance Report](./docs/performance/flux.md)
4. [Flux Performance Report](./docs/performance/flux.md)

<h3 id="perf_latte">Latte</h3>

4. [Latte Performance Report](./docs/performance/latte.md)
5. [Latte Performance Report](./docs/performance/latte.md)

<h3 id="perf_hunyuandit">HunyuanDiT</h3>

5. [HunyuanDiT Performance Report](./docs/performance/hunyuandit.md)
6. [HunyuanDiT Performance Report](./docs/performance/hunyuandit.md)

<h3 id="perf_sd3">SD3</h3>

6. [Stable Diffusion 3 Performance Report](./docs/performance/sd3.md)
7. [Stable Diffusion 3 Performance Report](./docs/performance/sd3.md)

<h3 id="perf_pixart">Pixart</h3>

7. [Pixart-Alpha Performance Report (legacy)](./docs/performance/pixart_alpha_legacy.md)
8. [Pixart-Alpha Performance Report (legacy)](./docs/performance/pixart_alpha_legacy.md)


<h2 id="QuickStart">🚀 QuickStart</h2>
Expand Down
25 changes: 25 additions & 0 deletions docs/performance/hunyuanvideo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
## HunyuanVideo Performance Report

xDiT is [HunyuanVideo](https://github.com/Tencent/HunyuanVideo?tab=readme-ov-file#-parallel-inference-on-multiple-gpus-by-xdit)'s official parallel inference engine. On H100 and H20 GPUs, xDiT reduces the generation time of 1028x720 videos from 31 minutes to 5 minutes, and 960x960 videos from 28 minutes to 6 minutes.

### 1280x720 Resolution (129 frames, 50 steps) - Ulysses Latency (seconds)

<center>

| GPU Type | 1 GPU | 2 GPUs | 4 GPUs | 8 GPUs |
|----------|--------|---------|---------|---------|
| H100 | 1,904.08 | 925.04 | 514.08 | 337.58 |
| H20 | 6,639.17 | 3,400.55 | 1,762.86 | 940.97 |

</center>

### 960x960 Resolution (129 frames, 50 steps) - Ulysses Latency (seconds)

<center>

| GPU Type | 1 GPU | 2 GPUs | 3 GPUs | 6 GPUs |
|----------|--------|---------|---------|---------|
| H100 | 1,735.01 | 934.09 | 645.45 | 367.02 |
| H20 | 6,621.46 | 3,400.55 | 2,310.48 | 1,214.67 |

</center>
Loading