add hunyuanvideo performance

xdit-project · Dec 9, 2024 · 5a0f5ad · 5a0f5ad
1 parent 0a30212
commit 5a0f5ad
Show file tree

Hide file tree

Showing 2 changed files with 37 additions and 7 deletions.
diff --git a/README.md b/README.md
@@ -93,6 +93,7 @@ Furthermore, xDiT incorporates optimization techniques from [DiTFastAttn](https:
 
 <h2 id="updates">📢 Updates</h2>
 
+* 🎉**December 7, 2024**: xDiT is the official parallel inference engine for [HunyuanVideo](https://github.com/Tencent-Hunyuan/HunyuanVideo), reducing 5-sec video generation latency from 31 minutes to 5 minutes!
 * 🎉**November 28, 2024**: xDiT achieves 1.6 sec end-to-end latency for 28-step [Flux.1-Dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) inference on 4xH100!
 * 🎉**November 20, 2024**: xDiT supports [CogVideoX-1.5](https://huggingface.co/THUDM/CogVideoX1.5-5B) and achieved 6.12x speedup compare to the implementation in diffusers!
 * 🎉**November 11, 2024**: xDiT has been applied to [mochi-1](https://github.com/xdit-project/mochi-xdit) and achieved 3.54x speedup compare to the official open source implementation!
@@ -158,31 +159,35 @@ Currently, if you need the parallel version of ComfyUI, please fill in this [app
 
 <h3 id="perf_mochi1">Mochi1</h3>
 
-1. [mochi1-xdit: Reducing the Inference Latency by 3.54x Compare to the Official Open Souce Implementation!](https://github.com/xdit-project/mochi-xdit)
+1. [HunyuanVideo Performance Report](./docs/performance/hunyuanvideo.md)
 
 <h3 id="perf_cogvideox">CogVideo</h3>
 
-2. [CogVideo Performance Report](./docs/performance/cogvideo.md)
+2. [mochi1-xdit: Reducing the Inference Latency by 3.54x Compare to the Official Open Souce Implementation!](https://github.com/xdit-project/mochi-xdit)
+
+<h3 id="perf_cogvideox">CogVideo</h3>
+
+3. [CogVideo Performance Report](./docs/performance/cogvideo.md)
 
 <h3 id="perf_flux">Flux.1</h3>
 
-3. [Flux Performance Report](./docs/performance/flux.md)
+4. [Flux Performance Report](./docs/performance/flux.md)
 
 <h3 id="perf_latte">Latte</h3>
 
-4. [Latte Performance Report](./docs/performance/latte.md)
+5. [Latte Performance Report](./docs/performance/latte.md)
 
 <h3 id="perf_hunyuandit">HunyuanDiT</h3>
 
-5. [HunyuanDiT Performance Report](./docs/performance/hunyuandit.md)
+6. [HunyuanDiT Performance Report](./docs/performance/hunyuandit.md)
 
 <h3 id="perf_sd3">SD3</h3>
 
-6. [Stable Diffusion 3 Performance Report](./docs/performance/sd3.md)
+7. [Stable Diffusion 3 Performance Report](./docs/performance/sd3.md)
 
 <h3 id="perf_pixart">Pixart</h3>
 
-7. [Pixart-Alpha Performance Report (legacy)](./docs/performance/pixart_alpha_legacy.md)
+8. [Pixart-Alpha Performance Report (legacy)](./docs/performance/pixart_alpha_legacy.md)
 
 
 <h2 id="QuickStart">🚀 QuickStart</h2>

diff --git a/docs/performance/hunyuanvideo.md b/docs/performance/hunyuanvideo.md
@@ -0,0 +1,25 @@
+## HunyuanVideo Performance Report
+
+xDiT is [HunyuanVideo](https://github.com/Tencent/HunyuanVideo)'s official parallel inference engine. On H100 and H20 GPUs, xDiT reduces the generation time of 1028x720 videos from 31 minutes to 5 minutes, and 960x960 videos from 28 minutes to 6 minutes.
+
+### 1280x720 Resolution (129 frames, 50 steps) - Ulysses Latency (seconds)
+
+<center>
+
+| GPU Type | 1 GPU | 2 GPUs | 4 GPUs | 8 GPUs |
+|----------|--------|---------|---------|---------|
+| H100 | 1904.08 | 925.04 | 514.08 | 337.58 |
+| H20 | 6,639.17 | 3,400.55 | 1,762.86 | 940.97 |
+
+</center>
+
+### 960x960 Resolution (129 frames, 50 steps) - Ulysses Latency (seconds)
+
+<center>
+
+| GPU Type | 1 GPU | 2 GPUs | 3 GPUs | 6 GPUs |
+|----------|--------|---------|---------|---------|
+| H100 | 1,735.01 | 934.09 | 645.45 | 367.02 |
+| H20 | 6,621.46 | 3,400.55 | 2,310.48 | 1,214.67 |
+
+</center>