Skip to content

Commit

Permalink
update README & USP example
Browse files Browse the repository at this point in the history
  • Loading branch information
SHYuanBest committed Dec 23, 2024
1 parent 89953a8 commit 53efbac
Show file tree
Hide file tree
Showing 3 changed files with 23 additions and 16 deletions.
33 changes: 20 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<div align="center">
<!-- <h1>KTransformers</h1> -->
<p align="center">

<picture>
<img alt="xDiT" src="https://raw.githubusercontent.com/xdit-project/xdit_assets/main/XDiTlogo.png" width="50%">

Expand All @@ -22,6 +22,7 @@
- [📈 Performance](#perf)
- [HunyuanVideo](#perf_hunyuanvideo)
- [Mochi-1](#perf_mochi1)
- [ConsisID](#perf_consisid)
- [CogVideoX](#perf_cogvideox)
- [Flux.1](#perf_flux)
- [HunyuanDiT](#perf_hunyuandit)
Expand Down Expand Up @@ -94,6 +95,7 @@ Furthermore, xDiT incorporates optimization techniques from [DiTFastAttn](https:

<h2 id="updates">📢 Updates</h2>

* 🎉**December 24, 2024**: xDiT supports [ConsisID-Preview](https://github.com/PKU-YuanGroup/ConsisID) and achieved 3.21x speedup compare to the official implementation! The inference scripts are [examples/consisid_example.py](examples/consisid_example.py) and [examples/consisid_usp_example.py](examples/consisid_usp_example.py).
* 🎉**December 7, 2024**: xDiT is the official parallel inference engine for [HunyuanVideo](https://github.com/Tencent/HunyuanVideo), reducing the 5-sec video generation latency from 31 minutes to 5 minutes on 8xH100!
* 🎉**November 28, 2024**: xDiT achieves 1.6 sec end-to-end latency for 28-step [Flux.1-Dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) inference on 4xH100!
* 🎉**November 20, 2024**: xDiT supports [CogVideoX-1.5](https://huggingface.co/THUDM/CogVideoX1.5-5B) and achieved 6.12x speedup compare to the implementation in diffusers!
Expand All @@ -117,11 +119,12 @@ Furthermore, xDiT incorporates optimization techniques from [DiTFastAttn](https:

| Model Name | CFG | SP | PipeFusion |
| --- | --- | --- | --- |
| [🎬 HunyuanVideo](https://github.com/Tencent/HunyuanVideo) | NA | ✔️ ||
| [🎬 CogVideoX1.5](https://huggingface.co/THUDM/CogVideoX1.5-5B) | ✔️ | ✔️ ||
| [🎬 Mochi-1](https://github.com/xdit-project/mochi-xdit) | ✔️ | ✔️ ||
| [🎬 CogVideoX](https://huggingface.co/THUDM/CogVideoX-2b) | ✔️ | ✔️ ||
| [🎬 Latte](https://huggingface.co/maxin-cn/Latte-1) || ✔️ ||
| [🎬 HunyuanVideo](https://github.com/Tencent/HunyuanVideo) | NA | ✔️ ||
| [🎬 ConsisID-Preview](https://github.com/PKU-YuanGroup/ConsisID) | ✔️ | ✔️ ||
| [🎬 CogVideoX1.5](https://huggingface.co/THUDM/CogVideoX1.5-5B) | ✔️ | ✔️ ||
| [🎬 Mochi-1](https://github.com/xdit-project/mochi-xdit) | ✔️ | ✔️ ||
| [🎬 CogVideoX](https://huggingface.co/THUDM/CogVideoX-2b) | ✔️ | ✔️ ||
| [🎬 Latte](https://huggingface.co/maxin-cn/Latte-1) || ✔️ ||
| [🔵 HunyuanDiT-v1.2-Diffusers](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-v1.2-Diffusers) | ✔️ | ✔️ | ✔️ |
| [🟠 Flux](https://huggingface.co/black-forest-labs/FLUX.1-schnell) | NA | ✔️ | ✔️ |
| [🔴 PixArt-Sigma](https://huggingface.co/PixArt-alpha/PixArt-Sigma-XL-2-1024-MS) | ✔️ | ✔️ | ✔️ |
Expand Down Expand Up @@ -163,33 +166,37 @@ Currently, if you need the parallel version of ComfyUI, please fill in this [app

1. [HunyuanVideo Performance Report](./docs/performance/hunyuanvideo.md)

<h3 id="perf_consisid">ConsisID-Preview</h3>

2. [ConsisID Performance Report](./docs/performance/consisid.md)

<h3 id="perf_cogvideox">Mochi1</h3>

2. [mochi1-xdit: Reducing the Inference Latency by 3.54x Compare to the Official Open Souce Implementation!](https://github.com/xdit-project/mochi-xdit)
3. [mochi1-xdit: Reducing the Inference Latency by 3.54x Compare to the Official Open Souce Implementation!](https://github.com/xdit-project/mochi-xdit)

<h3 id="perf_cogvideox">CogVideo</h3>

3. [CogVideo Performance Report](./docs/performance/cogvideo.md)
4. [CogVideo Performance Report](./docs/performance/cogvideo.md)

<h3 id="perf_flux">Flux.1</h3>

4. [Flux Performance Report](./docs/performance/flux.md)
5. [Flux Performance Report](./docs/performance/flux.md)

<h3 id="perf_latte">Latte</h3>

5. [Latte Performance Report](./docs/performance/latte.md)
6. [Latte Performance Report](./docs/performance/latte.md)

<h3 id="perf_hunyuandit">HunyuanDiT</h3>

6. [HunyuanDiT Performance Report](./docs/performance/hunyuandit.md)
7. [HunyuanDiT Performance Report](./docs/performance/hunyuandit.md)

<h3 id="perf_sd3">SD3</h3>

7. [Stable Diffusion 3 Performance Report](./docs/performance/sd3.md)
8. [Stable Diffusion 3 Performance Report](./docs/performance/sd3.md)

<h3 id="perf_pixart">Pixart</h3>

8. [Pixart-Alpha Performance Report (legacy)](./docs/performance/pixart_alpha_legacy.md)
9. [Pixart-Alpha Performance Report (legacy)](./docs/performance/pixart_alpha_legacy.md)


<h2 id="QuickStart">🚀 QuickStart</h2>
Expand Down
2 changes: 1 addition & 1 deletion examples/consisid_usp_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ def new_patch_embed(
image_embeds = get_sp_group().all_gather(image_embeds.contiguous(), dim=-2)
batch, num_frames, channels, height, width = image_embeds.shape
text_len = text_embeds.shape[-2]

output = original_patch_embed_forward(text_embeds, image_embeds)

text_embeds = output[:,:text_len,:]
Expand Down
4 changes: 2 additions & 2 deletions examples/run_consisid_usp.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ export PYTHONPATH=$PWD:$PYTHONPATH

# ConsisID configuration
SCRIPT="consisid_usp_example.py"
MODEL_ID="/cfs/dit/CogVideoX1.5-5B"
MODEL_ID="/cfs/dit/ConsisID-preview"
INFERENCE_STEP=50

mkdir -p ./results
Expand All @@ -16,7 +16,7 @@ TASK_ARGS="--height 480 --width 720 --num_frames 49"
# ConsisID parallel configuration
N_GPUS=4
PARALLEL_ARGS="--ulysses_degree 1 --ring_degree 2"
# CFG_ARGS="--use_cfg_parallel"
CFG_ARGS="--use_cfg_parallel"

# Uncomment and modify these as needed
# PIPEFUSION_ARGS="--num_pipeline_patch 8"
Expand Down

0 comments on commit 53efbac

Please sign in to comment.