diff --git a/README.md b/README.md index ec758a48..eb0182b3 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@

- + xDiT @@ -22,6 +22,7 @@ - [📈 Performance](#perf) - [HunyuanVideo](#perf_hunyuanvideo) - [Mochi-1](#perf_mochi1) + - [ConsisID](#perf_consisid) - [CogVideoX](#perf_cogvideox) - [Flux.1](#perf_flux) - [HunyuanDiT](#perf_hunyuandit) @@ -94,6 +95,7 @@ Furthermore, xDiT incorporates optimization techniques from [DiTFastAttn](https:

📢 Updates

+* 🎉**December 24, 2024**: xDiT supports [ConsisID-Preview](https://github.com/PKU-YuanGroup/ConsisID) and achieved 3.21x speedup compare to the official implementation! The inference scripts are [examples/consisid_example.py](examples/consisid_example.py) and [examples/consisid_usp_example.py](examples/consisid_usp_example.py). * 🎉**December 7, 2024**: xDiT is the official parallel inference engine for [HunyuanVideo](https://github.com/Tencent/HunyuanVideo), reducing the 5-sec video generation latency from 31 minutes to 5 minutes on 8xH100! * 🎉**November 28, 2024**: xDiT achieves 1.6 sec end-to-end latency for 28-step [Flux.1-Dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) inference on 4xH100! * 🎉**November 20, 2024**: xDiT supports [CogVideoX-1.5](https://huggingface.co/THUDM/CogVideoX1.5-5B) and achieved 6.12x speedup compare to the implementation in diffusers! @@ -117,11 +119,12 @@ Furthermore, xDiT incorporates optimization techniques from [DiTFastAttn](https: | Model Name | CFG | SP | PipeFusion | | --- | --- | --- | --- | -| [🎬 HunyuanVideo](https://github.com/Tencent/HunyuanVideo) | NA | ✔️ | ❎ | -| [🎬 CogVideoX1.5](https://huggingface.co/THUDM/CogVideoX1.5-5B) | ✔️ | ✔️ | ❎ | -| [🎬 Mochi-1](https://github.com/xdit-project/mochi-xdit) | ✔️ | ✔️ | ❎ | -| [🎬 CogVideoX](https://huggingface.co/THUDM/CogVideoX-2b) | ✔️ | ✔️ | ❎ | -| [🎬 Latte](https://huggingface.co/maxin-cn/Latte-1) | ❎ | ✔️ | ❎ | +| [🎬 HunyuanVideo](https://github.com/Tencent/HunyuanVideo) | NA | ✔️ | ❎ | +| [🎬 ConsisID-Preview](https://github.com/PKU-YuanGroup/ConsisID) | ✔️ | ✔️ | ❎ | +| [🎬 CogVideoX1.5](https://huggingface.co/THUDM/CogVideoX1.5-5B) | ✔️ | ✔️ | ❎ | +| [🎬 Mochi-1](https://github.com/xdit-project/mochi-xdit) | ✔️ | ✔️ | ❎ | +| [🎬 CogVideoX](https://huggingface.co/THUDM/CogVideoX-2b) | ✔️ | ✔️ | ❎ | +| [🎬 Latte](https://huggingface.co/maxin-cn/Latte-1) | ❎ | ✔️ | ❎ | | [🔵 HunyuanDiT-v1.2-Diffusers](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-v1.2-Diffusers) | ✔️ | ✔️ | ✔️ | | [🟠 Flux](https://huggingface.co/black-forest-labs/FLUX.1-schnell) | NA | ✔️ | ✔️ | | [🔴 PixArt-Sigma](https://huggingface.co/PixArt-alpha/PixArt-Sigma-XL-2-1024-MS) | ✔️ | ✔️ | ✔️ | @@ -163,33 +166,37 @@ Currently, if you need the parallel version of ComfyUI, please fill in this [app 1. [HunyuanVideo Performance Report](./docs/performance/hunyuanvideo.md) +

ConsisID-Preview

+ +2. [ConsisID Performance Report](./docs/performance/consisid.md) +

Mochi1

-2. [mochi1-xdit: Reducing the Inference Latency by 3.54x Compare to the Official Open Souce Implementation!](https://github.com/xdit-project/mochi-xdit) +3. [mochi1-xdit: Reducing the Inference Latency by 3.54x Compare to the Official Open Souce Implementation!](https://github.com/xdit-project/mochi-xdit)

CogVideo

-3. [CogVideo Performance Report](./docs/performance/cogvideo.md) +4. [CogVideo Performance Report](./docs/performance/cogvideo.md)

Flux.1

-4. [Flux Performance Report](./docs/performance/flux.md) +5. [Flux Performance Report](./docs/performance/flux.md)

Latte

-5. [Latte Performance Report](./docs/performance/latte.md) +6. [Latte Performance Report](./docs/performance/latte.md)

HunyuanDiT

-6. [HunyuanDiT Performance Report](./docs/performance/hunyuandit.md) +7. [HunyuanDiT Performance Report](./docs/performance/hunyuandit.md)

SD3

-7. [Stable Diffusion 3 Performance Report](./docs/performance/sd3.md) +8. [Stable Diffusion 3 Performance Report](./docs/performance/sd3.md)

Pixart

-8. [Pixart-Alpha Performance Report (legacy)](./docs/performance/pixart_alpha_legacy.md) +9. [Pixart-Alpha Performance Report (legacy)](./docs/performance/pixart_alpha_legacy.md)

🚀 QuickStart

diff --git a/examples/consisid_usp_example.py b/examples/consisid_usp_example.py index b02a26b5..778dd72f 100644 --- a/examples/consisid_usp_example.py +++ b/examples/consisid_usp_example.py @@ -108,7 +108,7 @@ def new_patch_embed( image_embeds = get_sp_group().all_gather(image_embeds.contiguous(), dim=-2) batch, num_frames, channels, height, width = image_embeds.shape text_len = text_embeds.shape[-2] - + output = original_patch_embed_forward(text_embeds, image_embeds) text_embeds = output[:,:text_len,:] diff --git a/examples/run_consisid_usp.sh b/examples/run_consisid_usp.sh index 24ccd84b..7937c074 100644 --- a/examples/run_consisid_usp.sh +++ b/examples/run_consisid_usp.sh @@ -5,7 +5,7 @@ export PYTHONPATH=$PWD:$PYTHONPATH # ConsisID configuration SCRIPT="consisid_usp_example.py" -MODEL_ID="/cfs/dit/CogVideoX1.5-5B" +MODEL_ID="/cfs/dit/ConsisID-preview" INFERENCE_STEP=50 mkdir -p ./results @@ -16,7 +16,7 @@ TASK_ARGS="--height 480 --width 720 --num_frames 49" # ConsisID parallel configuration N_GPUS=4 PARALLEL_ARGS="--ulysses_degree 1 --ring_degree 2" -# CFG_ARGS="--use_cfg_parallel" +CFG_ARGS="--use_cfg_parallel" # Uncomment and modify these as needed # PIPEFUSION_ARGS="--num_pipeline_patch 8"