-
Notifications
You must be signed in to change notification settings - Fork 133
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add ditfastattn in readme, and seperate cogvideo and ditfastattn run …
…scripts. (#298)
- Loading branch information
1 parent
ae504d6
commit 8affe0d
Showing
4 changed files
with
116 additions
and
63 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -27,14 +27,17 @@ | |
- [Pixart](#perf_pixart) | ||
- [Latte](#perf_latte) | ||
- [🚀 QuickStart](#QuickStart) | ||
- [🖼️ ComfyUI with xDiT](#comfyui) | ||
- [✨ xDiT's Arsenal](#secrets) | ||
- [Parallel Methods](#parallel) | ||
- [1. PipeFusion](#PipeFusion) | ||
- [2. Unified Sequence Parallel](#USP) | ||
- [3. Hybrid Parallel](#hybrid_parallel) | ||
- [4. CFG Parallel](#cfg_parallel) | ||
- [5. Parallel VAE](#parallel_vae) | ||
- [Compilation Acceleration](#compilation) | ||
- [Single GPU Acceleration](#1gpuacc) | ||
- [Compilation Acceleration](#compilation) | ||
- [DiTFastAttn](#dittfastattn) | ||
- [📚 Develop Guide](#dev-guide) | ||
- [🚧 History and Looking for Contributions](#history) | ||
- [📝 Cite Us](#cite-us) | ||
|
@@ -46,14 +49,23 @@ Diffusion Transformers (DiTs) are driving advancements in high-quality image and | |
With the escalating input context length in DiTs, the computational demand of the Attention mechanism grows **quadratically**! | ||
Consequently, multi-GPU and multi-machine deployments are essential to meet the **real-time** requirements in online services. | ||
|
||
|
||
<h3 id="meet-xdit-parallel">Parallel Inference</h3> | ||
|
||
To meet real-time demand for DiTs applications, parallel inference is a must. | ||
xDiT is an inference engine designed for the parallel deployment of DiTs on large scale. | ||
xDiT provides a suite of efficient parallel approaches for Diffusion Models, as well as GPU kernel accelerations. | ||
xDiT provides a suite of efficient parallel approaches for Diffusion Models, as well as computation accelerations. | ||
|
||
The overview of xDiT is shown as follows. | ||
|
||
<picture> | ||
<img alt="xDiT" src="https://raw.githubusercontent.com/xdit-project/xdit_assets/main/methods/xdit_overview.png"> | ||
</picture> | ||
|
||
|
||
1. Sequence Parallelism, [USP](https://arxiv.org/abs/2405.07719) is a unified sequence parallel approach combining DeepSpeed-Ulysses, Ring-Attention. | ||
1. Sequence Parallelism, [USP](https://arxiv.org/abs/2405.07719) is a unified sequence parallel approach combining DeepSpeed-Ulysses, Ring-Attention proposed by use3. | ||
|
||
2. [PipeFusion](https://arxiv.org/abs/2405.14430), a patch level pipeline parallelism using displaced patch by taking advantage of the diffusion model characteristics. | ||
2. [PipeFusion](https://arxiv.org/abs/2405.14430), a sequence-level pipeline parallelism, similar to [TeraPipe](https://arxiv.org/abs/2102.07988) but takes advantage of the input temporal redundancy characteristics of diffusion models. | ||
|
||
3. Data Parallel: Processes multiple prompts or generates multiple images from a single prompt in parallel across images. | ||
|
||
|
@@ -70,15 +82,13 @@ We also have implemented the following parallel stategies for reference: | |
2. [DistriFusion](https://arxiv.org/abs/2402.19481) | ||
|
||
|
||
Optimization orthogonal to parallelization focuses on accelerating single GPU performance. | ||
In addition to utilizing well-known Attention optimization libraries, we leverage compilation acceleration technologies such as `torch.compile` and `onediff`. | ||
<h3 id="meet-xdit-perf">Computing Acceleration</h3> | ||
|
||
The overview of xDiT is shown as follows. | ||
Optimization orthogonal to parallel focuses on accelerating single GPU performance. | ||
|
||
<picture> | ||
<img alt="xDiT" src="https://raw.githubusercontent.com/xdit-project/xdit_assets/main/methods/xdit_overview.png"> | ||
</picture> | ||
First, xDiT employs a series of kernel acceleration methods. In addition to utilizing well-known Attention optimization libraries, we leverage compilation acceleration technologies such as `torch.compile` and `onediff`. | ||
|
||
Furthermore, xDiT incorporates optimization techniques from [DiTFastAttn](https://github.com/thu-nics/DiTFastAttn), which exploits computational redundancies between different steps of the Diffusion Model to accelerate inference on a single GPU. | ||
|
||
<h2 id="updates">📢 Updates</h2> | ||
|
||
|
@@ -262,14 +272,25 @@ We observed that a warmup of 0 had no effect on the PixArt model. | |
Users can tune this value according to their specific tasks. | ||
<h2 id="comfyui">🖼️ ComfyUI with xDiT</h2> | ||
### 4. Launch a Http Service | ||
### 1. Launch ComfyUI | ||
[Launching a Text-to-Image Http Service](./docs/developer/Http_Service.md) | ||
ComfyUI is currently the most popular way to use Diffusion Models. | ||
It provides users with a platform for image generation, supporting plugins like LoRA, ControlNet, and IPAdaptor. | ||
However, since ComfyUI was initially designed for personal computers with single-node, single-GPU capabilities, implementing native parallel acceleration still faces significant compatibility issues. To address this, we've used xDiT with the Ray framework to achieve seamless multi-GPU parallel adaptation on ComfyUI, significantly improving the generation speed of ComfyUI workflows. | ||
Below is an example of using xDiT to accelerate a Flux workflow with LoRA: | ||
 | ||
### 5. Launch ComfyUI | ||
Currently, if you need the xDiT parallel version for ComfyUI, please contact us via this [email]([email protected]). | ||
[Launching ComfyUI](./docs/developer/ComfyUI_xdit.md) | ||
### 2. Launch a Http Service | ||
You can also launch a http service to generate images with xDiT. | ||
[Launching a Text-to-Image Http Service](./docs/developer/Http_Service.md) | ||
<h2 id="secrets">✨ The xDiT's Arsenal</h2> | ||
|
@@ -333,7 +354,10 @@ As we can see, PipeFusion and Sequence Parallel achieve lowest communication cos | |
[Patch Parallel VAE](./docs/methods/parallel_vae.md) | ||
<h3 id="compilation">Compilation Acceleration</h3> | ||
<h3 id="1gpuacc">Single GPU Acceleration</h3> | ||
<h4 id="compilation">Compilation Acceleration</h4> | ||
We utilize two compilation acceleration techniques, [torch.compile](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) and [onediff](https://github.com/siliconflow/onediff), to enhance runtime speed on GPUs. These compilation accelerations are used in conjunction with parallelization methods. | ||
|
@@ -347,6 +371,12 @@ pip install -U nexfort | |
For usage instructions, refer to the [example/run.sh](./examples/run.sh). Simply append `--use_torch_compile` or `--use_onediff` to your command. Note that these options are mutually exclusive, and their performance varies across different scenarios. | ||
<h4 id="dittfastattn">DiTFastAttn</h4> | ||
xDiT also provides DiTFastAttn for single GPU acceleration. It can reduce computation cost of attention layer by leveraging redundancies between different steps of the Diffusion Model. | ||
[DiTFastAttn](./docs/methods/dittfastattn.md) | ||
<h2 id="dev-guide">📚 Develop Guide</h2> | ||
[The implement and design of xdit framework](./docs/developer/The_implement_design_of_xdit_framework.md) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
### DiTFastAttn | ||
|
||
[DiTFastAttn](https://github.com/thu-nics/DiTFastAttn) is an acceleration solution for single-GPU DiTs inference, utilizing Input Temporal Reduction to reduce computational complexity through the following three methods: | ||
|
||
1. Window Attention with Residual Caching to reduce spatial redundancy. | ||
2. Temporal Similarity Reduction to exploit the similarity between steps. | ||
3. Conditional Redundancy Elimination to skip redundant computations during conditional generation | ||
|
||
Currently, DiTFastAttn can only be used with data parallelism or on a single GPU. It does not support other parallel methods such as USP and PipeFusion. We plan to implement a parallel version of DiTFastAttn in the future. | ||
|
||
## Download COCO Dataset | ||
``` | ||
wget http://images.cocodataset.org/annotations/annotations_trainval2014.zip | ||
unzip annotations_trainval2014.zip | ||
``` | ||
|
||
## Running | ||
|
||
Modify the dataset path in the script, then run | ||
|
||
``` | ||
bash examples/run_fastditattn.sh | ||
``` | ||
|
||
## Reference | ||
|
||
``` | ||
@misc{yuan2024ditfastattn, | ||
title={DiTFastAttn: Attention Compression for Diffusion Transformer Models}, | ||
author={Zhihang Yuan and Pu Lu and Hanling Zhang and Xuefei Ning and Linfeng Zhang and Tianchen Zhao and Shengen Yan and Guohao Dai and Yu Wang}, | ||
year={2024}, | ||
eprint={2406.08552}, | ||
archivePrefix={arXiv}, | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
### DiTFastAttn | ||
|
||
[DiTFastAttn](https://github.com/thu-nics/DiTFastAttn)是一种针对单卡DiTs推理的加速方案,利用Input Temperal Reduction通过如下三种方式来减少计算量: | ||
|
||
1. Window Attention with Residual Caching to reduce spatial redundancy. | ||
2. Temporal Similarity Reduction to exploit the similarity between steps. | ||
3. Conditional Redundancy Elimination to skip redundant computations during conditional generation | ||
|
||
目前使用DiTFastAttn只能数据并行,或者单GPU运行。不支持其他方式并行,比如USP和PipeFusion等。我们未来计划实现并行版本的DiTFastAttn。 | ||
|
||
## 下载COCO数据集 | ||
``` | ||
wget http://images.cocodataset.org/annotations/annotations_trainval2014.zip | ||
unzip annotations_trainval2014.zip | ||
``` | ||
|
||
## 运行 | ||
|
||
在脚本中修改数据集路径,然后运行 | ||
|
||
``` | ||
bash examples/run_fastditattn.sh | ||
``` | ||
|
||
## 引用 | ||
|
||
``` | ||
@misc{yuan2024ditfastattn, | ||
title={DiTFastAttn: Attention Compression for Diffusion Transformer Models}, | ||
author={Zhihang Yuan and Pu Lu and Hanling Zhang and Xuefei Ning and Linfeng Zhang and Tianchen Zhao and Shengen Yan and Guohao Dai and Yu Wang}, | ||
year={2024}, | ||
eprint={2406.08552}, | ||
archivePrefix={arXiv}, | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters