Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sd3 quant #976

Open
wants to merge 22 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified imgs/nexfort_sd3_demo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
63 changes: 50 additions & 13 deletions onediff_diffusers_extensions/examples/sd3/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,15 @@
1. [Environment Setup](#environment-setup)
- [Set Up OneDiff](#set-up-onediff)
- [Set Up NexFort Backend](#set-up-nexfort-backend)
- [Set Up Diffusers Library](#set-up-diffusers-library)
- [Set Up Diffusers](#set-up-diffusers)
- [Download SD3 Model for Diffusers](#download-sd3-model-for-diffusers)
2. [Execution Instructions](#execution-instructions)
- [Run Without Compilation (Baseline)](#run-without-compilation-baseline)
- [Run With Compilation](#run-with-compilation)
3. [Performance Comparison](#performance-comparison)
4. [Dynamic Shape for SD3](#dynamic-shape-for-sd3)
5. [Quality](#quality)
5. [Quantization](#quantization)
6. [Quality](#quality)

## Environment setup
### Set up onediff
Expand All @@ -25,12 +26,12 @@ https://github.com/siliconflow/onediff/tree/main/src/onediff/infer_compiler/back
# Ensure diffusers include the SD3 pipeline.
pip3 install --upgrade diffusers[torch]
```
### Set up SD3
### Download SD3 model for diffusers
Model version for diffusers: https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers

HF pipeline: https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_3.md

## Run
## Execution instructions

### Run 1024*1024 without compile (the original pytorch HF diffusers baseline)
```
Expand All @@ -42,22 +43,22 @@ python3 onediff_diffusers_extensions/examples/sd3/text_to_image_sd3.py \

```
python3 onediff_diffusers_extensions/examples/sd3/text_to_image_sd3.py \
--compiler-config '{"mode": "max-optimize:max-autotune:low-precision:cache-all:freezing:benchmark", "memory_format": "channels_last"}' \
--compiler-config '{"mode": "max-optimize:max-autotune:low-precision:cudagraphs:cache-all:freezing:benchmark", "memory_format": "channels_last"}' \
--saved-image sd3_compile.png
```

## Performance comparation

Testing on H800-NVL-80GB, with image size of 1024*1024, iterating 28 steps:
Testing on H800-NVL-80GB with torch 2.3.0, with image size of 1024*1024, iterating 28 steps:
| Metric | |
| ------------------------------------------------ | ----------------------------------- |
| Data update date(yyyy-mm-dd) | 2024-06-24 |
| PyTorch iteration speed | 15.56 it/s |
| OneDiff iteration speed | 25.91 it/s (+66.5%) |
| PyTorch E2E time | 1.96 s |
| OneDiff E2E time | 1.15 s (-41.3%) |
| PyTorch Max Mem Used | 18.784 GiB |
| OneDiff Max Mem Used | 18.324 GiB |
| Data update date(yyyy-mm-dd) | 2024-06-25 |
| PyTorch iteration speed | 15.11 it/s |
| OneDiff iteration speed | 25.14 it/s (+66.4%) |
| PyTorch E2E time | 2.03 s |
| OneDiff E2E time | 1.21 s (-40.1%) |
| PyTorch Max Mem Used | 18.788 GiB |
| OneDiff Max Mem Used | 17.926 GiB |
| PyTorch Warmup with Run time | 2.86 s |
| OneDiff Warmup with Compilation time<sup>1</sup> | 889.25 s |
| OneDiff Warmup with Cache time | 44.38 s |
Expand Down Expand Up @@ -95,6 +96,42 @@ python3 onediff_diffusers_extensions/examples/sd3/text_to_image_sd3.py \
--run_multiple_resolutions 1 \
--saved-image sd3_compile.png
```
## Quantization

Note: Quantization is a feature for Onediff enterprise.

### Run

Quantization of the model's layers can be selectively performed based on precision. Download `fp8_e4m3.json` or `fp8_e4m3_per_tensor.json` from https://huggingface.co/siliconflow/stable-diffusion-3-onediff-nexfort-fp8.

The --arg `quant-submodules-config-path` is optional. If left `None`, it will quantize all linear layers.

```
# Applies dynamic symmetric per-tensor activation and per-tensor weight quantization to all linear layers. Both activations and weights are quantized to e4m3 format.
python3 onediff_diffusers_extensions/examples/sd3/text_to_image_sd3.py \
--compiler-config '{"mode": "quant:max-optimize:max-autotune:low-precision:cudagraphs:freezing:benchmark", "memory_format": "channels_last"}' \
--quantize-config '{"quant_type": "fp8_e4m3_e4m3_dynamic_per_tensor"}' \
--quant-submodules-config-path /path/to/fp8_e4m3_per_tensor.json \
--saved-image sd3_fp8.png
```
or
```
# Applies dynamic symmetric per-token activation and per-channel weight quantization to all linear layers.
python3 onediff_diffusers_extensions/examples/sd3/text_to_image_sd3.py \
--compiler-config '{"mode": "quant:max-optimize:max-autotune:low-precision:cudagraphs:freezing:benchmark", "memory_format": "channels_last"}' \
--quantize-config '{"quant_type": "fp8_e4m3_e4m3_dynamic"}' \
--quant-submodules-config-path /path/to/fp8_e4m3.json \
--saved-image sd3_fp8.png
```

### Metric

The performance of above quantization types on the H800-NVL-80GB is as follows:

| quant_type | E2E Inference Time | Iteration speed | Max Used CUDA Memory |
|----------------------------------|--------------------|--------------------|----------------------|
| fp8_e4m3_e4m3_dynamic | 1.15 s (-43.4%) | 26.30 it/s (+74.1%)| 16.933 GiB |
| fp8_e4m3_e4m3_dynamic_per_tensor | 1.09 s (-46.3%) | 27.75 it/s (+83.7%)| 17.098 GiB |

## Quality
When using nexfort as the backend for onediff compilation acceleration, the generated images are lossless.
Expand Down
17 changes: 14 additions & 3 deletions onediff_diffusers_extensions/examples/sd3/text_to_image_sd3.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ def parse_args():
parser.add_argument(
"--prompt",
type=str,
default="photo of a dog and a cat both standing on a red box, with a blue ball in the middle with a parrot standing on top of the ball. The box has the text 'onediff'",
default="photo of a dog and a cat both standing on a red box, with a blue ball in the middle with a parrot standing on top of the ball. The box has the text 'nexfort'",
help="Prompt for the image generation.",
)
parser.add_argument(
Expand Down Expand Up @@ -54,7 +54,7 @@ def parse_args():
help="Path to save the generated image.",
)
parser.add_argument(
"--seed", type=int, default=1, help="Seed for random number generation."
"--seed", type=int, default=2, help="Seed for random number generation."
)
parser.add_argument(
"--run_multiple_resolutions",
Expand All @@ -66,6 +66,7 @@ def parse_args():
type=(lambda x: str(x).lower() in ["true", "1", "yes"]),
default=False,
)
parser.add_argument("--quant-submodules-config-path", type=str, default=None)
return parser.parse_args()


Expand Down Expand Up @@ -155,7 +156,17 @@ def compile_pipe(self, pipe, compiler_config):
return pipe

def quantize_pipe(self, pipe, quantize_config):
pipe = quantize_pipe(pipe, ignores=[], **quantize_config)
if args.quant_submodules_config_path:
# Quantitative submodules configuration file download: https://huggingface.co/siliconflow/stable-diffusion-3-onediff-nexfort-fp8
pipe = quantize_pipe(
pipe,
quant_submodules_config_path=args.quant_submodules_config_path,
top_percentage=75,
ignores=[],
**quantize_config,
)
else:
pipe = quantize_pipe(pipe, ignores=[], **quantize_config)
return pipe


Expand Down
Loading