Performance: accuracy, speed, and GPU memory usage

This document describes inference time cost, GPU memory usage during inference and training, and how to tune parameters to fit your hardware.

Inference Time Cost and GPU Memory Usage

`output_size`	Peak Memory	VAE Time + DM Time (`maisi3d-ddpm`)	VAE Time + DM Time (`maisi3d-rflow`)	latent size	`autoencoder_sliding_window_infer_size`	`autoencoder_tp_num_splits`	VAE Time	DM Time (`maisi3d-ddpm`)	DM Time (`maisi3d-rflow`)
256x256x128	15.0G	58s	3s	4x64x64x32	>=[64,64,32], not used	2	1s	57s	2s
256x256x256	15.4G	86s	8s	4x64x64x64	[48,48,64], 4 patches	4	5s	81s	3s
512x512x128	15.7G	146s	13s	4x128x128x32	[64,64,32], 9 patches	2	8s	138s	5s

256x256x256	22.7G	83s	5s	4x64x64x64	>=[64,64,64], not used	4	2s	81s	3s
512x512x128	21.0G	144s	11s	4x128x128x32	[80,80,32], 4 patches	2	6s	138s	5s
512x512x512	22.8G	598s	48s	4x128x128x128	[64,64,48], 36 patches	2	29s	569s	19s

512x512x512	28.4G	599s	49s	4x128x128x128	[80,80,48], 16 patches	4	30s	569s	19s

512x512x512	45.3G	601s	51s	4x128x128x128	[80,80,80], 8 patches	2	32s	569s	19s
512x512x768	49.7G	961s	87s	4x128x128x192	[80,80,96], 12 patches	4	57s	904s	30s

Table: Inference Time Cost and GPU Memory Usage. DM Time refers to the time required for diffusion model inference. VAE Time refers to the time required for VAE decoder inference. The total inference time is the sum of DM Time and VAE Time. The experiment was conducted on an A100 80G GPU.

During inference, the peak GPU memory usage occurs during the VAE's decoding of latent features. To reduce GPU memory usage, we can either increase autoencoder_tp_num_splits or reduce autoencoder_sliding_window_infer_size. Increasing autoencoder_tp_num_splits has a smaller impact on the generated image quality, while reducing autoencoder_sliding_window_infer_size may introduce stitching artifacts and has a larger impact on the generated image quality.

When autoencoder_sliding_window_infer_size is equal to or larger than the latent feature size, the sliding window will not be used, and the time and memory costs remain the same.

Training GPU Memory Usage

The VAE is trained on patches and can be trained using a 16G GPU if the patch size is set to a small value, such as [64, 64, 64]. Users can adjust the patch size to fit the available GPU memory. For the released model, we initially trained the autoencoder on 16G V100 GPUs with a small patch size of [64, 64, 64], and then continued training on 32G V100 GPUs with a larger patch size of [128, 128, 128].

The DM and ControlNet are trained on whole images rather than patches. The GPU memory usage during training depends on the size of the input images. There is no big difference on memory usage between maisi3d-ddpm and maisi3d-rflow.

image size	latent size	Peak Memory
256x256x128	4x64x64x32	5G
256x256x256	4x64x64x64	8G
512x512x128	4x128x128x32	12G
512x512x256	4x128x128x64	21G
512x512x512	4x128x128x128	39G
512x512x768	4x128x128x192	58G

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance: accuracy, speed, and GPU memory usage

Inference Time Cost and GPU Memory Usage

Training GPU Memory Usage

FilesExpand file tree

performance.md

Latest commit

History

performance.md

File metadata and controls

Performance: accuracy, speed, and GPU memory usage

Inference Time Cost and GPU Memory Usage

Training GPU Memory Usage