-
YouTube - https://youtu.be/ZjTBcC8PYMo
-
Bilibili - https://www.bilibili.com/video/BV1szevzVEEm/
I recommend using this in AI code IDE and asking it questions about it.
Gemini CLI or Client + Cerebras is free.
A fast implementation of diffusion models for CIFAR-10 image generation using DiT (Diffusion Transformer) architecture.
pip install -r requirements.txtSingle GPU Training:
python train.pyDistributed Training (8x RTX 4090):
# Option 1: Using torchrun (recommended)
torchrun --nproc_per_node=8 train_distributed.py
# Option 2: Using launch script
bash launch_distributed.shFrom Text Prompts:
python text_to_image.py --prompt "bird" --num_images 4 --steps 50Available Prompts:
airplane,plane,jetcar,automobile,vehiclebird,eagle,duckcat,feline,liondeer,stag,moosedog,canine,puppyfrog,toadhorse,mare,ponyship,boat,yachttruck,lorry,van
- Architecture: DiT (Diffusion Transformer)
- Dataset: CIFAR-10 (32x32 images)
- Model Size: 1024 hidden dim, 16 layers, 16 heads
- Training: 200 epochs with mixed precision
- Single GPU:
cifar10_diffusion_ckpt/ - Distributed:
cifar10_diffusion_distributed_ckpt/
- Single GPU: 24GB+ VRAM (RTX 3090/4090)
- Distributed: 8x RTX 4090 (24GB each)
Monitor GPU usage during training:
python gpu_monitor.py