Zero-to-Hero - Diffusion Models

Repository of lessons exploring image diffusion models, focused on understanding and education.

Introduction

This series is heavily inspired by Andrej Karpathy's Zero to Hero series of videos. Well, actually, we are straight out copying that series, because they are so good. Seriously, if you haven't followed his videos, go do that now - lot's of great stuff in there!

Each lesson contains both an explanatory video which walks you through the lesson and the code, a colab notebook that corresponds to the video material, and a a pointer to the runnable code in github. All of the code is designed to run on a minimal GPU. We test everything on T4 instances, since that is what colab provides at the free tier, and they are cheap to run on AWS as stand alone instances. Theoretically each of the lessons should be runnable on any 8GB or greater GPU, as they are all designed to be trained in real time on minimal hardware, so that we can really dive into the code.

Each lesson is in its own subdirectory, and we have ordered the lessons in historical order (from oldest to latest) so that its easy to trace the development of the research and see the historical progress of this space.

Since every lesson is meant to be trained in real time with minimal cost, most of the lessons are restricted to training on the MNIST dataset, simply because it is quick to train and easy to visualize.

For even more diffusion models, including Audio and Video diffusion models, check out the xdiffusion respository, which is a unified modeling framework for image, audio, and video diffusion modeling.

Requirements for All Lessons

All lessons are built using PyTorch and written in Python 3. To setup an environment to run all of the lessons, we suggest using conda or venv:

> python3 -m venv mindiffusion_env
> source mindiffusion_env/bin/activate
> pip install --upgrade pip
> pip install -r requirements.txt

All lessons are designed to be run in the lesson directory, not the root of the repository.

Table of Lessons

Lesson	Date	Name	Title	Colab	Code
1			Introduction to Diffusion Models	colab
2	March 2015	DPM	Deep Unsupervised Learning using Nonequilibrium Thermodynamics	colab	code
3	July 2019	NCSN	Generative Modeling by Estimating Gradients of the Data Distribution	colab	code
4	June 2020	NCSNv2	Improved Techniques for Training Score-Based Generative Models	colab	code
5	June 2020	DDPM	Denoising Diffusion Probabilistic Models		code
5a			DDPM with Dropout		code
5b			Interpolation in Latent Space		code
5c			Adding Control - Basic Class Conditioning with Cross-Attention		code
5d			Adding Control - Extended Class Conditioning		code
5e			Adding Control - Text-to-Image		code
6	October 2020	DDIM	Denoising Diffusion Implicit Models		code
7	November 2020	Score SDE	Score-Based Generative Modeling through Stochastic Differential Equations		code
8	February 2021	DaLL-E	Zero-Shot Text-to-Image Generation		code
9	February 2021	IDDPM	Improved Denoising Diffusion Probabilistic Models		code
10	April 2021	SR3	Image Super-Resolution via Iterative Refinement		code
11	May 2021	Guided Diffusion (ADM)	Diffusion Models Beat GANs on Image Synthesis		code
12	May 2021	CDM	Cascaded Diffusion Models for High Fidelity Image Generation		code
13	July 2021	VDM	Variational Diffusion Models		code
14	December 2021	Latent Diffusion	High-Resolution Image Synthesis with Latent Diffusion Models		code
14a		Stable Diffusion v1
14b		Stable Diffusion v2
15	December 2021	CFG	Classifier-Free Diffusion Guidance		code
16	December 2021	GLIDE	GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models		code
17	February 2022		Progressive Distillation for Fast Sampling of Diffusion Models		code
18	April 2022	DaLL-E 2	Hierarchical Text-Conditional Image Generation with CLIP Latents		code
19	May 2022	Imagen	Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding		code
20	June 2022	EDM	Elucidating the Design Space of Diffusion-Based Generative Models		code
21	September 2022		Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow		code
22	October 2022	ERNIE-ViLG 2.0	ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts
23	December 2022	DiT	Scalable Diffusion Models with Transformers		code
24	January 2023	Simple Diffusion	Simple diffusion: End-to-end diffusion for high resolution images
25	February 2023	ControlNet	Adding Conditional Control to Text-to-Image Diffusion Models
26	March 2023	Consistency Models	Consistency Models		code
27	May 2023	RAPHAEL	RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
28	June 2023	Wuerstchen	Wuerstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models
29	July 2023	SDXL	SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
30	September 2023	PixArt-α	PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis		code
31	October 2023	DaLL-E 3	Improving Image Generation with Better Captions
32	January 2024	PIXART-δ	PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models
33	March 2024	Stable Diffusion 3	Scaling Rectified Flow Transformers for High-Resolution Image Synthesis		code
34	March 2024	PixArt-Σ	PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
35	July 2024	AuraFlow	Introducing AuraFlow v0.1, an Open Exploration of Large Rectified Flow Models		code
36	August 2024	Flux	Flux Announcement		code
37	October 2024	Sana	SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers		code
38	October 2024	Stable Diffusion 3.5	Introducing Stable Diffusion 3.5		code

Lessons to Add

Resources

Most of the implementations have been consolidated into a single image and video diffusion repository, which is configurable through YAML files.

If you are interested Video Diffusion Models, take a look through video diffusion models where we are adding all of the latest video diffusion model paper implementations, on an equivalent MNIST dataset for video.

Name		Name	Last commit message	Last commit date
Latest commit History 237 Commits
lesson_02		lesson_02
lesson_03		lesson_03
lesson_04		lesson_04
lesson_05		lesson_05
lesson_06		lesson_06
lesson_07		lesson_07
lesson_08		lesson_08
lesson_09		lesson_09
lesson_10		lesson_10
lesson_11		lesson_11
lesson_12		lesson_12
lesson_13		lesson_13
lesson_14		lesson_14
lesson_15		lesson_15
lesson_16		lesson_16
lesson_17		lesson_17
lesson_18		lesson_18
lesson_19		lesson_19
lesson_20		lesson_20
lesson_21		lesson_21
lesson_23		lesson_23
lesson_26		lesson_26
lesson_30		lesson_30
lesson_33		lesson_33
lesson_35		lesson_35
lesson_36		lesson_36
lesson_37		lesson_37
lesson_38		lesson_38
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zero-to-Hero - Diffusion Models

Introduction

Requirements for All Lessons

Table of Lessons

Lessons to Add

Resources

About

Releases

Packages

Contributors 2

Languages

License

swookey-thinky/mindiffusion

Folders and files

Latest commit

History

Repository files navigation

Zero-to-Hero - Diffusion Models

Introduction

Requirements for All Lessons

Table of Lessons

Lessons to Add

Resources

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages