Skip to content

Repository of lessons exploring image diffusion models, focused on understanding and education.

License

Notifications You must be signed in to change notification settings

swookey-thinky/mindiffusion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Zero-to-Hero - Diffusion Models

Zero to Hero

python pytorch

Repository of lessons exploring image diffusion models, focused on understanding and education.

Introduction

This series is heavily inspired by Andrej Karpathy's Zero to Hero series of videos. Well, actually, we are straight out copying that series, because they are so good. Seriously, if you haven't followed his videos, go do that now - lot's of great stuff in there!

Each lesson contains both an explanatory video which walks you through the lesson and the code, a colab notebook that corresponds to the video material, and a a pointer to the runnable code in github. All of the code is designed to run on a minimal GPU. We test everything on T4 instances, since that is what colab provides at the free tier, and they are cheap to run on AWS as stand alone instances. Theoretically each of the lessons should be runnable on any 8GB or greater GPU, as they are all designed to be trained in real time on minimal hardware, so that we can really dive into the code.

Each lesson is in its own subdirectory, and we have ordered the lessons in historical order (from oldest to latest) so that its easy to trace the development of the research and see the historical progress of this space.

Since every lesson is meant to be trained in real time with minimal cost, most of the lessons are restricted to training on the MNIST dataset, simply because it is quick to train and easy to visualize.

For even more diffusion models, including Audio and Video diffusion models, check out the xdiffusion respository, which is a unified modeling framework for image, audio, and video diffusion modeling.

Requirements for All Lessons

All lessons are built using PyTorch and written in Python 3. To setup an environment to run all of the lessons, we suggest using conda or venv:

> python3 -m venv mindiffusion_env
> source mindiffusion_env/bin/activate
> pip install --upgrade pip
> pip install -r requirements.txt

All lessons are designed to be run in the lesson directory, not the root of the repository.

Table of Lessons

Lesson Date Name Title Video Colab Code
1 Introduction to Diffusion Models colab
2 March 2015 DPM Deep Unsupervised Learning using Nonequilibrium Thermodynamics colab code
3 July 2019 NCSN Generative Modeling by Estimating Gradients of the Data Distribution colab code
4 June 2020 NCSNv2 Improved Techniques for Training Score-Based Generative Models colab code
5 June 2020 DDPM Denoising Diffusion Probabilistic Models code
5a DDPM with Dropout code
5b Interpolation in Latent Space code
5c Adding Control - Basic Class Conditioning with Cross-Attention code
5d Adding Control - Extended Class Conditioning code
5e Adding Control - Text-to-Image code
6 October 2020 DDIM Denoising Diffusion Implicit Models code
7 November 2020 Score SDE Score-Based Generative Modeling through Stochastic Differential Equations code
8 February 2021 DaLL-E Zero-Shot Text-to-Image Generation code
9 February 2021 IDDPM Improved Denoising Diffusion Probabilistic Models code
10 April 2021 SR3 Image Super-Resolution via Iterative Refinement code
11 May 2021 Guided Diffusion (ADM) Diffusion Models Beat GANs on Image Synthesis code
12 May 2021 CDM Cascaded Diffusion Models for High Fidelity Image Generation code
13 July 2021 VDM Variational Diffusion Models code
14 December 2021 Latent Diffusion High-Resolution Image Synthesis with Latent Diffusion Models code
14a Stable Diffusion v1
14b Stable Diffusion v2
15 December 2021 CFG Classifier-Free Diffusion Guidance code
16 December 2021 GLIDE GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models code
17 February 2022 Progressive Distillation for Fast Sampling of Diffusion Models code
18 April 2022 DaLL-E 2 Hierarchical Text-Conditional Image Generation with CLIP Latents code
19 May 2022 Imagen Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding code
20 June 2022 EDM Elucidating the Design Space of Diffusion-Based Generative Models code
21 September 2022 Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow code
22 October 2022 ERNIE-ViLG 2.0 ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts
23 December 2022 DiT Scalable Diffusion Models with Transformers code
24 January 2023 Simple Diffusion Simple diffusion: End-to-end diffusion for high resolution images
25 February 2023 ControlNet Adding Conditional Control to Text-to-Image Diffusion Models
26 March 2023 Consistency Models Consistency Models code
27 May 2023 RAPHAEL RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
28 June 2023 Wuerstchen Wuerstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models
29 July 2023 SDXL SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
30 September 2023 PixArt-α PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis code
31 October 2023 DaLL-E 3 Improving Image Generation with Better Captions
32 January 2024 PIXART-δ PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models
33 March 2024 Stable Diffusion 3 Scaling Rectified Flow Transformers for High-Resolution Image Synthesis code
34 March 2024 PixArt-Σ PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
35 July 2024 AuraFlow Introducing AuraFlow v0.1, an Open Exploration of Large Rectified Flow Models code
36 August 2024 Flux Flux Announcement code
37 October 2024 Sana SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers code
38 October 2024 Stable Diffusion 3.5 Introducing Stable Diffusion 3.5 code

Lessons to Add

  • Emu (abstract)
  • CogView (abstract)
  • CogView 2 (abstract)
  • CogView 3 (abstract)
  • Consistency Models (abstract)
  • Latent Consistency Models (abstract)
  • Scalable Diffusion Models with State Space Backbone (abstract)
  • Palette: Image-to-Image Diffusion Models (abstract)
  • MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation (abstract)
  • Matryoshka Diffusion Models (abstract)
  • On the Importance of Noise Scheduling for Diffusion Models (abstract)
  • Analyzing and Improving the Training Dynamics of Diffusion Models (abstract)
  • Elucidating the Design Space of Diffusion-Based Generative Models (abstract)
  • Flow Matching for Generative Modeling (abstract)
  • U-ViT: All are Worth Words: A ViT Backbone for Diffusion Models (abstract)
  • MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer (abstract)
  • DiffiT: Diffusion Vision Transformers for Image Generation (abstract)
  • Scaling Vision Transformers to 22 Billion Parameters (abstract)
  • DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention (abstract)
  • DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis (abstract)
  • IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models (abstract)
  • JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation (abstract)
  • Adversarial Diffusion Distillation (abstract)
  • Discrete Predictor-Corrector Diffusion Models for Image Synthesis (abstract)
  • One-step Diffusion with Distribution Matching Distillation (abstract)
  • Salient Object-Aware Background Generation using Text-Guided Diffusion Models (abstract)
  • Versatile Diffusion (abstract)
  • D3PM: Structured Denoising Diffusion Models in Discrete State-Spaces (abstract)

Resources

Most of the implementations have been consolidated into a single image and video diffusion repository, which is configurable through YAML files.

If you are interested Video Diffusion Models, take a look through video diffusion models where we are adding all of the latest video diffusion model paper implementations, on an equivalent MNIST dataset for video.

About

Repository of lessons exploring image diffusion models, focused on understanding and education.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages