Skip to content

This repository contains the complete source code, explanations, and visualizations for the "Building LLMs from Scratch" series. Whether you're a beginner curious about how ChatGPT works or an experienced developer wanting to understand transformer architecture deeply, this series will guide you through every component step by step.

License

Notifications You must be signed in to change notification settings

soloeinsteinmit/llm-from-scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚀 Building LLMs from Scratch

A comprehensive, step-by-step journey into building Large Language Models from the ground up

Author: Solomon Eshun

License: MIT Python 3.8+ PyTorch

This repository contains the complete source code, explanations, and visualizations for the "Building LLMs from Scratch" series. Whether you're a beginner curious about how ChatGPT works or an experienced developer wanting to understand transformer architecture deeply, this series will guide you through every component step by step.

📚 About This Series

This educational series breaks down the complexity of Large Language Models into digestible, hands-on tutorials. Each part builds upon the previous one, gradually constructing a complete transformer-based language model from scratch using PyTorch.

🎯 Learning Objectives:

  • Understand the fundamental architecture of transformer models
  • Implement each component (tokenization, embeddings, attention, etc.) from scratch
  • Gain practical experience with PyTorch and deep learning concepts
  • Learn best practices for training and evaluating language models
  • Explore modern techniques used in state-of-the-art LLMs

👥 Target Audience:

  • Students and researchers in AI/ML
  • Software engineers interested in NLP
  • Anyone curious about how LLMs actually work
  • Developers wanting to build custom language models

🛣️ Series Roadmap

Part Topic Status Article Code
01 The Complete Theoretical Foundation ✅ Complete Medium N/A
02 Tokenization ✅ Complete Medium Code
03 Data Pipeline(Input-Target Pairs) ✅ Complete Medium Code
04 Token Embeddings & Positional Encoding ✅ Complete Medium Code
05 Complete Data Preprocessing Pipeline ✅ Complete Medium Code
06 The Attention Mechanism ✅ Complete Medium Code
07 Self-Attention with trainable weights ✅ Complete Medium Code
08 Casual Attention ✅ Complete Medium Code
0- Multi-Head Attention 🔄 In progress Medium Code
0- Transformer Blocks & Architecture ⏳ Planned Medium Code
0- Training Loop & Optimization ⏳ Planned Medium Code
0- Model Evaluation & Fine-tuning ⏳ Planned Medium Code

Legend: ✅ Complete | 🔄 In Progress | ⏳ Planned

🚀 Quick Start

Prerequisites

  • Python 3.8 or higher
  • Basic understanding of Python and neural networks
  • Familiarity with PyTorch (helpful but not required)

Installation

  1. Clone the repository:

    git clone https://github.com/soloeinsteinmit/llm-from-scratch.git
    cd llm-from-scratch
  2. Install dependencies:

    pip install -r requirements.txt
  3. Run the first code example (from Part 2):

    python src/part02_tokenization.py

📁 Repository Structure

llm-from-scratch/
├── README.md                 # You are here!
├── requirements.txt          # Python dependencies
├── LICENSE                   # MIT License
│
├── notebooks/                # Jupyter notebooks for interactive learning
│   ├── part02_tokenization.ipynb
│   └── ...
│
├── animations/               # Manim visualizations and diagrams
│   └── part-02-WordTokenizationScene.mp4    # Generated animation files
│
│
└── src/                      # Source code for each part
    ├── part02_tokenization.py
    └── utils/                # Helper functions and utilities

🎓 How to Use This Repository

For Learners

  1. Start with Part 01 on Medium for the theoretical foundation.
  2. Follow Part 02 and subsequent parts for hands-on coding.
  3. Run the code to see practical implementation.
  4. Experiment with the parameters and try modifications.
  5. Check the notebooks for interactive exploration.

For Educators

  • Use the code examples in your courses
  • Reference the visualizations for explanations
  • Adapt the materials for your curriculum
  • Contribute improvements and additional examples

For Researchers

  • Use as a foundation for your own model implementations
  • Reference the clean, well-documented code structure
  • Build upon the base architecture for your experiments

🎨 Visualizations

This series includes custom Manim animations that visualize complex concepts:

  • 🔄 Attention mechanisms - See how tokens "attend" to each other
  • 📊 Data flow - Understand how information moves through the model
  • 🧮 Matrix operations - Visualize the math behind transformers
  • 📈 Training dynamics - Watch the model learn in real-time

Animations are generated using Manim and available in the animations/ directory.

🤝 Contributing

We welcome contributions from the community! This is an open-source educational project aimed at making LLM understanding accessible to everyone.

Ways to contribute:

  • 🐛 Report bugs or suggest improvements
  • 📝 Improve documentation and explanations
  • 🎨 Create additional visualizations
  • 🔧 Add new features or optimizations
  • 🌍 Translate content to other languages

📖 Additional Resources

Related Articles & Tutorials

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • Community: Thanks to all contributors and learners who make this project better
  • Inspiration: Built upon the excellent work of researchers and educators in the field
  • Tools: Created with PyTorch, Manim, and lots of coffee ☕

📱 Connect & Follow

  • 📝 Medium: Follow the series on Medium
  • 💼 LinkedIn: Connect and discuss on LinkedIn
  • 🐙 GitHub: Star this repo and follow for updates

⭐ If you find this helpful, please give it a star! It helps others discover this resource.

Built with ❤️ for the open-source community

About

This repository contains the complete source code, explanations, and visualizations for the "Building LLMs from Scratch" series. Whether you're a beginner curious about how ChatGPT works or an experienced developer wanting to understand transformer architecture deeply, this series will guide you through every component step by step.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published