LLM From Scratch 🚀

🧱 Current Progress

1. Bigram Model

📓 Notebook: bigram
🧠 A simple character-level bigram language model implemented from scratch both neural network and lookup table based.

2. N-gram Autoregressive Model (Character-level)

📓 Paper followed: Bengio et al. 2003
🔧 Multi-layer perceptron-based character-level neural net with character-wise embedding vectors.

3. Batch Normalization in N-gram Character Model

📓 Papers:
- Paper followed: Bengio et al. 2003
- BatchNorm original
- Rethinking “Batch” in BatchNorm
🔬 Applied Batch Normalization to the n-gram character-level model:
- Explored effects on forward-pass activations and backward-pass gradients.
- Highlighted pitfalls when normalization statistics are improperly scaled or applied.
- Analyzed internal dynamics and stability improvements during training.

4. Manual Backpropagation Through Previous MLP

📓 Notebook: Back Prop
A manual implementation of backpropagation through a simple MLP, useful for understanding gradient flow and low-level training mechanics.

5. WaveNet-like Architecture

📓 Paper Referenced: WaveNet (DeepMind, 2016)

Built upon the previous MLP by deepening it into a tree-like structure inspired by the WaveNet architecture. While the original WaveNet achieves hierarchical feature extraction efficiently through causal dilated convolutions, which not implemented in this

6. GPT

Repository : GPT

GPT-style decoder-only Transformer

🔹 7. Byte Pair Encoding (BPE) Tokenizer

A custom Byte Pair Encoding tokenizer implemented from scratch, including:

UTF-8 byte-level processing
GPT-style regex-based token splitting
Dynamic vocabulary building via merge rules

📂 Files:

🧠 Tokenizer Class: BPE.py
🧪 Debug Notebook: tokenizer.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
extras		extras
.gitignore		.gitignore
BPE.py		BPE.py
README.md		README.md
backprop.ipynb		backprop.ipynb
batchnorm.ipynb		batchnorm.ipynb
bigram_model.ipynb		bigram_model.ipynb
char_mlp.ipynb		char_mlp.ipynb
names.txt		names.txt
tokenizer.ipynb		tokenizer.ipynb
wavenet.ipynb		wavenet.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM From Scratch 🚀

🧱 Current Progress

1. Bigram Model

2. N-gram Autoregressive Model (Character-level)

3. Batch Normalization in N-gram Character Model

4. Manual Backpropagation Through Previous MLP

5. WaveNet-like Architecture

6. GPT

🔹 7. Byte Pair Encoding (BPE) Tokenizer

About

Uh oh!

Releases

Packages

Languages

Haseebasif7/TinyToGPT

Folders and files

Latest commit

History

Repository files navigation

LLM From Scratch 🚀

🧱 Current Progress

1. Bigram Model

2. N-gram Autoregressive Model (Character-level)

3. Batch Normalization in N-gram Character Model

4. Manual Backpropagation Through Previous MLP

5. WaveNet-like Architecture

6. GPT

🔹 7. Byte Pair Encoding (BPE) Tokenizer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages