- 📓 Notebook:
bigram - 🧠 A simple character-level bigram language model implemented from scratch both neural network and lookup table based.
- 📓 Paper followed:
Bengio et al. 2003 - 🔧 Multi-layer perceptron-based character-level neural net with character-wise embedding vectors.
- 📓 Papers:
- Paper followed:
Bengio et al. 2003 BatchNorm originalRethinking “Batch” in BatchNorm
- Paper followed:
- 🔬 Applied Batch Normalization to the n-gram character-level model:
- Explored effects on forward-pass activations and backward-pass gradients.
- Highlighted pitfalls when normalization statistics are improperly scaled or applied.
- Analyzed internal dynamics and stability improvements during training.
- 📓 Notebook:
Back Prop
A manual implementation of backpropagation through a simple MLP, useful for understanding gradient flow and low-level training mechanics.
- 📓 Paper Referenced:
WaveNet (DeepMind, 2016)
Built upon the previous MLP by deepening it into a tree-like structure inspired by the WaveNet architecture. While the original WaveNet achieves hierarchical feature extraction efficiently through causal dilated convolutions, which not implemented in this
- Repository :
GPT
GPT-style decoder-only Transformer
A custom Byte Pair Encoding tokenizer implemented from scratch, including:
- UTF-8 byte-level processing
- GPT-style regex-based token splitting
- Dynamic vocabulary building via merge rules
📂 Files:
- 🧠 Tokenizer Class:
BPE.py - 🧪 Debug Notebook:
tokenizer.ipynb