Building a transformer from scratch from the popular research paper "Attention Is All You Need" in Python
In progress:
- Input Embeddings
- Positional Encodings
- Layer Normalization
- Feed Forward
- Multi-Head Attention
- Residual Connection
- Encoder
- Decoder
- Linear Layer
- Transformer
- Task overview
- Tokenizer
- Dataset
- Training loop
- Validation loop
- Attention visualization