A small repository for Transformer pre-training experiments. Inspired by nanoGPT, but with some extensions like alternate position encoding, optimizer sharding, gradient checkpointing, and experimental infrastructure.
It also includes code to work with the commaVQ dataset for video modelling.
Works for Pytorch 2.1