NanoPoor

NanoGPT-speedrunning for the poor T4 enjoyers

Inspired by Modded NanoGPT and my goat Jonas Geiping (Cramming), I trained a custom GPT I've been working on over at Dagonet, got to the 3.28 val loss on a single T4.

Important! Note/Future-bugifx

As @main_horse pointed out, I wrote a method that had the DSMoE class send the current tok to all experts, then apply router weights, so it removed the hard selection of the router, and made it more of a soft weighing instead, the hard routing is loss ~0.1 lower, or about 10 steps faster, but wallclock time per step is 2x longer and init was 8x longer, working on GEMMs

caveats:

Less than the 120M from main speedrun for stability
was just a 1B subset of finewebedu10b, not filtered or anything I just processed that much at this time, will probably fix this later

Runs

Ranking	Time - date	Data	Person	Description	log
1	7.09m - 4/5/25	~3.27M tok (1024 * 8 * 4 * 100)	Vatsa	now GPT-2 tokenizer shrunk vocab_size, and also shrunk head_lm and n_experts for stability, less params, now at ~73m	log
2	11.69m - 4/4/25	~3.93M tok (1024 * 8 * 4 * 120)	Vatsa	lr tuning (5e-4)	log
3	14.86m - 4/2/25	~5.21M tok (1024 * 8 * 4 * 160)	Vatsa	3x lr, removed ckpt saves every step, less printing	log
4	15.04m - 4/1/25	~3.89M tok (1024 * 5 * 4 * 190)	Vatsa	Used Muon instead	log
5	37.17m - 4/1/25	~6.14M tok (1024 * 5 * 4 * 300)	Vatsa	Added PSGD	log
6	70.61m - 3/31/25	~14M tok (1024 * 6 * 4 * 570)	Vatsa	First Run, has DS-MoE, MLA+NSA hybrid, Rope, etc	log

Unofficial Runs

Ranking	Time - date	Data	Person	Description	log
1st	7.63m - 4/1/25	~6.96M tok (1024 * 10 * 4 * 170)	Vatsa	Used an A100 with (15.04m - 4/1/25) run to see how I look on a real GPU	log

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
logs		logs
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NanoPoor

Runs

Unofficial Runs

About

Releases

Packages

Languages

License

VatsaDev/NanoPoor

Folders and files

Latest commit

History

Repository files navigation

NanoPoor

Runs

Unofficial Runs

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages