Help us build top 10 LLM in the world while keeping it fully open source, which will accelerate everyone and everything that uses LLMs (science, technology, medicine, startups, businesses, etc.)
Check out our contributors leaderboard!
Our goals:
- GPT-1 Level by Dec 20 2025 ✓ Watch
- GPT-2 Level by Jan 20 2026
- GPT-3 Level by Apr 20 2026
- Top 150 in LMArena (GPT-4o-mini level) by June 2026
- Top 50 by Apr 2027
- Top 10 by Dec 2027
- We could aim for Top 1 by 2028, TBD
Can you make our LLM train faster and better?
👉 Full Setup Guide | Leaderboard | Multimodal Guide
We have successfully implemented "Hard Mode" Multimodal Image Generation—building a mini-version of Google Parti or DeepSeek Janus from ground zero with zero pre-trained weights.
- Visual Tokenizer: A custom VQ-VAE compresses 128x128 images into a 32x32 grid of discrete "visual words".
- Multimodal Transformer: A 40M parameter Llama-style transformer trained to predict both text and visual tokens in a single unified stream.
- Unified Vocabulary: Text (49k) + Image (1k) tokens interleaved:
[BOS] {text} <seg_start> {visual_tokens} <seg_end> [EOS]. - Optimized Training: Powered by the Muon optimizer and Mixed Precision (Bfloat16), allowing for high-quality image synthesis on a single GPU.
The model has been scaled to 1,000,000 training sequences on CIFAR-10, demonstrating the ability to generate class-specific images (frogs, birds, cars, etc.) from scratch in an autoregressive fashion.
- Once you measure an improvement over the baseline according to the Setup Guide, submit your code in a GitHub pull request.
- The LLM must train faster or achieve lower loss on any of the benchmarks (8M, 20M, 100M, 1B tokens).
- Lower loss takes priority over training speed because pretraining data is limited - if your submission trains slower but achieves better (lower) loss for the same amount of tokens, it will probably be accepted, and vice versa.
- Add as little code as possible, keep it clean, rewrite AI generated pull request descriptions to increase quality.
- Submissions are judged case by case, tradeoffs between speed / loss etc. will be taken into account.
If you want to write a research paper improving this project, or if you or someone you know has extensive research experience and wants to contribute to this open-source initiative, contact me.
We will partner with compute providers while keeping all research/engineering/code fully open source.
Potential partners include: Hugging Face, NVIDIA, Microsoft, Google, Amazon, Meta, IBM, Oracle, Alibaba, Tencent, Huawei, Baidu, CoreWeave, Lambda Labs, Hyperbolic, Stability AI, OpenAI, Anthropic, xAI, Cohere, Mistral AI, Graphcore, Tenstorrent, Intel, AMD, Dell Technologies, ai2, a16z, Sequoia Capital, and more.