Blueberry LLM

Open Superintelligence Lab - Open research for everyone. We publish all of our research for the sake of accelerating science. Learn real AI research from a real research lab.

Quick Start

pip install -r requirements.txt

python train_moe.py

About

Purpose of this repository is to research better, faster, smarter LLMs.

This repository contains cutting-edge language model experiments and architectures. We believe scientists do their best work when given freedom to explore, so this is a space for your independent research and discovery.

Fork this repository, create a new experiment in experiments/ folder, then create a pull request to merge it back.

Experiments

Each experiment below is validated on a specific git tag. Later commits may introduce breaking changes. To run an experiment with correct version of the repo, checkout its validated tag using: git checkout <tag-name>

Experiment	Researcher	Validated Tag	Research Question	Key Findings
Your experiments will be added here
Exp3: PLASA + GDN Hybrid	Overtimepog GitHub	`git checkout experiments-v1.0`	1. Can per-layer adaptive sparse attention (PLASA) with progressive sparsity scheduling improve upon the uniform sparse attention tested in Exp1? 2. Does the PROGRESSIVE_SPARSE schedule align with transformer layer hierarchy (dense early layers, aggressive sparse middle layers, moderate sparse late layers)? 3. Which combination produces the best efficiency-performance tradeoff across 8 patterns (4 Original Full+Linear vs 4 PLASA+Linear)?	1. PLASA significantly outperforms uniform DSA: 26.1% lower validation loss and 122.7% higher accuracy vs Exp1. All PLASA patterns beat all original full attention patterns. 2. Progressive sparsity validated: Dense→Aggressive→Moderate schedule (k=L, k=L/4, k=L/2) confirms middle layer redundancy hypothesis and achieves 13.8% better performance than full attention baseline. 3. Sandwich pattern optimal: L→P→P→L achieves best results (Val Loss: 4.40, Acc: 50.09%, Perplexity: 81.56) - 18.5% lower loss and 38.1% higher accuracy than best original pattern with no training time penalty.
Exp1: DSA + GDN Hybrid	Vuk Rosić YouTube GitHub	`git checkout experiments-v1.0`	1. Can replacing full attention with DeepSeek Sparse Attention (DSA) improve the efficiency and performance of a hybrid attention architecture that combines full attention and Gated DeltaNet (GDN)? 2. Which combination of attention mechanisms across layers produces the best efficiency-performance tradeoff: (1) Full Attention + GDN, (2) DSA + GDN, (3) DSA only, or (4) Full Attention only?	1. Trains faster in the beginning, but full attention seems to surpass it with more training. Future work is to investigate this further. 2. Currently L → F → F → L (Gated Deltanet → Full Attention → Full Attention → Gated Deltanet). Future work is to investigate this further.

Getting Started

Fork this repository - Click the "Fork" button at the top right of this page to create your own copy
Clone your fork: git clone https://github.com/YOUR-USERNAME/blueberry-llm.git
Install dependencies: pip install -r requirements.txt
Read CONTRIBUTING.md for contribution guidelines
Create your own experiment and merge it
Explore the experiments/ folder for ongoing research and inspiration
Once you finish with your research, create a pull request to merge it back to this repo

Philosophy

We don't prescribe what to research. Instead, we provide:

Freedom to explore interesting ideas
Infrastructure to test hypotheses
A collaborative environment for learning

Structure

experiments/ - Research experiments with their own documentation
models/ - Model architectures and implementations (DeepSeek, Qwen3-Next)
training/ - Training scripts and utilities
configs/ - Configuration files

Contributing

See CONTRIBUTING.md for guidelines on how to contribute to this project.

Name		Name	Last commit message	Last commit date
Latest commit History 318 Commits
.github		.github
_course		_course
configs		configs
data		data
experiments		experiments
models		models
optimizers		optimizers
training		training
utils		utils
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train_moe.py		train_moe.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Blueberry LLM

Quick Start

About

Experiments

Getting Started

Philosophy

Structure

Contributing

About

Uh oh!

Releases

Packages

Languages

License

Summykai/blueberry-llm

Folders and files

Latest commit

History

Repository files navigation

Blueberry LLM

Quick Start

About

Experiments

Getting Started

Philosophy

Structure

Contributing

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages