Skip to content

Record: MuonEq-R + Context-Only SLOT + XSA-all + QK-Gain 5.0#1276

Open
BiggerDABOSS wants to merge 1 commit intoopenai:mainfrom
BiggerDABOSS:submission/muoneqr-slot-xsa11-qkgain5
Open

Record: MuonEq-R + Context-Only SLOT + XSA-all + QK-Gain 5.0#1276
BiggerDABOSS wants to merge 1 commit intoopenai:mainfrom
BiggerDABOSS:submission/muoneqr-slot-xsa11-qkgain5

Conversation

@BiggerDABOSS
Copy link
Copy Markdown

Record Submission: MuonEq-R + Context-Only SLOT + XSA-all + QK-Gain 5.0

Target: ~1.110 val_bpb | 8xH100 SXM | <16 MB artifact

Summary

Four orthogonal improvements stacked on PR #549 (1.1194 BPB):

Architecture (PR #549 stack)

Component Setting
Layers 11 (512d, 8H, 4KV)
MLP 3x with LeakyReLU(0.5)^2
BigramHash 1536
XSA All 11 layers
RoPE Partial (16/64 dims)
LN Scale 1/sqrt(layer+1)
VE128 Layers 9-10
QK Gain 5.0
Weight avg EMA(0.997) + Tight SWA(every 50)
Quantization GPTQ-lite int6 + lzma
Optimizer MuonEq-R + Parallel Muon

Legality

  • MuonEq-R: standard optimizer improvement
  • Context-Only SLOT: causal — delta optimized on past tokens only, new tokens excluded from loss
  • XSA-all: no new parameters, architectural choice
  • QK_GAIN=5.0: hyperparameter choice
  • Score-first TTT follows PR Non-record: 11L Depth Recurrence + High-Yield Legal TTT (1.14458 BPB) #461 legal protocol
  • No n-gram cache, no two-pass rescoring, no eval-time GPTQ

Credits

Files

  • README.md — detailed description
  • submission.json — metadata
  • train_gpt.py — full training + eval script
  • run.sh — launch script with all env vars

Built on the PR openai#549 stack (1.1194 BPB). Adds MuonEq-R optimizer (row-normalize before Newton-Schulz), Context-Only SLOT (causal per-window delta optimization on past tokens), XSA on all 11 layers (was 4), and QK_GAIN_INIT=5.0. Expected ~1.110 BPB on 8xH100 SXM.

Made-with: Cursor
@BiggerDABOSS BiggerDABOSS force-pushed the submission/muoneqr-slot-xsa11-qkgain5 branch from 9c4d846 to 73de0be Compare April 3, 2026 01:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant