Skip to content

10L MLP3x + BigramHash(2048) + SWA + Stride-32: 1.1487 BPB#331

Open
Rhodrium wants to merge 1 commit intoopenai:mainfrom
Rhodrium:submission/10L-MLP3x-BigramHash-SWA-Stride32
Open

10L MLP3x + BigramHash(2048) + SWA + Stride-32: 1.1487 BPB#331
Rhodrium wants to merge 1 commit intoopenai:mainfrom
Rhodrium:submission/10L-MLP3x-BigramHash-SWA-Stride32

Conversation

@Rhodrium
Copy link

Summary

  • 10-layer relu² MLP3x with BigramHash(2048), SmearGate, OrthoInit, mixed int5/int6 + zstd-22, SWA averaging, stride-32 sliding window eval
  • val_bpb: 1.1487 (mean of 3 seeds, std 0.0020)
  • 14.9 MB artifact, 8xH100 SXM, 600s

3-Seed Results

Seed Steps Step time Sliding s32 BPB Artifact
1337 6,374 94ms 1.1503 14.90 MB
42 6,626 91ms 1.1493 14.78 MB
2025 6,622 91ms 1.1464 14.99 MB

Test plan

  • Verify train_gpt.py runs successfully from the records folder on 8xH100
  • Confirm val_bpb reproduces within reported std
  • Verify artifact size < 16,000,000 bytes

🤖 Generated with Claude Code

val_bpb: 1.1487 (mean of 3 seeds), 14.9MB artifact, 8xH100 SXM 600s

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant