Submission TrigramHash + PartialRoPE + HeadTemp + stride32 (val_bpb: 1.1450)and by Ananddna · Pull Request #327 · openai/parameter-golf

Ananddna · 2026-03-21T08:54:15Z

Summary

val_bpb: 1.1450 (mean of 2 seeds)

Built on the 10L Int5-MLP + BigramHash + SWA foundation, adding 5 novel techniques:

Our Unique Contributions

TrigramHashEmbedding — Hash consecutive token triplets (not just pairs) into 8192-bucket learned embeddings (dim=64). Captures 3-word patterns like "in the morning" as atomic units. Complementary to BigramHash.
Partial RoPE (50%) — Apply rotary position embeddings to only 50% of head dimensions. Remaining dims are position-free, enabling similarity matching regardless of position. Improves length generalization.
Per-Head Temperature Scaling — Each attention head learns its own temperature parameter, allowing some heads to be sharp/focused and others broad/contextual.
Eval Stride 32 — Reduced sliding window stride from 64 to 32 for finer-grained evaluation context.
LoRA TTT Infrastructure — Added LoRA-based test-time training framework (eval_val_with_ttt). Infrastructure is in place for future runs.

Results

Seed	val_bpb
42	1.1449
1337	1.1451
Mean	1.1450
Std	0.0001

Architecture

10 layers, 512 dim, 8 heads, 4 KV heads
MLP 3x expansion, relu^2
SmearGate + BigramHash(10240) + TrigramHash(8192)
Partial RoPE (50% dims) + Per-head temperature
Int5 MLP / Int6 attention quantization
SWA (frac=0.4, every=50), Muon WD=0.04
Sliding eval stride=32, zstd-22 compression

Ananddna · 2026-03-21T09:14:19Z

Update: Trimmed unused LoRA TTT code from the submission to bring artifact size under the 16MB cap.

Model (int5/int6 + zstd-22): 15,940,693 bytes
Code: ~57,000 bytes (reduced from 61,164 by removing disabled TTT infrastructure)
Estimated total: ~15,998,000 bytes (under 16,000,000 limit)

Note: The training logs (final_s1.txt, final_s2.txt) were generated with the pre-trim version which included the TTT code. The TTT was disabled (TTT_ENABLED=0) during those runs so the scores are unaffected. The only change is removing dead code to fit the artifact cap.

Ready for review.

Ananddna and others added 4 commits March 21, 2026 07:53

Our entry: TrigramHash + PartialRoPE + LoRA TTT + HeadTemp + stride32

d067067

Add logs

82e6b98

Create submission.json

4cae4d5

Create README.md

656a6f3

notapplica mentioned this pull request Mar 21, 2026

Parameter Golf Live AI Commentary + Analysis / Ideas | every 10 minutes #140

Open

Ananddna added 2 commits March 21, 2026 17:01

Create train_gpt.py

951686b

Update train_gpt.py

2681cae

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Submission TrigramHash + PartialRoPE + HeadTemp + stride32 (val_bpb: 1.1450)and #327

Submission TrigramHash + PartialRoPE + HeadTemp + stride32 (val_bpb: 1.1450)and #327
Ananddna wants to merge 6 commits intoopenai:mainfrom
Ananddna:submission

Ananddna commented Mar 21, 2026

Uh oh!

Ananddna commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ananddna commented Mar 21, 2026

Summary

Our Unique Contributions

Results

Architecture

Uh oh!

Ananddna commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant