Record: 10L + Batched LoRA TTT (mean val_bpb=1.1180, 3 seeds) by hypery11 · Pull Request #713 · openai/parameter-golf

hypery11 · 2026-03-25T13:10:04Z

Results

Seed	Base val_bpb	TTT val_bpb
42	1.1476	1.1160
1337	1.1540	1.1210
2024	1.1504	1.1170
Mean	1.1507	1.1180
Std	0.0032	0.0026

Artifact: 15.75 MB
Train: 600s on 8xH100 SXM
TTT eval: ~496s

Method

10-layer transformer (512d, 8/4 GQA, 3x MLP LeakyReLU(0.5)^2) with per-document batched LoRA test-time training.

LoRA rank-8 on Q/V projections + LM head. 64 documents batched in parallel. Per-doc reset, Adam lr=0.01, 256-token chunks, 3 epochs, score on final epoch. Mixed int5/int6 quantization + zstd-22.

See README.md for full details.

3-seed validation: 1.1160 / 1.1210 / 1.1170 (std 0.0026) Per-document rank-8 LoRA on Q/V/LM-head, batch-64, 3 epochs. 15.75MB artifact. Train 600s, eval 496s.

dexhunter · 2026-03-29T01:46:35Z

Hi @hypery11 — interesting LoRA TTT approach with per-document batching.

I wanted to flag a potential score-first compliance concern. Looking at lora_ttt_eval() (line 1095), the scoring happens only on the final epoch:

for epoch in range(ttt_epochs):       # 3 epochs
    for ci in range(max_chunks):
        ...
        if epoch == ttt_epochs - 1:   # score only on epoch 3
            # accumulate loss_sum
        if needs_train:               # train on non-last chunks
            loss.backward()
            cur_opt.step()

This means when scoring on epoch 3, the LoRA weights have already been trained on the full document for 2 complete epochs. A token at position t in the document is scored using LoRA weights that were adapted on tokens including t itself (from epochs 1 and 2).

The README rule is: "you are only allowed to test-time train on validation set tokens you've already evaluated your model on."

In the standard score-first TTT pattern (PR #461/#549/#726), each chunk is scored BEFORE the model trains on it, and the score is final — no re-scoring after training. Here, scoring happens after training, which appears to be the adapt-then-score pattern that PR #518 was closed for.

For reference, PR #518 was closed by @valerio-oai because it "trains on the validation set by reporting the score on a doc after its weights have adapted to it."

Would you be able to clarify how this differs from the adapt-then-score pattern?

Record: 10L + Batched LoRA TTT (mean val_bpb=1.1180)

9b6ab5b

3-seed validation: 1.1160 / 1.1210 / 1.1170 (std 0.0026) Per-document rank-8 LoRA on Q/V/LM-head, batch-64, 3 epochs. 15.75MB artifact. Train 600s, eval 496s.

notapplica mentioned this pull request Mar 25, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: 10L + Batched LoRA TTT (mean val_bpb=1.1180, 3 seeds)#713

Record: 10L + Batched LoRA TTT (mean val_bpb=1.1180, 3 seeds)#713
hypery11 wants to merge 1 commit intoopenai:mainfrom
hypery11:submission/2026-03-25_10L_LoRA_TTT_Record

hypery11 commented Mar 25, 2026

Uh oh!

dexhunter commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hypery11 commented Mar 25, 2026

Results

Method

Uh oh!

dexhunter commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants