Commit 4fb6969
committed
Replace LoRA TTT with 30ep cosine full-model TTT in 16L XSA-all submission
Swap score-first LoRA TTT for the simpler and more effective cosine TTT
approach from PR openai#672 (1.0781 BPB): fine-tune all model weights on val
data for 30 epochs with cosine LR decay and per-layer LR groups (3x
MLP-out, 0.5x MLP-in), followed by sliding-window stride=64 eval.1 parent 5825338 commit 4fb6969
2 files changed
Lines changed: 103 additions & 239 deletions
File tree
- records/track_10min_16mb/2026-03-25_16L_XSAall_GPTQ_EMA_PartialRoPE_TTT
Lines changed: 3 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
| 5 | + | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | | - | |
17 | | - | |
| 16 | + | |
| 17 | + | |
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| |||
0 commit comments