Skip to content

Record: Doc-Isolated TTT + Eval Optimizations#964

Open
vivekvar-dl wants to merge 1 commit intoopenai:mainfrom
vivekvar-dl:submission/doc-isolated-ttt
Open

Record: Doc-Isolated TTT + Eval Optimizations#964
vivekvar-dl wants to merge 1 commit intoopenai:mainfrom
vivekvar-dl:submission/doc-isolated-ttt

Conversation

@vivekvar-dl
Copy link
Copy Markdown

Summary

Built on PR #549 (LeakyReLU² + Legal TTT + Parallel Muon, 1.1194 BPB).

  • Document-Isolated TTT: Reset TTT optimizer state at BOS document boundaries to prevent cross-document contamination. PR Non-record: 11L Depth Recurrence + High-Yield Legal TTT (1.14458 BPB) #461 showed -0.011 BPB from doc isolation alone — never applied to the frontier architecture.
  • Temperature scaling: Grid search T=0.90-1.00 on quantized model at eval time.
  • Base architecture unchanged: 11L/512d, LeakyReLU(0.5)², XSA4, Parallel Muon, GPTQ-lite int6+LZMA.

Status

Work in progress. Requesting compute credits for 8xH100 validation runs.

Dev validation (1xH100 NVL)

  • Base architecture reproduces correctly (1.39 BPB at 920 steps, consistent with 1xH100 scaling)
  • Tested and rejected: sp4096 vocab (per-token loss overtakes tokens_per_byte gain at convergence), NorMuon, ProRes

Target

1.09-1.11 BPB (pending 8xH100 validation)

Test plan

  • 3-seed validation on 8xH100 SXM
  • Statistical significance (p < 0.01 for 0.005-nat improvement)
  • Verify artifact under 16MB

Built on PR openai#549 stack. Adds document-isolated TTT (reset optimizer
at BOS boundaries) and temperature scaling. Pending 8xH100 validation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant