Skip to content

11L SmearGate + BigramHash(10240) + Causal TTT + Mixed Int5/Int6 + SWA#322

Draft
romainsantoli-web wants to merge 1 commit intoopenai:mainfrom
romainsantoli-web:champion-11L-smeargate-bigram-ttt
Draft

11L SmearGate + BigramHash(10240) + Causal TTT + Mixed Int5/Int6 + SWA#322
romainsantoli-web wants to merge 1 commit intoopenai:mainfrom
romainsantoli-web:champion-11L-smeargate-bigram-ttt

Conversation

@romainsantoli-web
Copy link

11L SmearGate + BigramHash(10240) + Causal TTT + Mixed Int5/Int6 + SWA

val_bpb: pending - awaiting RunPod credits for 8xH100 validation runs.

Approach

Combines the strongest proven techniques from top submissions into a unified architecture targeting sub-1.135 val_bpb:

Status

Code complete and syntax-validated. Requesting compute credits to run 3-seed validation on 8xH100.

Expected Results

Based on ablation data from PRs #162, #180, #267, #281 - target ~1.133-1.137 val_bpb.

Built on PR #162 by @unnir and PR #180 by @thwu1.

…its)

Combines techniques from PR openai#162, openai#180, openai#267, openai#281:
- 11-layer GPT with U-Net skip connections, GQA
- SmearGate + BigramHash(10240)
- Mixed int5/int6 quantization + 3% magnitude pruning
- Causal TTT at eval time
- SWA(frac=0.4), WD=0.042, Z-loss
- Target: sub-1.135 val_bpb

Awaiting RunPod 8xH100 credits for 3-seed validation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant