Skip to content

Record: 11L NonTTT VR+GA MixedInt5/6: val_bpb=1.1428 (3-seed, 8xH100)#516

Closed
Asukabot0 wants to merge 2 commits intoopenai:mainfrom
Asukabot0:submission/nonttt-vr-ga-mixed-quant-3seed
Closed

Record: 11L NonTTT VR+GA MixedInt5/6: val_bpb=1.1428 (3-seed, 8xH100)#516
Asukabot0 wants to merge 2 commits intoopenai:mainfrom
Asukabot0:submission/nonttt-vr-ga-mixed-quant-3seed

Conversation

@Asukabot0
Copy link
Copy Markdown

Summary

Non-TTT submission with Value Residual (ResFormer) + Gated Attention + Mixed Int5/Int6 quantization.

3-seed mean val_bpb: 1.1428 (sliding window, stride=64, int6 roundtrip)

Approach

Built on PR #315 base (EMA + Partial RoPE + LN Scale) with two architectural additions:

Key config

11L × 512 × 3xMLP, XSA4, EMA(0.997), Partial RoPE(16/64), LN Scale
Muon(momentum=0.99, lr=0.025, wd=0.04), warmdown=3000
TTT=OFF, Late QAT=OFF
Mixed int5/int6 + zstd-22

Results

Seed Steps ms/step val_bpb Artifact Bytes <16MB?
1337 4622 129.82 1.14280 16,026,184 NO
42 4623 129.79 1.14281 16,339,774 NO
2025 4632 129.55 1.14287 16,244,044 NO
Mean 4626 129.72 1.14283 16,203,334 NO

Known Issues & Next Steps

This submission has a DDP performance regression we've identified and are fixing:

  1. find_unused_parameters=True in DDP causes +53% step time (130ms vs expected ~85ms), reducing steps from ~7000 to ~4600. VR and GA params all receive gradients — this flag is unnecessary. Fix: set find_unused_parameters=False.
  2. Artifact slightly exceeds 16MB (by 26–340KB). Fix: more aggressive int5 layer selection or magnitude pruning.

With these fixes we expect:

  • Step speed: ~85ms → ~7000 steps in 600s
  • val_bpb: ~1.12 range (matching single-GPU extrapolation)
  • Artifact: <16MB

Will submit an updated run after fixes.

Hardware

  • 8× H100 SXM 80GB
  • 600s wallclock (all seeds completed within budget)
  • Peak memory: ~27 GiB per GPU

Asukabot0 and others added 2 commits March 23, 2026 16:18
PR#315 config + Value Residual + Gated Attention + XSA4 + EMA
Mixed int5/int6 quantization for artifact <16MB
Defaults set for 8xH100 600s competition run

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3-seed mean val_bpb: 1.14283 (sliding window exact, int6)
Seeds: 1337/42/2025, all completed within 600s wallclock on 8xH100.
Note: artifacts exceed 16MB limit (16.0–16.3MB).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Asukabot0 Asukabot0 closed this Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant