Skip to content

Non-record: 11L Partial RoPE + XSA4 + VE128 + Tight SWA + GPTQ-lite (val_bpb=1.1804)#543

Open
rarce wants to merge 1 commit intoopenai:mainfrom
rarce:submission/2026-03-23_PR374Stack_GPTQ
Open

Non-record: 11L Partial RoPE + XSA4 + VE128 + Tight SWA + GPTQ-lite (val_bpb=1.1804)#543
rarce wants to merge 1 commit intoopenai:mainfrom
rarce:submission/2026-03-23_PR374Stack_GPTQ

Conversation

@rarce
Copy link
Copy Markdown

@rarce rarce commented Mar 23, 2026

Summary

val_bpb: 1.1804 (post-quant, single seed) | 15.95 MB artifact | 8×H100 SXM, 615s

Non-record submission documenting systematic combination of PR #374 frontier techniques with MLP width optimization and GPTQ-lite quantization.

Key Techniques

Technique Source Impact
Partial RoPE (16/64 dims) PR #315 Position-free 75% of head dims
LN Scale (1/sqrt(i+1)) PR #315 Damps deeper layers
XSA on last 4 layers PR #265, #287 GQA-aware self-value debiasing
Shared VE128 (layers 9,10) PR #374 Value embedding injection
Tight SWA (scale<0.2) PR #374 Zero-penalty weight averaging
Late QAT (lr_scale<0.1) PR #297 Avoids Muon momentum corruption
GPTQ-lite (clip search) PR #379 Per-tensor optimal clip ratio
MLP hidden=1408 Novel Faster steps → more training in 10min
Int6 layers 1-9 + int8 0,10 Reference Mixed precision quantization
zstd-22 Standard ~35% better than zlib

Novel Contribution

MLP hidden=1408 vs 1536: Narrower MLP fits in 16MB while enabling 33% more training steps (137ms vs 178ms/step). The extra 1000 steps more than compensate for reduced per-step capacity:

  • MLP 1536: 3061 steps, val_bpb 1.1958, 18MB (over limit)
  • MLP 1408: 4071 steps, val_bpb 1.1804, 15.95MB (under limit)

Metrics

Metric Value
Pre-quant val_bpb 1.1770
Post-quant val_bpb 1.1804
Quant gap +0.0034
Steps 4,071 @ 137ms/step
Parameters 25,224,291
Artifact 15,949,473 bytes

Test plan

  • Artifact under 16MB (15.95MB)
  • Trains in 615s on 8×H100 SXM
  • Post-quant roundtrip verified
  • train_gpt.py compiles and runs from records/ folder
  • Train log included
  • Multi-seed validation (single seed, budget constrained)

@rarce rarce force-pushed the submission/2026-03-23_PR374Stack_GPTQ branch from 81ea3ef to 9096bd9 Compare March 23, 2026 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant