Non-record: MDLM Masked Diffusion + Depth Recurrence — val_bpb 1.3428 (8×H100, seed=1337) by He-Wenhao · Pull Request #1582 · openai/parameter-golf

He-Wenhao · 2026-04-13T01:52:08Z

Summary

val_bpb: 1.3428 (int8+zlib roundtrip) | 14.73 MB | 8×H100 SXM, 600s | Beats #1403 by 0.0057 BPB

Extends the MDLM baseline (#1403) with depth recurrence and quantization improvements.

Stack

Depth recurrence: physical layers L1–L3 looped 1× extra → 12 effective layers / 9 physical layers
QAT (STE): straight-through quantization at lr_scale < 0.40 (~last 480 steps of 8,049 total)
EMA (decay=0.997) applied before serialization
GPTQ-lite: 5-candidate percentile clip search (99.9%→100%) per row, min-MSE selection
Linear LR → 0 (Muon warmdown), relu² MLP, Muon WD=0.01

Results (8×H100 SXM, seed=1337, 600s)

Metric	This	#1403
Pre-quant val_bpb	1.3379	1.3409
Post-roundtrip val_bpb	1.3428	1.3485
Quant penalty	0.0049	0.0076
Artifact	14.73 MB	15.63 MB
Steps	8,049	11,808
ms/step	74.6 ms	50.8 ms

EMA + GPTQ-lite cuts quant penalty from 0.0076 → 0.0049. Depth recurrence improves pre-quant quality (1.3379 vs 1.3409) even with fewer steps, because ~12 effective layers of compute per forward pass.

Extends PR openai#1403 MDLM baseline with depth recurrence (L1-L3 looped 1x extra = 12 effective layers), QAT/STE, EMA decay=0.997, GPTQ-lite clip search, linear LR->0, relu^2 MLP, Muon WD=0.01. val_bpb: 1.3428 | quant penalty: 0.0049 | artifact: 14.73 MB 8xH100 SXM, 600s, seed=1337 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: MDLM Masked Diffusion + Depth Recurrence — val_bpb 1.3428 (8×H100, seed=1337)#1582

Non-record: MDLM Masked Diffusion + Depth Recurrence — val_bpb 1.3428 (8×H100, seed=1337)#1582
He-Wenhao wants to merge 1 commit intoopenai:mainfrom
He-Wenhao:submission/mdlm-depth-recurrence

He-Wenhao commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

He-Wenhao commented Apr 13, 2026

Summary

Stack

Results (8×H100 SXM, seed=1337, 600s)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant