Skip to content

Non-record: JEPA v3 — span-masked I-JEPA + VICReg, val_bpb 1.2321#1581

Open
aiejvn wants to merge 1 commit intoopenai:mainfrom
aiejvn:submission-jepa-v3
Open

Non-record: JEPA v3 — span-masked I-JEPA + VICReg, val_bpb 1.2321#1581
aiejvn wants to merge 1 commit intoopenai:mainfrom
aiejvn:submission-jepa-v3

Conversation

@aiejvn
Copy link
Copy Markdown

@aiejvn aiejvn commented Apr 13, 2026

Non-record: JEPA v3 — span-masked I-JEPA + VICReg, val_bpb 1.2321

Builds on PR #1330 (JEPA v2 — why same-sequence next-k JEPA collapses in causal LMs). Two additions:

Span-masked JEPA: The context encoder sees target spans replaced with a learned mask embedding (jepa_mask_emb) rather than the actual tokens — the target encoder sees the full unmasked sequence. This makes prediction genuinely hard: the context encoder cannot recover the target token from its own input and must rely on surrounding context. Bigram hash contributions are explicitly zeroed at masked positions to prevent the Cantor hash from leaking token identity. Span lengths are sampled from Geometric(mean=16) with 4 spans per sequence (~6% masked per step).

VICReg anti-collapse regularization: Variance hinge and off-diagonal covariance penalty (V-JEPA style) are applied to the predictor-side representations at masked positions. This prevents the predictor from collapsing to a single point or low-rank subspace independently of the span masking. Target-side VICReg terms are monitored as diagnostics only — no gradient.

Optimizer bug fix (v2 regression): In v2, JEPAPredictor and jepa_mask_emb were absent from all three optimizer groups — only base_model.blocks was iterated (verifiable in b4a428b). The predictor was frozen at zero-init for the entire v2 run. Fixed by explicitly routing predictor matrix params to Muon and scalar params to Adam.

Non-record reason: Trained ~20hr on 1× AWS A10G.

Submission val_bpb
This (JEPA v3, span-masked + VICReg) 1.2321
PR #1330 (JEPA v2, next-k, no JEPA path) 1.4617
PR #1330 (JEPA v2, next-k, with JEPA) 1.6047

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant