RecurLoRA: Quantization-Stable Shallow Recurrence with Low-Rank Corrective Adapters#1181
Open
Tanush1912 wants to merge 3 commits intoopenai:mainfrom
Open
RecurLoRA: Quantization-Stable Shallow Recurrence with Low-Rank Corrective Adapters#1181Tanush1912 wants to merge 3 commits intoopenai:mainfrom
Tanush1912 wants to merge 3 commits intoopenai:mainfrom
Conversation
Novel contribution: shallow recurrence (layers 4,5 repeated once each) with rank-2 LoRA corrections on attention projections, RMSNorm before repeat, and learnable alpha scaling. 13 virtual layers from 11 physical layers at 28KB (0.18%) parameter overhead. Hyperparameter changes from PR openai#1179 base (1.1105 BPB): - NEGATIVE_SLOPE: 0.5 -> 0.9 (validated +0.013 BPB in issue openai#140) - QK_GAIN_INIT: 1.5 -> 4.0 (validated +0.006 BPB in PR openai#1176) - TTT_ENABLED: 1 (score-first, legal variant) - WARMDOWN_ITERS: 4000 (extended from 3500) - BIGRAM_DIM: 160 (from 112) Status: WIP - awaiting compute for 3-seed validation runs.
Both A and B matrices now initialized with N(0, 1e-3) instead of one being zero. This ensures all LoRA parameters receive gradients from step 1, critical in a 600s training budget where delayed activation wastes precious optimization steps. Alpha default raised from 0.4 to 0.6 to amplify early correction signal.
- Rename submission folder to RecurLoRA_Slope09_QKGain4_TTT - Rewrite README: lead with architectural contribution, add scaling hypothesis, constraint-aware framing, prior failure table - Fix LoRA gradient flow description (warm-init, not cold-start) - Update submission.json title to match
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Why this direction
Weight sharing has consistently failed in this competition due to quantization error accumulation across repeated layers (e.g. PR #363: +4.3 BPB at 3 cycles).
However, PR #686 demonstrated that shallow recurrence (<=2 repeats) remains stable under int6 quantization (~1.1182 BPB), suggesting that limited reuse is viable.
RecurLoRA builds on this by introducing per-pass low-rank corrective adapters:
This enables increased effective depth (11 -> 13 layers) without incurring the instability of deep recurrence, effectively reallocating parameters from duplicated layers into increased depth under a fixed 16MB budget.
Status
Implementation complete and validated for:
Full training runs (3 seeds + ablations) queued pending compute.
Test plan