Neural Cache: Cross-Window KV Caching for Extended Eval Context (research proposal)#318
Open
sseanliu wants to merge 1 commit intoopenai:mainfrom
Open
Neural Cache: Cross-Window KV Caching for Extended Eval Context (research proposal)#318sseanliu wants to merge 1 commit intoopenai:mainfrom
sseanliu wants to merge 1 commit intoopenai:mainfrom
Conversation
…ded eval context Non-record research submission. Proposes caching K/V pairs across sliding windows to extend effective context from 2K to 50K+ tokens at eval time. Backward-looking, zero artifact cost, rule-compliant. Implementation provided but untested due to compute constraints. Base: PR openai#287 reproduction at 1.1284 BPB.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Research proposal for a novel eval-time technique: Neural Cache — caching K/V pairs across sliding windows to extend effective context from 2,048 to 50K+ tokens.
How it works
Standard sliding window processes each window independently. Neural Cache maintains a per-layer KV cache that grows as evaluation proceeds — each new window's queries attend to both current context AND cached K/V from previous windows.
Status
Implementation provided but untested — encountered a
torch.compilestate interaction bug that prevented valid results before running out of compute budget. The fix (use freshly loadedeval_modelinstead of compiledbase_model) is identified in the code.Base model: PR #287 reproduction at 1.1284 BPB (7,009 steps @ 85.6ms/step, 8xH100 SXM with FA3).
Prior work by this author
Test plan
eval_neural_cache.pyon a trained model with cache=0 vs cache=4096Generated with Claude Code