Skip to content

Neural Cache: Cross-Window KV Caching for Extended Eval Context (research proposal)#318

Open
sseanliu wants to merge 1 commit intoopenai:mainfrom
sseanliu:submission/neural-cache-research
Open

Neural Cache: Cross-Window KV Caching for Extended Eval Context (research proposal)#318
sseanliu wants to merge 1 commit intoopenai:mainfrom
sseanliu:submission/neural-cache-research

Conversation

@sseanliu
Copy link

Summary

Research proposal for a novel eval-time technique: Neural Cache — caching K/V pairs across sliding windows to extend effective context from 2,048 to 50K+ tokens.

  • Zero artifact cost — no model changes, no extra parameters
  • Backward-looking only — caches K/V from already-evaluated tokens (rule-compliant)
  • Leverages existing flash_attn — FA3 natively supports seqlen_k > seqlen_q
  • Complementary to sliding window — sliding window gives overlapping context within 2K; Neural Cache extends beyond 2K

How it works

Standard sliding window processes each window independently. Neural Cache maintains a per-layer KV cache that grows as evaluation proceeds — each new window's queries attend to both current context AND cached K/V from previous windows.

Status

Implementation provided but untested — encountered a torch.compile state interaction bug that prevented valid results before running out of compute budget. The fix (use freshly loaded eval_model instead of compiled base_model) is identified in the code.

Base model: PR #287 reproduction at 1.1284 BPB (7,009 steps @ 85.6ms/step, 8xH100 SXM with FA3).

Prior work by this author

Test plan

  • Someone with compute: run eval_neural_cache.py on a trained model with cache=0 vs cache=4096
  • Compare BPB to validate whether cross-window context improves compression
  • Test different cache sizes and layer subsets

Generated with Claude Code

…ded eval context

Non-record research submission. Proposes caching K/V pairs across sliding
windows to extend effective context from 2K to 50K+ tokens at eval time.
Backward-looking, zero artifact cost, rule-compliant. Implementation provided
but untested due to compute constraints. Base: PR openai#287 reproduction at 1.1284 BPB.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant