Neural Cache: Cross-Window KV Caching for Extended Eval Context (research proposal) by sseanliu · Pull Request #318 · openai/parameter-golf

sseanliu · 2026-03-21T06:35:35Z

Summary

Research proposal for a novel eval-time technique: Neural Cache — caching K/V pairs across sliding windows to extend effective context from 2,048 to 50K+ tokens.

Zero artifact cost — no model changes, no extra parameters
Backward-looking only — caches K/V from already-evaluated tokens (rule-compliant)
Leverages existing flash_attn — FA3 natively supports seqlen_k > seqlen_q
Complementary to sliding window — sliding window gives overlapping context within 2K; Neural Cache extends beyond 2K

How it works

Standard sliding window processes each window independently. Neural Cache maintains a per-layer KV cache that grows as evaluation proceeds — each new window's queries attend to both current context AND cached K/V from previous windows.

Status

Implementation provided but untested — encountered a torch.compile state interaction bug that prevented valid results before running out of compute budget. The fix (use freshly loaded eval_model instead of compiled base_model) is identified in the code.

Base model: PR #287 reproduction at 1.1284 BPB (7,009 steps @ 85.6ms/step, 8xH100 SXM with FA3).

Prior work by this author

PR [Non-record] Meta-Learned TTT + Error-Guided Adaptation Analysis (val_bpb=1.1645) #296: Reptile meta-learned TTT (0.011 BPB gain on SmearGate models, 10x naive TTT)
PR [Non-record] XSA + EMA + TTT: Negative interaction study (val_bpb=1.1436) #303: XSA+EMA+TTT negative interaction study (definitive: TTT hurts by 0.016 on XSA+EMA base)

Test plan

Someone with compute: run eval_neural_cache.py on a trained model with cache=0 vs cache=4096
Compare BPB to validate whether cross-window context improves compression
Test different cache sizes and layer subsets

Generated with Claude Code

…ded eval context Non-record research submission. Proposes caching K/V pairs across sliding windows to extend effective context from 2K to 50K+ tokens at eval time. Backward-looking, zero artifact cost, rule-compliant. Implementation provided but untested due to compute constraints. Base: PR openai#287 reproduction at 1.1284 BPB.

notapplica mentioned this pull request Mar 21, 2026

Parameter Golf Live AI Commentary + Analysis / Ideas | every 10 minutes #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Neural Cache: Cross-Window KV Caching for Extended Eval Context (research proposal)#318

Neural Cache: Cross-Window KV Caching for Extended Eval Context (research proposal)#318
sseanliu wants to merge 1 commit intoopenai:mainfrom
sseanliu:submission/neural-cache-research

sseanliu commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sseanliu commented Mar 21, 2026

Summary

How it works

Status

Prior work by this author

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant