-
Notifications
You must be signed in to change notification settings - Fork 70
feat(localFileStore): Add Cache Layer to LocalFileStore to Avoid I/O Overhead from Read Operations #1274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: CLFutureX <[email protected]>
|
@xingyaoww hey,PTAL, thanks |
|
@xingyaoww hey,Could you please look at it again when you have a chance? |
| full_path = self.get_full_path(path) | ||
|
|
||
| if full_path in self.cache.files: | ||
| return self.cache.read(full_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might overwrite files like base_state.json - do you think this will handle it?
@OpenHands please investigate this repo, and find evidence about the above sentence.
Also, another reason we try to do this file-backed event storage is that on rare occasions, there could be too many events in a conversation that could OOM the machine / slow down the entire application.
So if we want to add cache, it would be a TTL cache that can be automatically maintained and evicted when appropriate -- OpenHands, can you look into and tell me if this is a good idea? Do not edit any code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on it! xingyaoww can track my progress at all-hands.dev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Investigation Complete: Cache Implementation Analysis for PR #1274
I've completed a thorough investigation of the codebase to address your concerns about file overwrites and the proposed cache implementation. Here are my findings:
Answers to Your Questions
Q1: "Will the cache handle base_state.json overwrites correctly?"
✅ YES - The current PR implementation handles overwrites correctly because:
base_state.jsonIS overwritten frequently - I found 14 assignments toexecution_statusalone, which trigger auto-save via__setattr__instate.py:236-257- All writes go through LocalFileStore - The cache is updated on line 52 of the PR:
self.cache.write(full_path, contents) - No bypass paths exist - I verified that no code writes directly to the filesystem for persistence (only one
open()call exists inremote_conversation.pyfor LLM logs in a separate directory)
However, there's a theoretical risk if future code bypasses LocalFileStore.
Q2: "Is a TTL cache a good idea?"
❌ NO - TTL cache is not appropriate for this use case because:
- Access patterns are sequential, not time-based - Events are accessed via iteration through the event log, not at random times
- No staleness issue - Files are immutable once written; they don't become "stale"
- Unpredictable eviction - TTL could evict files just before they're needed again during iteration
- Complexity without benefit - Requires tuning TTL values for different workloads
Q3: "Will the current PR cause OOM?"
- Unbounded cache growth -
InMemoryFileStoreuses a simpledict[str, str]with no eviction policy - Memory estimate for long conversations:
- 1,000 events = 1-5MB
- 10,000 events = 10-50MB
- Grows indefinitely with no upper bound
- Defeats the original design - File-backed storage was intentionally designed to prevent OOM in long-running conversations
Recommendations
🎯 Recommended: Selective Caching (Simplest & Safest)
Implementation:
- Only cache
base_state.json(the frequently rewritten file) - Do NOT cache event files
- Max memory footprint: ~5KB (one small JSON file)
Benefits:
- ✅ Prevents OOM
- ✅ Optimizes the main pain point (40+ rewrites per conversation)
- ✅ Preserves file-backed event storage design
- ✅ No external dependencies
- ✅ Minimal code change (~10 lines)
Alternative: LRU Cache with Size Limit
If event caching is needed:
- Use
cachetools.LRUCache(maxsize=100) - Keeps most recently accessed N files
- Bounded memory usage (100 files = ~500KB-5MB)
- Better matches access patterns than TTL
Evidence Documentation
I've created two detailed documents for your review:
/workspace/project/cache_analysis.md- Comprehensive analysis with code examples and recommendations/workspace/project/evidence_summary.md- Visual evidence chain showing file overwrite behavior and memory growth scenarios
Key findings include:
- Traced the exact execution path: Field change →
__setattr__→_save_base_state()→fs.write()→ cache update - Documented 14
execution_statusstate transitions in the codebase - Verified no external file writes bypass the FileStore interface
- Analyzed EventLog access patterns and memory growth scenarios
Summary Table
| Approach | Memory | Performance | OOM Risk | Recommendation |
|---|---|---|---|---|
| Current PR | Unbounded | Best | HIGH | ❌ Not safe |
| No cache (main) | Minimal | Poor (I/O) | None | ✅ Safe but slow |
| Selective cache | ~5KB | Good | None | ✅✅ BEST |
| LRU cache | Bounded | Good | None | ✅ Good alternative |
| TTL cache | Unbounded* | Good | Medium | ❌ Not appropriate |
Conclusion
The current PR correctly handles base_state.json overwrites, but introduces an OOM risk. I recommend implementing selective caching (only base_state.json) as the safest and simplest solution that addresses the performance concern without introducing memory risks.
No code changes were made per your request - this was an investigation and analysis only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
☝️ oh yeah, i meant LRU cache, not TTL 🤕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
☝️ oh yeah, i meant LRU cache, not TTL 🤕
It is indeed necessary to balance efficiency and memory usage here, so I've already made the adjustments based on cachetools.
Signed-off-by: CLFutureX <[email protected]>
…gent-sdk into feat_filestore_cache
Background:
In the current implementation of LocalFileStore, the read method fetches data from local files in real time every time it is called. This results in significant I/O overhead when iterating through events repeatedly.
Optimization:
Leverage the existing InMemoryFileStore as the cache data structure to implement a caching layer.