fix: batch remaining unbounded ChromaDB reads missed by #66#146
fix: batch remaining unbounded ChromaDB reads missed by #66#146SoundMindsAI wants to merge 2 commits intoMemPalace:developfrom
Conversation
Field report: 109,574 drawers on macOS ARM64We're running mempalace v3.0.0 as the cross-session memory layer for a large technical project (86K drawers in one wing alone, 109K total). Here's what we hit and how we patched it — hopefully useful context for this PR. The bug
PR #137's Our workaroundWe patched def _query_wing_room_counts():
"""Direct SQL against ChromaDB's SQLite — no variable limit."""
db = _sqlite_path()
conn = sqlite3.connect(db)
rows = conn.execute("""
SELECT json_extract(string_value, '$.wing') as wing,
json_extract(string_value, '$.room') as room,
COUNT(*) as cnt
FROM embedding_metadata
WHERE key = 'chroma:document' -- or appropriate metadata key
GROUP BY wing, room
""").fetchall()
conn.close()
return rowsThis is O(1) in memory regardless of palace size, returns in ~200ms for 109K drawers, and never hits the SQLite variable limit. Why batched
|
fea9789 to
0f39f7d
Compare
|
@RobertoGEMartin Thanks for the detailed field report — this is exactly the kind of real-world validation we needed. 109K drawers across 8 wings is a serious stress test. Good to hear that Your point about Would you be willing to test this branch against your 109K-drawer palace? That would give us confidence before merging. You can install directly from the branch: Specifically interested in:
Appreciate the offer to help validate. |
0f39f7d to
0aed178
Compare
|
Rebased onto latest main — merge conflicts resolved, all 546 tests passing, linting clean. Ready for review whenever you are. |
b5c7f6d to
5d6b706
Compare
web3guru888
left a comment
There was a problem hiding this comment.
🔧 Review of #146 — fix: batch remaining unbounded ChromaDB reads missed by #66
Scope: +636/−21 · 12 file(s) · touches core
README.md(modified: +1/−0)mempalace/README.md(modified: +1/−0)mempalace/chromadb_utils.py(added: +55/−0)mempalace/config.py(modified: +2/−2)⚠️ mempalace/mcp_server.py(modified: +13/−14)mempalace/miner.py(modified: +2/−1)tests/benchmarks/test_layers_bench.py(modified: +3/−3)tests/test_chromadb_utils.py(added: +132/−0)tests/test_config.py(modified: +19/−0)tests/test_layers.py(modified: +93/−1)tests/test_mcp_server_reads.py(added: +222/−0)tests/test_miner_status.py(added: +93/−0)
Technical Analysis
- 🔌 MCP server dispatch changes — verify JSON-RPC compliance and backward compatibility
- 🪟 Windows compatibility — verify path handling works cross-platform
- 🧠 Retrieval scoring changes — verify backward compatibility with existing Synapse configs
Issues
⚠️ Touchesmempalace/mcp_server.py— Core MCP server — maintainer guards this closely
Suggestions
- Magic number(s) 1200, 2026 — consider extracting to named constant(s)
Strengths
- ✅ Includes test coverage
🟡 Needs attention — touches guarded files and has items to address.
🏛️ Reviewed by MemPalace-AGI · Autonomous research system with perfect memory · Showcase: Truth Palace of Atlantis
5d6b706 to
ecd63f7
Compare
|
Hey! 👋 Just rebased this branch onto the latest mcp_server.py — miner.py — Kept develop's full test_mcp_server_reads.py — Fixed a small test isolation issue that surfaced after the rebase: Everything is green — ruff clean and all 908 tests passing. Would appreciate a review when you get a chance! |
ecd63f7 to
f74eb90
Compare
col.get() without an explicit limit applies ChromaDB's small internal default, silently dropping drawers from status, Layer1, MCP read tools, miner status, and diary reads. Replace all unbounded col.get() calls with a new get_all() helper that paginates in safe batches. Also fix tilde expansion in config.py palace_path resolution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
f74eb90 to
e9508a0
Compare
Summary
Closes #40.
#66 fixed unbounded
col.get()calls incli.pyandlayers.py. This PRfixes the same class of bug in the remaining call sites that #66 did not cover.
There is zero overlap — this PR does not touch
cli.pyorlayers.py.See #132 for prior discussion.
What this fixes
mcp_server.py— 5 unboundedcol.get()calls:tool_status,tool_list_wings,tool_list_rooms,tool_get_taxonomy,tool_diary_readminer.py—status()had a hardcodedlimit=10000that silently droppedwings and rooms filed after the first 10k drawers
config.py—palace_pathcontaining~was passed unexpanded to ChromaDB,causing "palace not found" errors
Also adds
chromadb_utils.get_all(), a centralized batched-read utility so futurecall sites don't need to reimplement pagination inline.
Changes
mempalace/chromadb_utils.pyget_all()utilitymempalace/mcp_server.pycol.get()→get_all()mempalace/miner.pylimit=10000→get_all()mempalace/config.pyos.path.expanduser()on palace_pathREADME.mdchromadb_utils.pyto file reference tablemempalace/README.mdchromadb_utils.pyto module tableTest plan
Tests for this PR's changes:
test_chromadb_utils.py— 7 tests: all records, docs+meta, where filter, empty collection, batching, no duplicates, filtered paginationtest_mcp_server_reads.py— 9 tests: status, list_wings, list_rooms, taxonomy, diary read (all entries + empty)test_miner_status.py— 2 tests: accurate drawer count, multiple wings reportedtest_config.py— 2 new tests: tilde expansion from file config and env varRegression tests for #66's batched reads (no existing coverage):
test_layers.py— 2 tests: Layer1 pulls from all rooms, wing filter excludes other wingsFull suite: 49 passed, ruff clean