Skip to content

fix: batch ChromaDB reads to prevent silent truncation on large palaces#132

Closed
SoundMindsAI wants to merge 1 commit intoMemPalace:mainfrom
SoundMindsAI:fix/batch-chromadb-reads
Closed

fix: batch ChromaDB reads to prevent silent truncation on large palaces#132
SoundMindsAI wants to merge 1 commit intoMemPalace:mainfrom
SoundMindsAI:fix/batch-chromadb-reads

Conversation

@SoundMindsAI
Copy link
Copy Markdown

@SoundMindsAI SoundMindsAI commented Apr 7, 2026

Summary

Closes #40.

ChromaDB's col.get() without an explicit limit applies a small internal default (currently 10). On palaces with more drawers than that default, every read-only tool — mempalace status, Layer1 wake-up, and all MCP read tools (status, list_wings, list_rooms, get_taxonomy, diary_read) — silently returns incomplete results. Users see missing wings, undercounted rooms, and truncated Layer1 context with no error or warning. The miner.py status function had a limit=10000 that would similarly break once a palace exceeded 10k drawers.

PR #120 attempted to address these issues alongside several other fixes but was closed. PR #114 and #119 fixed related problems (shell injection, ChromaDB pin) but did not address the unbounded read truncation.

  • Add chromadb_utils.get_all() helper that paginates in safe batches, replace all unbounded col.get() call sites
  • Fix tilde expansion in config.py palace_path resolution (paths like ~/.mempalace/palace from config file or env var were passed unexpanded to ChromaDB, causing "palace not found" errors)

Changes

New file: mempalace/chromadb_utils.py — batched get_all() utility

Fixed call sites:

  • cli.py — status command
  • layers.py — Layer1 generation
  • mcp_server.pytool_status, tool_list_wings, tool_list_rooms, tool_get_taxonomy, tool_diary_read
  • miner.pystatus() (previously hardcoded limit=10000)

Config fix: config.pyos.path.expanduser() on palace_path from file config and env var

Documentation: Added chromadb_utils.py to file reference tables in README.md and mempalace/README.md

Test plan

  • test_chromadb_utils.py — 7 tests: all records, docs+meta, where filter, empty collection, batching, no duplicates, filtered pagination
  • test_layers.py — 2 tests: Layer1 pulls from all rooms, wing filter excludes other wings
  • test_mcp_server_reads.py — 9 tests: status, list_wings, list_rooms, taxonomy, diary read (all entries + empty)
  • test_miner_status.py — 2 tests: accurate drawer count, multiple wings reported
  • test_config.py — 2 new tests: tilde expansion from file config and env var
  • Full suite: 42 passed, ruff clean

col.get() without an explicit limit applies ChromaDB's small internal
default, silently dropping drawers from status, Layer1, MCP read tools,
miner status, and diary reads.  Replace all unbounded col.get() calls
with a new get_all() helper that paginates in safe batches.

Also fix tilde expansion in config.py palace_path resolution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@bensig
Copy link
Copy Markdown
Collaborator

bensig commented Apr 7, 2026

Duplicate of #66 which covers batched ChromaDB reads. Closing.

@bensig bensig closed this Apr 7, 2026
@SoundMindsAI
Copy link
Copy Markdown
Author

@bensig Thanks for the quick review. I took a closer look at #66 and I believe this PR covers significantly more ground — would you mind reconsidering?

What #66 fixes: Inline batching in cli.py and layers.py (2 call sites). No automated tests.

What this PR fixes in addition:

  • mcp_server.py — 5 unbounded col.get() calls across tool_status, tool_list_wings, tool_list_rooms, tool_get_taxonomy, and tool_diary_read
  • miner.pystatus() had a hardcoded limit=10000 that would miss wings/rooms beyond 10k drawers
  • config.pypalace_path with ~ was passed unexpanded to ChromaDB, causing "palace not found" errors
  • Centralized chromadb_utils.get_all() utility so future call sites don't reimplement batching inline
  • 22 automated tests across 5 test files (all pass, no network/API required)
  • Closes status command shows max 10,000 drawers (hardcoded limit) + palace_path not expanded #40

Happy to adjust anything if needed — just wanted to make sure the additional coverage wasn't lost.

@SoundMindsAI
Copy link
Copy Markdown
Author

Rebased on top of #66 and opened #146 with the non-overlapping changes only. The new PR does not touch cli.py or layers.py — it covers the remaining call sites (#66 did not fix), the limit=10000 cap in miner.py, the tilde expansion bug in config.py, and adds 22 automated tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

status command shows max 10,000 drawers (hardcoded limit) + palace_path not expanded

2 participants