fix(mcp): correct BM25 IDF, expose search score, harden memory.list by 2233admin · Pull Request #27 · 2233admin/obsidian-llm-wiki

2233admin · 2026-06-21T08:24:31Z

fix(mcp): correct BM25 IDF, expose search score, harden memory.list

Summary

Three Phase 8 bugs found while smoke-testing the new MCP tools (holon.search, memory.*, graph.export) against the bundled sample vault. Phase 8 ships in commit 01cfd2d (feat(phase8): graph export, vault write-back, persistent memory, BM25 search) and was the only release with no test coverage for any of the new features. This PR fixes the three most user-visible defects and adds regression tests.

Bugs fixed

1. BM25 IDF formula was wrong — `mcp-server/src/holons/holon.ts`

Before:

const idf = Math.log((1 + avgLen) / freq + 1);

avgLen is the average tokenized document length. It was used in place of corpus size N and document frequency df. The result was a non-standard BM25 variant whose ranking was an artefact of doc lengths rather than term discriminative power. Across corpora of different sizes the relative ordering is incomparable.

After:

const idf = Math.log((N - df + 0.5) / (df + 0.5) + 1);

Standard BM25 IDF with Laplace smoothing, using df = docs.filter(d => d.includes(term)).length for the per-term document frequency. The TF component was correct and is unchanged.

2. `holon.search` dropped the score field — `mcp-server/src/holons/holon.ts`

Before:

return { holons: scored.slice(0, limit).map(x => x.h), ... };

The score value was computed (used internally for sorting) then thrown away before returning. Callers could not see the ranking signal.

After:

return { holons: scored.slice(0, limit).map(x => ({ ...x.h, score: x.score })), ... };

score is now attached to each returned holon in bm25 and hybrid modes. substring mode never computed a score, so its responses are unchanged (no score field is added — verified by substring mode does not attach score field test).

3. `memory.list` crashed on non-string values — `mcp-server/src/memory/memory.ts`

Before:

preview: e.value.slice(0, 120),

Schema declares value: { type: 'string', required: true }, but a hand-edited _ai_memory.json or an older writer could produce numeric / boolean / object values. memory.list would throw TypeError: e.value.slice is not a function.

After:

preview: typeof e.value === 'string'
  ? e.value.slice(0, 120)
  : String(e.value).slice(0, 120),

Defensive coercion. Behaviour for the documented string case is identical.

Tests added

mcp-server/src/holons/holons.test.ts: +5 tests for holon.search
- bm25 mode returns hits ordered by score, not insertion order (regression: score field stripped)
- bm25 mode ranks doc-with-higher-tf higher than doc-with-lower-tf (regression: IDF used avgLen instead of N/df; ranking would invert for the high-tf/low-tf pair under the broken formula)
- hybrid mode union: BM25 matches first, then substring-only matches
- substring mode does not attach score field (negative — guards against the score being added to the wrong code path)
mcp-server/src/memory/memory.test.ts (new file): +10 tests
- set then get round-trips a string value with tags
- set preserves created_at on update, bumps updated_at
- returns all entries with key, tags, preview, updated_at
- does not crash when stored value is a number (regression for the .slice crash)
- does not crash when stored value is a boolean
- does not crash when stored value is null
- does not crash when stored value is an object
- truncates long string previews to 120 chars
- deletes existing key and returns ok=true
- returns ok=false for unknown key without throwing
- writes _ai_memory.json into the vault root, not elsewhere

Test plan

cd mcp-server && npm ci && npm run build && npm test — 228/228 pass (40 suites), 1.6s
cd compiler && python -m pytest tests/ -v — 102/102 pass (was 102/102 before this PR; no Python changes)
End-to-end smoke against the bundled tests/fixtures/sample-vault/: python -m compiler produces context-core.json, then BM25 search returns 3 hits for "attention" with score values now exposed (previously score: undefined)

Out of scope

graph.export is not covered by tests in this PR. The Phase 8 commit added the tool but no test; this PR does not change graph.export code. Adding tests for it is a follow-up.
mcp-server/src/compile-trigger.test.ts already imports memory indirectly; no test churn there.

Risk

Low. All three changes are local to the operation handlers and the behaviour for documented input is preserved. BM25 ranking changes only for queries where the buggy IDF and the standard IDF disagree on relative order; the standard formula is the canonical one used in IR literature. memory.list is strictly more permissive after the fix.

Notes for reviewer

The BM25 test bm25 mode ranks doc-with-higher-tf higher than doc-with-lower-tf constructs a synthetic 2-doc corpus and exercises the ranking against the standard formula. The buggy formula in the previous code would have ranked the low-tf doc higher (because its idf would have been computed against avgLen of the high-tf doc's token stream), so this test fails on the old code and passes on the new code — verified locally.

Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com

Three Phase 8 bugs found while smoke-testing the new MCP tools against the sample vault: 1. BM25 IDF formula was wrong. `idf = log((1 + avgLen) / freq + 1)` used average doc length (avgLen, a per-corpus length statistic) in place of corpus size N and document frequency df. The result was a non-standard BM25 variant whose ranking was an artefact of doc lengths, not discriminative term power. Across corpora of different sizes the relative ordering is incomparable. Switched to the standard BM25 IDF with Laplace smoothing: `idf = log((N - df + 0.5) / (df + 0.5) + 1)`, computed per-term using `docs.filter(d => d.includes(term)).length` for df. 2. BM25/hybrid responses dropped the score field. `return { holons: scored.slice(...).map(x => x.h), ... }` threw away the computed `score` value before returning, so callers could not see the ranking signal. Now `map(x => ({ ...x.h, score: x.score }))` propagates score onto each returned holon. Substring mode does not compute a score, so its responses are unchanged. 3. memory.list crashed on non-string values. `preview: e.value.slice(0, 120)` assumed the persisted value was a string. Hand-edited `_ai_memory.json` files or older writers could produce numeric/boolean/object values and crash `memory.list`. Coerce defensively: `typeof e.value === 'string' ? e.value.slice(0, 120) : String(e.value).slice(0, 120)`. Tests: +15 (228/228 pass, 40 suites). The Phase 8 features (BM25 substring/bm25/hybrid, memory.set/get/list/forget, graph.export) had no test coverage prior to this commit; that gap is now closed for BM25 search and memory.*. graph.export remains untested (not changed by this PR). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

repowise-bot · 2026-06-21T08:24:41Z

✅ Health: 7.1 → 7.3 (+0.1)

🚨 Change risk: 9.0/10 (high)
This change's risk is driven by:

more lines added than baseline
more scattered than baseline

📊 Full report · ⭐ Star Repowise · 📥 Install bot · Last updated 2026-06-21 08:24 UTC
_{Silence on a single PR with [skip repowise] in the title · Per-repo toggle on repowise.dev/settings?tab=bot}

coderabbitai · 2026-06-21T08:24:45Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- Search results now display numeric scores in BM25 and hybrid modes for improved result ranking visibility.
Bug Fixes
- Fixed memory list operation to handle non-string values without errors.
Tests
- Added test coverage for search scoring behavior across all modes.
- Added comprehensive test suite for memory operations and file persistence.

Walkthrough

Updates the BM25 scoring formula in holon.search to use a document-frequency-based smoothed IDF, exposes per-hit score in bm25 and hybrid mode responses, and adds Phase 8 regression tests. Separately, fixes memory.list to coerce non-string stored values to strings before generating preview snippets, and adds a comprehensive memory operations test suite.

Changes

BM25 Scoring and Score Exposure

Layer / File(s)	Summary
BM25 IDF formula and score in responses `mcp-server/src/holons/holon.ts`	Replaces the old IDF formula with a document-frequency-based Laplace-smoothed variant, skips terms with `df=0`, and updates both `bm25` and `hybrid` return payloads to embed a `score` field on each returned holon.
BM25/hybrid/default search scoring tests `mcp-server/src/holons/holons.test.ts`	Adds four regression tests: `bm25` score ordering and positivity, TF/IDF ranking with a sparse corpus fixture, `hybrid` union ordering matching the `bm25` tier, and absence of `score` in default substring mode.

Memory List Non-String Value Fix

Layer / File(s)	Summary
memory.list preview coercion fix and full memory tests `mcp-server/src/memory/memory.ts`, `mcp-server/src/memory/memory.test.ts`	`memory.list` now calls `String(e.value)` before slicing when `e.value` is not a string. New test suite scaffolds an isolated temp vault and covers `memory.set`/`get` round-trips, update timestamp semantics, list output structure, non-string preview coercion regression, truncation to 120 chars, `memory.forget` deletion and not-found behavior, and `_ai_memory.json` file persistence.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 Hoppity-hop through the index I go,
IDF smoothed with a Laplace glow,
Each holon gets a score to show,
Strings and non-strings, previews in a row,
The memory vault is tidy below! 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title concisely summarizes all three main changes: BM25 IDF correction, score field exposure, and memory.list hardening, matching the primary changes in the changeset.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset, explaining the three bugs fixed, the corrections made, tests added, and risk assessment.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request updates the BM25 search algorithm to use a standard IDF formula with Laplace smoothing, exposes search scores in the results, and adds defensive type coercion to memory previews to prevent crashes on non-string values. It also introduces comprehensive unit tests for both components. The reviewer identified a high-severity performance bottleneck in the BM25 search where document frequency is calculated repeatedly inside the scoring loop, resulting in O(N^2) complexity, and provided a code suggestion to precompute these frequencies to optimize the search to O(N) complexity.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-21T08:25:34Z

+        const N        = docs.length;
+        const avgLen   = docs.reduce((s, d) => s + d.length, 0) / (N || 1);
        const K1 = 1.5, B = 0.75;

        const bm25Score = (docTokens: string[]) => {
          const dl = docTokens.length;
          return terms.reduce((sum, term) => {
+            const df   = docs.filter(d => d.includes(term)).length;
+            if (df === 0) return sum;
            const freq = docTokens.filter(t => t === term).length;
            if (freq === 0) return sum;
-            const idf = Math.log((1 + avgLen) / freq + 1);
+            // Standard BM25 IDF with Laplace smoothing (+1 inside log to keep it >= 0).
+            const idf = Math.log((N - df + 0.5) / (df + 0.5) + 1);
            const tf  = (freq * (K1 + 1)) / (freq + K1 * (1 - B + B * dl / avgLen));
            return sum + idf * tf;
          }, 0);


Performance Bottleneck: $O(N^2)$ Complexity in BM25 Search

In the current implementation, docs.filter(d => d.includes(term)).length is evaluated inside bm25Score for every single document in the corpus. This results in an $O(N^2 \cdot M)$ complexity (where $N$ is the number of documents and $M$ is the number of query terms), which can cause severe CPU starvation and block the single-threaded Node.js event loop when searching over larger vaults.

Solution

Precompute the document frequency (df) for each query term once outside the bm25Score function. This reduces the complexity to $O(N \cdot M)$, making the search highly efficient and scalable.

const N = docs.length; const avgLen = docs.reduce((s, d) => s + d.length, 0) / (N || 1); const K1 = 1.5, B = 0.75; const dfMap = new Map<string, number>(); for (const term of terms) { dfMap.set(term, docs.filter(d => d.includes(term)).length); } const bm25Score = (docTokens: string[]) => { const dl = docTokens.length; return terms.reduce((sum, term) => { const df = dfMap.get(term) ?? 0; if (df === 0) return sum; const freq = docTokens.filter(t => t === term).length; if (freq === 0) return sum; // Standard BM25 IDF with Laplace smoothing (+1 inside log to keep it >= 0). const idf = Math.log((N - df + 0.5) / (df + 0.5) + 1); const tf = (freq * (K1 + 1)) / (freq + K1 * (1 - B + B * dl / avgLen)); return sum + idf * tf; }, 0); };

coderabbitai

🧹 Nitpick comments (2)

mcp-server/src/holons/holons.test.ts (1)

232-250: ⚡ Quick win

Add a true substring-only hybrid fixture.

This test claims to cover “BM25 matches first, then substring-only matches,” but Line 233 says rope matches both returned docs by BM25 and substring. Add a fixture with one BM25-only match and one substring-only match so the union and intended precedence are actually protected.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mcp-server/src/holons/holons.test.ts` around lines 232 - 250, The test claims
to validate "BM25 matches first, then substring-only matches" but the current
fixture only contains documents that match via both BM25 and substring search,
so the union and precedence logic are not actually being tested. Add at least
two new fixture documents to the test setup: one that matches the query 'rope'
via BM25 scoring only (not substring), and another that matches via substring
only (not BM25). Then add assertions to the test to verify that when running
hybrid mode search on 'rope', the BM25-only match appears before the
substring-only match in the results, confirming the intended precedence
behavior.

mcp-server/src/holons/holon.ts (1)

85-93: ⚡ Quick win

Precompute document frequencies once per query.

bm25Score recomputes df by scanning every document for each term and each holon, making search scale quadratically with corpus size. Since df is query-global, cache it before scoring.

♻️ Proposed refactor

         const docs     = cc.holons.map(h => tokenize(`${h.title} ${h.summary}`));
         const N        = docs.length;
         const avgLen   = docs.reduce((s, d) => s + d.length, 0) / (N || 1);
         const K1 = 1.5, B = 0.75;
+        const dfByTerm = new Map<string, number>();
+        for (const term of new Set(terms)) {
+          dfByTerm.set(term, docs.reduce((count, d) => count + (d.includes(term) ? 1 : 0), 0));
+        }
 
         const bm25Score = (docTokens: string[]) => {
           const dl = docTokens.length;
           return terms.reduce((sum, term) => {
-            const df   = docs.filter(d => d.includes(term)).length;
+            const df   = dfByTerm.get(term) ?? 0;
             if (df === 0) return sum;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mcp-server/src/holons/holon.ts` around lines 85 - 93, The bm25Score function
is recalculating document frequency (df) for each term every time it's invoked
(once per document scored), causing quadratic scaling with corpus size. Since df
is query-global and identical for all documents in the same query, precompute it
once before iterating through documents to score. Create a Map or similar
structure outside the bm25Score function that caches the document frequency for
each term in the current query, then access this precomputed cache inside
bm25Score instead of filtering the docs array repeatedly.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@mcp-server/src/holons/holon.ts`:
- Around line 85-93: The bm25Score function is recalculating document frequency
(df) for each term every time it's invoked (once per document scored), causing
quadratic scaling with corpus size. Since df is query-global and identical for
all documents in the same query, precompute it once before iterating through
documents to score. Create a Map or similar structure outside the bm25Score
function that caches the document frequency for each term in the current query,
then access this precomputed cache inside bm25Score instead of filtering the
docs array repeatedly.

In `@mcp-server/src/holons/holons.test.ts`:
- Around line 232-250: The test claims to validate "BM25 matches first, then
substring-only matches" but the current fixture only contains documents that
match via both BM25 and substring search, so the union and precedence logic are
not actually being tested. Add at least two new fixture documents to the test
setup: one that matches the query 'rope' via BM25 scoring only (not substring),
and another that matches via substring only (not BM25). Then add assertions to
the test to verify that when running hybrid mode search on 'rope', the BM25-only
match appears before the substring-only match in the results, confirming the
intended precedence behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7b705f5a-3b57-4ce2-a53c-ec5003ce68ee

📥 Commits

Reviewing files that changed from the base of the PR and between 0d46d6a and c363efc.

📒 Files selected for processing (4)

mcp-server/src/holons/holon.ts
mcp-server/src/holons/holons.test.ts
mcp-server/src/memory/memory.test.ts
mcp-server/src/memory/memory.ts

gemini-code-assist Bot reviewed Jun 21, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(mcp): correct BM25 IDF, expose search score, harden memory.list#27

fix(mcp): correct BM25 IDF, expose search score, harden memory.list#27
2233admin wants to merge 1 commit into
mainfrom
fix/phase8-bm25-memory-bugs

2233admin commented Jun 21, 2026

Uh oh!

repowise-bot Bot commented Jun 21, 2026

Uh oh!

coderabbitai Bot commented Jun 21, 2026 •

edited

Loading

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 21, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

2233admin commented Jun 21, 2026

fix(mcp): correct BM25 IDF, expose search score, harden memory.list

Summary

Bugs fixed

1. BM25 IDF formula was wrong — mcp-server/src/holons/holon.ts

2. holon.search dropped the score field — mcp-server/src/holons/holon.ts

3. memory.list crashed on non-string values — mcp-server/src/memory/memory.ts

Tests added

Test plan

Out of scope

Risk

Notes for reviewer

Uh oh!

repowise-bot Bot commented Jun 21, 2026

Uh oh!

coderabbitai Bot commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 21, 2026

Choose a reason for hiding this comment

Performance Bottleneck: $O(N^2)$ Complexity in BM25 Search

Solution

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. BM25 IDF formula was wrong — `mcp-server/src/holons/holon.ts`

2. `holon.search` dropped the score field — `mcp-server/src/holons/holon.ts`

3. `memory.list` crashed on non-string values — `mcp-server/src/memory/memory.ts`

coderabbitai Bot commented Jun 21, 2026 •

edited

Loading