Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When cwd is a large monorepo root and path is a small subdirectory, the previous behaviour would create a new index rooted at cwd — scanning and embedding the entire ancestor tree (e.g. 4503 files, 8+ minutes) just to search a subdirectory. With this fix, getOrCreate only uses preferredRoot (cwd) if a DB already exists at that path. Otherwise it falls through to findEffectiveRoot(path), which scopes the index to the actual path being searched. Once a cwd-level index has been built (e.g. via lumen index), subsequent searches will reuse it and benefit from the shared project-wide index.
Within a Claude session the MCP server receives many consecutive search calls. Previously every call re-walked the entire project tree to check if the index was stale — 1-3s of pure filesystem I/O even when nothing had changed (2s for a 4500-file monorepo). Add a lastCheckedAt timestamp to each cacheEntry. ensureIndexed skips EnsureFresh entirely if the index was confirmed fresh within the last 30s. ForceReindex bypasses the TTL. touchChecked updates both the projectDir entry and its effectiveRoot alias so the TTL is consistent regardless of which key the caller uses.
Previously findEffectiveRoot returned path itself when no existing index was found in the ancestry, causing each subdirectory search to create its own isolated index. This meant sibling directories got separate DBs, files were embedded multiple times, and cross-directory searches missed results. Now when no existing DB is found within the git repo boundary, the git root is returned as the effective root. All first searches in a repo share one index at the repo root. The git boundary cap is preserved so ancestor indexes above the repo root are still never adopted.
On macOS, t.TempDir() returns symlink paths (/var/folders/...) while git.RepoRoot() resolves them via EvalSymlinks (/private/var/folders/...). This mismatch caused filepath.Rel(effectiveRoot, input.Path) to produce paths with ".." components, which matched nothing in the database and returned empty search results from subdirectories. Fix by applying filepath.EvalSymlinks to both Path and Cwd in validateSearchInput so all path comparisons operate on resolved paths. Also fix TestE2E_GitRootFallbackSharedIndex: the third assertion searched from apiDir expecting pkg/ results, but pathPrefix filtering correctly restricts results to the searched subdirectory. Changed to search from the repo root to verify the shared index is complete. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a table-driven e2e test that systematically exercises all path topology combinations: plain dir, git root, git subdirectory, sibling subdirectory sharing, cwd fallback, external worktree, internal worktree subdir, and symlink variants. Key assertions: - wantNoSymbols: verifies pathPrefix scoping actively excludes out-of-scope results (previously untested) - wantMinFiles>=2: distinguishes correct full-root indexing from narrow subdir-only indexing - second call in git-subdir-sibling: verifies sibling dirs share git-root index without reindexing Closes the two structural gaps identified in PR #61: symlink path normalization regression and git-repo subdirectory pathPrefix filtering. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two tests (TestGenerateSessionContext_NoIndex, TestHookOutputJSON) called the real generateSessionContext with a non-existent path, which triggered spawnBackgroundIndexer. In a test binary os.Executable() returns the test binary itself, so the spawned process ran all tests with no -test.run filter — including those two — causing exponential process proliferation (fork bomb). Fixed by switching both tests to generateSessionContextInternal with a no-op bgIndexer mock, matching the pattern used by all other hook tests. Also added t.Cleanup(idx.Close) to four getOrCreate tests that created *index.Indexer values (holding open SQLite WAL file handles) without ever closing them. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On SessionStart, spawn a detached 'lumen index <cwd>' process so the first search in a new session doesn't pay the full embed cost. Uses dependency injection (findDonor/bgIndexer callbacks) so the hook is testable without spawning real processes. In getOrCreate, pre-populate the freshness TTL cache entry when the index was stamped recently by background pre-warming, avoiding a redundant merkle walk on the very first search. Propagate seed warnings (sibling copy failures) through getOrCreate and up to SemanticSearchOutput so callers are informed rather than silently degraded. Adds LastIndexedAt() to Indexer to read the last_indexed_at metadata field written after every successful index run. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Key fixes:
1. splitOversizedChunks: account for filepath prefix in embed text
- Checks now use budget = maxChars - (3 + len(filePath) + 1) so the
full "// path\n" + content text stays within the model's context
- Passes the tighter budget to splitChunk and createSubChunks
2. createSubChunks: skip header+overlap when it would exceed budget
- Adds maxChars parameter; only prepends header+overlap lines if the
total still fits, preventing non-convergent re-splitting cycles
- Existing tests updated with maxChars=0 (no limit) to preserve behavior
3. index.go: second splitOversizedChunks pass after mergeUndersizedChunks
- Prevents oversized chunks from surviving after merge expands them
4. Restore MarkdownChunker and re-register .md/.yaml/.json extensions
- Languages were removed in a prior commit; restored chunker and tests
5. Fix findEffectiveRoot git-root fallback for SkipDir paths
- Previously always defaulted to gitRoot; now checks pathCrossesSkipDir
- testdata/ paths (a Go SkipDir convention) now scope to their own dir,
preventing testdata/sample-project from indexing the entire repo
6. e2e: add TTL sleeps to TestE2E_IncrementalIndex
- File changes must wait >1s for the freshness TTL to expire before
re-searching, matching TestE2E_FreshnessTTLSkipsMerkleWalk pattern
7. e2e lang tests: cap LUMEN_MAX_CHUNK_TOKENS=100 for BERT context window
- all-minilm uses 4x denser tokenisation than the 4-chars/token estimate
8. Update all cupaloy snapshots to reflect new chunk boundaries
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
aeneasr
added a commit
that referenced
this pull request
Mar 19, 2026
Adds a table-driven e2e test that systematically exercises all path topology combinations: plain dir, git root, git subdirectory, sibling subdirectory sharing, cwd fallback, external worktree, internal worktree subdir, and symlink variants. Key assertions: - wantNoSymbols: verifies pathPrefix scoping actively excludes out-of-scope results (previously untested) - wantMinFiles>=2: distinguishes correct full-root indexing from narrow subdir-only indexing - second call in git-subdir-sibling: verifies sibling dirs share git-root index without reindexing Closes the two structural gaps identified in PR #61: symlink path normalization regression and git-repo subdirectory pathPrefix filtering. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Test plan
🤖 Generated with Claude Code