Add pipeline verification, freshness tracking, and confidence tiers by JMRussas · Pull Request #2 · JMRussas/mcp-rag

JMRussas · 2026-02-28T19:55:46Z

Summary

Verify command: Index health checks (schema, sources, embeddings, FTS5) + configurable search quality tests from config.json
Stale detection: Two-level check (repo git commits + file SHA-256 hashes), now also checks upstream for unpulled commits and detects embedding model drift
Freshness dashboard: New pipeline.py freshness command — unified report showing index age, embed model consistency, per-source chunk counts, and aggregated issues
Confidence tiers: Search results classified as HIGH/MEDIUM/LOW based on configurable thresholds, with optional low-confidence filtering
Gotcha annotations: Anti-hallucination warnings displayed as CAUTION on search results
Provenance metadata: Stores embed_model, embed_dimensions, indexed_at, and per-repo git commits in index_metadata table
Database sources: db_sources config for indexing from external SQLite databases via SQL queries
Hybrid search: BM25 + vector fusion via Reciprocal Rank Fusion (RRF)
License: Switched to AGPL v3

Test plan

python -m pytest tests/ -q — 94 tests pass
Run python pipeline.py freshness on a RAG with a built index to verify dashboard output
Run python pipeline.py stale to verify upstream check and model drift detection
Rebuild a RAG and confirm embed_model/embed_dimensions appear in index_metadata

Generated by Claude Code · Claude Opus 4.6

Extends the search tool with two config-gated improvements: 1. Hybrid search: Runs BM25 (FTS5) alongside vector search and fuses results via Reciprocal Rank Fusion. Enabled with search.hybrid=true. 2. Reranking: Optional cross-encoder pass (sentence-transformers ONNX) that reranks candidates after retrieval. Enabled with reranker.enabled=true. Both default to false — existing deployments are unaffected. No schema changes required. Also adds tools/sync-forks.sh for fork propagation and source_subdir support in pipeline.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Surface the review-driven development process and test coverage for hiring managers who won't click into commit history. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Runs automatic schema, source coverage, embedding, and FTS5 checks. Supports optional config-driven test queries (semantic + keyword) to catch regressions after reindexing. Syncs to forks via sync-forks.sh. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When Ollama is unreachable, semantic search tests now set a flag and skip remaining search queries, but lookup tests (pure SQLite) continue running. Previously the break statement exited the entire test loop. Also renamed shadowed 'rows' variable to 'lookup_rows' for clarity. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

JMRussas · 2026-02-28T21:28:13Z

Review fixes (3a9fbd6):

Fixed ConnectError break bug — when Ollama is unreachable, semantic search tests now set an ollama_available flag and skip remaining search queries. Lookup tests (pure SQLite) continue running instead of being silently skipped.
Renamed shadowed "rows" variable to "lookup_rows" in the lookup test block for clarity.

Note: This PR still bundles hybrid search/reranker changes with the verify command. Since this repo uses squash merge, the final commit will be clean, but future PRs should keep features separate.

Protect implementation code while keeping it publicly viewable. Anyone who distributes or runs this as a service must open-source their derivative work under AGPL v3 or negotiate a commercial license. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Allows pulling chunks from SQLite databases via config-driven SQL queries, alongside the existing file-based repos. Useful for indexing coding standards or other structured content stored in databases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Store embed_model and embed_dimensions in index_metadata at build time. cmd_stale now detects model drift and checks upstream repos for unpulled commits via git fetch --dry-run. New cmd_freshness command provides a unified dashboard: index age, model consistency, per-source chunk counts, and aggregated issues list. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Search results now show confidence tiers (HIGH/MEDIUM/LOW) based on configurable thresholds. Gotcha annotations display as CAUTION warnings. Add test_provenance.py for stale detection tests. Expand sync-forks list. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- cmd_freshness: use config.get("ollama", {}) to avoid KeyError - Remove dead stale_source_tags variable from cmd_stale - Fix confidence tiers for hybrid search: RRF scores (~0.01) were compared against cosine thresholds (0.85/0.65), making all hybrid results "LOW". Now uses rank-calibrated thresholds for RRF/reranker. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

JMRussas · 2026-03-08T16:05:42Z

Self-review defect fixes (13dcc33):

3 issues found and fixed:

cmd_freshness KeyError (pipeline.py:1161) — config["ollama"] accessed directly instead of config.get("ollama", {}). Would crash on minimal configs. Fixed to match cmd_stale's pattern.
Dead stale_source_tags variable (pipeline.py:996-1030) — Set populated but never read in cmd_stale. Removed.
Confidence tiers broken for hybrid search (server.py:383-400) — RRF scores (~0.008–0.03) were compared against cosine similarity thresholds (0.85/0.65), making every hybrid result "LOW" confidence. Added _rrf_confidence_thresholds() that computes rank-calibrated cutoffs: top-3 = HIGH, top-10 = MEDIUM. The search function now selects thresholds based on whether hybrid/reranker is active.

…tcha args - Remove sentence-transformers from requirements.txt (optional dep, only needed when reranker.enabled=true, already lazy-imported in reranker.py) - Fix upstream detection: compare local HEAD vs remote tracking ref instead of checking stderr from git fetch --dry-run (false positives) - Fix reranker confidence thresholds: reranker scores (1/(rank+1)) are a different range than RRF scores, now use separate calibrated cutoffs - cmd_gotcha: receive chunk_id and gotcha_text as function args instead of reading sys.argv directly, matching all other cmd_ functions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

JMRussas · 2026-03-08T16:21:29Z

Review fix round 2 (ceff8f7):

4 remaining issues fixed:

sentence-transformers removed from requirements.txt — optional dep (~500MB PyTorch/ONNX) only needed when reranker.enabled=true. Already lazy-imported in reranker.py. Now documented as install-on-demand comment.
Upstream detection made robust — was checking git fetch --dry-run stderr for any output (false positives from progress messages). Now does a proper comparison: fetches remote refs, then compares local HEAD vs @{u} (upstream tracking ref).
Reranker confidence thresholds fixed — reranker assigns 1/(rank+1) scores (1.0, 0.5, 0.33...) which is a completely different range than RRF 1/(k+rank+1). Was using RRF thresholds for both. Now: reranker uses 0.25/0.08 (top-3/top-10), RRF uses rank-calibrated, cosine uses config thresholds.
cmd_gotcha no longer reads sys.argv directly — now receives chunk_id and gotcha_text as function parameters, matching the pattern of all other cmd_ functions. Argument parsing stays in main().

JMRussas and others added 4 commits February 28, 2026 01:44

Add Development section to README

3217a50

Surface the review-driven development process and test coverage for hiring managers who won't click into commit history. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

JMRussas and others added 4 commits February 28, 2026 22:50

License: switch to AGPL v3

5bdf4ae

Protect implementation code while keeping it publicly viewable. Anyone who distributes or runs this as a service must open-source their derivative work under AGPL v3 or negotiate a commercial license. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

JMRussas changed the title ~~Add verify command for index health checks~~ Add pipeline verification, freshness tracking, and confidence tiers Mar 8, 2026

JMRussas merged commit a347d2a into main Mar 8, 2026
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pipeline verification, freshness tracking, and confidence tiers#2

Add pipeline verification, freshness tracking, and confidence tiers#2
JMRussas merged 10 commits intomainfrom
feature/pipeline-verify

JMRussas commented Feb 28, 2026 •

edited

Loading

Uh oh!

JMRussas commented Feb 28, 2026

Uh oh!

JMRussas commented Mar 8, 2026

Uh oh!

JMRussas commented Mar 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JMRussas commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

JMRussas commented Feb 28, 2026

Uh oh!

JMRussas commented Mar 8, 2026

Uh oh!

JMRussas commented Mar 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JMRussas commented Feb 28, 2026 •

edited

Loading