Add pipeline verification, freshness tracking, and confidence tiers#2
Add pipeline verification, freshness tracking, and confidence tiers#2
Conversation
Extends the search tool with two config-gated improvements: 1. Hybrid search: Runs BM25 (FTS5) alongside vector search and fuses results via Reciprocal Rank Fusion. Enabled with search.hybrid=true. 2. Reranking: Optional cross-encoder pass (sentence-transformers ONNX) that reranks candidates after retrieval. Enabled with reranker.enabled=true. Both default to false — existing deployments are unaffected. No schema changes required. Also adds tools/sync-forks.sh for fork propagation and source_subdir support in pipeline.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Surface the review-driven development process and test coverage for hiring managers who won't click into commit history. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Runs automatic schema, source coverage, embedding, and FTS5 checks. Supports optional config-driven test queries (semantic + keyword) to catch regressions after reindexing. Syncs to forks via sync-forks.sh. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When Ollama is unreachable, semantic search tests now set a flag and skip remaining search queries, but lookup tests (pure SQLite) continue running. Previously the break statement exited the entire test loop. Also renamed shadowed 'rows' variable to 'lookup_rows' for clarity. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Review fixes (3a9fbd6):
Note: This PR still bundles hybrid search/reranker changes with the verify command. Since this repo uses squash merge, the final commit will be clean, but future PRs should keep features separate. |
Protect implementation code while keeping it publicly viewable. Anyone who distributes or runs this as a service must open-source their derivative work under AGPL v3 or negotiate a commercial license. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Allows pulling chunks from SQLite databases via config-driven SQL queries, alongside the existing file-based repos. Useful for indexing coding standards or other structured content stored in databases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Store embed_model and embed_dimensions in index_metadata at build time. cmd_stale now detects model drift and checks upstream repos for unpulled commits via git fetch --dry-run. New cmd_freshness command provides a unified dashboard: index age, model consistency, per-source chunk counts, and aggregated issues list. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Search results now show confidence tiers (HIGH/MEDIUM/LOW) based on configurable thresholds. Gotcha annotations display as CAUTION warnings. Add test_provenance.py for stale detection tests. Expand sync-forks list. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- cmd_freshness: use config.get("ollama", {}) to avoid KeyError
- Remove dead stale_source_tags variable from cmd_stale
- Fix confidence tiers for hybrid search: RRF scores (~0.01) were
compared against cosine thresholds (0.85/0.65), making all hybrid
results "LOW". Now uses rank-calibrated thresholds for RRF/reranker.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Self-review defect fixes (13dcc33): 3 issues found and fixed:
|
…tcha args - Remove sentence-transformers from requirements.txt (optional dep, only needed when reranker.enabled=true, already lazy-imported in reranker.py) - Fix upstream detection: compare local HEAD vs remote tracking ref instead of checking stderr from git fetch --dry-run (false positives) - Fix reranker confidence thresholds: reranker scores (1/(rank+1)) are a different range than RRF scores, now use separate calibrated cutoffs - cmd_gotcha: receive chunk_id and gotcha_text as function args instead of reading sys.argv directly, matching all other cmd_ functions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Review fix round 2 (ceff8f7): 4 remaining issues fixed:
|
Summary
pipeline.py freshnesscommand — unified report showing index age, embed model consistency, per-source chunk counts, and aggregated issuesembed_model,embed_dimensions,indexed_at, and per-repo git commits in index_metadata tabledb_sourcesconfig for indexing from external SQLite databases via SQL queriesTest plan
python -m pytest tests/ -q— 94 tests passpython pipeline.py freshnesson a RAG with a built index to verify dashboard outputpython pipeline.py staleto verify upstream check and model drift detectionembed_model/embed_dimensionsappear inindex_metadataGenerated by Claude Code · Claude Opus 4.6