Skip to content

Add pipeline verification, freshness tracking, and confidence tiers#2

Merged
JMRussas merged 10 commits intomainfrom
feature/pipeline-verify
Mar 8, 2026
Merged

Add pipeline verification, freshness tracking, and confidence tiers#2
JMRussas merged 10 commits intomainfrom
feature/pipeline-verify

Conversation

@JMRussas
Copy link
Owner

@JMRussas JMRussas commented Feb 28, 2026

Summary

  • Verify command: Index health checks (schema, sources, embeddings, FTS5) + configurable search quality tests from config.json
  • Stale detection: Two-level check (repo git commits + file SHA-256 hashes), now also checks upstream for unpulled commits and detects embedding model drift
  • Freshness dashboard: New pipeline.py freshness command — unified report showing index age, embed model consistency, per-source chunk counts, and aggregated issues
  • Confidence tiers: Search results classified as HIGH/MEDIUM/LOW based on configurable thresholds, with optional low-confidence filtering
  • Gotcha annotations: Anti-hallucination warnings displayed as CAUTION on search results
  • Provenance metadata: Stores embed_model, embed_dimensions, indexed_at, and per-repo git commits in index_metadata table
  • Database sources: db_sources config for indexing from external SQLite databases via SQL queries
  • Hybrid search: BM25 + vector fusion via Reciprocal Rank Fusion (RRF)
  • License: Switched to AGPL v3

Test plan

  • python -m pytest tests/ -q — 94 tests pass
  • Run python pipeline.py freshness on a RAG with a built index to verify dashboard output
  • Run python pipeline.py stale to verify upstream check and model drift detection
  • Rebuild a RAG and confirm embed_model/embed_dimensions appear in index_metadata

Generated by Claude Code · Claude Opus 4.6

JMRussas and others added 4 commits February 28, 2026 01:44
Extends the search tool with two config-gated improvements:

1. Hybrid search: Runs BM25 (FTS5) alongside vector search and fuses
   results via Reciprocal Rank Fusion. Enabled with search.hybrid=true.

2. Reranking: Optional cross-encoder pass (sentence-transformers ONNX)
   that reranks candidates after retrieval. Enabled with reranker.enabled=true.

Both default to false — existing deployments are unaffected. No schema
changes required. Also adds tools/sync-forks.sh for fork propagation and
source_subdir support in pipeline.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Surface the review-driven development process and test coverage
for hiring managers who won't click into commit history.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Runs automatic schema, source coverage, embedding, and FTS5 checks.
Supports optional config-driven test queries (semantic + keyword) to
catch regressions after reindexing. Syncs to forks via sync-forks.sh.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When Ollama is unreachable, semantic search tests now set a flag and
skip remaining search queries, but lookup tests (pure SQLite) continue
running. Previously the break statement exited the entire test loop.

Also renamed shadowed 'rows' variable to 'lookup_rows' for clarity.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@JMRussas
Copy link
Owner Author

Review fixes (3a9fbd6):

  1. Fixed ConnectError break bug — when Ollama is unreachable, semantic search tests now set an ollama_available flag and skip remaining search queries. Lookup tests (pure SQLite) continue running instead of being silently skipped.

  2. Renamed shadowed "rows" variable to "lookup_rows" in the lookup test block for clarity.

Note: This PR still bundles hybrid search/reranker changes with the verify command. Since this repo uses squash merge, the final commit will be clean, but future PRs should keep features separate.

JMRussas and others added 4 commits February 28, 2026 22:50
Protect implementation code while keeping it publicly viewable.
Anyone who distributes or runs this as a service must open-source
their derivative work under AGPL v3 or negotiate a commercial license.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Allows pulling chunks from SQLite databases via config-driven SQL queries,
alongside the existing file-based repos. Useful for indexing coding standards
or other structured content stored in databases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Store embed_model and embed_dimensions in index_metadata at build time.
cmd_stale now detects model drift and checks upstream repos for unpulled
commits via git fetch --dry-run. New cmd_freshness command provides a
unified dashboard: index age, model consistency, per-source chunk counts,
and aggregated issues list.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Search results now show confidence tiers (HIGH/MEDIUM/LOW) based on
configurable thresholds. Gotcha annotations display as CAUTION warnings.
Add test_provenance.py for stale detection tests. Expand sync-forks list.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@JMRussas JMRussas changed the title Add verify command for index health checks Add pipeline verification, freshness tracking, and confidence tiers Mar 8, 2026
- cmd_freshness: use config.get("ollama", {}) to avoid KeyError
- Remove dead stale_source_tags variable from cmd_stale
- Fix confidence tiers for hybrid search: RRF scores (~0.01) were
  compared against cosine thresholds (0.85/0.65), making all hybrid
  results "LOW". Now uses rank-calibrated thresholds for RRF/reranker.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@JMRussas
Copy link
Owner Author

JMRussas commented Mar 8, 2026

Self-review defect fixes (13dcc33):

3 issues found and fixed:

  1. cmd_freshness KeyError (pipeline.py:1161) — config["ollama"] accessed directly instead of config.get("ollama", {}). Would crash on minimal configs. Fixed to match cmd_stale's pattern.

  2. Dead stale_source_tags variable (pipeline.py:996-1030) — Set populated but never read in cmd_stale. Removed.

  3. Confidence tiers broken for hybrid search (server.py:383-400) — RRF scores (~0.008–0.03) were compared against cosine similarity thresholds (0.85/0.65), making every hybrid result "LOW" confidence. Added _rrf_confidence_thresholds() that computes rank-calibrated cutoffs: top-3 = HIGH, top-10 = MEDIUM. The search function now selects thresholds based on whether hybrid/reranker is active.

…tcha args

- Remove sentence-transformers from requirements.txt (optional dep, only
  needed when reranker.enabled=true, already lazy-imported in reranker.py)
- Fix upstream detection: compare local HEAD vs remote tracking ref
  instead of checking stderr from git fetch --dry-run (false positives)
- Fix reranker confidence thresholds: reranker scores (1/(rank+1)) are a
  different range than RRF scores, now use separate calibrated cutoffs
- cmd_gotcha: receive chunk_id and gotcha_text as function args instead
  of reading sys.argv directly, matching all other cmd_ functions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@JMRussas
Copy link
Owner Author

JMRussas commented Mar 8, 2026

Review fix round 2 (ceff8f7):

4 remaining issues fixed:

  1. sentence-transformers removed from requirements.txt — optional dep (~500MB PyTorch/ONNX) only needed when reranker.enabled=true. Already lazy-imported in reranker.py. Now documented as install-on-demand comment.

  2. Upstream detection made robust — was checking git fetch --dry-run stderr for any output (false positives from progress messages). Now does a proper comparison: fetches remote refs, then compares local HEAD vs @{u} (upstream tracking ref).

  3. Reranker confidence thresholds fixed — reranker assigns 1/(rank+1) scores (1.0, 0.5, 0.33...) which is a completely different range than RRF 1/(k+rank+1). Was using RRF thresholds for both. Now: reranker uses 0.25/0.08 (top-3/top-10), RRF uses rank-calibrated, cosine uses config thresholds.

  4. cmd_gotcha no longer reads sys.argv directly — now receives chunk_id and gotcha_text as function parameters, matching the pattern of all other cmd_ functions. Argument parsing stays in main().

@JMRussas JMRussas merged commit a347d2a into main Mar 8, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant