Skip to content

Reintroduce search with relevance ranking and session grouping (#17)#20

Merged
tony merged 23 commits into
masterfrom
search-ranking
May 25, 2026
Merged

Reintroduce search with relevance ranking and session grouping (#17)#20
tony merged 23 commits into
masterfrom
search-ranking

Conversation

@tony
Copy link
Copy Markdown
Owner

@tony tony commented May 24, 2026

Summary

  • Reintroduce search as a ranked, progress-aware alternative to grep
  • Results scored by rapidfuzz.fuzz.partial_ratio, sorted best-first
  • Session grouping with [session ...] headers
  • Pretty snippet-first output with amber highlights and dim provenance
  • Progress spinner with Enter-to-answer-now during collection
  • Flags: --threshold N, --no-rank, --no-group
  • Field-only queries work (agent:codex without text terms)
  • JSON/NDJSON output includes score and group_session_id fields
  • New src/agentgrep/ranking.py module

Closes #17

Test plan

  • uv run pytest --reruns 0 passes
  • uv run ruff check . clean
  • uv run ty check clean
  • just build-docs builds
  • agentgrep search libtmux — progress spinner, then ranked results with session grouping
  • agentgrep search --no-rank libtmux — discovery order, no scoring
  • agentgrep search --threshold 80 libtmux — filters low scores
  • agentgrep search agent:codex — field-only query works
  • Press Enter during long search — instant partial results
  • agentgrep search libtmux --json — JSON with score fields

tony added 7 commits May 24, 2026 12:57
…#17)

why: search returns with genuine differentiation from grep —
rapidfuzz relevance ranking, near-duplicate collapsing, and
session grouping.

what:
- Add SearchArgs with threshold, no_group, no_rank fields
- Register search subparser with ranking-specific flags
- Add SEARCH_DESCRIPTION and main() dispatch
- Add parse tests
…rouping

why: search needs to score results by relevance (best match first),
collapse near-duplicates (WRatio > 90), and group by session for
a coherent browsing experience.

what:
- Add ranking.py with rank_search_records (WRatio scoring + sort)
- Add collapse_near_duplicates (pairwise similarity, keep representative)
- Add group_by_session (OrderedDict grouping by session_id)
- Add parametrized tests for all three functions
… output

why: Complete the search command by connecting the ranking engine
to the CLI with progress feedback and pretty-style output.

what:
- Add run_search_command with eager collection + progress + ranking pipeline
- Add _print_search_text with score display and similar-count indicators
- Add _print_search_json for structured output with scores
- Wire dispatch in main() and re-export from __init__
- Add integration tests
… guard

why: collapse_near_duplicates runs pairwise WRatio between all
records — O(n²) with expensive C calls. It was called
unconditionally even with --no-rank, hanging on large result
sets. Users who pass --no-rank explicitly want fast unranked
output.

what:
- Skip collapse_near_duplicates entirely when --no-rank is set;
  emit records with score=0, similar_count=0
- Add size guard in collapse_near_duplicates: if len(scored) > 500,
  skip pairwise comparison and return records as-is
- Move rank + collapse imports inside the else branch (lazy load
  only when ranking is active)
… mix

why: grep and find both reject mixing --agent with agent: inline
predicates (via _grep_explicit_flags / _find_explicit_flags). The
reintroduced search subparser was missing this validation,
silently accepting nonsensical queries like
`agentgrep search --agent codex agent:claude bliss`.

what:
- Add _search_explicit_flags() mapping --agent and --type flags
- Pass explicit_flags to _maybe_compile_query in _build_search_args
- Parse-time error now raised on flag/field conflicts
why: --threshold only takes effect inside rank_search_records,
which is skipped when --no-rank is set. Silently accepting both
flags misleads the user into thinking their threshold filter is
active.

what:
- Add parse-time error when both --no-rank and --threshold > 0
- Split all-ranking-flags test into two valid cases
why: search subcommand was reintroduced but CLI_DESCRIPTION only
listed grep/fuzzy/find/ui.

what:
- Add search description to the CLI help intro text
@tony
Copy link
Copy Markdown
Owner Author

tony commented May 24, 2026

Needs changes

Two places where features silently disable themselves instead of doing their job:

1. collapse_near_duplicates silently turns off at 500 records

return []
if len(scored) > 500:
return [(r, s, 0) for r, s in scored]

The whole point of this function is pairwise dedup. At 500+ records it returns everything uncollapsed with similar_count=0 — the user asked for dedup and silently gets none.

The O(n^2) concern is valid but the fix should be a better algorithm, not a silent feature toggle. rapidfuzz.process.cdist is purpose-built for batch pairwise comparison with a C backend — it handles thousands of items. Alternatively, warn on stderr that dedup was skipped due to result set size so the user knows.

2. --no-rank silently disables dedup

if args.no_rank:
scored: list[tuple[agentgrep.SearchRecord, float]] = [(r, 0.0) for r in records]
collapsed: list[tuple[agentgrep.SearchRecord, float, int]] = [(r, 0.0, 0) for r in records]

--no-rank means "don't score/sort by relevance." It should not also mean "skip near-duplicate collapsing." These are independent features — a user who wants discovery-order results but still wants duplicates collapsed can't get that. The coupling exists to dodge the O(n^2) cost, not because ranking and dedup are conceptually linked.

Both issues share the same root: the pairwise comparison is too slow, so the code routes around it. Fix the algorithm (use cdist) and the workarounds become unnecessary.

@tony
Copy link
Copy Markdown
Owner Author

tony commented May 24, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

Verified the fixes from the earlier review comment are in place: the size guard is removed from collapse_near_duplicates and --no-rank no longer silently skips dedup (both addressed in e97cc9b). The answered_early path correctly bypasses both ranking and collapse without re-coupling them. The dead branch in _iter_jsonl's text-mode loop is gone (905552b).

🤖 Generated with Claude Code

tony added 9 commits May 24, 2026 18:36
why: run_search_command created a SearchControl but never wired
up the AnswerNowInputListener thread, so pressing Enter during a
long search had no effect and the progress hint was hidden.

what:
- Wire AnswerNowInputListener with start/stop around run_search_query
- Set answer_now_hint based on TTY detection (stdin + stderr)
- Wrap run_search_query in try/finally to ensure listener.stop()
why: Large Codex and Claude-style JSONL sources can spend seconds inside parsing work before any deduped result is emitted, which leaves the CLI progress line looking frozen. Huge Codex tool-output records make this worse because they can hold the GIL while producing no searchable prompt record.

what:
- Add optional in-source progress updates with cooperative parser yields while preserving final deduped result semantics.
- Show source detail in CLI and TUI progress snapshots alongside source counters.
- Skip large Codex function_call_output lines before JSON decoding, discarding them cooperatively because they cannot produce prompt records.
- Cover progress callbacks, JSONL yielding, raw tool-output skipping, and progress-line formatting in tests.
why: Showing in-source progress made the live TTY status line long enough to wrap on narrow terminals. The renderer only clears one terminal row with carriage-return plus clear-line, so wrapped renders leave stale rows behind and look like a flood.
what:
- Make TTY progress rendering terminal-width aware, dropping optional detail and the answer-now hint before ANSI-safe truncation.
- Add a regression test for narrow terminal rendering.
- Preserve full detail formatting for callers without a width constraint.
why: The search CLI accepted malformed regex terms until matching reached Python's regex engine, producing a traceback after scanning started. Query-language type predicates also kept the default prompt-only coarse search filter, so history records were discarded before the compiled predicate could evaluate.
what:
- Validate `search --regex` terms at parse time with argparse-shaped errors.
- Track compiled query fields so `type:` predicates broaden the coarse search filter when `--type` was not explicit.
- Treat explicit default `--type` values as flag/field collisions across search, grep, and find.
- Add regression coverage for invalid search regexes, type predicate routing, and explicit default collisions.
… errors

why: Validation errors for --limit and --max-count called the
root parser's .error(), showing `usage: agentgrep [-h] ...`
instead of the subcommand's usage hint.

what:
- find --limit: bundle.parser → bundle.find_parser
- search --limit: bundle.parser → bundle.search_parser
- grep --max-count: bundle.parser → bundle.grep_parser
why: The early return at the top of _iter_jsonl dispatches to
_iter_jsonl_with_raw_skip when skip_line is set, making the
inline `if skip_line is not None` check unreachable.

what:
- Remove the dead branch from the text-mode iteration path
…no-rank

why: collapse_near_duplicates silently turned itself off at 500
records, and --no-rank silently skipped dedup. Both hacks avoided
the O(n²) cost instead of letting the C-accelerated WRatio calls
do their job. Ranking and dedup are independent features — a user
who wants discovery-order results should still get dedup.

what:
- Remove the 500-record size guard from collapse_near_duplicates
- Always run collapse_near_duplicates regardless of --no-rank
- Fix docstring: "above" → "at or above" for >= threshold
why: Docstring described scoring/collapse/grouping as unconditional
but --no-rank skips scoring and --no-group skips grouping.

what:
- Note --no-rank and --no-group bypass paths in the docstring
why: Function-level docstring was fixed to match >= semantics
but module docstring still said "above" (implying >).

what:
- Change "records above" to "records at or above" in module
  docstring to match the >= comparison in the implementation
tony added 4 commits May 24, 2026 18:48
why: `assert code in (0, 1)` is always true. The canned records
score 90 against "bliss" so threshold=99 always filters all of
them — code is deterministically 1.

what:
- Assert code == 1 and empty stdout directly
- Remove narration comments
why: `agentgrep search agent:codex` raised SystemExit even though
a compiled field query existed. The guard only checked for empty
terms, not for a compiled query. Additionally, field-only queries
produce empty query_text which makes WRatio return 0 for
everything — ranking is skipped in that case.

what:
- Check args.compiled before rejecting empty terms
- Skip ranking when query_text is empty (field-only query)
- Add test for field-only query parsing and execution
why: When the user pressed Enter for partial results, the
"Answering now: N matches" message appeared but then the CLI
hung for minutes running rank_search_records (O(n) WRatio calls)
and collapse_near_duplicates (O(n²) pairwise) on potentially
thousands of partial results — defeating the purpose of
answering now.

what:
- Check control.answer_now_requested() after collection returns
- Skip both ranking and collapse when answering early — emit
  records in discovery order with score=0, similar_count=0
- Collapse still runs normally for --no-rank (only answer-now
  bypasses it, preserving the earlier decoupling)
why: The parser guard rejecting --threshold with --no-rank had
no test verifying the error fires.

what:
- Add test_search_threshold_with_no_rank_rejected asserting
  SystemExit code 2 and error message mentioning both flags
tony added 2 commits May 24, 2026 21:58
why: collapse_near_duplicates ran O(n²) pairwise WRatio on the
full result set (~612M comparisons for 35K records), hanging the
CLI indefinitely. The engine already does exact dedup via
hash-based record_dedupe_key. Both grep and the TUI stream
results without pairwise dedup and work at scale.

what:
- Rewrite run_search_command to stream via iter_search_events,
  scoring each record inline with WRatio as it arrives (O(n))
- Remove collapse_near_duplicates from the pipeline entirely
- Text mode streams with session headers and per-record scores
- JSON/NDJSON stays eager for envelope integrity but skips
  collapse — ranking + grouping only
- Pass args.limit to SearchQuery so the engine caps early
- Apply post-ranking limit in eager path for JSON accuracy
- Update tests: remove similar_count assertions, fix
  monkeypatching for streaming vs eager paths
why: Without `readme = "README.md"` in [project], hatchling does not
include the README in package metadata, so the PyPI page is blank.

what:
- Add `readme = "README.md"` to [project] table
why: search was removed (#19) then reintroduced (#20) in the same
release cycle — the net change is that search gained ranking, not
that it was removed. Replace the stale breaking-change entry with
the shipped feature.

what:
- Remove "Remove search subcommand" breaking change (branch-internal)
- Add What's new entry for ranked search with session grouping
@tony tony changed the title Reintroduce search with rapidfuzz ranking, dedup, and session grouping (#17) Reintroduce search with relevance ranking and session grouping (#17) May 25, 2026
@tony tony merged commit c28bd5d into master May 25, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reintroduce search with dedup, grouping, and relevance ranking

1 participant