Reintroduce `search` with relevance ranking and session grouping (#17) by tony · Pull Request #20 · tony/agentgrep

tony · 2026-05-24T14:23:07Z

Summary

Reintroduce search as a ranked, progress-aware alternative to grep
Results scored by rapidfuzz.fuzz.partial_ratio, sorted best-first
Session grouping with [session ...] headers
Pretty snippet-first output with amber highlights and dim provenance
Progress spinner with Enter-to-answer-now during collection
Flags: --threshold N, --no-rank, --no-group
Field-only queries work (agent:codex without text terms)
JSON/NDJSON output includes score and group_session_id fields
New src/agentgrep/ranking.py module

Closes #17

Test plan

…#17) why: search returns with genuine differentiation from grep — rapidfuzz relevance ranking, near-duplicate collapsing, and session grouping. what: - Add SearchArgs with threshold, no_group, no_rank fields - Register search subparser with ranking-specific flags - Add SEARCH_DESCRIPTION and main() dispatch - Add parse tests

…rouping why: search needs to score results by relevance (best match first), collapse near-duplicates (WRatio > 90), and group by session for a coherent browsing experience. what: - Add ranking.py with rank_search_records (WRatio scoring + sort) - Add collapse_near_duplicates (pairwise similarity, keep representative) - Add group_by_session (OrderedDict grouping by session_id) - Add parametrized tests for all three functions

… output why: Complete the search command by connecting the ranking engine to the CLI with progress feedback and pretty-style output. what: - Add run_search_command with eager collection + progress + ranking pipeline - Add _print_search_text with score display and similar-count indicators - Add _print_search_json for structured output with scores - Wire dispatch in main() and re-export from __init__ - Add integration tests

… guard why: collapse_near_duplicates runs pairwise WRatio between all records — O(n²) with expensive C calls. It was called unconditionally even with --no-rank, hanging on large result sets. Users who pass --no-rank explicitly want fast unranked output. what: - Skip collapse_near_duplicates entirely when --no-rank is set; emit records with score=0, similar_count=0 - Add size guard in collapse_near_duplicates: if len(scored) > 500, skip pairwise comparison and return records as-is - Move rank + collapse imports inside the else branch (lazy load only when ranking is active)

… mix why: grep and find both reject mixing --agent with agent: inline predicates (via _grep_explicit_flags / _find_explicit_flags). The reintroduced search subparser was missing this validation, silently accepting nonsensical queries like `agentgrep search --agent codex agent:claude bliss`. what: - Add _search_explicit_flags() mapping --agent and --type flags - Pass explicit_flags to _maybe_compile_query in _build_search_args - Parse-time error now raised on flag/field conflicts

why: --threshold only takes effect inside rank_search_records, which is skipped when --no-rank is set. Silently accepting both flags misleads the user into thinking their threshold filter is active. what: - Add parse-time error when both --no-rank and --threshold > 0 - Split all-ranking-flags test into two valid cases

why: search subcommand was reintroduced but CLI_DESCRIPTION only listed grep/fuzzy/find/ui. what: - Add search description to the CLI help intro text

tony · 2026-05-24T18:10:02Z

Needs changes

Two places where features silently disable themselves instead of doing their job:

1. collapse_near_duplicates silently turns off at 500 records

agentgrep/src/agentgrep/ranking.py

Lines 92 to 94 in 9a4840e

    
               return [] 
        
           if len(scored) > 500: 
        
               return [(r, s, 0) for r, s in scored]

The whole point of this function is pairwise dedup. At 500+ records it returns everything uncollapsed with similar_count=0 — the user asked for dedup and silently gets none.

The O(n^2) concern is valid but the fix should be a better algorithm, not a silent feature toggle. rapidfuzz.process.cdist is purpose-built for batch pairwise comparison with a C backend — it handles thousands of items. Alternatively, warn on stderr that dedup was skipped due to result set size so the user knows.

2. --no-rank silently disables dedup

agentgrep/src/agentgrep/cli/render.py

Lines 493 to 495 in 19bb35d

    
           if args.no_rank: 
        
               scored: list[tuple[agentgrep.SearchRecord, float]] = [(r, 0.0) for r in records] 
        
               collapsed: list[tuple[agentgrep.SearchRecord, float, int]] = [(r, 0.0, 0) for r in records]

--no-rank means "don't score/sort by relevance." It should not also mean "skip near-duplicate collapsing." These are independent features — a user who wants discovery-order results but still wants duplicates collapsed can't get that. The coupling exists to dodge the O(n^2) cost, not because ranking and dedup are conceptually linked.

Both issues share the same root: the pairwise comparison is too slow, so the code routes around it. Fix the algorithm (use cdist) and the workarounds become unnecessary.

tony · 2026-05-24T21:59:22Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

Verified the fixes from the earlier review comment are in place: the size guard is removed from collapse_near_duplicates and --no-rank no longer silently skips dedup (both addressed in e97cc9b). The answered_early path correctly bypasses both ranking and collapse without re-coupling them. The dead branch in _iter_jsonl's text-mode loop is gone (905552b).

🤖 Generated with Claude Code

why: run_search_command created a SearchControl but never wired up the AnswerNowInputListener thread, so pressing Enter during a long search had no effect and the progress hint was hidden. what: - Wire AnswerNowInputListener with start/stop around run_search_query - Set answer_now_hint based on TTY detection (stdin + stderr) - Wrap run_search_query in try/finally to ensure listener.stop()

why: Large Codex and Claude-style JSONL sources can spend seconds inside parsing work before any deduped result is emitted, which leaves the CLI progress line looking frozen. Huge Codex tool-output records make this worse because they can hold the GIL while producing no searchable prompt record. what: - Add optional in-source progress updates with cooperative parser yields while preserving final deduped result semantics. - Show source detail in CLI and TUI progress snapshots alongside source counters. - Skip large Codex function_call_output lines before JSON decoding, discarding them cooperatively because they cannot produce prompt records. - Cover progress callbacks, JSONL yielding, raw tool-output skipping, and progress-line formatting in tests.

why: Showing in-source progress made the live TTY status line long enough to wrap on narrow terminals. The renderer only clears one terminal row with carriage-return plus clear-line, so wrapped renders leave stale rows behind and look like a flood. what: - Make TTY progress rendering terminal-width aware, dropping optional detail and the answer-now hint before ANSI-safe truncation. - Add a regression test for narrow terminal rendering. - Preserve full detail formatting for callers without a width constraint.

why: The search CLI accepted malformed regex terms until matching reached Python's regex engine, producing a traceback after scanning started. Query-language type predicates also kept the default prompt-only coarse search filter, so history records were discarded before the compiled predicate could evaluate. what: - Validate `search --regex` terms at parse time with argparse-shaped errors. - Track compiled query fields so `type:` predicates broaden the coarse search filter when `--type` was not explicit. - Treat explicit default `--type` values as flag/field collisions across search, grep, and find. - Add regression coverage for invalid search regexes, type predicate routing, and explicit default collisions.

… errors why: Validation errors for --limit and --max-count called the root parser's .error(), showing `usage: agentgrep [-h] ...` instead of the subcommand's usage hint. what: - find --limit: bundle.parser → bundle.find_parser - search --limit: bundle.parser → bundle.search_parser - grep --max-count: bundle.parser → bundle.grep_parser

why: The early return at the top of _iter_jsonl dispatches to _iter_jsonl_with_raw_skip when skip_line is set, making the inline `if skip_line is not None` check unreachable. what: - Remove the dead branch from the text-mode iteration path

…no-rank why: collapse_near_duplicates silently turned itself off at 500 records, and --no-rank silently skipped dedup. Both hacks avoided the O(n²) cost instead of letting the C-accelerated WRatio calls do their job. Ranking and dedup are independent features — a user who wants discovery-order results should still get dedup. what: - Remove the 500-record size guard from collapse_near_duplicates - Always run collapse_near_duplicates regardless of --no-rank - Fix docstring: "above" → "at or above" for >= threshold

why: Docstring described scoring/collapse/grouping as unconditional but --no-rank skips scoring and --no-group skips grouping. what: - Note --no-rank and --no-group bypass paths in the docstring

why: Function-level docstring was fixed to match >= semantics but module docstring still said "above" (implying >). what: - Change "records above" to "records at or above" in module docstring to match the >= comparison in the implementation

why: `assert code in (0, 1)` is always true. The canned records score 90 against "bliss" so threshold=99 always filters all of them — code is deterministically 1. what: - Assert code == 1 and empty stdout directly - Remove narration comments

why: `agentgrep search agent:codex` raised SystemExit even though a compiled field query existed. The guard only checked for empty terms, not for a compiled query. Additionally, field-only queries produce empty query_text which makes WRatio return 0 for everything — ranking is skipped in that case. what: - Check args.compiled before rejecting empty terms - Skip ranking when query_text is empty (field-only query) - Add test for field-only query parsing and execution

why: When the user pressed Enter for partial results, the "Answering now: N matches" message appeared but then the CLI hung for minutes running rank_search_records (O(n) WRatio calls) and collapse_near_duplicates (O(n²) pairwise) on potentially thousands of partial results — defeating the purpose of answering now. what: - Check control.answer_now_requested() after collection returns - Skip both ranking and collapse when answering early — emit records in discovery order with score=0, similar_count=0 - Collapse still runs normally for --no-rank (only answer-now bypasses it, preserving the earlier decoupling)

why: The parser guard rejecting --threshold with --no-rank had no test verifying the error fires. what: - Add test_search_threshold_with_no_rank_rejected asserting SystemExit code 2 and error message mentioning both flags

why: collapse_near_duplicates ran O(n²) pairwise WRatio on the full result set (~612M comparisons for 35K records), hanging the CLI indefinitely. The engine already does exact dedup via hash-based record_dedupe_key. Both grep and the TUI stream results without pairwise dedup and work at scale. what: - Rewrite run_search_command to stream via iter_search_events, scoring each record inline with WRatio as it arrives (O(n)) - Remove collapse_near_duplicates from the pipeline entirely - Text mode streams with session headers and per-record scores - JSON/NDJSON stays eager for envelope integrity but skips collapse — ranking + grouping only - Pass args.limit to SearchQuery so the engine caps early - Apply post-ranking limit in eager path for JSON accuracy - Update tests: remove similar_count assertions, fix monkeypatching for streaming vs eager paths

why: Without `readme = "README.md"` in [project], hatchling does not include the README in package metadata, so the PyPI page is blank. what: - Add `readme = "README.md"` to [project] table

why: search was removed (#19) then reintroduced (#20) in the same release cycle — the net change is that search gained ranking, not that it was removed. Replace the stale breaking-change entry with the shipped feature. what: - Remove "Remove search subcommand" breaking change (branch-internal) - Add What's new entry for ranked search with session grouping

tony temporarily deployed to docs May 24, 2026 14:23 — with GitHub Actions Inactive

tony temporarily deployed to docs May 24, 2026 15:00 — with GitHub Actions Inactive

tony temporarily deployed to docs May 24, 2026 15:01 — with GitHub Actions Inactive

tony temporarily deployed to docs May 24, 2026 15:29 — with GitHub Actions Inactive

Base automatically changed from remove-search to master May 24, 2026 15:58

tony force-pushed the search-ranking branch from 8b7a18f to be5e2f8 Compare May 24, 2026 16:02

tony temporarily deployed to docs May 24, 2026 16:02 — with GitHub Actions Inactive

tony force-pushed the search-ranking branch from be5e2f8 to 74a6db2 Compare May 24, 2026 17:10

tony temporarily deployed to docs May 24, 2026 17:10 — with GitHub Actions Inactive

tony temporarily deployed to docs May 24, 2026 17:13 — with GitHub Actions Inactive

tony temporarily deployed to docs May 24, 2026 17:49 — with GitHub Actions Inactive

tony added 7 commits May 24, 2026 12:57

agentgrep(docs[cli]): Add search to CLI_DESCRIPTION

2cb5a81

why: search subcommand was reintroduced but CLI_DESCRIPTION only listed grep/fuzzy/find/ui. what: - Add search description to the CLI help intro text

tony force-pushed the search-ranking branch from a159835 to 19bb35d Compare May 24, 2026 17:57

tony temporarily deployed to docs May 24, 2026 17:57 — with GitHub Actions Inactive

tony temporarily deployed to docs May 24, 2026 18:43 — with GitHub Actions Inactive

tony force-pushed the search-ranking branch from 98a2a8e to 19bb35d Compare May 24, 2026 19:12

tony temporarily deployed to docs May 24, 2026 19:12 — with GitHub Actions Inactive

tony temporarily deployed to docs May 24, 2026 19:48 — with GitHub Actions Inactive

tony temporarily deployed to docs May 24, 2026 19:58 — with GitHub Actions Inactive

tony temporarily deployed to docs May 24, 2026 20:31 — with GitHub Actions Inactive

tony temporarily deployed to docs May 24, 2026 20:59 — with GitHub Actions Inactive

tony temporarily deployed to docs May 24, 2026 21:35 — with GitHub Actions Inactive

tony temporarily deployed to docs May 24, 2026 21:38 — with GitHub Actions Inactive

tony temporarily deployed to docs May 24, 2026 22:48 — with GitHub Actions Inactive

tony added 9 commits May 24, 2026 18:36

agentgrep(docs[render]): Accurate run_search_command docstring

438047d

why: Docstring described scoring/collapse/grouping as unconditional but --no-rank skips scoring and --no-group skips grouping. what: - Note --no-rank and --no-group bypass paths in the docstring

tony force-pushed the search-ranking branch from 9d8a2f3 to 523e778 Compare May 24, 2026 23:37

tony temporarily deployed to docs May 24, 2026 23:37 — with GitHub Actions Inactive

tony added 4 commits May 24, 2026 18:48

agentgrep(test[parser]): Cover --threshold + --no-rank rejection

8f871cf

why: The parser guard rejecting --threshold with --no-rank had no test verifying the error fires. what: - Add test_search_threshold_with_no_rank_rejected asserting SystemExit code 2 and error message mentioning both flags

tony force-pushed the search-ranking branch from 523e778 to 8f871cf Compare May 24, 2026 23:51

tony temporarily deployed to docs May 24, 2026 23:51 — with GitHub Actions Inactive

tony temporarily deployed to docs May 25, 2026 00:39 — with GitHub Actions Inactive

tony temporarily deployed to docs May 25, 2026 02:14 — with GitHub Actions Inactive

tony temporarily deployed to docs May 25, 2026 02:56 — with GitHub Actions Inactive

tony added 2 commits May 24, 2026 21:58

agentgrep(fix[packaging]): Add readme field to project metadata

682cc7e

why: Without `readme = "README.md"` in [project], hatchling does not include the README in package metadata, so the PyPI page is blank. what: - Add `readme = "README.md"` to [project] table

tony force-pushed the search-ranking branch from aae5abe to 682cc7e Compare May 25, 2026 02:58

tony temporarily deployed to docs May 25, 2026 02:58 — with GitHub Actions Inactive

tony temporarily deployed to docs May 25, 2026 03:09 — with GitHub Actions Inactive

tony changed the title ~~Reintroduce search with rapidfuzz ranking, dedup, and session grouping (#17)~~ Reintroduce search with relevance ranking and session grouping (#17) May 25, 2026

tony merged commit c28bd5d into master May 25, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reintroduce `search` with relevance ranking and session grouping (#17)#20

Reintroduce `search` with relevance ranking and session grouping (#17)#20
tony merged 23 commits into
masterfrom
search-ranking

tony commented May 24, 2026 •

edited

Loading

Uh oh!

tony commented May 24, 2026

Uh oh!

tony commented May 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tony commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

tony commented May 24, 2026

Needs changes

Uh oh!

tony commented May 24, 2026

Code review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tony commented May 24, 2026 •

edited

Loading