Context
The agentgrep grep -v PATTERN text-output path is currently a
silent no-op: argparse accepts the flag, the command runs, and
the user gets back the matching records (not the inverted set).
The acknowledgement comment lives in
src/agentgrep/cli/render.py (the print_grep_results early
branch). Only grep -c -v and grep -L -v honor inversion today
because they collapse to a "did anything match?" question that
the eager record list can answer.
In the upcoming commit that lands with this issue, -v outside
-c/-L will be hard-rejected with exit 2 so we no longer
silently return wrong output. This issue tracks the real
implementation.
Implementation paths
v2 — consumer-layer enumeration. Add a helper that calls
discover_sources() + re-reads each source's records, then
filters at the CLI consumer to lines that don't match the
pattern. The engine surface stays unchanged; the CLI does the
inversion. Cost: roughly 2x I/O for -v queries because every
source is read regardless of match.
v3 — engine include_non_matching mode. Add an opt-in flag
to iter_search_events so it emits RecordEmitted for every
record (matching or not) with a matched: bool field, and the
CLI consumer filters. Pros: single pass over the stores; engine
becomes the single source of truth for the universal candidate
set. Cons: touches the pydantic event union and every consumer
that reads it.
Decisions to make
- Should
-v invert at the line level (rg parity — emits
non-matching lines from records that have at least one match
too) or at the record level (whole records that don't
match)? The rest of agentgrep treats records as atomic units,
so record-level may be the more consistent default.
- Should
-v change the streaming live-output mode, or stay
eager-only? rg's -v streams; per-record inversion can stream;
per-line inversion needs the full record buffer either way.
Related
- The interim "refuse outside
-c/-L" commit:
(PR / commit hash to back-fill)
- TODO comment at
src/agentgrep/cli/render.py in
print_grep_results documenting the v1 simplification.
Context
The
agentgrep grep -v PATTERNtext-output path is currently asilent no-op: argparse accepts the flag, the command runs, and
the user gets back the matching records (not the inverted set).
The acknowledgement comment lives in
src/agentgrep/cli/render.py(theprint_grep_resultsearlybranch). Only
grep -c -vandgrep -L -vhonor inversion todaybecause they collapse to a "did anything match?" question that
the eager record list can answer.
In the upcoming commit that lands with this issue,
-voutside-c/-Lwill be hard-rejected with exit 2 so we no longersilently return wrong output. This issue tracks the real
implementation.
Implementation paths
v2 — consumer-layer enumeration. Add a helper that calls
discover_sources()+ re-reads each source's records, thenfilters at the CLI consumer to lines that don't match the
pattern. The engine surface stays unchanged; the CLI does the
inversion. Cost: roughly 2x I/O for
-vqueries because everysource is read regardless of match.
v3 — engine
include_non_matchingmode. Add an opt-in flagto
iter_search_eventsso it emitsRecordEmittedfor everyrecord (matching or not) with a
matched: boolfield, and theCLI consumer filters. Pros: single pass over the stores; engine
becomes the single source of truth for the universal candidate
set. Cons: touches the pydantic event union and every consumer
that reads it.
Decisions to make
-vinvert at the line level (rg parity — emitsnon-matching lines from records that have at least one match
too) or at the record level (whole records that don't
match)? The rest of agentgrep treats records as atomic units,
so record-level may be the more consistent default.
-vchange the streaming live-output mode, or stayeager-only? rg's
-vstreams; per-record inversion can stream;per-line inversion needs the full record buffer either way.
Related
-c/-L" commit:(PR / commit hash to back-fill)
src/agentgrep/cli/render.pyinprint_grep_resultsdocumenting the v1 simplification.