Skip to content

search: WRatio scoring is meaningless for short queries against long records #22

@tony

Description

@tony

Problem

`rapidfuzz.fuzz.WRatio("libtmux", <10K-char transcript>)` returns ~60 for every record. The score doesn't differentiate results because WRatio is designed for comparing strings of similar length, not a 7-char needle against a 10K-char haystack.

All results show the same score, making `--threshold` useless and the score display meaningless.

Expected

A scoring strategy that works for record-level search:

  • Substring presence + match density (how many times the term appears)
  • Position weighting (match in first line vs buried at line 500)
  • Term coverage for multi-term AND queries
  • Or: use `rapidfuzz.fuzz.partial_ratio` which is designed for substring matching in longer strings

Evidence

```
agentgrep search --limit 5 --threshold 80 libtmux
No matches found.
```

Every record scores ~60, so threshold 80 filters everything.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions