Skip to content

Conversation

brendan-kellam
Copy link
Contributor

@brendan-kellam brendan-kellam commented Oct 8, 2025

TLDR: This PR improves search performance for unbounded searches by bounding the TotalMaxMatchCount closely to MaxMatchDisplayCount. Comment contains the details:

// @note: Zoekt has several different ways to limit a given search. The two that
// we care about are `MaxMatchDisplayCount` and `TotalMaxMatchCount`:
// - `MaxMatchDisplayCount` truncates the number of matches AFTER performing
//   a search (specifically, after collating and sorting the results). The number of
//   results returned by the API will be less than or equal to this value.
//
// - `TotalMaxMatchCount` truncates the number of matches DURING a search. The results
//   returned by the API the API can be less than, equal to, or greater than this value.
//   Why greater? Because this value is compared _after_ a given shard has finished
//   being processed, the number of matches returned by the last shard may have exceeded
//   this value.
//
// Let's define two variables:
// - `actualMatchCount` : The number of matches that are returned by the API. This is
//   always less than or equal to `MaxMatchDisplayCount`.
// - `totalMatchCount` : The number of matches that zoekt found before it either
//   1) found all matches or 2) hit the `TotalMaxMatchCount` limit. This number is
//   not bounded and can be less than, equal to, or greater than both `TotalMaxMatchCount`
//   and `MaxMatchDisplayCount`.
//
//
// Our challenge is to determine whether or not the search returned all possible matches/
// (it was exaustive) or if it was truncated. By setting the `TotalMaxMatchCount` to
// `MaxMatchDisplayCount + 1`, we can determine which of these occurred by comparing
// `totalMatchCount` to `MaxMatchDisplayCount`.
//
// if (totalMatchCount ≤ actualMatchCount):
//     Search is EXHAUSTIVE (found all possible matches)
//     Proof: totalMatchCount ≤ MaxMatchDisplayCount < TotalMaxMatchCount
//         Therefore Zoekt stopped naturally, not due to limit
//     
// if (totalMatchCount > actualMatchCount):
//     Search is TRUNCATED (more matches exist)
//     Proof: totalMatchCount > MaxMatchDisplayCount + 1 = TotalMaxMatchCount
//         Therefore Zoekt hit the limit and stopped searching
//

This PR deprecates the following env vars:

  • SHARD_MAX_MATCH_COUNT
  • TOTAL_MAX_MATCH_COUNT
  • ZOEKT_MAX_WALL_TIME_MS

This comment has been minimized.

Copy link

coderabbitai bot commented Oct 8, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bkellam/search_time_improvements

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@brendan-kellam brendan-kellam merged commit c3fae1a into main Oct 8, 2025
10 checks passed
@brendan-kellam brendan-kellam deleted the bkellam/search_time_improvements branch October 8, 2025 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant