Skip to content

chore(suspect flags): Include filtered flag in output #95007

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 8, 2025

Conversation

aayush-se
Copy link
Member

@aayush-se aayush-se commented Jul 7, 2025

  • Performs a filtering step prior to RRF
    • Normalizes KL and Entropy scores with Box Cox transform then takes scores which have a z-score >= threshold
      • Threshold is currently 1.5 but can be adjusted if over/under filtering on RRF
  • Exposes if the flag has been filtered out in the JSON response as a boolean such that all flags are still visible on the frontend if required

TODO:

  • Update frontend to use these filtered scores when sorting by Heuristic + RRF or RRF

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Jul 7, 2025
@aayush-se aayush-se marked this pull request as ready for review July 7, 2025 23:06
@aayush-se aayush-se requested review from a team as code owners July 7, 2025 23:06
@aayush-se aayush-se requested review from trillville and ram-senth July 7, 2025 23:06
cursor[bot]

This comment was marked as outdated.

Copy link

codecov bot commented Jul 7, 2025

Codecov Report

Attention: Patch coverage is 97.26027% with 4 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/sentry/seer/math.py 92.30% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #95007       +/-   ##
===========================================
+ Coverage   38.17%   87.82%   +49.65%     
===========================================
  Files        9858    10436      +578     
  Lines      556058   604299    +48241     
  Branches    23550    23550               
===========================================
+ Hits       212265   530718   +318453     
+ Misses     343426    73214   -270212     
  Partials      367      367               

cursor[bot]

This comment was marked as outdated.

@aayush-se aayush-se marked this pull request as draft July 7, 2025 23:56
@aayush-se aayush-se marked this pull request as ready for review July 8, 2025 00:39
cursor[bot]

This comment was marked as outdated.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Box-Cox Lambda Calculation Inconsistency

The boxcox_transform function calculates the optimal lambda parameter using the original values via _boxcox_normmax, even when it internally shifts non-positive values to shifted_values for the actual transformation. This leads to an inconsistency where the optimal lambda is determined for a different dataset than what is ultimately transformed, potentially yielding suboptimal results. Furthermore, _boxcox_normmax duplicates the shifting logic, which is inefficient and brittle.

src/sentry/seer/math.py#L109-L129

shifted_values = values
if lambda_param is not None:
if lambda_param == 0.0:
transformed = [math.log(max(v, 1e-10)) for v in shifted_values]
else:
transformed = [
(pow(max(v, 1e-10), lambda_param) - 1) / lambda_param for v in shifted_values
]
return transformed, lambda_param
optimal_lambda = _boxcox_normmax(values)
if optimal_lambda == 0.0:
transformed = [math.log(max(v, 1e-10)) for v in shifted_values]
else:
transformed = [
(pow(max(v, 1e-10), optimal_lambda) - 1) / optimal_lambda for v in shifted_values
]
return transformed, optimal_lambda

Fix in CursorFix in Web


Was this report helpful? Give feedback by reacting with 👍 or 👎

]
return transformed, lambda_param

optimal_lambda = _boxcox_normmax(values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be _boxcox_normmax(shifted_values)?


optimal_lambda = _boxcox_normmax(values)

if optimal_lambda == 0.0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor one - looks like you can avoid code duplication by first initializing lambda_param as lambda_param = _boxcox_normmax(values) if lambda_param is not None else lambda_param.

@aayush-se aayush-se merged commit eb87fa0 into master Jul 8, 2025
64 of 65 checks passed
@aayush-se aayush-se deleted the suspect-flags/rrf-filtering branch July 8, 2025 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants