Skip to content

fix: Lower resolved_threshold default from 0.8 to 0.0 for dead code benchmarks#1

Draft
jonathanpopham wants to merge 1 commit intofeat/supermodel-benchmarkfrom
fix/resolved-threshold-dead-code
Draft

fix: Lower resolved_threshold default from 0.8 to 0.0 for dead code benchmarks#1
jonathanpopham wants to merge 1 commit intofeat/supermodel-benchmarkfrom
fix/resolved-threshold-dead-code

Conversation

@jonathanpopham
Copy link
Copy Markdown
Owner

Problem

The resolved_threshold defaulted to 0.8, meaning a task was only marked "Resolved" if both precision AND recall were ≥ 80%. No dead code detection approach hits this bar. Every task in every run showed Resolved: False for both MCP and baseline.

Fix

Lower default from 0.8 to 0.0 in both dead code benchmark classes. Any task with non-zero P and R counts as resolved. Still configurable via config YAML.

Context

See supermodeltools/supermodel-public-api#676 for full benchmark analysis.

…enchmarks

The 80% precision AND recall gate meant every task showed "Resolved: False"
for both MCP and baseline agents. No dead code detection approach achieves
80% on both metrics simultaneously — precision is bounded by parser import
resolution gaps (see supermodeltools/supermodel-public-api#677).

Setting to 0.0 means any task with >0% on both P and R counts as resolved.
The threshold is still configurable via config YAML for stricter gating.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant