fix: Lower resolved_threshold default from 0.8 to 0.0 for dead code benchmarks by jonathanpopham · Pull Request #1 · jonathanpopham/mcpbr

jonathanpopham · 2026-03-25T19:30:22Z

Problem

The resolved_threshold defaulted to 0.8, meaning a task was only marked "Resolved" if both precision AND recall were ≥ 80%. No dead code detection approach hits this bar. Every task in every run showed Resolved: False for both MCP and baseline.

Fix

Lower default from 0.8 to 0.0 in both dead code benchmark classes. Any task with non-zero P and R counts as resolved. Still configurable via config YAML.

Context

See supermodeltools/supermodel-public-api#676 for full benchmark analysis.

…enchmarks The 80% precision AND recall gate meant every task showed "Resolved: False" for both MCP and baseline agents. No dead code detection approach achieves 80% on both metrics simultaneously — precision is bounded by parser import resolution gaps (see supermodeltools/supermodel-public-api#677). Setting to 0.0 means any task with >0% on both P and R counts as resolved. The threshold is still configurable via config YAML for stricter gating.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Lower resolved_threshold default from 0.8 to 0.0 for dead code benchmarks#1

fix: Lower resolved_threshold default from 0.8 to 0.0 for dead code benchmarks#1
jonathanpopham wants to merge 1 commit intofeat/supermodel-benchmarkfrom
fix/resolved-threshold-dead-code

jonathanpopham commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jonathanpopham commented Mar 25, 2026

Problem

Fix

Context

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant