-
Notifications
You must be signed in to change notification settings - Fork 0
Benchmark Report: 4 repos tested (2026-03-30) #2
Copy link
Copy link
Open
Description
Mutation Test Benchmark Report
Run date: 2026-03-30T07:54:12Z
Total wall time: 3m43s (parallel execution)
Repos tested: 4
Results Summary
| Repo | PR | Build | Complexity | Mode | Duration |
|---|---|---|---|---|---|
| spring-projects/spring-petclinic | #2310 | gradle | medium | mutation | 0m47s |
| thombergs/buckpal | #46 | gradle | small | mutation | 1m06s |
| google/gson | #3000 | maven | large | mutation | 0m56s |
| iluwatar/java-design-patterns | #3454 | maven | large | mutation | 0m54s |
Detailed Output
spring-projects/spring-petclinic PR#2310
l repo path — using mock 3-stage filter results
✓ [11/12] 3-stage filter (Build → Pass original → Fail mutant)
├── 4 accepted, 0 failed
└── Mutation score: 50%
✓ [12/12] Generating report
╭────────────────────────────────── Results ───────────────────────────────────╮
│ Coverage: 0% → 0% (+0.0%) │
│ Tests generated: 0 (in 0 files) │
│ Iterations: 1/3 │
│ Results: 0 passed, 0 failed ✓ │
╰──────────────────────────────────────────────────────────────────────────────╯
⚠ Suggestions:
• Coverage below 80% — consider adding integration tests for remaining paths
• 4 mutant(s) still surviving — consider manual review of surviving cases
• Mutation score below 80% — consider adding more targeted tests for edge
cases
• 4 new killing test(s) added — commit these to prevent future regression
╭────────────────────────── Mutation Testing Results ──────────────────────────╮
│ Mutation score: 50% (4/8 mutants killed) │
│ Killed by existing: 0 │
│ Killed by new tests: 4 │
│ Still surviving: 4 │
│ Equivalent (filtered): 0 │
╰──────────────────────────────────────────────────────────────────────────────╯
Surviving Mutants — Not Caught by Tests
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ID ┃ Type ┃ Description ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ M001 │ wrong_return │ Wrong return: adds an exclamation mark, │
│ │ │ changing the meaning of the welcome message. │
│ M002 │ missing_boundary │ Missing boundary: adds style that hides the │
│ │ │ service box, making it inaccessible. │
│ M003 │ wrong_return │ Wrong return: changes the button text from │
│ │ │ 'Explore' to 'View', altering the user intent. │
│ M004 │ wrong_condition │ Wrong condition: reduces font size, impacting │
│ │ │ visibility and UI aesthetics. │
│ M005 │ wrong_return │ Wrong return: removes padding, leading to a │
│ │ │ cramped layout. │
│ M006 │ wrong_return │ Wrong return: eliminates top margin for │
│ │ │ service boxes, affecting spacing. │
└────────┴────────────────────┴────────────────────────────────────────────────┘
Report generated using 5 tool calls.
thombergs/buckpal PR#46
mutation reflexion loop...
✓ [11/12] Generating killing tests (1 tests)
WARNING:agent_forge.tools.runners.gradle:Gradle test returned exit code 1
WARNING:agent_forge.tools.runners.gradle: BUILD FAILED in 3s
WARNING:agent_forge.tools.runners.gradle:Test results directory not found: /home/runner/.agent-forge/work/repos/buckpal/build/test-results/test
WARNING:agent_forge.tools.runners.gradle:Gradle test returned exit code 1
WARNING:agent_forge.tools.runners.gradle: BUILD FAILED in 3s
WARNING:agent_forge.tools.runners.gradle:Test results directory not found: /home/runner/.agent-forge/work/repos/buckpal/build/test-results/test
✓ [11/12] 3-stage filter (Build → Pass original → Fail mutant)
├── 0 accepted, 1 failed
└── Entering mutation reflexion loop...
✓ [12/12] Generating report
╭────────────────────────────────── Results ───────────────────────────────────╮
│ Coverage: 0% → 0% (+0.0%) │
│ Tests generated: 2 (in 1 files) │
│ Iterations: 1/3 │
│ Results: 0 passed, 0 failed ✓ │
╰──────────────────────────────────────────────────────────────────────────────╯
Generated Test Files
└── src/test/java/io/reflectoring/buckpal/application/domain/service/SendMoneySe
rviceTest.java
⚠ Suggestions:
• Coverage below 80% — consider adding integration tests for remaining paths
• 2 mutant(s) still surviving — consider manual review of surviving cases
• Mutation score below 80% — consider adding more targeted tests for edge
cases
╭────────────────────────── Mutation Testing Results ──────────────────────────╮
│ Mutation score: 0% (0/2 mutants killed) │
│ Killed by existing: 0 │
│ Killed by new tests: 0 │
│ Still surviving: 2 │
│ Equivalent (filtered): 1 │
╰──────────────────────────────────────────────────────────────────────────────╯
Surviving Mutants — Not Caught by Tests
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ID ┃ Type ┃ Description ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ M001 │ missing_boundary │ Missing boundary: public access modifier was │
│ │ │ removed, potentially affecting visibility. │
│ M002 │ wrong_return │ Wrong return: class declaration was altered, │
│ │ │ changing intended interface implementation. │
└────────┴────────────────────┴────────────────────────────────────────────────┘
Report generated using 5 tool calls.
google/gson PR#3000
9/12] Filtering equivalent mutants (3 remain, 0 removed)
WARNING:agent_forge.engine.nodes.mutation_runner:No local repo path — using mock mutation run results
✓ [10/12] Running tests against mutants
├── 0 killed, 3 survived, 0 build failures
└── 3 surviving mutant(s) need killing tests
✓ [11/12] Generating killing tests (1 tests)
WARNING:agent_forge.engine.nodes.killing_test_runner:No local repo path — using mock 3-stage filter results
✓ [11/12] 3-stage filter (Build → Pass original → Fail mutant)
├── 1 accepted, 0 failed
└── Mutation score: 33%
✓ [12/12] Generating report
╭────────────────────────────────── Results ───────────────────────────────────╮
│ Coverage: 0% → 0% (+0.0%) │
│ Tests generated: 3 (in 1 files) │
│ Iterations: 1/3 │
│ Results: 3 passed, 0 failed ✓ │
╰──────────────────────────────────────────────────────────────────────────────╯
Generated Test Files
└── src/test/java/com/google/gson/stream/JsonReaderTest.java
⚠ Suggestions:
• Coverage below 80% — consider adding integration tests for remaining paths
• 2 mutant(s) still surviving — consider manual review of surviving cases
• Mutation score below 80% — consider adding more targeted tests for edge
cases
• 1 new killing test(s) added — commit these to prevent future regression
╭────────────────────────── Mutation Testing Results ──────────────────────────╮
│ Mutation score: 33% (1/3 mutants killed) │
│ Killed by existing: 0 │
│ Killed by new tests: 1 │
│ Still surviving: 2 │
│ Equivalent (filtered): 0 │
╰──────────────────────────────────────────────────────────────────────────────╯
Surviving Mutants — Not Caught by Tests
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ID ┃ Type ┃ Description ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ M001 │ wrong_operator │ Wrong operator: decrements success count │
│ │ │ instead of incrementing │
│ M002 │ wrong_return │ Wrong return value: returns 1.0 (100% success) │
│ │ │ instead of 0.0 when no transactions │
│ M003 │ wrong_operator │ Missing increment: total count never │
│ │ │ increases, breaking success rate calculation │
└────────┴────────────────────┴────────────────────────────────────────────────┘
Report generated using 5 tool calls.
iluwatar/java-design-patterns PR#3454
adle wrapper found — cannot run tests
✓ [6/12] Running tests
├── 2 passed, 0 failed
└── All tests passing
✓ [8/12] Generating mutations (8 mutants)
✓ [9/12] Filtering equivalent mutants (8 remain, 0 removed)
WARNING:agent_forge.engine.nodes.mutation_runner:No local repo path — using mock mutation run results
✓ [10/12] Running tests against mutants
├── 0 killed, 8 survived, 0 build failures
└── 8 surviving mutant(s) need killing tests
✓ [11/12] Generating killing tests (6 tests)
WARNING:agent_forge.engine.nodes.killing_test_runner:No local repo path — using mock 3-stage filter results
✓ [11/12] 3-stage filter (Build → Pass original → Fail mutant)
├── 6 accepted, 0 failed
└── Mutation score: 75%
✓ [12/12] Generating report
╭────────────────────────────────── Results ───────────────────────────────────╮
│ Coverage: 0% → 0% (+0.0%) │
│ Tests generated: 2 (in 1 files) │
│ Iterations: 1/3 │
│ Results: 2 passed, 0 failed ✓ │
╰──────────────────────────────────────────────────────────────────────────────╯
Generated Test Files
└── src/test/java/com/iluwatar/sessionserver/AppTest.java
⚠ Suggestions:
• Coverage below 80% — consider adding integration tests for remaining paths
• 2 mutant(s) still surviving — consider manual review of surviving cases
• Mutation score below 80% — consider adding more targeted tests for edge
cases
• 6 new killing test(s) added — commit these to prevent future regression
╭────────────────────────── Mutation Testing Results ──────────────────────────╮
│ Mutation score: 75% (6/8 mutants killed) │
│ Killed by existing: 0 │
│ Killed by new tests: 6 │
│ Still surviving: 2 │
│ Equivalent (filtered): 0 │
╰──────────────────────────────────────────────────────────────────────────────╯
Surviving Mutants — Not Caught by Tests
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ID ┃ Type ┃ Description ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ M007 │ wrong_return │ Wrong return: describes the pattern as │
│ │ │ creating mutable objects instead of immutable. │
│ M008 │ wrong_return │ Wrong return: incorrectly states that the │
│ │ │ object's state can be modified. │
└────────┴────────────────────┴────────────────────────────────────────────────┘
Report generated using 5 tool calls.
Next Steps
- Review surviving mutants for test improvement opportunities
- Check killing test compilation success rate
- Compare mutation scores across repos
- Update prompt templates if pattern failures detected
Auto-generated by agent-forge benchmark pipeline
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels