Benchmark Report: 4 repos tested (2026-03-30)

## Mutation Test Benchmark Report

**Run date:** 2026-03-30T07:54:12Z
**Total wall time:** 3m43s (parallel execution)
**Repos tested:** 4

### Results Summary

| Repo | PR | Build | Complexity | Mode | Duration |
|------|-----|-------|------------|------|----------|
| spring-projects/spring-petclinic | #2310 | gradle | medium | mutation | 0m47s |
| thombergs/buckpal | #46 | gradle | small | mutation | 1m06s |
| google/gson | #3000 | maven | large | mutation | 0m56s |
| iluwatar/java-design-patterns | #3454 | maven | large | mutation | 0m54s |

### Detailed Output


<details>
<summary>spring-projects/spring-petclinic PR#2310</summary>

```
l repo path — using mock 3-stage filter results
  ✓ [11/12] 3-stage filter (Build → Pass original → Fail mutant)
    ├── 4 accepted, 0 failed
    └── Mutation score: 50%
  ✓ [12/12] Generating report

╭────────────────────────────────── Results ───────────────────────────────────╮
│        Coverage:  0% → 0% (+0.0%)                                            │
│ Tests generated:  0 (in 0 files)                                             │
│      Iterations:  1/3                                                        │
│         Results:  0 passed, 0 failed ✓                                       │
╰──────────────────────────────────────────────────────────────────────────────╯

⚠ Suggestions:
  • Coverage below 80% — consider adding integration tests for remaining paths
  • 4 mutant(s) still surviving — consider manual review of surviving cases
  • Mutation score below 80% — consider adding more targeted tests for edge 
cases
  • 4 new killing test(s) added — commit these to prevent future regression

╭────────────────────────── Mutation Testing Results ──────────────────────────╮
│        Mutation score:  50% (4/8 mutants killed)                             │
│    Killed by existing:  0                                                    │
│   Killed by new tests:  4                                                    │
│       Still surviving:  4                                                    │
│ Equivalent (filtered):  0                                                    │
╰──────────────────────────────────────────────────────────────────────────────╯
                    Surviving Mutants — Not Caught by Tests                     
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ID     ┃ Type               ┃ Description                                    ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ M001   │ wrong_return       │ Wrong return: adds an exclamation mark,        │
│        │                    │ changing the meaning of the welcome message.   │
│ M002   │ missing_boundary   │ Missing boundary: adds style that hides the    │
│        │                    │ service box, making it inaccessible.           │
│ M003   │ wrong_return       │ Wrong return: changes the button text from     │
│        │                    │ 'Explore' to 'View', altering the user intent. │
│ M004   │ wrong_condition    │ Wrong condition: reduces font size, impacting  │
│        │                    │ visibility and UI aesthetics.                  │
│ M005   │ wrong_return       │ Wrong return: removes padding, leading to a    │
│        │                    │ cramped layout.                                │
│ M006   │ wrong_return       │ Wrong return: eliminates top margin for        │
│        │                    │ service boxes, affecting spacing.              │
└────────┴────────────────────┴────────────────────────────────────────────────┘

Report generated using 5 tool calls.

```
</details>


<details>
<summary>thombergs/buckpal PR#46</summary>

```
 mutation reflexion loop...
  ✓ [11/12] Generating killing tests (1 tests)
WARNING:agent_forge.tools.runners.gradle:Gradle test returned exit code 1
WARNING:agent_forge.tools.runners.gradle:  BUILD FAILED in 3s
WARNING:agent_forge.tools.runners.gradle:Test results directory not found: /home/runner/.agent-forge/work/repos/buckpal/build/test-results/test
WARNING:agent_forge.tools.runners.gradle:Gradle test returned exit code 1
WARNING:agent_forge.tools.runners.gradle:  BUILD FAILED in 3s
WARNING:agent_forge.tools.runners.gradle:Test results directory not found: /home/runner/.agent-forge/work/repos/buckpal/build/test-results/test
  ✓ [11/12] 3-stage filter (Build → Pass original → Fail mutant)
    ├── 0 accepted, 1 failed
    └── Entering mutation reflexion loop...
  ✓ [12/12] Generating report

╭────────────────────────────────── Results ───────────────────────────────────╮
│        Coverage:  0% → 0% (+0.0%)                                            │
│ Tests generated:  2 (in 1 files)                                             │
│      Iterations:  1/3                                                        │
│         Results:  0 passed, 0 failed ✓                                       │
╰──────────────────────────────────────────────────────────────────────────────╯
Generated Test Files
└── src/test/java/io/reflectoring/buckpal/application/domain/service/SendMoneySe
    rviceTest.java

⚠ Suggestions:
  • Coverage below 80% — consider adding integration tests for remaining paths
  • 2 mutant(s) still surviving — consider manual review of surviving cases
  • Mutation score below 80% — consider adding more targeted tests for edge 
cases

╭────────────────────────── Mutation Testing Results ──────────────────────────╮
│        Mutation score:  0% (0/2 mutants killed)                              │
│    Killed by existing:  0                                                    │
│   Killed by new tests:  0                                                    │
│       Still surviving:  2                                                    │
│ Equivalent (filtered):  1                                                    │
╰──────────────────────────────────────────────────────────────────────────────╯
                    Surviving Mutants — Not Caught by Tests                     
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ID     ┃ Type               ┃ Description                                    ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ M001   │ missing_boundary   │ Missing boundary: public access modifier was   │
│        │                    │ removed, potentially affecting visibility.     │
│ M002   │ wrong_return       │ Wrong return: class declaration was altered,   │
│        │                    │ changing intended interface implementation.    │
└────────┴────────────────────┴────────────────────────────────────────────────┘

Report generated using 5 tool calls.

```
</details>


<details>
<summary>google/gson PR#3000</summary>

```
9/12] Filtering equivalent mutants (3 remain, 0 removed)
WARNING:agent_forge.engine.nodes.mutation_runner:No local repo path — using mock mutation run results
  ✓ [10/12] Running tests against mutants
    ├── 0 killed, 3 survived, 0 build failures
    └── 3 surviving mutant(s) need killing tests
  ✓ [11/12] Generating killing tests (1 tests)
WARNING:agent_forge.engine.nodes.killing_test_runner:No local repo path — using mock 3-stage filter results
  ✓ [11/12] 3-stage filter (Build → Pass original → Fail mutant)
    ├── 1 accepted, 0 failed
    └── Mutation score: 33%
  ✓ [12/12] Generating report

╭────────────────────────────────── Results ───────────────────────────────────╮
│        Coverage:  0% → 0% (+0.0%)                                            │
│ Tests generated:  3 (in 1 files)                                             │
│      Iterations:  1/3                                                        │
│         Results:  3 passed, 0 failed ✓                                       │
╰──────────────────────────────────────────────────────────────────────────────╯
Generated Test Files
└── src/test/java/com/google/gson/stream/JsonReaderTest.java

⚠ Suggestions:
  • Coverage below 80% — consider adding integration tests for remaining paths
  • 2 mutant(s) still surviving — consider manual review of surviving cases
  • Mutation score below 80% — consider adding more targeted tests for edge 
cases
  • 1 new killing test(s) added — commit these to prevent future regression

╭────────────────────────── Mutation Testing Results ──────────────────────────╮
│        Mutation score:  33% (1/3 mutants killed)                             │
│    Killed by existing:  0                                                    │
│   Killed by new tests:  1                                                    │
│       Still surviving:  2                                                    │
│ Equivalent (filtered):  0                                                    │
╰──────────────────────────────────────────────────────────────────────────────╯
                    Surviving Mutants — Not Caught by Tests                     
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ID     ┃ Type               ┃ Description                                    ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ M001   │ wrong_operator     │ Wrong operator: decrements success count       │
│        │                    │ instead of incrementing                        │
│ M002   │ wrong_return       │ Wrong return value: returns 1.0 (100% success) │
│        │                    │ instead of 0.0 when no transactions            │
│ M003   │ wrong_operator     │ Missing increment: total count never           │
│        │                    │ increases, breaking success rate calculation   │
└────────┴────────────────────┴────────────────────────────────────────────────┘

Report generated using 5 tool calls.

```
</details>


<details>
<summary>iluwatar/java-design-patterns PR#3454</summary>

```
adle wrapper found — cannot run tests
  ✓ [6/12] Running tests
    ├── 2 passed, 0 failed
    └── All tests passing
  ✓ [8/12] Generating mutations (8 mutants)
  ✓ [9/12] Filtering equivalent mutants (8 remain, 0 removed)
WARNING:agent_forge.engine.nodes.mutation_runner:No local repo path — using mock mutation run results
  ✓ [10/12] Running tests against mutants
    ├── 0 killed, 8 survived, 0 build failures
    └── 8 surviving mutant(s) need killing tests
  ✓ [11/12] Generating killing tests (6 tests)
WARNING:agent_forge.engine.nodes.killing_test_runner:No local repo path — using mock 3-stage filter results
  ✓ [11/12] 3-stage filter (Build → Pass original → Fail mutant)
    ├── 6 accepted, 0 failed
    └── Mutation score: 75%
  ✓ [12/12] Generating report

╭────────────────────────────────── Results ───────────────────────────────────╮
│        Coverage:  0% → 0% (+0.0%)                                            │
│ Tests generated:  2 (in 1 files)                                             │
│      Iterations:  1/3                                                        │
│         Results:  2 passed, 0 failed ✓                                       │
╰──────────────────────────────────────────────────────────────────────────────╯
Generated Test Files
└── src/test/java/com/iluwatar/sessionserver/AppTest.java

⚠ Suggestions:
  • Coverage below 80% — consider adding integration tests for remaining paths
  • 2 mutant(s) still surviving — consider manual review of surviving cases
  • Mutation score below 80% — consider adding more targeted tests for edge 
cases
  • 6 new killing test(s) added — commit these to prevent future regression

╭────────────────────────── Mutation Testing Results ──────────────────────────╮
│        Mutation score:  75% (6/8 mutants killed)                             │
│    Killed by existing:  0                                                    │
│   Killed by new tests:  6                                                    │
│       Still surviving:  2                                                    │
│ Equivalent (filtered):  0                                                    │
╰──────────────────────────────────────────────────────────────────────────────╯
                    Surviving Mutants — Not Caught by Tests                     
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ID     ┃ Type               ┃ Description                                    ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ M007   │ wrong_return       │ Wrong return: describes the pattern as         │
│        │                    │ creating mutable objects instead of immutable. │
│ M008   │ wrong_return       │ Wrong return: incorrectly states that the      │
│        │                    │ object's state can be modified.                │
└────────┴────────────────────┴────────────────────────────────────────────────┘

Report generated using 5 tool calls.

```
</details>


### Next Steps

- [ ] Review surviving mutants for test improvement opportunities
- [ ] Check killing test compilation success rate
- [ ] Compare mutation scores across repos
- [ ] Update prompt templates if pattern failures detected

---
*Auto-generated by agent-forge benchmark pipeline*


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark Report: 4 repos tested (2026-03-30) #2

Mutation Test Benchmark Report

Results Summary

Detailed Output

Next Steps

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Repo	PR	Build	Complexity	Mode	Duration
spring-projects/spring-petclinic	#2310	gradle	medium	mutation	0m47s
thombergs/buckpal	#46	gradle	small	mutation	1m06s
google/gson	#3000	maven	large	mutation	0m56s
iluwatar/java-design-patterns	#3454	maven	large	mutation	0m54s

Benchmark Report: 4 repos tested (2026-03-30) #2

Description

Mutation Test Benchmark Report

Results Summary

Detailed Output

Next Steps

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions