Skip to content

Benchmark Report: 4 repos tested (2026-03-30) #2

@github-actions

Description

@github-actions

Mutation Test Benchmark Report

Run date: 2026-03-30T07:54:12Z
Total wall time: 3m43s (parallel execution)
Repos tested: 4

Results Summary

Repo PR Build Complexity Mode Duration
spring-projects/spring-petclinic #2310 gradle medium mutation 0m47s
thombergs/buckpal #46 gradle small mutation 1m06s
google/gson #3000 maven large mutation 0m56s
iluwatar/java-design-patterns #3454 maven large mutation 0m54s

Detailed Output

spring-projects/spring-petclinic PR#2310
l repo path — using mock 3-stage filter results
  ✓ [11/12] 3-stage filter (Build → Pass original → Fail mutant)
    ├── 4 accepted, 0 failed
    └── Mutation score: 50%
  ✓ [12/12] Generating report

╭────────────────────────────────── Results ───────────────────────────────────╮
│        Coverage:  0% → 0% (+0.0%)                                            │
│ Tests generated:  0 (in 0 files)                                             │
│      Iterations:  1/3                                                        │
│         Results:  0 passed, 0 failed ✓                                       │
╰──────────────────────────────────────────────────────────────────────────────╯

⚠ Suggestions:
  • Coverage below 80% — consider adding integration tests for remaining paths
  • 4 mutant(s) still surviving — consider manual review of surviving cases
  • Mutation score below 80% — consider adding more targeted tests for edge 
cases
  • 4 new killing test(s) added — commit these to prevent future regression

╭────────────────────────── Mutation Testing Results ──────────────────────────╮
│        Mutation score:  50% (4/8 mutants killed)                             │
│    Killed by existing:  0                                                    │
│   Killed by new tests:  4                                                    │
│       Still surviving:  4                                                    │
│ Equivalent (filtered):  0                                                    │
╰──────────────────────────────────────────────────────────────────────────────╯
                    Surviving Mutants — Not Caught by Tests                     
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ID     ┃ Type               ┃ Description                                    ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ M001   │ wrong_return       │ Wrong return: adds an exclamation mark,        │
│        │                    │ changing the meaning of the welcome message.   │
│ M002   │ missing_boundary   │ Missing boundary: adds style that hides the    │
│        │                    │ service box, making it inaccessible.           │
│ M003   │ wrong_return       │ Wrong return: changes the button text from     │
│        │                    │ 'Explore' to 'View', altering the user intent. │
│ M004   │ wrong_condition    │ Wrong condition: reduces font size, impacting  │
│        │                    │ visibility and UI aesthetics.                  │
│ M005   │ wrong_return       │ Wrong return: removes padding, leading to a    │
│        │                    │ cramped layout.                                │
│ M006   │ wrong_return       │ Wrong return: eliminates top margin for        │
│        │                    │ service boxes, affecting spacing.              │
└────────┴────────────────────┴────────────────────────────────────────────────┘

Report generated using 5 tool calls.

thombergs/buckpal PR#46
 mutation reflexion loop...
  ✓ [11/12] Generating killing tests (1 tests)
WARNING:agent_forge.tools.runners.gradle:Gradle test returned exit code 1
WARNING:agent_forge.tools.runners.gradle:  BUILD FAILED in 3s
WARNING:agent_forge.tools.runners.gradle:Test results directory not found: /home/runner/.agent-forge/work/repos/buckpal/build/test-results/test
WARNING:agent_forge.tools.runners.gradle:Gradle test returned exit code 1
WARNING:agent_forge.tools.runners.gradle:  BUILD FAILED in 3s
WARNING:agent_forge.tools.runners.gradle:Test results directory not found: /home/runner/.agent-forge/work/repos/buckpal/build/test-results/test
  ✓ [11/12] 3-stage filter (Build → Pass original → Fail mutant)
    ├── 0 accepted, 1 failed
    └── Entering mutation reflexion loop...
  ✓ [12/12] Generating report

╭────────────────────────────────── Results ───────────────────────────────────╮
│        Coverage:  0% → 0% (+0.0%)                                            │
│ Tests generated:  2 (in 1 files)                                             │
│      Iterations:  1/3                                                        │
│         Results:  0 passed, 0 failed ✓                                       │
╰──────────────────────────────────────────────────────────────────────────────╯
Generated Test Files
└── src/test/java/io/reflectoring/buckpal/application/domain/service/SendMoneySe
    rviceTest.java

⚠ Suggestions:
  • Coverage below 80% — consider adding integration tests for remaining paths
  • 2 mutant(s) still surviving — consider manual review of surviving cases
  • Mutation score below 80% — consider adding more targeted tests for edge 
cases

╭────────────────────────── Mutation Testing Results ──────────────────────────╮
│        Mutation score:  0% (0/2 mutants killed)                              │
│    Killed by existing:  0                                                    │
│   Killed by new tests:  0                                                    │
│       Still surviving:  2                                                    │
│ Equivalent (filtered):  1                                                    │
╰──────────────────────────────────────────────────────────────────────────────╯
                    Surviving Mutants — Not Caught by Tests                     
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ID     ┃ Type               ┃ Description                                    ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ M001   │ missing_boundary   │ Missing boundary: public access modifier was   │
│        │                    │ removed, potentially affecting visibility.     │
│ M002   │ wrong_return       │ Wrong return: class declaration was altered,   │
│        │                    │ changing intended interface implementation.    │
└────────┴────────────────────┴────────────────────────────────────────────────┘

Report generated using 5 tool calls.

google/gson PR#3000
9/12] Filtering equivalent mutants (3 remain, 0 removed)
WARNING:agent_forge.engine.nodes.mutation_runner:No local repo path — using mock mutation run results
  ✓ [10/12] Running tests against mutants
    ├── 0 killed, 3 survived, 0 build failures
    └── 3 surviving mutant(s) need killing tests
  ✓ [11/12] Generating killing tests (1 tests)
WARNING:agent_forge.engine.nodes.killing_test_runner:No local repo path — using mock 3-stage filter results
  ✓ [11/12] 3-stage filter (Build → Pass original → Fail mutant)
    ├── 1 accepted, 0 failed
    └── Mutation score: 33%
  ✓ [12/12] Generating report

╭────────────────────────────────── Results ───────────────────────────────────╮
│        Coverage:  0% → 0% (+0.0%)                                            │
│ Tests generated:  3 (in 1 files)                                             │
│      Iterations:  1/3                                                        │
│         Results:  3 passed, 0 failed ✓                                       │
╰──────────────────────────────────────────────────────────────────────────────╯
Generated Test Files
└── src/test/java/com/google/gson/stream/JsonReaderTest.java

⚠ Suggestions:
  • Coverage below 80% — consider adding integration tests for remaining paths
  • 2 mutant(s) still surviving — consider manual review of surviving cases
  • Mutation score below 80% — consider adding more targeted tests for edge 
cases
  • 1 new killing test(s) added — commit these to prevent future regression

╭────────────────────────── Mutation Testing Results ──────────────────────────╮
│        Mutation score:  33% (1/3 mutants killed)                             │
│    Killed by existing:  0                                                    │
│   Killed by new tests:  1                                                    │
│       Still surviving:  2                                                    │
│ Equivalent (filtered):  0                                                    │
╰──────────────────────────────────────────────────────────────────────────────╯
                    Surviving Mutants — Not Caught by Tests                     
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ID     ┃ Type               ┃ Description                                    ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ M001   │ wrong_operator     │ Wrong operator: decrements success count       │
│        │                    │ instead of incrementing                        │
│ M002   │ wrong_return       │ Wrong return value: returns 1.0 (100% success) │
│        │                    │ instead of 0.0 when no transactions            │
│ M003   │ wrong_operator     │ Missing increment: total count never           │
│        │                    │ increases, breaking success rate calculation   │
└────────┴────────────────────┴────────────────────────────────────────────────┘

Report generated using 5 tool calls.

iluwatar/java-design-patterns PR#3454
adle wrapper found — cannot run tests
  ✓ [6/12] Running tests
    ├── 2 passed, 0 failed
    └── All tests passing
  ✓ [8/12] Generating mutations (8 mutants)
  ✓ [9/12] Filtering equivalent mutants (8 remain, 0 removed)
WARNING:agent_forge.engine.nodes.mutation_runner:No local repo path — using mock mutation run results
  ✓ [10/12] Running tests against mutants
    ├── 0 killed, 8 survived, 0 build failures
    └── 8 surviving mutant(s) need killing tests
  ✓ [11/12] Generating killing tests (6 tests)
WARNING:agent_forge.engine.nodes.killing_test_runner:No local repo path — using mock 3-stage filter results
  ✓ [11/12] 3-stage filter (Build → Pass original → Fail mutant)
    ├── 6 accepted, 0 failed
    └── Mutation score: 75%
  ✓ [12/12] Generating report

╭────────────────────────────────── Results ───────────────────────────────────╮
│        Coverage:  0% → 0% (+0.0%)                                            │
│ Tests generated:  2 (in 1 files)                                             │
│      Iterations:  1/3                                                        │
│         Results:  2 passed, 0 failed ✓                                       │
╰──────────────────────────────────────────────────────────────────────────────╯
Generated Test Files
└── src/test/java/com/iluwatar/sessionserver/AppTest.java

⚠ Suggestions:
  • Coverage below 80% — consider adding integration tests for remaining paths
  • 2 mutant(s) still surviving — consider manual review of surviving cases
  • Mutation score below 80% — consider adding more targeted tests for edge 
cases
  • 6 new killing test(s) added — commit these to prevent future regression

╭────────────────────────── Mutation Testing Results ──────────────────────────╮
│        Mutation score:  75% (6/8 mutants killed)                             │
│    Killed by existing:  0                                                    │
│   Killed by new tests:  6                                                    │
│       Still surviving:  2                                                    │
│ Equivalent (filtered):  0                                                    │
╰──────────────────────────────────────────────────────────────────────────────╯
                    Surviving Mutants — Not Caught by Tests                     
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ID     ┃ Type               ┃ Description                                    ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ M007   │ wrong_return       │ Wrong return: describes the pattern as         │
│        │                    │ creating mutable objects instead of immutable. │
│ M008   │ wrong_return       │ Wrong return: incorrectly states that the      │
│        │                    │ object's state can be modified.                │
└────────┴────────────────────┴────────────────────────────────────────────────┘

Report generated using 5 tool calls.

Next Steps

  • Review surviving mutants for test improvement opportunities
  • Check killing test compilation success rate
  • Compare mutation scores across repos
  • Update prompt templates if pattern failures detected

Auto-generated by agent-forge benchmark pipeline

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions