|
1 | 1 | # Section 13: AI Tournament & Balance Analysis |
2 | 2 |
|
3 | | -**Last Updated:** 2025-12-03 |
| 3 | +**Last Updated:** 2025-12-04 |
4 | 4 |
|
5 | 5 | ## Overview |
6 | | -This section describes how to use the AI tournament and balance analysis tooling introduced in Phase 9. These tools help designers and developers run large batches of AI-driven games in parallel, compare strategy performance, and identify balance issues or underutilized content. |
| 6 | + |
| 7 | +This guide explains how to use the AI tournament, batch sweep, and balance analysis tools introduced in Phases 9 and 11. These utilities enable designers and developers to: |
| 8 | + |
| 9 | +- Run large batches of AI-driven games in parallel |
| 10 | +- Compare strategy and difficulty performance |
| 11 | +- Identify balance issues and underutilized content |
| 12 | +- Automate regression and balance testing in CI |
7 | 13 |
|
8 | 14 | ## Running AI Tournaments |
9 | 15 |
|
10 | | -The tournament script executes multiple games in parallel, each using a configurable AI strategy (BALANCED, AGGRESSIVE, DIPLOMATIC, HYBRID). Telemetry is captured for each game, and results are aggregated into a single JSON file. |
| 16 | +The tournament script executes multiple games in parallel, each using a configurable AI strategy (`BALANCED`, `AGGRESSIVE`, `DIPLOMATIC`, `HYBRID`). Telemetry is captured for each game, and results are aggregated into a single JSON file for analysis. |
11 | 17 |
|
12 | 18 | **Example:** |
13 | 19 | ```bash |
14 | 20 | uv run python scripts/run_ai_tournament.py --games 100 --output build/tournament.json |
15 | 21 | ``` |
| 22 | + |
| 23 | +**Key options:** |
16 | 24 | - `--games`: Number of games to run (default: 100) |
17 | 25 | - `--output`: Path to save the aggregated results |
18 | | -- Additional flags allow you to specify strategies, seeds, and world configs. |
| 26 | +- `--strategies`: Strategies to test (e.g., `balanced aggressive`) |
| 27 | +- `--seeds`: Random seeds for reproducibility |
| 28 | +- `--worlds`: World configuration bundles |
| 29 | + |
| 30 | +## Running Batch Simulation Sweeps |
| 31 | + |
| 32 | +The batch sweep script (Phase 11, M11.1) enables multi-dimensional parameter space exploration for comprehensive balance analysis. It generates a Cartesian product of parameter combinations and executes them in parallel, allowing you to: |
| 33 | + |
| 34 | +- Stress-test balance across strategies, difficulties, seeds, worlds, and tick budgets |
| 35 | +- Sample large parameter spaces efficiently |
| 36 | +- Aggregate results for statistical analysis |
| 37 | + |
| 38 | +### Configuration |
| 39 | + |
| 40 | +Batch sweeps are configured via `content/config/batch_sweeps.yml`. You can override any parameter via CLI flags. |
| 41 | + |
| 42 | +```yaml |
| 43 | +parameters: |
| 44 | + strategies: |
| 45 | + - balanced |
| 46 | + - aggressive |
| 47 | + - diplomatic |
| 48 | + difficulties: |
| 49 | + - normal |
| 50 | + - hard |
| 51 | + seeds: |
| 52 | + - 42 |
| 53 | + - 123 |
| 54 | + - 456 |
| 55 | + worlds: |
| 56 | + - default |
| 57 | + tick_budgets: |
| 58 | + - 100 |
| 59 | + - 200 |
| 60 | + |
| 61 | +parallel: |
| 62 | + max_workers: null # Auto-detect CPU count |
| 63 | + timeout_per_sweep: 300 |
| 64 | + |
| 65 | +output: |
| 66 | + dir: build/batch_sweeps |
| 67 | + include_telemetry: true |
| 68 | + |
| 69 | +sampling: |
| 70 | + mode: full # Options: full, random, latin_hypercube |
| 71 | + sample_count: 100 |
| 72 | +``` |
| 73 | +
|
| 74 | +**Tip:** For very large parameter spaces, use `sampling.mode: random` and adjust `sample_count` to control the number of sweeps. |
| 75 | + |
| 76 | +### Running Batch Sweeps |
| 77 | + |
| 78 | +**Basic execution with default configuration:** |
| 79 | +```bash |
| 80 | +uv run python scripts/run_batch_sweeps.py --output-dir build/sweeps --verbose |
| 81 | +``` |
| 82 | + |
| 83 | +**Override parameters via CLI:** |
| 84 | +```bash |
| 85 | +uv run python scripts/run_batch_sweeps.py \ |
| 86 | + --strategies balanced aggressive \ |
| 87 | + --difficulties normal hard \ |
| 88 | + --seeds 42 123 456 \ |
| 89 | + --ticks 100 200 \ |
| 90 | + --output-dir build/custom_sweeps |
| 91 | +``` |
| 92 | + |
| 93 | +**Use a custom configuration file:** |
| 94 | +```bash |
| 95 | +uv run python scripts/run_batch_sweeps.py --config path/to/custom_sweeps.yml |
| 96 | +``` |
| 97 | + |
| 98 | +### Output Format |
| 99 | + |
| 100 | +Each sweep produces a JSON file containing: |
| 101 | +- `parameters`: Full parameter set (strategy, difficulty, seed, world, tick_budget) |
| 102 | +- `results`: Game outcome data (final_stability, actions_taken, story_seeds_activated) |
| 103 | +- `telemetry`: Metrics and profiling data (environment, faction_legitimacy, economy) |
| 104 | +- `metadata`: Timestamp, git commit, runtime info |
| 105 | + |
| 106 | +A summary file `batch_sweep_summary.json` aggregates all results, including: |
| 107 | +- Strategy-level statistics (average/min/max stability, win rates) |
| 108 | +- Difficulty-level statistics |
| 109 | +- Total sweep counts and failure rates |
| 110 | + |
| 111 | +### CLI Options |
| 112 | + |
| 113 | +**Common CLI Flags:** |
| 114 | + |
| 115 | +| Flag | Description | |
| 116 | +|-------------------|---------------------------------------------| |
| 117 | +| `--config, -c` | Path to YAML configuration file | |
| 118 | +| `--strategies, -s`| Override strategies to test | |
| 119 | +| `--difficulties, -d` | Override difficulty presets | |
| 120 | +| `--seeds` | Override random seeds | |
| 121 | +| `--worlds, -w` | Override world bundles | |
| 122 | +| `--ticks, -t` | Override tick budgets | |
| 123 | +| `--workers` | Max parallel workers | |
| 124 | +| `--output-dir, -o`| Output directory for results | |
| 125 | +| `--json` | Output summary as JSON | |
| 126 | +| `--verbose, -v` | Print progress during execution | |
| 127 | +| `--no-write` | Skip writing individual sweep files | |
19 | 128 |
|
20 | 129 | ## Analyzing Tournament Results |
21 | 130 |
|
22 | | -After running a tournament, use the analysis script to generate comparative reports. This tool surfaces win rate differences, balance anomalies, and unused story seeds. |
| 131 | +After running a tournament or batch sweep, use the analysis script to generate comparative reports. This tool surfaces: |
| 132 | +- Win rate differences across strategies and difficulties |
| 133 | +- Detection of unused story seeds |
| 134 | +- Flagging of balance outliers and anomalies |
23 | 135 |
|
24 | 136 | **Example:** |
25 | 137 | ```bash |
26 | 138 | uv run python scripts/analyze_ai_games.py build/tournament.json --report build/analysis.txt |
27 | 139 | ``` |
| 140 | + |
| 141 | +**Key option:** |
28 | 142 | - `--report`: Path to save the analysis output |
29 | 143 |
|
30 | 144 | The report includes: |
31 | | -- Win rate comparison across strategies |
| 145 | +- Win rate comparison across strategies and difficulties |
32 | 146 | - Detection of unused story seeds |
33 | 147 | - Flagging of balance outliers |
34 | 148 |
|
35 | 149 | ## Balance Iteration Workflow |
36 | 150 |
|
37 | | -1. Run a tournament with a large number of games and varied strategies. |
38 | | -2. Analyze the results to identify dominant strategies, underpowered/overpowered actions, and unused content. |
39 | | -3. Adjust simulation parameters or authored content as needed. |
40 | | -4. Repeat the process to validate improvements. |
| 151 | +### Recommended Workflow |
| 152 | + |
| 153 | +1. **Initial Exploration:** Run batch sweeps with diverse parameter combinations to establish baseline metrics. |
| 154 | +2. **Tournament Validation:** Run focused tournaments on specific strategy combinations. |
| 155 | +3. **Analysis:** Use the analysis script to identify dominant strategies, underpowered/overpowered actions, and unused content. |
| 156 | +4. **Adjustment:** Modify simulation parameters or authored content based on findings. |
| 157 | +5. **Regression Testing:** Re-run batch sweeps to validate improvements and ensure no regressions. |
41 | 158 |
|
42 | 159 | ## CI Integration |
43 | 160 |
|
44 | | -A nightly CI workflow automatically runs tournaments and archives results for ongoing balance review. See `.github/workflows/ai-tournament.yml` for details. |
| 161 | +A nightly CI workflow automatically runs tournaments and batch sweeps, archiving results for ongoing balance review. See `.github/workflows/ai-tournament.yml` for details. |
45 | 162 |
|
46 | 163 | ## Usage Tips |
| 164 | + |
47 | 165 | - Use different world configs and seeds to stress-test balance across scenarios. |
| 166 | +- For large parameter spaces, start with `sampling.mode: random` and a reduced `sample_count`. |
48 | 167 | - Review the analysis report regularly to guide design iteration. |
49 | 168 | - Archived CI artifacts provide a historical record of balance changes. |
| 169 | +- Use `--verbose` during development to monitor sweep progress. |
| 170 | +- Use reproducible seeds for regression testing. |
50 | 171 |
|
51 | 172 | ## See Also |
52 | 173 | - [How to Play Echoes](./how_to_play_echoes.md) |
|
0 commit comments