Skip to content

Commit 7b9c029

Browse files
committed
Merge branch 'copilot/remote-lobster' FIXES: 58
2 parents 70c747b + f252c65 commit 7b9c029

6 files changed

Lines changed: 1802 additions & 786 deletions

File tree

content/config/batch_sweeps.yml

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
# Batch Simulation Sweep Configuration
2+
# Defines parameter ranges for multi-dimensional simulation sweeps.
3+
# Used by scripts/run_batch_sweeps.py for balance analysis and regression testing.
4+
5+
# Parameter Grid - each combination is tested (Cartesian product)
6+
parameters:
7+
# AI strategies to test
8+
strategies:
9+
- balanced
10+
- aggressive
11+
- diplomatic
12+
13+
# Difficulty presets (maps to content/config/sweeps/difficulty-<preset>/)
14+
difficulties:
15+
- normal
16+
17+
# Random seeds for deterministic reproducibility
18+
# Can be explicit list or range definition
19+
seeds:
20+
- 42
21+
- 123
22+
- 456
23+
24+
# World bundles to test (from content/worlds/)
25+
worlds:
26+
- default
27+
28+
# Tick budgets for simulation length
29+
tick_budgets:
30+
- 100
31+
32+
# Parallel execution settings
33+
parallel:
34+
# Maximum worker processes (null = auto-detect based on CPU count)
35+
max_workers: null
36+
37+
# Timeout per individual sweep in seconds
38+
timeout_per_sweep: 300
39+
40+
# Output configuration
41+
output:
42+
# Directory for sweep result JSON files
43+
dir: build/batch_sweeps
44+
45+
# Include full telemetry in output (increases file size)
46+
include_telemetry: true
47+
48+
# Include game state summary in output
49+
include_summary: true
50+
51+
# Sampling configuration for large parameter spaces
52+
# When enabled, samples from the grid instead of full Cartesian product
53+
sampling:
54+
# Sampling mode: "full" (all combinations), "random", "latin_hypercube"
55+
mode: full
56+
57+
# Number of samples to take (only used when mode != "full")
58+
sample_count: 100
59+
60+
# Random seed for sampling reproducibility
61+
sample_seed: 42
62+
63+
# Metadata included in every sweep output
64+
metadata:
65+
# Include git commit hash if available
66+
include_git_commit: true
67+
68+
# Include timestamp
69+
include_timestamp: true
70+
71+
# Include runtime environment info
72+
include_runtime_info: true

docs/gengine/ai_tournament_and_balance_analysis.md

Lines changed: 132 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,52 +1,173 @@
11
# Section 13: AI Tournament & Balance Analysis
22

3-
**Last Updated:** 2025-12-03
3+
**Last Updated:** 2025-12-04
44

55
## Overview
6-
This section describes how to use the AI tournament and balance analysis tooling introduced in Phase 9. These tools help designers and developers run large batches of AI-driven games in parallel, compare strategy performance, and identify balance issues or underutilized content.
6+
7+
This guide explains how to use the AI tournament, batch sweep, and balance analysis tools introduced in Phases 9 and 11. These utilities enable designers and developers to:
8+
9+
- Run large batches of AI-driven games in parallel
10+
- Compare strategy and difficulty performance
11+
- Identify balance issues and underutilized content
12+
- Automate regression and balance testing in CI
713

814
## Running AI Tournaments
915

10-
The tournament script executes multiple games in parallel, each using a configurable AI strategy (BALANCED, AGGRESSIVE, DIPLOMATIC, HYBRID). Telemetry is captured for each game, and results are aggregated into a single JSON file.
16+
The tournament script executes multiple games in parallel, each using a configurable AI strategy (`BALANCED`, `AGGRESSIVE`, `DIPLOMATIC`, `HYBRID`). Telemetry is captured for each game, and results are aggregated into a single JSON file for analysis.
1117

1218
**Example:**
1319
```bash
1420
uv run python scripts/run_ai_tournament.py --games 100 --output build/tournament.json
1521
```
22+
23+
**Key options:**
1624
- `--games`: Number of games to run (default: 100)
1725
- `--output`: Path to save the aggregated results
18-
- Additional flags allow you to specify strategies, seeds, and world configs.
26+
- `--strategies`: Strategies to test (e.g., `balanced aggressive`)
27+
- `--seeds`: Random seeds for reproducibility
28+
- `--worlds`: World configuration bundles
29+
30+
## Running Batch Simulation Sweeps
31+
32+
The batch sweep script (Phase 11, M11.1) enables multi-dimensional parameter space exploration for comprehensive balance analysis. It generates a Cartesian product of parameter combinations and executes them in parallel, allowing you to:
33+
34+
- Stress-test balance across strategies, difficulties, seeds, worlds, and tick budgets
35+
- Sample large parameter spaces efficiently
36+
- Aggregate results for statistical analysis
37+
38+
### Configuration
39+
40+
Batch sweeps are configured via `content/config/batch_sweeps.yml`. You can override any parameter via CLI flags.
41+
42+
```yaml
43+
parameters:
44+
strategies:
45+
- balanced
46+
- aggressive
47+
- diplomatic
48+
difficulties:
49+
- normal
50+
- hard
51+
seeds:
52+
- 42
53+
- 123
54+
- 456
55+
worlds:
56+
- default
57+
tick_budgets:
58+
- 100
59+
- 200
60+
61+
parallel:
62+
max_workers: null # Auto-detect CPU count
63+
timeout_per_sweep: 300
64+
65+
output:
66+
dir: build/batch_sweeps
67+
include_telemetry: true
68+
69+
sampling:
70+
mode: full # Options: full, random, latin_hypercube
71+
sample_count: 100
72+
```
73+
74+
**Tip:** For very large parameter spaces, use `sampling.mode: random` and adjust `sample_count` to control the number of sweeps.
75+
76+
### Running Batch Sweeps
77+
78+
**Basic execution with default configuration:**
79+
```bash
80+
uv run python scripts/run_batch_sweeps.py --output-dir build/sweeps --verbose
81+
```
82+
83+
**Override parameters via CLI:**
84+
```bash
85+
uv run python scripts/run_batch_sweeps.py \
86+
--strategies balanced aggressive \
87+
--difficulties normal hard \
88+
--seeds 42 123 456 \
89+
--ticks 100 200 \
90+
--output-dir build/custom_sweeps
91+
```
92+
93+
**Use a custom configuration file:**
94+
```bash
95+
uv run python scripts/run_batch_sweeps.py --config path/to/custom_sweeps.yml
96+
```
97+
98+
### Output Format
99+
100+
Each sweep produces a JSON file containing:
101+
- `parameters`: Full parameter set (strategy, difficulty, seed, world, tick_budget)
102+
- `results`: Game outcome data (final_stability, actions_taken, story_seeds_activated)
103+
- `telemetry`: Metrics and profiling data (environment, faction_legitimacy, economy)
104+
- `metadata`: Timestamp, git commit, runtime info
105+
106+
A summary file `batch_sweep_summary.json` aggregates all results, including:
107+
- Strategy-level statistics (average/min/max stability, win rates)
108+
- Difficulty-level statistics
109+
- Total sweep counts and failure rates
110+
111+
### CLI Options
112+
113+
**Common CLI Flags:**
114+
115+
| Flag | Description |
116+
|-------------------|---------------------------------------------|
117+
| `--config, -c` | Path to YAML configuration file |
118+
| `--strategies, -s`| Override strategies to test |
119+
| `--difficulties, -d` | Override difficulty presets |
120+
| `--seeds` | Override random seeds |
121+
| `--worlds, -w` | Override world bundles |
122+
| `--ticks, -t` | Override tick budgets |
123+
| `--workers` | Max parallel workers |
124+
| `--output-dir, -o`| Output directory for results |
125+
| `--json` | Output summary as JSON |
126+
| `--verbose, -v` | Print progress during execution |
127+
| `--no-write` | Skip writing individual sweep files |
19128

20129
## Analyzing Tournament Results
21130

22-
After running a tournament, use the analysis script to generate comparative reports. This tool surfaces win rate differences, balance anomalies, and unused story seeds.
131+
After running a tournament or batch sweep, use the analysis script to generate comparative reports. This tool surfaces:
132+
- Win rate differences across strategies and difficulties
133+
- Detection of unused story seeds
134+
- Flagging of balance outliers and anomalies
23135

24136
**Example:**
25137
```bash
26138
uv run python scripts/analyze_ai_games.py build/tournament.json --report build/analysis.txt
27139
```
140+
141+
**Key option:**
28142
- `--report`: Path to save the analysis output
29143

30144
The report includes:
31-
- Win rate comparison across strategies
145+
- Win rate comparison across strategies and difficulties
32146
- Detection of unused story seeds
33147
- Flagging of balance outliers
34148

35149
## Balance Iteration Workflow
36150

37-
1. Run a tournament with a large number of games and varied strategies.
38-
2. Analyze the results to identify dominant strategies, underpowered/overpowered actions, and unused content.
39-
3. Adjust simulation parameters or authored content as needed.
40-
4. Repeat the process to validate improvements.
151+
### Recommended Workflow
152+
153+
1. **Initial Exploration:** Run batch sweeps with diverse parameter combinations to establish baseline metrics.
154+
2. **Tournament Validation:** Run focused tournaments on specific strategy combinations.
155+
3. **Analysis:** Use the analysis script to identify dominant strategies, underpowered/overpowered actions, and unused content.
156+
4. **Adjustment:** Modify simulation parameters or authored content based on findings.
157+
5. **Regression Testing:** Re-run batch sweeps to validate improvements and ensure no regressions.
41158

42159
## CI Integration
43160

44-
A nightly CI workflow automatically runs tournaments and archives results for ongoing balance review. See `.github/workflows/ai-tournament.yml` for details.
161+
A nightly CI workflow automatically runs tournaments and batch sweeps, archiving results for ongoing balance review. See `.github/workflows/ai-tournament.yml` for details.
45162

46163
## Usage Tips
164+
47165
- Use different world configs and seeds to stress-test balance across scenarios.
166+
- For large parameter spaces, start with `sampling.mode: random` and a reduced `sample_count`.
48167
- Review the analysis report regularly to guide design iteration.
49168
- Archived CI artifacts provide a historical record of balance changes.
169+
- Use `--verbose` during development to monitor sweep progress.
170+
- Use reproducible seeds for regression testing.
50171

51172
## See Also
52173
- [How to Play Echoes](./how_to_play_echoes.md)

0 commit comments

Comments
 (0)