Skip to content

Commit f13ef3c

Browse files
committed
Resolve merge conflict in gamedev-agent-thoughts.txt after stash pop. Finalize merge.
2 parents d359675 + d01b646 commit f13ef3c

6 files changed

Lines changed: 3337 additions & 3 deletions

File tree

.pm/tracker.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -360,6 +360,7 @@ The project has closely followed the implementation plan with excellent tracking
360360
**In-Progress Tasks:**
361361

362362
- **11.4.1** - Strategy parameter optimization (Issue #71, optional, not started, low priority)
363+
- **11.4.2** - Optimization engine robustness & UX polish (see task details)
363364
- **11.6.1** - Designer feedback tooling (Issue #70, optional, not started, low priority)
364365

365366
**Optional Polish Tasks** (not included in phase counts):
@@ -454,6 +455,7 @@ The project has closely followed the implementation plan with excellent tracking
454455
- ✅ 11.2.1: Result aggregation and storage - **COMPLETED** (2025-12-04)
455456
- ✅ 11.3.1: Analysis and balance reporting - **COMPLETED** (2025-12-04)
456457
- ⬜ 11.4.1: Strategy parameter optimization - **OPTIONAL** (low priority, future work)
458+
- ⬜ 11.4.2: Optimization engine robustness & UX polish - **OPTIONAL** (medium priority, future work)
457459
- ✅ 11.5.1: CI integration for continuous validation - **COMPLETED** (2025-12-05)
458460
- ⬜ 11.6.1: Designer feedback loop and tooling - **OPTIONAL** (low priority, UX enhancement)
459461
- **Dependencies:** Phase 9 (AI tournaments) ✅ COMPLETE
@@ -584,6 +586,7 @@ The project has closely followed the implementation plan with excellent tracking
584586
| 11.2.1 | Result aggregation and storage (M11.2) | ✅ completed | Medium | gamedev-agent | 2025-12-05 |
585587
| 11.3.1 | Analysis and balance reporting (M11.3) | ✅ completed | High | gamedev-agent | 2025-12-05 |
586588
| 11.4.1 | Strategy parameter optimization (M11.4) | not-started | Low | gamedev-agent | 2025-12-05 |
589+
| 11.4.2 | Optimization engine robustness & UX polish | not-started | Medium | gamedev-agent | 2025-12-06 |
587590
| 11.5.1 | CI integration for continuous validation (M11.5) | ✅ completed | Medium | gamedev-agent | 2025-12-05 |
588591
| 11.6.1 | Designer feedback loop and tooling (M11.6) | not-started | Low | gamedev-agent | 2025-12-05 |
589592
| 10.2.1 | Harden difficulty sweep runtime & monitoring | not-started | Low | Gamedev Agent | 2025-12-02 |
@@ -1546,6 +1549,35 @@ The project has closely followed the implementation plan with excellent tracking
15461549
5. (Future) Consider exposing internal strategy parameters per implementation plan Section 10.
15471550
- **Last Updated:** 2025-12-04
15481551

1552+
### 11.4.2 — Optimization Engine Robustness & UX Polish
1553+
1554+
- **Status:** not-started (optional enhancement)
1555+
- **Description:** Iterate on `scripts/optimize_strategies.py` to improve robustness, extensibility, and user experience after the initial implementation (PR #72). Focus areas include stronger validation, clearer feedback for long-running jobs, more flexible parameter handling, consistent logging, and forward-compatible result storage.
1556+
- **Acceptance Criteria:**
1557+
- CLI validates YAML and CLI overrides with clear, actionable error messages for invalid ranges, missing/unknown fields, and conflicting options.
1558+
- Optional parallel evaluation mode (config flag and/or CLI flag) for grid/random search that preserves deterministic behavior when seeds are fixed.
1559+
- Result storage schema includes a simple version field and continues to read older rows safely (defensive parsing for missing/extra columns).
1560+
- Logging uses the standard `logging` module with at least `--verbose` and `--quiet` behaviors, replacing ad-hoc `stderr` writes.
1561+
- Categorical (non-numeric) parameter support is either minimally implemented for a representative case or explicitly documented as a future design (with clear limitations).
1562+
- All tests pass; coverage for `scripts/optimize_strategies.py` is maintained at or above the current baseline.
1563+
- **Priority:** Medium
1564+
- **Responsible:** gamedev-agent
1565+
- **Dependencies:** 11.1.1 (batch sweeps), 11.2.1 (result aggregation), 11.3.1 (analysis & reporting), 11.4.1 (initial optimizer shipped)
1566+
- **Risks & Mitigations:**
1567+
- Risk: Parallelization introduces non-determinism or intermittent failures.
1568+
- Mitigation: Make parallel mode opt-in, centralize RNG seeding, and add tests that assert stable results with fixed seeds.
1569+
- Risk: SQLite schema changes break existing data.
1570+
- Mitigation: Introduce a schema version column, keep queries backward compatible, and add a small migration/upgrade note in docs.
1571+
- Risk: Over-extending categorical parameter support.
1572+
- Mitigation: Limit scope to a single clear pattern or document it as future work instead of refactoring the core engine.
1573+
- **Next Steps:**
1574+
1. Design validation rules and logging behavior for `optimize_strategies.py` CLI usage.
1575+
2. Prototype optional parallel evaluation for random/grid search and measure performance and determinism.
1576+
3. Extend SQLite schema and query helpers with a version field while keeping existing rows readable.
1577+
4. Update `docs/gengine/ai_tournament_and_balance_analysis.md` with an "Advanced Optimization" section describing new behaviors and trade-offs.
1578+
5. Coordinate with `test_agent` to ensure edge cases and failure paths are well covered.
1579+
- **Last Updated:** 2025-12-06
1580+
15491581
### 11.5.1 — CI Integration for Continuous Validation (M11.5)
15501582

15511583
- **GitHub Issue:** [#68](https://github.com/TheWizardsCode/GEngine/issues/68)**COMPLETED**

docs/gengine/ai_tournament_and_balance_analysis.md

Lines changed: 187 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -228,22 +228,207 @@ uv run python scripts/analyze_balance.py report build/batch_sweep_summary.json -
228228

229229
See the sections below for details on statistical analysis, regression detection, and report formats.
230230

231+
## Strategy Parameter Optimization
232+
233+
The `optimize_strategies.py` script (Phase 11, M11.4) provides automated strategy parameter tuning using optimization algorithms to find well-balanced strategy configurations. It helps reduce dominant strategy win rate deltas and improve strategic diversity.
234+
235+
### Optimization Algorithms
236+
237+
Three optimization algorithms are supported:
238+
239+
1. **Grid Search** (`--algorithm grid`): Exhaustive search over all parameter combinations. Best for small parameter spaces or when you need guaranteed coverage.
240+
241+
2. **Random Search** (`--algorithm random`): Randomly samples parameter configurations. Efficient for large parameter spaces where exhaustive search is impractical.
242+
243+
3. **Bayesian Optimization** (`--algorithm bayesian`): Uses Gaussian processes to intelligently explore the parameter space. Requires `scikit-optimize` package (`pip install scikit-optimize`).
244+
245+
### Configuration
246+
247+
Optimization can be configured via `content/config/optimization.yml` or CLI flags:
248+
249+
```yaml
250+
parameters:
251+
stability_low:
252+
min: 0.5
253+
max: 0.8
254+
step: 0.1 # For grid search
255+
stability_critical:
256+
min: 0.3
257+
max: 0.5
258+
faction_low_legitimacy:
259+
min: 0.3
260+
max: 0.6
261+
262+
targets:
263+
- name: win_rate_delta
264+
weight: 1.0
265+
direction: minimize
266+
- name: diversity
267+
weight: 0.5
268+
direction: maximize
269+
270+
settings:
271+
algorithm: random
272+
n_samples: 50
273+
tick_budget: 100
274+
seeds: [42, 123, 456]
275+
strategies: [balanced, aggressive, diplomatic]
276+
```
277+
278+
### Running Optimization
279+
280+
**Basic optimization with grid search:**
281+
```bash
282+
uv run python scripts/optimize_strategies.py optimize --algorithm grid
283+
```
284+
285+
**Random search with more samples:**
286+
```bash
287+
uv run python scripts/optimize_strategies.py optimize --algorithm random --samples 100
288+
```
289+
290+
**Bayesian optimization (requires scikit-optimize):**
291+
```bash
292+
uv run python scripts/optimize_strategies.py optimize --algorithm bayesian --samples 50
293+
```
294+
295+
### Optimization Targets
296+
297+
The optimizer supports multiple optimization targets:
298+
299+
- **win_rate_delta**: Minimize the maximum win rate difference between strategies. Lower values indicate better balance.
300+
- **diversity**: Maximize strategic diversity (different strategies succeed in different scenarios). Uses entropy-based scoring.
301+
- **stability**: Target average stability across simulations.
302+
303+
Multi-objective optimization produces a Pareto frontier showing trade-offs between competing objectives.
304+
305+
### Pareto Frontier
306+
307+
The Pareto frontier represents configurations that are optimal in some dimension—no other configuration is better in all objectives simultaneously. View the frontier with:
308+
309+
```bash
310+
uv run python scripts/optimize_strategies.py pareto --database build/sweep_results.db
311+
```
312+
313+
This helps identify trade-offs such as:
314+
- Balance vs. difficulty (easier games may be more balanced)
315+
- Diversity vs. stability (more diverse outcomes may have wider stability ranges)
316+
317+
#### Interpreting the Pareto Frontier
318+
319+
The Pareto frontier is a set of parameter configurations where no single configuration is strictly better than another across all objectives. Each point on the frontier represents a trade-off:
320+
321+
- **If you want the most balanced game:** Look for points with the lowest `win_rate_delta`.
322+
- **If you want the most diverse strategies:** Look for points with the highest `diversity` score.
323+
- **If you want a compromise:** Choose a point that balances both objectives, or use the weights in your optimization config to bias toward your design goals.
324+
325+
**Visualizing the Pareto Frontier:**
326+
You can plot the Pareto points (e.g., `win_rate_delta` vs. `diversity`) using your favorite plotting tool or spreadsheet. This helps you see the shape of the trade-off curve and pick a configuration that fits your needs.
327+
328+
**Example:**
329+
If the Pareto frontier includes:
330+
331+
| win_rate_delta | diversity | stability |
332+
|----------------|-----------|-----------|
333+
| 0.10 | 0.95 | 0.70 |
334+
| 0.12 | 0.98 | 0.68 |
335+
| 0.08 | 0.90 | 0.72 |
336+
337+
You might choose the first row for best balance, the second for best diversity, or the third for a compromise.
338+
339+
**Tip:** No point on the Pareto frontier is strictly "best"—the right choice depends on your design priorities.
340+
341+
### Output Files
342+
343+
After optimization, results are saved to the output directory (default: `build/optimization/`):
344+
345+
- `optimization_result.json`: Full optimization data including all evaluated configurations
346+
- `optimization_report.md`: Human-readable Markdown report with best parameters and Pareto frontier
347+
348+
### Integration with Result Storage
349+
350+
Optimization results are automatically stored in the sweep results database (`build/sweep_results.db`) for historical tracking. Query past optimization runs:
351+
352+
```bash
353+
uv run python scripts/optimize_strategies.py pareto --limit 10
354+
```
355+
356+
### CLI Options
357+
358+
| Flag | Description |
359+
|------|-------------|
360+
| `--algorithm, -a` | Optimization algorithm (grid, random, bayesian) |
361+
| `--config, -c` | Path to YAML configuration file |
362+
| `--samples, -n` | Number of samples for random/bayesian search |
363+
| `--ticks, -t` | Tick budget per sweep simulation |
364+
| `--seed` | Random seed for reproducibility |
365+
| `--output-dir, -o` | Output directory for results |
366+
| `--database, -d` | Path to sweep results database |
367+
| `--json` | Output as JSON instead of files |
368+
| `--verbose, -v` | Print progress information |
369+
| `--no-store` | Skip storing result in database |
370+
371+
### Example Workflow
372+
373+
### Troubleshooting & FAQ
374+
375+
- **Q: My optimization run is very slow or seems stuck.**
376+
- Try reducing the number of samples or using random search instead of grid search for large parameter spaces.
377+
- Use the `--verbose` flag to monitor progress.
378+
- **Q: I get an error about missing or invalid parameters.**
379+
- Check your YAML config and CLI flags for typos or out-of-range values. All parameter names must match those in your strategy config.
380+
- **Q: The Pareto frontier is empty or has only one point.**
381+
- This can happen if all configurations are dominated or if your parameter ranges are too narrow. Try expanding the search space or adjusting your targets.
382+
- **Q: How do I add a new optimization target?**
383+
- Edit your config to add a new target (e.g., `stability`) and re-run the optimizer. See the config example above.
384+
385+
1. **Run initial optimization:**
386+
```bash
387+
uv run python scripts/optimize_strategies.py optimize --algorithm random --samples 50 --verbose
388+
```
389+
390+
2. **Review results:**
391+
```bash
392+
cat build/optimization/optimization_report.md
393+
```
394+
395+
3. **Apply best parameters:** Update `src/gengine/ai_player/strategies.py` with the discovered optimal values for `StrategyConfig`.
396+
397+
4. **Validate with batch sweeps:**
398+
```bash
399+
uv run python scripts/run_batch_sweeps.py --output-dir build/validation
400+
```
401+
402+
5. **Generate balance report:**
403+
```bash
404+
uv run python scripts/analyze_balance.py report --database build/sweep_results.db
405+
```
406+
231407
## Balance Iteration Workflow
232408

233409
### Recommended Workflow
234410

235411
1. **Initial Exploration:** Run batch sweeps with diverse parameter combinations to establish baseline metrics.
236412
2. **Tournament Validation:** Run focused tournaments on specific strategy combinations.
237413
3. **Analysis:** Use the analysis script to identify dominant strategies, underpowered/overpowered actions, and unused content.
238-
4. **Adjustment:** Modify simulation parameters or authored content based on findings.
239-
5. **Regression Testing:** Re-run batch sweeps to validate improvements and ensure no regressions.
414+
4. **Parameter Optimization:** Use `optimize_strategies.py` to find balanced parameter configurations automatically.
415+
5. **Adjustment:** Apply optimized parameters or modify authored content based on findings.
416+
6. **Regression Testing:** Re-run batch sweeps to validate improvements and ensure no regressions.
240417

241418
## CI Integration
242419

243420
A nightly CI workflow automatically runs tournaments and batch sweeps, archiving results for ongoing balance review. See `.github/workflows/ai-tournament.yml` for details.
244421

245422
## Usage Tips
246423

424+
## Best Practices & Advanced Tips
425+
426+
- Start with a broad parameter sweep to understand the landscape, then narrow in on promising regions.
427+
- Use multiple random seeds to avoid overfitting to a single scenario.
428+
- Regularly review the Markdown and JSON reports to track progress and spot regressions.
429+
- Archive your optimization results and reports for future reference and reproducibility.
430+
- For advanced analysis, export Pareto points and plot them to visualize trade-offs.
431+
247432
- Use different world configs and seeds to stress-test balance across scenarios.
248433
- For large parameter spaces, start with `sampling.mode: random` and a reduced `sample_count`.
249434
- Review the analysis report regularly to guide design iteration.

0 commit comments

Comments
 (0)