diff --git a/content/config/overlays/example_tuning.yml b/content/config/overlays/example_tuning.yml new file mode 100644 index 00000000..fd733052 --- /dev/null +++ b/content/config/overlays/example_tuning.yml @@ -0,0 +1,46 @@ +# Example Tuning Overlay for Echoes of Emergence +# ================================================ +# +# This file demonstrates how to create configuration overlays for balance testing. +# Overlays merge with the base simulation.yml, allowing you to test parameter +# changes without modifying the base configuration. +# +# Usage: +# echoes-balance-studio test-tuning --overlay content/config/overlays/example_tuning.yml +# +# Structure: +# Only include the settings you want to change. Missing settings will use +# base configuration values. Nested sections are deep-merged. + +# Metadata (optional) - helps document what this overlay is testing +_meta: + name: "Example Tuning Overlay" + description: "Demonstrates adjusting economy and environment parameters" + author: "Balance Team" + created: "2025-12-05" + hypothesis: "Increasing regen_scale should improve stability without making the game too easy" + +# Economy adjustments +# Hypothesis: Slightly faster resource regeneration helps new players +# without trivializing resource management +economy: + # Increase base regeneration by 25% (0.8 -> 1.0) + regen_scale: 1.0 + # Slightly reduce the threshold for shortages to trigger earlier warnings + shortage_threshold: 0.25 + # Allow prices to fluctuate more for interesting market dynamics + price_max_boost: 0.6 + +# Environment adjustments +# Hypothesis: Softer scarcity pressure gives players more recovery time +environment: + # Reduce how much scarcity affects unrest + scarcity_unrest_weight: 0.00003 + # Allow biodiversity to recover faster + biodiversity_recovery_rate: 0.05 + +# Director adjustments (optional) +# Hypothesis: Longer quiet periods between story seeds reduces overwhelm +# director: +# global_quiet_ticks: 6 +# seed_quiet_ticks: 8 diff --git a/docs/gengine/ai_tournament_and_balance_analysis.md b/docs/gengine/ai_tournament_and_balance_analysis.md index 14bea9a6..9a0b0a46 100644 --- a/docs/gengine/ai_tournament_and_balance_analysis.md +++ b/docs/gengine/ai_tournament_and_balance_analysis.md @@ -251,9 +251,32 @@ A nightly CI workflow automatically runs tournaments and batch sweeps, archiving - Use `--verbose` during development to monitor sweep progress. - Use reproducible seeds for regression testing. +## Designer Feedback Loop and Tooling + +For designer-friendly workflows that make balance iteration accessible without deep engineering knowledge, see the [Designer Balance Guide](./designer_balance_guide.md). This guide covers: + +- Running exploratory parameter sweeps with `echoes-balance-studio` +- Creating and testing config overlays +- Diagnosing dominant strategies +- Iterating on action costs and narrative pacing +- Example workflows with case studies + +Quick start: +```bash +# Run the balance studio +uv run echoes-balance-studio sweep --strategies balanced aggressive --ticks 50 + +# Test a tuning change +uv run echoes-balance-studio test-tuning --overlay content/config/overlays/example_tuning.yml + +# View historical reports +uv run echoes-balance-studio history --days 30 +``` + ## See Also +- [Designer Balance Guide](./designer_balance_guide.md) - Designer-focused balance workflows - [How to Play Echoes](./how_to_play_echoes.md) - [Implementation Plan](../simul/emergent_story_game_implementation_plan.md) - [README](../../README.md) - - [Testing Guide](./testing_guide.md) - - [Content Designer Workflow](./content_designer_workflow.md) +- [Testing Guide](./testing_guide.md) +- [Content Designer Workflow](./content_designer_workflow.md) diff --git a/docs/gengine/designer_balance_guide.md b/docs/gengine/designer_balance_guide.md new file mode 100644 index 00000000..e8389a59 --- /dev/null +++ b/docs/gengine/designer_balance_guide.md @@ -0,0 +1,446 @@ +# Designer Balance Guide + +A practical guide for game designers to diagnose balance issues and iterate on game parameters in Echoes of Emergence using the balance studio tooling. + +## Overview + +The balance studio provides designer-friendly workflows for: + +- Running exploratory parameter sweeps +- Comparing configuration variants +- Testing tuning changes with overlays +- Viewing historical balance reports + +This guide covers common balance iteration tasks and provides step-by-step workflows. + +## Quick Start + +### Installation + +The balance studio is included with the GEngine development environment: + +```bash +# Install dependencies +uv sync --group dev + +# Verify installation +uv run echoes-balance-studio --help +``` + +### Available Commands + +| Command | Purpose | +|---------|---------| +| `sweep` | Run exploratory balance sweeps | +| `compare` | Compare two configurations | +| `test-tuning` | Test overlay changes | +| `history` | View past sweep runs | +| `view` | Inspect a specific run | +| `report` | Generate HTML reports | +| `overlays` | List available overlays | + +## Diagnosing Dominant Strategies + +When one strategy consistently outperforms others, it indicates a balance issue that needs investigation. + +### Symptoms + +- Win rate differences >10% between strategies +- Players gravitating to a single approach +- AI tournaments showing lopsided results + +### Diagnostic Workflow + +1. **Run a multi-strategy sweep:** + + ```bash + uv run echoes-balance-studio sweep \ + --strategies balanced aggressive diplomatic \ + --seeds 42 123 456 789 \ + --ticks 100 + ``` + +2. **Check the results:** + + ``` + Strategy Results: + balanced: avg_stability=0.721 + aggressive: avg_stability=0.534 + diplomatic: avg_stability=0.698 + ``` + +3. **Identify the dominant strategy:** If one strategy has >10% higher win rate, it may need adjustment. + +4. **Generate a detailed report:** + + ```bash + uv run echoes-balance-studio report \ + --output build/balance_report.html + ``` + +5. **Review the HTML report** for: + - Win rate comparisons + - Action usage frequencies + - Story seed activation rates + +### Common Fixes + +| Issue | Typical Cause | Suggested Fix | +|-------|--------------|---------------| +| Aggressive too strong | Low stability penalty for aggression | Increase `environment.scarcity_unrest_weight` | +| Diplomatic too weak | Negotiation rewards too low | Adjust `progression.experience_per_negotiation` | +| Balanced dominates | Other strategies have skewed risk/reward | Review action costs and outcomes | + +## Iterating on Action Costs + +Action costs determine how expensive each player choice is, affecting strategy viability. + +### Understanding Action Economy + +Actions in Echoes consume resources and have effects: + +- **Direct costs**: Resources spent to take the action +- **Opportunity costs**: What else could be done instead +- **Side effects**: Stability, faction legitimacy, pollution impacts + +### Testing Cost Changes + +1. **Create an overlay to adjust costs:** + + ```yaml + # content/config/overlays/action_cost_test.yml + _meta: + name: "Action Cost Test" + hypothesis: "Reducing inspection costs encourages exploration" + + progression: + experience_per_inspection: 8.0 # Increased from 5.0 + ``` + +2. **Test the change:** + + ```bash + uv run echoes-balance-studio test-tuning \ + --overlay content/config/overlays/action_cost_test.yml \ + --strategy balanced \ + --ticks 50 + ``` + +3. **Evaluate the results:** + + ``` + Results: + Baseline stability: 0.712 + With overlay: 0.745 + Delta: +0.033 + Impact: ✅ positive + ``` + +4. **If positive**, run a full comparison sweep to validate across strategies. + +### Case Study: Balancing Faction Interactions + +**Problem**: Players rarely use faction negotiation because the payoff is unclear. + +**Hypothesis**: Increasing negotiation experience rewards will encourage diplomatic play. + +**Process**: + +```bash +# Create test overlay +cat > content/config/overlays/negotiation_boost.yml << 'EOF' +_meta: + name: "Negotiation Boost" + hypothesis: "Higher negotiation XP encourages diplomatic strategies" + +progression: + experience_per_negotiation: 25.0 # Up from 15.0 + diplomacy_multiplier: 1.3 +EOF + +# Test the change +uv run echoes-balance-studio test-tuning \ + --overlay content/config/overlays/negotiation_boost.yml \ + --strategy diplomatic \ + --ticks 100 + +# Compare strategies with the new settings +uv run echoes-balance-studio sweep \ + --strategies balanced aggressive diplomatic \ + --overlay content/config/overlays/negotiation_boost.yml \ + --ticks 100 \ + --seeds 42 123 456 +``` + +## Testing Narrative Pacing Changes + +The narrative director controls story seed activation and pacing. + +### Key Pacing Parameters + +| Parameter | Effect | +|-----------|--------| +| `director.max_active_seeds` | How many story arcs can run simultaneously | +| `director.global_quiet_ticks` | Minimum ticks between new seed activations | +| `director.seed_active_ticks` | How long a seed stays in "active" state | +| `director.seed_resolve_ticks` | How long resolution takes | + +### Testing Pacing Adjustments + +1. **Create a pacing overlay:** + + ```yaml + # content/config/overlays/slower_pacing.yml + _meta: + name: "Slower Narrative Pacing" + hypothesis: "More breathing room between story beats reduces overwhelm" + + director: + max_active_seeds: 1 + global_quiet_ticks: 8 # Up from 4 + seed_quiet_ticks: 10 # Up from 6 + ``` + +2. **Run a longer sweep to observe pacing effects:** + + ```bash + uv run echoes-balance-studio sweep \ + --overlay content/config/overlays/slower_pacing.yml \ + --ticks 200 \ + --seeds 42 + ``` + +3. **Check story seed activation counts** in the output telemetry. + +### Balancing Story Density + +Too many story seeds firing leads to chaos; too few leads to boredom. + +**Indicators of over-pacing:** +- Multiple story seeds active simultaneously +- Players unable to respond before new events +- Stability crashes from overlapping crises + +**Indicators of under-pacing:** +- Long stretches with no narrative events +- Players waiting with nothing to do +- Low engagement between crises + +## Example Workflows + +### Workflow 1: New Feature Balance Check + +When adding a new game feature, validate it doesn't break existing balance: + +```bash +# 1. Establish baseline +uv run echoes-balance-studio sweep \ + --strategies balanced aggressive diplomatic \ + --ticks 100 \ + --output-dir build/baseline + +# 2. Apply your feature changes to an overlay + +# 3. Test with overlay +uv run echoes-balance-studio sweep \ + --strategies balanced aggressive diplomatic \ + --overlay content/config/overlays/new_feature.yml \ + --ticks 100 \ + --output-dir build/with_feature + +# 4. Compare results manually or generate reports +uv run echoes-balance-studio report \ + --output build/feature_comparison.html +``` + +### Workflow 2: Difficulty Tuning + +Adjusting difficulty presets for different player skill levels: + +```bash +# Compare easy vs hard difficulty configs +uv run echoes-balance-studio compare \ + --config-a content/config/sweeps/difficulty-easy/simulation.yml \ + --config-b content/config/sweeps/difficulty-hard/simulation.yml \ + --strategies balanced \ + --ticks 100 +``` + +### Workflow 3: Regression Testing + +After making changes, verify you haven't broken balance: + +```bash +# Run sweep and ingest to database +uv run python scripts/run_batch_sweeps.py \ + --strategies balanced aggressive \ + --output-dir build/regression_test + +uv run python scripts/aggregate_sweep_results.py \ + ingest build/regression_test + +# Check historical trends +uv run echoes-balance-studio history --days 7 + +# Generate comparison report +uv run echoes-balance-studio report \ + --days 7 \ + --output build/regression_report.html +``` + +## Case Study: Balancing the Industrial Tier + +This example walks through a complete balance iteration for a specific faction. + +### Problem Statement + +The Industrial Tier faction (Union of Flux) is underperforming: +- Lower win rates when playing industrial-focused strategies +- Faction legitimacy rarely exceeds 0.5 +- Story seeds related to industry trigger less frequently + +### Investigation + +1. **Run targeted sweep:** + + ```bash + uv run echoes-balance-studio sweep \ + --strategies balanced aggressive \ + --ticks 150 \ + --seeds 42 123 456 789 1234 + ``` + +2. **Review telemetry for faction legitimacy** in the output JSON. + +3. **Identify issues:** + - Industrial production values too low + - Pollution costs outweigh benefits + - Faction investment actions have weak effects + +### Creating a Fix + +```yaml +# content/config/overlays/industrial_balance.yml +_meta: + name: "Industrial Tier Balance" + hypothesis: "Boosting industrial benefits and reducing pollution penalties" + +economy: + base_resource_weights: + materials: 3.0 # Up from 2.5 + energy: 4.5 # Up from 4.0 + +environment: + faction_invest_pollution_relief: 0.03 # Up from 0.02 + scarcity_pollution_weight: 0.00002 # Down from 0.00003 +``` + +### Testing the Fix + +```bash +# Quick validation +uv run echoes-balance-studio test-tuning \ + --overlay content/config/overlays/industrial_balance.yml \ + --strategy balanced \ + --ticks 100 + +# Full sweep comparison +uv run echoes-balance-studio sweep \ + --overlay content/config/overlays/industrial_balance.yml \ + --strategies balanced aggressive diplomatic \ + --ticks 150 \ + --seeds 42 123 456 +``` + +### Validating the Fix + +After the overlay shows positive results: + +1. Merge overlay values into the base config +2. Run full regression sweep +3. Update difficulty presets if needed +4. Document the change in commit message + +## Best Practices + +### Overlay Organization + +``` +content/config/overlays/ +├── economy/ +│ ├── resource_boost.yml +│ └── price_stability.yml +├── environment/ +│ ├── pollution_reduction.yml +│ └── biodiversity_focus.yml +├── narrative/ +│ ├── faster_pacing.yml +│ └── more_story_seeds.yml +└── experimental/ + └── wild_ideas.yml +``` + +### Testing Checklist + +Before merging a balance change: + +- [ ] Tested with at least 3 random seeds +- [ ] Compared against baseline configuration +- [ ] Checked all strategies (balanced, aggressive, diplomatic) +- [ ] Verified no dramatic win rate shifts +- [ ] Documented hypothesis and results +- [ ] Run against multiple difficulty levels if applicable + +### Interpreting Results + +| Metric | Good Range | Warning Signs | +|--------|------------|---------------| +| Avg Stability | 0.5 - 0.8 | Below 0.4 (too hard) or above 0.9 (too easy) | +| Win Rate Delta | < 10% | > 15% indicates dominant strategy | +| Actions/Game | 5-20 | Very low suggests boring; very high suggests chaos | +| Story Seed Activations | 2-5 per 100 ticks | None (broken pacing) or >10 (overwhelming) | + +## Troubleshooting + +### "No sweep runs found" + +The database may be empty or in the wrong location: + +```bash +# Check database exists +ls -la build/sweep_results.db + +# Ingest results if needed +uv run python scripts/aggregate_sweep_results.py \ + ingest build/batch_sweeps +``` + +### Sweep takes too long + +Reduce the parameter space: + +```bash +# Fewer seeds and lower tick budget for quick tests +uv run echoes-balance-studio sweep \ + --strategies balanced \ + --seeds 42 \ + --ticks 30 +``` + +### Overlay not applying + +Verify the overlay file: + +```bash +# Check syntax +python -c "import yaml; yaml.safe_load(open('path/to/overlay.yml'))" + +# List available overlays +uv run echoes-balance-studio overlays +``` + +## See Also + +- [AI Tournament & Balance Analysis](./ai_tournament_and_balance_analysis.md) - Detailed tournament tooling +- [How to Play Echoes](./how_to_play_echoes.md) - Gameplay mechanics reference +- [Implementation Plan](../simul/emergent_story_game_implementation_plan.md) - Technical details diff --git a/gamedev-agent-thoughts.txt b/gamedev-agent-thoughts.txt index 057d7aa3..8253f294 100644 --- a/gamedev-agent-thoughts.txt +++ b/gamedev-agent-thoughts.txt @@ -1,4 +1,84 @@ -# GameDev Agent Thoughts - Issue #63: Analysis and Balance Reporting (M11.3) +# GameDev Agent Thoughts - Issue #70: Designer Feedback Loop and Tooling (Task 11.6.1) + +## Task Analysis + +Working on Issue #70 - Phase 11, Milestone 11.6, Task 11.6.1. + +### Requirements + +1. Create CLI tool `echoes-balance-studio` with 4 guided workflows: + - "Run exploratory sweep" - interactive parameter selection + - "Compare two configs" - side-by-side comparison + - "Test tuning change" - apply YAML overlays and validate + - "View historical reports" - browse past balance reports + +2. Config Overlay System: + - Allow configuration changes via YAML overlays + - Store overlays in `content/config/overlays/` + - Merge cleanly with base simulation.yml + +3. Interactive Report Viewer (HTML Dashboard): + - Extend existing HTML report generation + - Add filtering and sorting capabilities + - Allow drill-down by strategy/difficulty + +4. Designer Documentation: + - How to diagnose dominant strategies + - Iterating on action costs + - Testing narrative pacing changes + - Example workflows with case studies + +5. Tests: + - At least 8 tests covering CLI commands, overlay loading/merging, and report generation + +## Implementation Summary + +### Files Created/Modified +1. CREATED: scripts/echoes_balance_studio.py - Main CLI tool with 7 workflows +2. CREATED: tests/scripts/test_balance_studio.py - 28 tests covering CLI and utilities +3. CREATED: docs/gengine/designer_balance_guide.md - Designer documentation +4. CREATED: content/config/overlays/example_tuning.yml - Example overlay +5. MODIFIED: pyproject.toml - Added echoes-balance-studio entry point +6. MODIFIED: docs/gengine/ai_tournament_and_balance_analysis.md - Link to new guide + +### CLI Workflows Implemented +1. `sweep` - Run exploratory balance sweeps +2. `compare` - Compare two configurations side-by-side +3. `test-tuning` - Test tuning changes with overlays +4. `history` - View historical sweep runs +5. `view` - View details of specific sweep run +6. `report` - Generate enhanced HTML balance report +7. `overlays` - List available overlay files + +### Tests Written (28 tests, 8 required) +- TestDeepMerge: 4 tests for config merging +- TestConfigLoading: 4 tests for YAML loading/saving +- TestOverlayValidation: 3 tests for overlay validation +- TestListOverlays: 3 tests for overlay listing +- TestHistoricalReports: 2 tests for history queries +- TestEnhancedHtmlReport: 2 tests for HTML report generation +- TestCLI: 8 tests for CLI commands +- TestIntegration: 2 slow integration tests + +## Verification + +- All 28 tests pass +- Ruff linting passes with no errors +- CLI entry point added to pyproject.toml + +## Progress +- [x] Create echoes_balance_studio.py with 7 workflows (4 required + 3 bonus) +- [x] Add entry point to pyproject.toml +- [x] Create content/config/overlays/ directory and example overlay +- [x] Create HTML dashboard with filtering/sorting +- [x] Create designer documentation +- [x] Write 28 tests (8+ required) +- [x] Link to existing documentation +- [ ] Link to existing documentation + +--- + +# Previous Task: Issue #63: Analysis and Balance Reporting (M11.3) ## Task Analysis diff --git a/pyproject.toml b/pyproject.toml index 8f75b2bc..9b2e0cfe 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -41,6 +41,7 @@ echoes-shell = "gengine.echoes.cli.shell:main" echoes-gateway-service = "gengine.echoes.gateway.main:main" echoes-gateway-shell = "gengine.echoes.gateway.client:main" echoes-llm-service = "gengine.echoes.llm.main:main" +echoes-balance-studio = "scripts.echoes_balance_studio:main" [build-system] requires = ["setuptools>=68.0.0"] diff --git a/scripts/echoes_balance_studio.py b/scripts/echoes_balance_studio.py new file mode 100644 index 00000000..98e7abc5 --- /dev/null +++ b/scripts/echoes_balance_studio.py @@ -0,0 +1,1560 @@ +#!/usr/bin/env python3 +"""Designer-facing balance studio for Echoes of Emergence. + +Provides guided workflows for balance iteration accessible to non-engineers: +- Run exploratory sweep: Interactive parameter selection with sensible defaults +- Compare two configs: Side-by-side comparison of sweep results +- Test tuning change: Apply YAML overlays and run quick validation +- View historical reports: Browse and view past balance reports + +Examples +-------- +Run the interactive balance studio:: + + uv run echoes-balance-studio + +Run exploratory sweep with defaults:: + + uv run echoes-balance-studio sweep --strategies balanced aggressive --ticks 50 + +Compare two configurations:: + + uv run echoes-balance-studio compare --config-a content/config/simulation.yml \\ + --config-b content/config/sweeps/difficulty-hard/simulation.yml + +Test a tuning change with overlay:: + + uv run echoes-balance-studio test-tuning \\ + --overlay content/config/overlays/example_tuning.yml + +View historical reports:: + + uv run echoes-balance-studio history --days 30 +""" + +from __future__ import annotations + +import argparse +import copy +import json +import os +import sys +from datetime import datetime, timedelta, timezone +from pathlib import Path +from typing import Any, Sequence + +import yaml + +# Ensure config environment is set +os.environ.setdefault("ECHOES_CONFIG_ROOT", "content/config") + +# Default paths +DEFAULT_BASE_CONFIG = Path("content/config/simulation.yml") +DEFAULT_OVERLAY_DIR = Path("content/config/overlays") +DEFAULT_OUTPUT_DIR = Path("build/balance_studio") +DEFAULT_DB_PATH = Path("build/sweep_results.db") + +# Available options +AVAILABLE_STRATEGIES = ["balanced", "aggressive", "diplomatic", "hybrid"] +AVAILABLE_DIFFICULTIES = ["tutorial", "easy", "normal", "hard", "brutal"] + + +# ============================================================================ +# Config Overlay System +# ============================================================================ + + +def deep_merge(base: dict[str, Any], overlay: dict[str, Any]) -> dict[str, Any]: + """Deep merge overlay into base configuration. + + Parameters + ---------- + base + Base configuration dictionary. + overlay + Overlay configuration to merge in. + + Returns + ------- + dict[str, Any] + Merged configuration with overlay values taking precedence. + """ + result = copy.deepcopy(base) + + for key, value in overlay.items(): + if key in result and isinstance(result[key], dict) and isinstance(value, dict): + result[key] = deep_merge(result[key], value) + else: + result[key] = copy.deepcopy(value) + + return result + + +def load_config(path: Path) -> dict[str, Any]: + """Load a YAML configuration file. + + Parameters + ---------- + path + Path to YAML configuration file. + + Returns + ------- + dict[str, Any] + Configuration dictionary. + """ + if not path.exists(): + return {} + + with open(path, encoding="utf-8") as f: + data = yaml.safe_load(f) + return data if isinstance(data, dict) else {} + + +def load_config_with_overlay( + base_path: Path, + overlay_path: Path | None = None, +) -> dict[str, Any]: + """Load base configuration and optionally merge an overlay. + + Parameters + ---------- + base_path + Path to base configuration file. + overlay_path + Optional path to overlay file to merge. + + Returns + ------- + dict[str, Any] + Merged configuration. + """ + base = load_config(base_path) + + if overlay_path: + overlay = load_config(overlay_path) + return deep_merge(base, overlay) + + return base + + +def save_config(config: dict[str, Any], path: Path) -> None: + """Save configuration to YAML file. + + Parameters + ---------- + config + Configuration dictionary. + path + Output path. + """ + path.parent.mkdir(parents=True, exist_ok=True) + with open(path, "w", encoding="utf-8") as f: + yaml.safe_dump(config, f, default_flow_style=False, sort_keys=False) + + +def list_overlays(overlay_dir: Path = DEFAULT_OVERLAY_DIR) -> list[Path]: + """List available overlay files. + + Parameters + ---------- + overlay_dir + Directory containing overlay files. + + Returns + ------- + list[Path] + List of overlay file paths. + """ + if not overlay_dir.exists(): + return [] + + return sorted(overlay_dir.glob("*.yml")) + sorted(overlay_dir.glob("*.yaml")) + + +def validate_overlay(overlay: dict[str, Any]) -> list[str]: + """Validate an overlay configuration. + + Parameters + ---------- + overlay + Overlay configuration dictionary. + + Returns + ------- + list[str] + List of validation warnings (empty if valid). + """ + warnings: list[str] = [] + + # Check for recognized top-level keys + known_keys = { + "limits", "lod", "profiling", "focus", "director", "economy", + "environment", "progression", "per_agent_progression", "campaign", + "_meta" # Allow metadata for overlays + } + + for key in overlay: + if key not in known_keys: + warnings.append(f"Unknown top-level key: '{key}'") + + return warnings + + +# ============================================================================ +# Sweep Execution +# ============================================================================ + + +def run_exploratory_sweep( + strategies: list[str] | None = None, + difficulties: list[str] | None = None, + seeds: list[int] | None = None, + tick_budget: int = 50, + output_dir: Path = DEFAULT_OUTPUT_DIR, + config_overlay: Path | None = None, + verbose: bool = False, +) -> dict[str, Any]: + """Run an exploratory balance sweep with sensible defaults. + + Parameters + ---------- + strategies + Strategies to test (defaults to all). + difficulties + Difficulties to test (defaults to ["normal"]). + seeds + Random seeds (defaults to [42, 123, 456]). + tick_budget + Ticks per sweep (default 50). + output_dir + Output directory for results. + config_overlay + Optional overlay to apply to base config. + verbose + Print progress to stderr. + + Returns + ------- + dict[str, Any] + Sweep results summary. + """ + from scripts.run_batch_sweeps import ( + BatchSweepConfig, + run_batch_sweeps, + write_sweep_outputs, + ) + + # Apply defaults + if strategies is None: + strategies = ["balanced", "aggressive"] + if difficulties is None: + difficulties = ["normal"] + if seeds is None: + seeds = [42, 123, 456] + + # Prepare output directory + output_dir.mkdir(parents=True, exist_ok=True) + + # If overlay specified, create temporary merged config + if config_overlay: + merged = load_config_with_overlay(DEFAULT_BASE_CONFIG, config_overlay) + temp_config_dir = output_dir / "temp_config" + temp_config_dir.mkdir(parents=True, exist_ok=True) + save_config(merged, temp_config_dir / "simulation.yml") + # Set environment variable for the overlay config + os.environ["ECHOES_CONFIG_ROOT"] = str(temp_config_dir) + + # Create sweep configuration + config = BatchSweepConfig( + strategies=strategies, + difficulties=difficulties, + seeds=seeds, + worlds=["default"], + tick_budgets=[tick_budget], + max_workers=min(4, os.cpu_count() or 1), + output_dir=output_dir, + include_telemetry=True, + ) + + if verbose: + sys.stderr.write("Running exploratory sweep:\n") + sys.stderr.write(f" Strategies: {strategies}\n") + sys.stderr.write(f" Difficulties: {difficulties}\n") + sys.stderr.write(f" Seeds: {seeds}\n") + sys.stderr.write(f" Tick budget: {tick_budget}\n") + if config_overlay: + sys.stderr.write(f" Overlay: {config_overlay}\n") + + # Run sweeps + report = run_batch_sweeps(config, verbose=verbose) + + # Write outputs + write_sweep_outputs(report, output_dir, verbose=verbose) + + return report.to_dict() + + +def compare_configs( + config_a_path: Path, + config_b_path: Path, + strategies: list[str] | None = None, + tick_budget: int = 30, + seeds: list[int] | None = None, + output_dir: Path = DEFAULT_OUTPUT_DIR, + verbose: bool = False, +) -> dict[str, Any]: + """Compare two configurations by running identical sweeps. + + Parameters + ---------- + config_a_path + Path to first configuration. + config_b_path + Path to second configuration. + strategies + Strategies to test. + tick_budget + Ticks per sweep. + seeds + Random seeds. + output_dir + Output directory. + verbose + Print progress. + + Returns + ------- + dict[str, Any] + Comparison results with delta analysis. + """ + from scripts.run_batch_sweeps import ( + BatchSweepConfig, + run_batch_sweeps, + ) + + if strategies is None: + strategies = ["balanced"] + if seeds is None: + seeds = [42, 123] + + results: dict[str, Any] = { + "config_a": str(config_a_path), + "config_b": str(config_b_path), + "comparison": {}, + } + + for label, config_path in [("a", config_a_path), ("b", config_b_path)]: + config_root = config_path.parent + + config = BatchSweepConfig( + strategies=strategies, + difficulties=["normal"], + seeds=seeds, + worlds=["default"], + tick_budgets=[tick_budget], + max_workers=2, + include_telemetry=False, + ) + + # Set environment for config root + old_env = os.environ.get("ECHOES_CONFIG_ROOT") + os.environ["ECHOES_CONFIG_ROOT"] = str(config_root) + + try: + if verbose: + sys.stderr.write(f"Running sweep with config {label}: {config_path}\n") + report = run_batch_sweeps(config, verbose=verbose) + results[f"config_{label}_results"] = report.to_dict() + finally: + if old_env: + os.environ["ECHOES_CONFIG_ROOT"] = old_env + else: + os.environ.pop("ECHOES_CONFIG_ROOT", None) + + # Compute deltas + if "config_a_results" in results and "config_b_results" in results: + a_stats = results["config_a_results"].get("strategy_stats", {}) + b_stats = results["config_b_results"].get("strategy_stats", {}) + + for strategy in set(a_stats.keys()) | set(b_stats.keys()): + a_avg = a_stats.get(strategy, {}).get("avg_stability", 0.0) + b_avg = b_stats.get(strategy, {}).get("avg_stability", 0.0) + delta = b_avg - a_avg + + results["comparison"][strategy] = { + "config_a_avg_stability": round(a_avg, 4), + "config_b_avg_stability": round(b_avg, 4), + "delta": round(delta, 4), + "change_percent": round((delta / a_avg * 100) if a_avg else 0, 2), + } + + return results + + +def test_tuning_change( + overlay_path: Path, + base_config: Path = DEFAULT_BASE_CONFIG, + strategy: str = "balanced", + tick_budget: int = 30, + seed: int = 42, + verbose: bool = False, +) -> dict[str, Any]: + """Test a tuning change by running a quick validation sweep. + + Parameters + ---------- + overlay_path + Path to overlay file. + base_config + Path to base configuration. + strategy + Strategy to test. + tick_budget + Ticks to run. + seed + Random seed. + verbose + Print progress. + + Returns + ------- + dict[str, Any] + Validation results with baseline comparison. + """ + from scripts.run_batch_sweeps import ( + BatchSweepConfig, + run_batch_sweeps, + ) + + # Validate overlay + overlay = load_config(overlay_path) + warnings = validate_overlay(overlay) + + results: dict[str, Any] = { + "overlay": str(overlay_path), + "overlay_content": overlay, + "validation_warnings": warnings, + "baseline": {}, + "with_overlay": {}, + "comparison": {}, + } + + config = BatchSweepConfig( + strategies=[strategy], + difficulties=["normal"], + seeds=[seed], + worlds=["default"], + tick_budgets=[tick_budget], + max_workers=1, + include_telemetry=True, + ) + + # Run baseline + if verbose: + sys.stderr.write("Running baseline sweep...\n") + + old_env = os.environ.get("ECHOES_CONFIG_ROOT") + os.environ["ECHOES_CONFIG_ROOT"] = str(base_config.parent) + + try: + baseline_report = run_batch_sweeps(config, verbose=verbose) + results["baseline"] = baseline_report.to_dict() + finally: + if old_env: + os.environ["ECHOES_CONFIG_ROOT"] = old_env + else: + os.environ.pop("ECHOES_CONFIG_ROOT", None) + + # Run with overlay + if verbose: + sys.stderr.write("Running sweep with overlay...\n") + + merged = load_config_with_overlay(base_config, overlay_path) + temp_dir = Path("/tmp/balance_studio_test") + temp_dir.mkdir(parents=True, exist_ok=True) + save_config(merged, temp_dir / "simulation.yml") + + os.environ["ECHOES_CONFIG_ROOT"] = str(temp_dir) + + try: + overlay_report = run_batch_sweeps(config, verbose=verbose) + results["with_overlay"] = overlay_report.to_dict() + finally: + if old_env: + os.environ["ECHOES_CONFIG_ROOT"] = old_env + else: + os.environ.pop("ECHOES_CONFIG_ROOT", None) + + # Compute comparison + baseline_stab = ( + results["baseline"] + .get("strategy_stats", {}) + .get(strategy, {}) + .get("avg_stability", 0.0) + ) + overlay_stab = ( + results["with_overlay"] + .get("strategy_stats", {}) + .get(strategy, {}) + .get("avg_stability", 0.0) + ) + delta = overlay_stab - baseline_stab + + if delta > 0.01: + impact = "positive" + elif delta < -0.01: + impact = "negative" + else: + impact = "neutral" + + results["comparison"] = { + "baseline_stability": round(baseline_stab, 4), + "overlay_stability": round(overlay_stab, 4), + "delta": round(delta, 4), + "impact": impact, + } + + return results + + +# ============================================================================ +# Historical Reports +# ============================================================================ + + +def get_historical_reports( + db_path: Path = DEFAULT_DB_PATH, + days: int | None = None, + limit: int = 20, +) -> list[dict[str, Any]]: + """Get list of historical sweep runs. + + Parameters + ---------- + db_path + Path to SQLite database. + days + Filter to last N days. + limit + Maximum reports to return. + + Returns + ------- + list[dict[str, Any]] + List of run summaries. + """ + import sqlite3 + + if not db_path.exists(): + return [] + + conn = sqlite3.connect(str(db_path)) + conn.row_factory = sqlite3.Row + + query = """ + SELECT + run_id, + timestamp, + git_commit, + total_sweeps, + completed_sweeps, + failed_sweeps, + strategies, + difficulties, + total_duration_seconds + FROM sweep_runs + WHERE 1=1 + """ + params: list[Any] = [] + + if days is not None: + cutoff = datetime.now(timezone.utc) - timedelta(days=days) + query += " AND timestamp >= ?" + params.append(cutoff.isoformat()) + + query += " ORDER BY timestamp DESC LIMIT ?" + params.append(limit) + + try: + cursor = conn.execute(query, params) + rows = cursor.fetchall() + + reports = [] + for row in rows: + reports.append({ + "run_id": row["run_id"], + "timestamp": row["timestamp"], + "git_commit": row["git_commit"], + "total_sweeps": row["total_sweeps"], + "completed_sweeps": row["completed_sweeps"], + "failed_sweeps": row["failed_sweeps"], + "strategies": json.loads(row["strategies"] or "[]"), + "difficulties": json.loads(row["difficulties"] or "[]"), + "duration_seconds": row["total_duration_seconds"], + }) + + return reports + finally: + conn.close() + + +def view_report_details( + run_id: int, + db_path: Path = DEFAULT_DB_PATH, +) -> dict[str, Any]: + """Get detailed results for a specific run. + + Parameters + ---------- + run_id + Run ID to query. + db_path + Path to SQLite database. + + Returns + ------- + dict[str, Any] + Detailed run results. + """ + import sqlite3 + + if not db_path.exists(): + return {"error": "Database not found"} + + conn = sqlite3.connect(str(db_path)) + conn.row_factory = sqlite3.Row + + try: + # Get run metadata + cursor = conn.execute( + "SELECT * FROM sweep_runs WHERE run_id = ?", + (run_id,) + ) + run_row = cursor.fetchone() + + if not run_row: + return {"error": f"Run {run_id} not found"} + + # Get sweep results + cursor = conn.execute( + """ + SELECT strategy, difficulty, + AVG(final_stability) as avg_stability, + COUNT(*) as count, + SUM(CASE WHEN error IS NULL THEN 1 ELSE 0 END) as completed + FROM sweep_results + WHERE run_id = ? + GROUP BY strategy, difficulty + """, + (run_id,) + ) + result_rows = cursor.fetchall() + + results_by_strategy: dict[str, list[dict[str, Any]]] = {} + for row in result_rows: + strategy = row["strategy"] + if strategy not in results_by_strategy: + results_by_strategy[strategy] = [] + results_by_strategy[strategy].append({ + "difficulty": row["difficulty"], + "avg_stability": round(row["avg_stability"], 4), + "count": row["count"], + "completed": row["completed"], + }) + + return { + "run_id": run_row["run_id"], + "timestamp": run_row["timestamp"], + "git_commit": run_row["git_commit"], + "total_sweeps": run_row["total_sweeps"], + "completed_sweeps": run_row["completed_sweeps"], + "failed_sweeps": run_row["failed_sweeps"], + "duration_seconds": run_row["total_duration_seconds"], + "results_by_strategy": results_by_strategy, + } + finally: + conn.close() + + +# ============================================================================ +# Enhanced HTML Report +# ============================================================================ + + +def generate_enhanced_html_report( + db_path: Path = DEFAULT_DB_PATH, + days: int | None = None, + filter_strategy: str | None = None, + filter_difficulty: str | None = None, + output_path: Path | None = None, +) -> str: + """Generate an enhanced HTML balance report with filtering/sorting. + + Parameters + ---------- + db_path + Path to SQLite database. + days + Filter to last N days. + filter_strategy + Filter to specific strategy. + filter_difficulty + Filter to specific difficulty. + output_path + Optional path to save HTML file. + + Returns + ------- + str + HTML report content. + """ + import sqlite3 + + if not db_path.exists(): + return "
Generated: {timestamp}
", + ] + + # Summary boxes + html.append("Active Filters: ") + if filter_strategy: + af = "" + html.append(f"{af}Strategy: {filter_strategy}") + if filter_difficulty: + af = "" + html.append(f"{af}Difficulty: {filter_difficulty}") + if days: + html.append(f"Last {days} days") + html.append("
") + + # Filter UI + html.append("| Strategy ↕ | ") + html.append(f"Count ↕ | ") + html.append(f"Avg Stability ↕ | ") + html.append(f"Min ↕ | ") + html.append(f"Max ↕ | ") + html.append(f"Win Rate ↕ | ") + html.append(f"Avg Actions ↕ | ") + html.append("
|---|---|---|---|---|---|---|
| {strategy} | ") + html.append(f"{stats['count']} | ") + html.append(f"{stats['avg_stability']:.3f} | ") + html.append(f"{stats['min_stability']:.3f} | ") + html.append(f"{stats['max_stability']:.3f} | ") + html.append(f"{win_rate:.1f}% | ") + html.append(f"{stats['avg_actions']:.1f} | ") + html.append("
| Difficulty ↕ | ") + html.append(f"Count ↕ | ") + html.append(f"Avg Stability ↕ | ") + html.append(f"Min ↕ | ") + html.append(f"Max ↕ | ") + html.append(f"Win Rate ↕ | ") + html.append("
|---|---|---|---|---|---|
| {difficulty} | ") + html.append(f"{stats['count']} | ") + html.append(f"{stats['avg_stability']:.3f} | ") + html.append(f"{stats['min_stability']:.3f} | ") + html.append(f"{stats['max_stability']:.3f} | ") + html.append(f"{win_rate:.1f}% | ") + html.append("