Merge pull request #220 from igerber/power-analysis-notebook

igerber · web-flow · commit 215fff51dd3e · 2026-03-21T09:37:43.000-04:00
Update power analysis tutorial with simulation-based features
diff --git a/docs/tutorials/06_power_analysis.ipynb b/docs/tutorials/06_power_analysis.ipynb
@@ -4,40 +4,15 @@
    "cell_type": "markdown",
    "id": "cell-0",
    "metadata": {},
-   "source": [
-    "# Power Analysis for Difference-in-Differences\n",
-    "\n",
-    "This notebook demonstrates how to use the power analysis tools in `diff-diff` for study design. We'll cover:\n",
-    "\n",
-    "1. Computing minimum detectable effects (MDE)\n",
-    "2. Calculating required sample sizes\n",
-    "3. Estimating statistical power\n",
-    "4. Creating power curves for visualization\n",
-    "5. Simulation-based power analysis for complex designs\n",
-    "6. Panel data considerations (ICC, multiple periods)"
-   ]
+   "source": "# Power Analysis for Difference-in-Differences\n\nThis notebook demonstrates how to use the power analysis tools in `diff-diff` for study design. We'll cover:\n\n1. Computing minimum detectable effects (MDE)\n2. Calculating required sample sizes\n3. Estimating statistical power\n4. Creating power curves for visualization\n5. Panel data considerations (ICC, multiple periods)\n6. Simulation-based power analysis for complex designs\n7. Power analysis for any estimator (staggered, synthetic DiD, triple difference)\n8. Finding MDE via simulation (bisection search)\n9. Finding required sample size via simulation (bisection search)\n10. Custom data generators\n11. Convenience functions\n12. Practical recommendations"
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "cell-1",
    "metadata": {},
    "outputs": [],
-   "source": [
-    "import numpy as np\n",
-    "import pandas as pd\n",
-    "import matplotlib.pyplot as plt\n",
-    "\n",
-    "from diff_diff import (\n",
-    "    PowerAnalysis,\n",
-    "    DifferenceInDifferences,\n",
-    "    simulate_power,\n",
-    "    compute_mde,\n",
-    "    compute_power,\n",
-    "    compute_sample_size,\n",
-    "    plot_power_curve,\n",
-    ")"
-   ]
+   "source": "import numpy as np\nimport pandas as pd\nimport matplotlib.pyplot as plt\n\nfrom diff_diff import (\n    PowerAnalysis,\n    DifferenceInDifferences,\n    CallawaySantAnna,\n    SyntheticDiD,\n    TripleDifference,\n    simulate_power,\n    simulate_mde,\n    simulate_sample_size,\n    compute_mde,\n    compute_power,\n    compute_sample_size,\n    plot_power_curve,\n)"
   },
   {
    "cell_type": "markdown",
@@ -475,15 +450,197 @@
     ")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "cjpvh2ze7lh",
+   "source": "## 8. Power Analysis for Any Estimator\n\nThe simulation-based approach works with **all 12 supported estimators** — not just basic DiD. An internal registry automatically selects the appropriate data-generating process (DGP) and fit signature for each registered estimator. Just swap in the estimator object and everything else is handled. See the support table below for the full list, and Section 11 for using custom DGPs with unsupported estimators.\n\n### Staggered Adoption Estimators",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "la7ps3nxufq",
+   "source": "# Power analysis with Callaway-Sant'Anna — the registry auto-selects\n# generate_staggered_data as the DGP and the correct fit kwargs\ncs = CallawaySantAnna()\n\ncs_results = simulate_power(\n    estimator=cs,\n    n_units=100,\n    n_periods=6,\n    treatment_effect=5.0,\n    treatment_fraction=0.5,\n    treatment_period=3,\n    sigma=5.0,\n    n_simulations=100,\n    seed=42,\n    progress=False,\n)\n\nprint(cs_results.summary())",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "q61cjchqjrd",
+   "source": "### Factor Model Estimators (Synthetic DiD)\n\nFor `SyntheticDiD` with the default placebo variance method, the DGP must generate **more control than treated units** (`treatment_fraction < 0.5`). The registry uses `generate_factor_data` automatically.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "x068rpe24gf",
+   "source": "# Synthetic DiD — note treatment_fraction=0.3 (placebo variance requires\n# more control units than treated units)\nsdid = SyntheticDiD()\n\nsdid_results = simulate_power(\n    estimator=sdid,\n    n_units=60,\n    n_periods=6,\n    treatment_effect=5.0,\n    treatment_fraction=0.3,\n    treatment_period=3,\n    sigma=3.0,\n    n_simulations=100,\n    seed=42,\n    progress=False,\n)\n\nprint(sdid_results.summary())",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6qpu05hi18s",
+   "source": "### Triple Difference\n\n`TripleDifference` uses a fixed 2×2×2 factorial design (group × partition × time). Sample sizes are **rounded via `n_per_cell = max(2, n_units // 8)`**, so the minimum effective N is 16 (2 units per cell × 8 cells). The `effective_n_units` field in results tracks any rounding. Note that `simulate_sample_size()` uses a higher search floor of 64 from the registry.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "91uackfwqp",
+   "source": "# Triple Difference — n_units snaps to multiples of 8\nddd = TripleDifference()\n\nddd_results = simulate_power(\n    estimator=ddd,\n    n_units=64,\n    treatment_effect=3.0,\n    sigma=2.0,\n    n_simulations=100,\n    seed=42,\n    progress=False,\n)\n\nprint(ddd_results.summary())\nif ddd_results.effective_n_units is not None:\n    print(f\"\\nEffective N (after grid rounding): {ddd_results.effective_n_units}\")",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6kb8ovmue4m",
+   "source": "### Supported Estimators\n\nThe following 12 estimators are supported by the simulation power analysis registry. Each is automatically paired with the correct data-generating process:\n\n| DGP Family | Estimators | Min N |\n|---|---|---|\n| **Basic DiD** (`generate_did_data`) | DifferenceInDifferences, TwoWayFixedEffects, MultiPeriodDiD | 20 |\n| **Staggered** (`generate_staggered_data`) | CallawaySantAnna, SunAbraham, ImputationDiD, TwoStageDiD, StackedDiD, EfficientDiD | 40 |\n| **Factor Model** (`generate_factor_data`) | TROP, SyntheticDiD | 30 |\n| **Triple Difference** (`generate_ddd_data`) | TripleDifference | 16* |\n\n\\* DDD effective N rounds to `max(2, n_units // 8) * 8` with minimum 16. `simulate_sample_size()` uses a higher search floor of 64.\n\n> **Note:** `ContinuousDiD` is not in the registry because continuous/dose-response treatments require a different DGP structure. `BaconDecomposition` and `HonestDiD` are diagnostic/sensitivity tools rather than treatment effect estimators. For unsupported estimators, you can pass a custom `data_generator` and `result_extractor` (see Section 11).",
+   "metadata": {}
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4erqwll1af",
+   "source": "### Power Curve for a Staggered Estimator",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "ox3uab7h5bj",
+   "source": "# Power curve across effect sizes for Callaway-Sant'Anna\ncs_curve = simulate_power(\n    estimator=CallawaySantAnna(),\n    n_units=100,\n    n_periods=6,\n    effect_sizes=[1.0, 2.0, 3.0, 5.0, 7.0],\n    treatment_period=3,\n    sigma=5.0,\n    n_simulations=100,\n    seed=42,\n    progress=False,\n)\n\nplot_power_curve(\n    cs_curve.power_curve_df(),\n    target_power=0.80,\n    title=\"CS Power Curve (100 units, 6 periods, SD=5)\",\n    figsize=(10, 6),\n)",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "kqw6y4du5u",
+   "source": "## 9. Finding MDE via Simulation\n\nThe analytical `PowerAnalysis.mde()` works for basic DiD, but for complex estimators there is no closed-form formula. `simulate_mde()` uses **bisection search** to find the minimum detectable effect: it repeatedly calls `simulate_power()` at different effect sizes, narrowing the bracket until it finds the smallest effect that achieves the target power.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "p9a03aycu2",
+   "source": "# Find MDE for basic DiD via simulation\nmde_result = simulate_mde(\n    DifferenceInDifferences(),\n    n_units=100,\n    n_periods=4,\n    sigma=5.0,\n    n_simulations=100,\n    seed=42,\n    progress=False,\n)\n\nprint(mde_result.summary())",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1mgrw7qmish",
+   "source": "### Inspecting the Search Path\n\nThe `search_path` attribute records the effect size and power at each bisection step, which is useful for diagnosing convergence:",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "3p2tmxivi2g",
+   "source": "# search_path is a List[Dict] — convert to DataFrame for display\nsearch_df = pd.DataFrame(mde_result.search_path)\nprint(search_df.to_string(index=False))",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "o4se5ofngjh",
+   "source": "### MDE for a Staggered Estimator\n\nThe same function works with any registered estimator:",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "gwou7kv0ht9",
+   "source": "# MDE for Callaway-Sant'Anna\ncs_mde = simulate_mde(\n    CallawaySantAnna(),\n    n_units=100,\n    n_periods=6,\n    treatment_period=3,\n    sigma=5.0,\n    n_simulations=100,\n    seed=42,\n    progress=False,\n)\n\nprint(cs_mde.summary())",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ljyojxybdhc",
+   "source": "**Key parameters for `simulate_mde()`:**\n- `effect_range=(lo, hi)` — custom search bracket (auto-detected if omitted)\n- `tol` — convergence tolerance on power (default 0.02)\n- `max_steps` — maximum bisection steps (default 15)\n- `n_simulations` — simulations per step (use 500+ for production analyses)",
+   "metadata": {}
+  },
+  {
+   "cell_type": "markdown",
+   "id": "qrfhvtizg58",
+   "source": "## 10. Finding Required Sample Size via Simulation\n\n`simulate_sample_size()` uses bisection search over `n_units` to find the smallest sample size that achieves the target power for a given effect size.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "8o018e2dnl6",
+   "source": "# Find required sample size for basic DiD\nn_result = simulate_sample_size(\n    DifferenceInDifferences(),\n    treatment_effect=5.0,\n    sigma=5.0,\n    n_simulations=100,\n    seed=42,\n    progress=False,\n)\n\nprint(n_result.summary())",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "gwtzwm3oedl",
+   "source": "### Inspecting the Search Path",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "ubg3hqe9ypa",
+   "source": "# View the bisection steps\nn_search_df = pd.DataFrame(n_result.search_path)\nprint(n_search_df.to_string(index=False))",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dblnwe076kf",
+   "source": "### Comparing Analytical and Simulation Results\n\nFor basic DiD, we can compare the simulation result against the analytical formula. With only 100 simulations per bisection step there will be Monte Carlo noise, so we expect **approximate** — not exact — agreement:",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "bgpin5xmsud",
+   "source": "# Analytical sample size\nanalytical = pa.sample_size(effect_size=5.0, sigma=5.0)\n\nprint(f\"{'Method':<25} {'Required N':>12}\")\nprint(\"-\" * 40)\nprint(f\"{'Analytical:':<25} {analytical.required_n:>12}\")\nprint(f\"{'Simulation:':<25} {n_result.required_n:>12}\")\nprint(f\"\\nSimulation power at N: {n_result.power_at_n:.1%}\")",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "rnte1h09hra",
+   "source": "**Key parameters for `simulate_sample_size()`:**\n- `n_range=(lo, hi)` — custom search bracket for sample size (auto-detected if omitted)\n- `max_steps` — maximum bisection steps (default 15)\n- `n_simulations` — simulations per step (use 500+ for production analyses)",
+   "metadata": {}
+  },
+  {
+   "cell_type": "markdown",
+   "id": "usp5iwyacop",
+   "source": "## 11. Custom Data Generators\n\nThe default DGPs cover common designs, but you can customize them in two ways:\n1. **Tweak the default DGP** with `data_generator_kwargs` (e.g., add multiple treatment cohorts)\n2. **Supply a fully custom DGP** with `data_generator`\n\n### Tweaking the Default DGP\n\nPass additional keyword arguments to the registry's DGP via `data_generator_kwargs`. For example, the default staggered DGP generates a single treatment cohort — here we create a multi-cohort design:\n\n> **Note:** Some keys are *protected* and cannot be overridden via `data_generator_kwargs` because they are controlled by the simulation function itself: `treatment_effect`, `noise_sd`, `n_units`, `n_periods`, `treatment_fraction`, `treatment_period`, `n_pre`, `n_post`.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "igft0epkiic",
+   "source": "# Multi-cohort staggered design: treatment starts at periods 2 and 4\ncs_multi = simulate_power(\n    estimator=CallawaySantAnna(),\n    n_units=120,\n    n_periods=6,\n    treatment_effect=5.0,\n    sigma=5.0,\n    n_simulations=100,\n    seed=42,\n    progress=False,\n    data_generator_kwargs={\n        \"cohort_periods\": [2, 4],\n        \"never_treated_frac\": 0.3,\n    },\n)\n\nprint(cs_multi.summary())",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "hznafqrhzoq",
+   "source": "### Fully Custom Data Generator\n\nFor designs not covered by the built-in DGPs, supply your own `data_generator` function. It receives the standard simulation parameters and must return a DataFrame. You may also need a custom `result_extractor` if your estimator returns non-standard results, and `estimator_kwargs` to pass the right column names to `fit()`.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "v06p7ubbj9p",
+   "source": "def my_dgp(n_units, n_periods, treatment_effect, treatment_fraction,\n           treatment_period, noise_sd, seed=None):\n    \"\"\"Custom DGP with heterogeneous unit effects.\"\"\"\n    rng = np.random.default_rng(seed)\n    n_treat = int(n_units * treatment_fraction)\n\n    rows = []\n    for i in range(n_units):\n        unit_fe = rng.normal(0, 3)  # heterogeneous unit effect\n        treated_unit = i < n_treat\n        for t in range(n_periods):\n            post = int(t >= treatment_period)\n            effect = treatment_effect * post if treated_unit else 0.0\n            y = unit_fe + 2.0 * t + effect + rng.normal(0, noise_sd)\n            rows.append({\n                \"unit\": i, \"period\": t, \"outcome\": y,\n                \"ever_treated\": int(treated_unit), \"post\": post,\n            })\n    return pd.DataFrame(rows)\n\n# Use the custom DGP with simulate_power\ncustom_results = simulate_power(\n    estimator=DifferenceInDifferences(),\n    n_units=80,\n    n_periods=4,\n    treatment_effect=4.0,\n    sigma=3.0,\n    n_simulations=100,\n    seed=42,\n    progress=False,\n    data_generator=my_dgp,\n    estimator_kwargs={\"outcome\": \"outcome\", \"treatment\": \"ever_treated\", \"time\": \"post\"},\n)\n\nprint(custom_results.summary())",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
   {
    "cell_type": "markdown",
    "id": "cell-28",
    "metadata": {},
-   "source": [
-    "## 8. Convenience Functions\n",
-    "\n",
-    "For quick calculations, use the convenience functions:"
-   ]
+   "source": "## 12. Convenience Functions\n\nFor quick calculations, use the convenience functions:"
   },
   {
    "cell_type": "code",
@@ -509,18 +666,7 @@
    "cell_type": "markdown",
    "id": "cell-30",
    "metadata": {},
-   "source": [
-    "## 9. Practical Recommendations\n",
-    "\n",
-    "### Estimating Sigma (Residual SD)\n",
-    "\n",
-    "The residual standard deviation is crucial for power calculations. Options:\n",
-    "\n",
-    "1. **Pilot data**: Fit a model on historical data and get residual SD\n",
-    "2. **Literature**: Find similar studies and use their reported SDs\n",
-    "3. **Domain knowledge**: Expert judgment about outcome variability\n",
-    "4. **Sensitivity analysis**: Calculate power for a range of sigma values"
-   ]
+   "source": "## 13. Practical Recommendations\n\n### Estimating Sigma (Residual SD)\n\nThe residual standard deviation is crucial for power calculations. Options:\n\n1. **Pilot data**: Fit a model on historical data and get residual SD\n2. **Literature**: Find similar studies and use their reported SDs\n3. **Domain knowledge**: Expert judgment about outcome variability\n4. **Sensitivity analysis**: Calculate power for a range of sigma values"
   },
   {
    "cell_type": "code",
@@ -569,30 +715,17 @@
     "    print(f\"{power:>10.0%} {result.required_n:>15}\")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "eub4cew045u",
+   "source": "### Analytical vs. Simulation: When to Use Each\n\n| Approach | Best for | Advantages |\n|---|---|---|\n| **Analytical** (`PowerAnalysis`) | Basic 2×2 DiD, panel DiD | Fast, exact, closed-form |\n| **Simulation** (`simulate_power/mde/sample_size`) | Staggered, SDID, TROP, DDD, custom designs | Works with any estimator, reports bias/RMSE/coverage |\n\n**Rule of thumb:** Start with analytical power analysis for basic designs. Move to simulation when using specialized estimators or non-standard DGPs.",
+   "metadata": {}
+  },
   {
    "cell_type": "markdown",
    "id": "cell-34",
    "metadata": {},
-   "source": [
-    "## Summary\n",
-    "\n",
-    "Key takeaways for DiD power analysis:\n",
-    "\n",
-    "1. **Always do a power analysis** before running a study\n",
-    "2. **MDE decreases** with sample size, more periods, and lower variance\n",
-    "3. **ICC matters** for panel data - high autocorrelation reduces effective sample size\n",
-    "4. **Use simulation** for complex designs (staggered, synthetic DiD)\n",
-    "5. **Be realistic about sigma** - err on the side of larger values\n",
-    "6. **Consider your smallest meaningful effect** - don't just target statistical significance\n",
-    "\n",
-    "For more on DiD estimation, see the other tutorials:\n",
-    "- `01_basic_did.ipynb` - Basic DiD estimation\n",
-    "- `02_staggered_did.ipynb` - Staggered adoption designs\n",
-    "- `03_synthetic_did.ipynb` - Synthetic DiD\n",
-    "- `04_parallel_trends.ipynb` - Testing assumptions\n",
-    "- `05_honest_did.ipynb` - Sensitivity analysis\n",
-    "- `07_pretrends_power.ipynb` - Pre-trends power analysis (Roth 2022)"
-   ]
+   "source": "## Summary\n\nKey takeaways for DiD power analysis:\n\n1. **Always do a power analysis** before running a study\n2. **MDE decreases** with sample size, more periods, and lower variance\n3. **ICC matters** for panel data — high autocorrelation reduces effective sample size\n4. **Use simulation** for complex designs (staggered, synthetic DiD, triple difference)\n5. **12 estimators are supported** out of the box via the auto-registry — just swap in the estimator\n6. **`simulate_mde()` and `simulate_sample_size()`** extend MDE and sample size calculations to any estimator via bisection search\n7. **Custom DGPs** let you model non-standard designs with `data_generator` and `data_generator_kwargs`\n8. **Be realistic about sigma** — err on the side of larger values\n9. **Consider your smallest meaningful effect** — don't just target statistical significance\n\nFor more on DiD estimation, see the other tutorials:\n- `01_basic_did.ipynb` — Basic DiD estimation\n- `02_staggered_did.ipynb` — Staggered adoption designs\n- `03_synthetic_did.ipynb` — Synthetic DiD\n- `04_parallel_trends.ipynb` — Testing assumptions\n- `05_honest_did.ipynb` — Sensitivity analysis\n- `07_pretrends_power.ipynb` — Pre-trends power analysis (Roth 2022)"
   }
  ],
  "metadata": {
@@ -602,4 +735,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 5
-}
+}