Skip to content

Commit 56b07ea

Browse files
igerberclaude
andcommitted
fix: address AI review - document deff in REGISTRY.md, cache SurveyDesign
- Add REGISTRY.md notes for analytical deff parameter (variance/sample-size inflation formulas, deff vs rho distinction) and survey_config simulation path (supported estimators, mutual exclusivity, protected keys) - Cache SurveyDesign in SurveyPowerConfig._build_survey_design() to avoid rebuilding per simulation iteration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 6ca2c26 commit 56b07ea

2 files changed

Lines changed: 12 additions & 5 deletions

File tree

diff_diff/power.py

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -148,12 +148,17 @@ def __post_init__(self) -> None:
148148
)
149149

150150
def _build_survey_design(self) -> Any:
151-
"""Return user-supplied SurveyDesign or auto-build from DGP column names."""
152-
if self.survey_design is not None:
153-
return self.survey_design
154-
from diff_diff.survey import SurveyDesign
151+
"""Return cached SurveyDesign (built once, reused across simulations)."""
152+
if not hasattr(self, "_cached_survey_design"):
153+
if self.survey_design is not None:
154+
self._cached_survey_design = self.survey_design
155+
else:
156+
from diff_diff.survey import SurveyDesign
155157

156-
return SurveyDesign(weights="weight", strata="stratum", psu="psu", fpc="fpc")
158+
self._cached_survey_design = SurveyDesign(
159+
weights="weight", strata="stratum", psu="psu", fpc="fpc"
160+
)
161+
return self._cached_survey_design
157162

158163
@property
159164
def min_viable_n(self) -> int:

docs/methodology/REGISTRY.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2254,6 +2254,8 @@ n = 2(t_{α/2} + t_{1-κ})² σ² / MDE²
22542254
- **Note:** `simulate_sample_size()` rejects `n_per_cell` in `data_generator_kwargs` for `TripleDifference` because `n_per_cell` is derived from `n_units` (the search variable). A fixed override would freeze the effective sample size across bisection iterations, making the search degenerate. Use `simulate_power()` with a fixed `n_per_cell` override instead, or pass a custom `data_generator`.
22552255
- **Note:** The simulation-based power registry (`simulate_power`, `simulate_mde`, `simulate_sample_size`) uses a single-cohort staggered DGP by default. Estimators configured with `control_group="not_yet_treated"`, `clean_control="strict"`, or `anticipation>0` will receive a `UserWarning` because the default DGP does not match their identification strategy. Users must supply `data_generator_kwargs` (e.g., `cohort_periods=[2, 4]`, `never_treated_frac=0.0`) or a custom `data_generator` to match the estimator design.
22562256
- **Note:** The `TripleDifference` registry adapter uses `generate_ddd_data`, a fixed 2×2×2 factorial DGP (group × partition × time). The `n_periods`, `treatment_period`, and `treatment_fraction` parameters are ignored — DDD always simulates 2 periods with balanced groups. `n_units` is mapped to `n_per_cell = max(2, n_units // 8)` (effective total N = `n_per_cell × 8`), so non-multiples of 8 are rounded down and values below 16 are clamped to 16. A `UserWarning` is emitted when simulation inputs differ from the effective DDD design. When rounding occurs, all result objects (`SimulationPowerResults`, `SimulationMDEResults`, `SimulationSampleSizeResults`) set `effective_n_units` to the actual sample size used; it is `None` when no rounding occurred. `simulate_sample_size()` snaps bisection candidates to multiples of 8 so that `required_n` is always a realizable DDD sample size. Passing `n_per_cell` in `data_generator_kwargs` suppresses the effective-N rounding warning but not warnings for ignored parameters (`n_periods`, `treatment_period`, `treatment_fraction`).
2257+
- **Note:** The analytical power methods (`PowerAnalysis.power/mde/sample_size` and the `compute_power/compute_mde/compute_sample_size` convenience functions) accept a `deff` parameter (survey design effect, default 1.0). This inflates variance multiplicatively: `Var(ATT) *= deff`, and inflates required sample size: `n_total *= deff`. The `deff` parameter is **not redundant** with `rho` (intra-cluster correlation): `rho` models within-unit serial correlation in panel data via the Moulton factor `1 + (T-1)*rho`, while `deff` models the survey design effect from stratified multi-stage sampling (clustering + unequal weighting). A survey panel study may need both. Values `deff > 0` are accepted; `deff < 1.0` (net variance reduction, e.g., from stratification gain) emits a warning.
2258+
- **Note:** The simulation-based power functions (`simulate_power/simulate_mde/simulate_sample_size`) accept a `survey_config` parameter (`SurveyPowerConfig` dataclass). When set, the simulation loop uses `generate_survey_did_data` instead of the default registry DGP, and automatically injects `SurveyDesign(weights="weight", strata="stratum", psu="psu", fpc="fpc")` into the estimator's `fit()` call. Supported estimators: DifferenceInDifferences, TwoWayFixedEffects, MultiPeriodDiD, CallawaySantAnna, SunAbraham, ImputationDiD, TwoStageDiD, StackedDiD, EfficientDiD. Unsupported (raises `ValueError`): TROP, SyntheticDiD, TripleDifference (generate_survey_did_data produces staggered cohort data incompatible with factor-model/DDD DGPs). `survey_config` and `data_generator` are mutually exclusive. `data_generator_kwargs` may not contain keys managed by `SurveyPowerConfig` (n_strata, psu_per_stratum, etc.) but may contain passthrough DGP params (unit_fe_sd, add_covariates, panel, strata_sizes). `simulate_sample_size` raises the bisection floor to `n_strata * psu_per_stratum * 2` to ensure viable survey structure.
22572259

22582260
**Reference implementation(s):**
22592261
- R: `pwr` package (general), `DeclareDesign` (simulation-based)

0 commit comments

Comments
 (0)