You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Close axis-C/J silent-failures audit: B-spline derivative + PA survey cache
Bundles the two remaining S-complexity findings from the Phase 2 audit,
closing Phase 3 execution.
Finding #12 — ContinuousDiD B-spline degenerate knot (axis C, Minor,
`continuous_did_bspline.py:153`): `bspline_derivative_design_matrix`
silently swallowed `ValueError` from `scipy.interpolate.BSpline` in the
per-basis derivative loop, leaving affected columns of the derivative
design matrix as zero with no user-visible signal. Downstream
ContinuousDiD analytical inference then fed a biased `dPsi` into SE
computation. Fix aggregates failed-basis indices and emits ONE
`UserWarning` naming them. The all-identical-knot degenerate case
(single dose value, `knots[0] == knots[-1]`) remains silently handled —
derivatives there are mathematically zero, well-defined, and always
have been.
Finding #28 — PowerAnalysis survey-design cache staleness (axis J,
Major, `power.py:171-180`): `_build_survey_design()` populated
`self._cached_survey_design` on first call and never invalidated.
Mutating `config.survey_design` after `__init__` silently returned the
stale cached design. Default construction is microseconds and
user-provided designs are reference copies, so the cache never earned
its cost. Fix drops the cache entirely; method now reflects live
`self.survey_design` every call.
Six new tests:
- `tests/test_continuous_did.py::TestBSplineDerivativeDegenerateBasis` (3):
single-dose silent contract, `ValueError`-forced aggregate warning,
happy-path no-warning regression.
- `tests/test_power.py::TestSurveyPowerConfigDesignStaleness` (3):
mutate-survey_design-picks-up-new, clearing-falls-back-to-default,
repeat-calls-equivalent regression.
REGISTRY notes added under §ContinuousDiD (edge cases) and §PowerAnalysis
(`survey_config` section).
Audit state post-PR: all 28 actionable Phase-2 findings resolved (26 in
prior PRs; #12 + #28 here). Three P1 follow-ups remain logged in
`TODO.md` from PR #337's discovered divergences (FW/PGD algorithmic
mismatch in `compute_synthetic_weights`, TROP grid-search on rank-
deficient Y, TROP bootstrap RNG unification). Those are post-audit
cleanup work, not Phase-3 scope.
No behavioral changes on clean inputs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: docs/methodology/REGISTRY.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -723,6 +723,7 @@ See `docs/methodology/continuous-did.md` Section 4 for full details.
723
723
not-yet-treated controls. When `anticipation=0` (default), behavior is
724
724
unchanged.
725
725
-**Boundary knots**: Knots are built once from all treated doses (global, not per-cell) to ensure a common basis across (g,t) cells for aggregation. Evaluation grid is clamped to training-dose boundary knots (`range(dose)`). R's `contdid` v0.1.0 has an inconsistency where `splines2::bSpline(dvals)` uses `range(dvals)` instead of `range(dose)`, which can produce extrapolation artifacts at dose grid extremes. Our approach avoids extrapolation and is methodologically sound.
726
+
-**Note:**`bspline_derivative_design_matrix` previously swallowed `ValueError` from `scipy.interpolate.BSpline` in the per-basis derivative loop, leaving affected columns of the derivative design matrix as zero with no user-facing signal. It now aggregates the failed basis indices and emits ONE `UserWarning` naming them so downstream ContinuousDiD inference doesn't silently use a biased `dPsi`. The all-identical-knot degenerate case (single dose value) remains silently handled — derivatives there are mathematically zero. Axis-C finding #12 in the Phase 2 silent-failures audit.
- **Note:** The `TripleDifference` registry adapter uses `generate_ddd_data`, a fixed 2×2×2 factorial DGP (group × partition × time). The `n_periods`, `treatment_period`, and `treatment_fraction` parameters are ignored — DDD always simulates 2 periods with balanced groups. `n_units` is mapped to `n_per_cell = max(2, n_units // 8)` (effective total N = `n_per_cell × 8`), so non-multiples of 8 are rounded down and values below 16 are clamped to 16. A `UserWarning` is emitted when simulation inputs differ from the effective DDD design. When rounding occurs, all result objects (`SimulationPowerResults`, `SimulationMDEResults`, `SimulationSampleSizeResults`) set `effective_n_units` to the actual sample size used; it is `None` when no rounding occurred. `simulate_sample_size()` snaps bisection candidates to multiples of 8 so that `required_n` is always a realizable DDD sample size. Passing `n_per_cell` in `data_generator_kwargs` suppresses the effective-N rounding warning but not warnings for ignored parameters (`n_periods`, `treatment_period`, `treatment_fraction`).
2583
2584
-**Note:** The analytical power methods (`PowerAnalysis.power/mde/sample_size` and the `compute_power/compute_mde/compute_sample_size` convenience functions) accept a `deff` parameter (survey design effect, default 1.0). This inflates variance multiplicatively: `Var(ATT) *= deff`, and inflates required sample size: `n_total *= deff`. The `deff` parameter is **not redundant** with `rho` (intra-cluster correlation): `rho` models within-unit serial correlation in panel data via the Moulton factor `1 + (T-1)*rho`, while `deff` models the survey design effect from stratified multi-stage sampling (clustering + unequal weighting). A survey panel study may need both. Values `deff > 0` are accepted; `deff < 1.0` (net variance reduction, e.g., from stratification gain) emits a warning.
2584
2585
-**Note:**`simulate_power()` catches a narrow set of exception types — `ValueError`, `numpy.linalg.LinAlgError`, `KeyError`, `RuntimeError`, `ZeroDivisionError` — raised inside the per-simulation fit and result-extraction block, increments a per-effect failure counter, and skips the replicate. Programming errors (`TypeError`, `AttributeError`, `NameError`, `IndexError`, etc.) are allowed to propagate so that bugs in the estimator or custom result extractor surface loudly instead of being absorbed as simulation failures. The primary-effect failure count is surfaced on the result object as `SimulationPowerResults.n_simulation_failures`; a `UserWarning` still fires when the failure rate exceeds 10% for any effect size, and all-failed runs raise `RuntimeError`. This replaces the prior bare `except Exception` that swallowed root causes and kept the counter internal to the function (axis C — silent fallback — under the Phase 2 audit).
2586
+
-**Note:**`SurveyPowerConfig._build_survey_design()` no longer caches its return value in `self._cached_survey_design`. Mutating `config.survey_design` (or any other config field) after the first call used to silently return the stale cached design; the method now returns the live `self.survey_design` (or the default construction) every call. Construction is microseconds — the cache never earned its complexity. Axis-J finding #28 in the Phase 2 silent-failures audit.
2585
2587
- **Note:** The simulation-based power functions (`simulate_power/simulate_mde/simulate_sample_size`) accept a `survey_config` parameter (`SurveyPowerConfig` dataclass). When set, the simulation loop uses `generate_survey_did_data` instead of the default registry DGP, and automatically injects `SurveyDesign(weights="weight", strata="stratum", psu="psu", fpc="fpc")` into the estimator's `fit()` call. Supported estimators: DifferenceInDifferences, TwoWayFixedEffects, MultiPeriodDiD, CallawaySantAnna, SunAbraham, ImputationDiD, TwoStageDiD, StackedDiD, EfficientDiD. Unsupported (raises `ValueError`): TROP, SyntheticDiD, TripleDifference (generate_survey_did_data produces staggered cohort data incompatible with factor-model/DDD DGPs). `survey_config` and `data_generator` are mutually exclusive. `data_generator_kwargs` may not contain keys managed by `SurveyPowerConfig` (n_strata, psu_per_stratum, etc.) but may contain passthrough DGP params (unit_fe_sd, add_covariates, strata_sizes). Repeated cross-section survey power (`panel=False`) is only supported for `CallawaySantAnna(panel=False)` with a matching `data_generator_kwargs={"panel": False}`; both mismatch directions are rejected. `estimator_kwargs` may not contain `survey_design` when `survey_config` is set (use `SurveyPowerConfig(survey_design=...)` instead). Estimator settings that require a multi-cohort DGP (`control_group="not_yet_treated"`, `control_group="last_cohort"`, `clean_control="strict"`) are rejected because the survey DGP uses a single cohort; use the custom `data_generator` path for these configurations. `simulate_sample_size` raises the bisection floor to `n_strata * psu_per_stratum * 2` to ensure viable survey structure and rejects `strata_sizes` in `data_generator_kwargs` (it depends on `n_units` which varies during bisection).
0 commit comments