Skip to content

Commit ca7b6bf

Browse files
igerberclaude
andcommitted
Narrow auto-inject guard to nest=False + regression test for nest=True
The previous round's guard fired on any varying-strata + omitted-psu combination, which rejected `SurveyDesign(weights, strata, nest=True)` unnecessarily. `SurveyDesign.resolve()` at `diff_diff/survey.py:299-302` combines `(stratum, psu)` into globally-unique labels under nest=True, so the auto-injected `psu=<group>` is re-labeled per stratum and the cross-stratum uniqueness check passes. Only the `nest=False` default path actually needs the up-front guard. Narrows the guard to `not getattr(survey_design, "nest", False)` and updates the error message to enumerate three actionable remediations (constant-within-group strata, or `nest=True`, or explicit `psu`). Adds `test_auto_inject_with_varying_strata_nest_true_succeeds` under `TestSurveyWithinGroupValidation` covering the newly-accepted path: byte-for-byte match against explicit `SurveyDesign(weights, strata, psu="group", nest=True)` on `overall_se` and `survey_metadata.df_survey`. The default `nest=False` still-raises regression (`test_auto_inject_with_varying_strata_raises`) remains unchanged. Updates fit() docstring and the REGISTRY.md survey IF expansion Note to enumerate the three supported auto-inject paths: (1) strata constant within group, (2) strata vary + nest=True, (3) strata vary + nest=False (rejected with targeted ValueError). All 338 tests pass (affected surface + slow MC coverage sim). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent dedc2a6 commit ca7b6bf

3 files changed

Lines changed: 80 additions & 25 deletions

File tree

diff_diff/chaisemartin_dhaultfoeuille.py

Lines changed: 39 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -650,7 +650,15 @@ def fit(
650650
**Strata and PSU may vary across cells of a group** but
651651
must be constant within each ``(g, t)`` cell (trivially
652652
true in one-obs-per-cell panels; enforced otherwise with
653-
``ValueError``). When ``n_bootstrap > 0`` and a survey
653+
``ValueError``). Three supported combinations under the
654+
auto-injected ``psu=<group_col>``:
655+
(1) strata constant within group (any ``nest`` flag works);
656+
(2) strata vary within group **and** ``nest=True`` — the
657+
resolver re-labels the synthesized ``psu`` uniquely within
658+
strata; (3) strata vary within group **and** ``nest=False``
659+
— rejected up front with a targeted ``ValueError``; pass
660+
``SurveyDesign(..., nest=True)`` or an explicit
661+
``psu=<col>`` with globally-unique labels instead. When ``n_bootstrap > 0`` and a survey
654662
design is supplied, the multiplier bootstrap operates at
655663
the PSU level (Hall-Mammen wild PSU bootstrap) — under the
656664
default auto-inject this collapses to a group-level
@@ -726,36 +734,43 @@ def fit(
726734
or resolved_survey.replicate_weights.shape[1] == 0
727735
)
728736
):
729-
# Pre-auto-inject contract check: the auto-injected PSU
730-
# column reuses group labels with nest=False, but the
731-
# survey resolver enforces globally-unique PSU labels when
732-
# nest=False and strata are present (see
733-
# ``diff_diff/survey.py``). If strata varies within group,
734-
# the synthesized PSU column collides across strata and
735-
# resolution fails downstream with an opaque error. Flag
736-
# that configuration up front with an actionable message
737-
# pointing users to the explicit ``psu=<col>, nest=True``
738-
# path (REGISTRY.md survey IF expansion Note).
739-
if resolved_survey.strata is not None:
737+
# Pre-auto-inject contract check: the auto-inject path
738+
# synthesizes ``psu=<group>`` and preserves the user's
739+
# ``nest`` flag. Under ``nest=False`` (the default), the
740+
# survey resolver requires globally-unique PSU labels when
741+
# strata are present; if strata varies within group, the
742+
# synthesized PSU column reuses group labels across strata
743+
# and trips the cross-stratum PSU uniqueness check at
744+
# resolution time. Under ``nest=True`` the resolver
745+
# re-labels ``(stratum, psu)`` uniquely within strata
746+
# (``diff_diff/survey.py:299-302``), so varying strata is
747+
# fine — let the auto-inject proceed. Only the
748+
# ``nest=False`` + varying-strata + omitted-psu triple
749+
# warrants an up-front targeted error.
750+
if resolved_survey.strata is not None and not getattr(
751+
survey_design, "nest", False
752+
):
740753
_strata_varies_pre, _ = _strata_psu_vary_within_group(
741754
resolved_survey, data, group, survey_weights,
742755
)
743756
if _strata_varies_pre:
744757
raise ValueError(
745758
"ChaisemartinDHaultfoeuille survey support: "
746759
"strata that vary across cells of the same "
747-
"group require an explicit `psu=<col>` with "
748-
"`nest=True` so that `(stratum, psu)` pairs "
749-
"are globally unique. The default auto-"
750-
"injected `psu=<group>` path does NOT support "
751-
"this because the synthesized PSU column "
752-
"reuses group labels across strata and trips "
753-
"the cross-stratum PSU uniqueness check in "
754-
"survey resolution. Either (a) set strata "
755-
"constant within each group, or (b) pass "
756-
"`SurveyDesign(..., psu=<col>, nest=True)` "
757-
"with PSU labels that are unique within "
758-
"strata."
760+
"group require either an explicit "
761+
"`psu=<col>` (any column whose labels are "
762+
"globally unique within strata) or the "
763+
"original `SurveyDesign(..., nest=True)` "
764+
"flag so the auto-injected `psu=<group>` is "
765+
"re-labeled uniquely within strata by the "
766+
"resolver. The default `nest=False` auto-"
767+
"inject path reuses group labels across "
768+
"strata and trips the cross-stratum PSU "
769+
"uniqueness check in survey resolution. "
770+
"Either (a) set strata constant within each "
771+
"group, (b) pass `SurveyDesign(..., "
772+
"nest=True)`, or (c) pass an explicit "
773+
"`psu=<col>` with globally-unique labels."
759774
)
760775

761776
from diff_diff.survey import SurveyDesign as _SurveyDesign

docs/methodology/REGISTRY.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -649,7 +649,7 @@ Alternative: Multiplier bootstrap clustered at group via the `n_bootstrap` param
649649
- [x] Design-2 switch-in/switch-out descriptive wrapper (Web Appendix Section 1.6)
650650
- [x] HonestDiD (Rambachan-Roth 2023) integration on placebo + event study surface
651651
- [x] Survey design support: pweight with strata/PSU/FPC via Taylor Series Linearization (analytical) **or replicate-weight variance (BRR/Fay/JK1/JKn/SDR)**, covering the main ATT surface, covariate adjustment (DID^X), heterogeneity testing, the TWFE diagnostic (fit and standalone `twowayfeweights()` helper), and HonestDiD bounds. Opt-in **PSU-level Hall-Mammen wild bootstrap** is also supported via `n_bootstrap > 0`.
652-
- **Note (Survey IF expansion — library convention):** Survey IF expansion is a library extension not in the dCDH papers (the paper's plug-in variance assumes iid sampling). The library convention builds observation-level `psi_i` by proportionally distributing per-group IF mass within weight share: either at the group level (`psi_i = U_centered[g] * w_i / W_g`, the previous convention) or at the per-`(g, t)` cell level via the cell-period allocator shipped in this release. Cell-level expansion: decompose `U[g]` into per-period attributions `U[g, t]`, cohort-center each column independently, then expand to observation level as `psi_i = U_centered_per_period[g_i, t_i] * (w_i / W_{g_i, t_i})`. Binder (1983) stratified-PSU variance aggregates the resulting `psi` at PSU level. **Post-period attribution convention:** each transition term in the IF sum (of the form `role_weight * (Y_{g, t} - Y_{g, t-1})` for DID_M or `S_g * (Y_{g, out} - Y_{g, ref})` for DID_l) is attributed as a single *difference* to the POST-period cell, not split into a `+Y_post` / `-Y_pre` pair across two cells. This is a library *convention*, not a theorem — adopted because it preserves the group-sum, PSU-sum, and cohort-sum identities of the previous group-level expansion (so Binder variance coincides with the group-level variance under the auto-injected `psu=group`) and because Monte Carlo coverage at nominal 95% is empirically close to nominal on a DGP where PSUs vary across the cells of each group (see `tests/test_dcdh_cell_period_coverage.py`). A covariance-aware two-cell allocator is a plausible alternative and may be worth exploring if future designs motivate an explicit observation-level IF derivation; the method currently in the library is **not derived from the observation-level survey linearization of the contrast** and makes no stronger claim than "coverage is approximately nominal under the tested DGPs and the group-sum identity holds exactly." Under within-group-constant PSU (the pre-allocator accepted input), per-cell sums telescope to `U_centered[g]` and Binder variance is byte-identical (up to single-ULP floating-point noise) to the previous group-level expansion. **Strata and PSU must be constant within each `(g, t)` cell** (trivially satisfied in one-obs-per-cell panels — the canonical dCDH structure); variation **across cells of a group** is supported by the allocator. Within-group-varying **weights** are supported as before. When `survey_design.psu` is not specified, `fit()` auto-injects `psu=<group column>` so the TSL variance, `df_survey`, and t-based inference match the per-group PSU structure. **Strata that vary across cells of a group require an explicit `psu=<col>` with `nest=True`** so that `(stratum, psu)` labels are globally unique; the auto-injected `psu=<group>` path does NOT support this (the synthesized PSU column would reuse group labels across strata and trip the cross-stratum PSU uniqueness check in `SurveyDesign.resolve()`). `fit()` detects that combination before survey resolution and raises a targeted `ValueError` pointing users at the explicit-`psu, nest=True` path. Under replicate-weight designs, the same cell-level `psi_i` is aggregated via Rao-Wu weight-ratio rescaling (`compute_replicate_if_variance` at `diff_diff/survey.py:1681`) rather than the Binder TSL formula. All five methods (BRR/Fay/JK1/JKn/SDR) are supported method-agnostically through the unified helper; the effective `df_survey` is reduced to `min(n_valid) - 1` across IF sites when some replicate solves fail (matching `efficient_did.py:1133-1135` and `triple_diff.py:676-686` precedents). Under DID^X, the first-stage residualization coefficient `theta_hat` is computed once on full-sample weights and treated as fixed (FWL plug-in IF convention) — per-replicate refits of `theta_hat` are not performed. **Scope limitations (follow-up PRs):** (a) `heterogeneity=` combined with within-group-varying PSU/strata raises `NotImplementedError` — the heterogeneity WLS `psi_obs` still uses the legacy group-level expansion, to be extended in PR 3; (b) `n_bootstrap > 0` combined with within-group-varying PSU raises `NotImplementedError` — the PSU-level Hall-Mammen wild bootstrap still uses the legacy group-level PSU map, to be extended in PR 4.
652+
- **Note (Survey IF expansion — library convention):** Survey IF expansion is a library extension not in the dCDH papers (the paper's plug-in variance assumes iid sampling). The library convention builds observation-level `psi_i` by proportionally distributing per-group IF mass within weight share: either at the group level (`psi_i = U_centered[g] * w_i / W_g`, the previous convention) or at the per-`(g, t)` cell level via the cell-period allocator shipped in this release. Cell-level expansion: decompose `U[g]` into per-period attributions `U[g, t]`, cohort-center each column independently, then expand to observation level as `psi_i = U_centered_per_period[g_i, t_i] * (w_i / W_{g_i, t_i})`. Binder (1983) stratified-PSU variance aggregates the resulting `psi` at PSU level. **Post-period attribution convention:** each transition term in the IF sum (of the form `role_weight * (Y_{g, t} - Y_{g, t-1})` for DID_M or `S_g * (Y_{g, out} - Y_{g, ref})` for DID_l) is attributed as a single *difference* to the POST-period cell, not split into a `+Y_post` / `-Y_pre` pair across two cells. This is a library *convention*, not a theorem — adopted because it preserves the group-sum, PSU-sum, and cohort-sum identities of the previous group-level expansion (so Binder variance coincides with the group-level variance under the auto-injected `psu=group`) and because Monte Carlo coverage at nominal 95% is empirically close to nominal on a DGP where PSUs vary across the cells of each group (see `tests/test_dcdh_cell_period_coverage.py`). A covariance-aware two-cell allocator is a plausible alternative and may be worth exploring if future designs motivate an explicit observation-level IF derivation; the method currently in the library is **not derived from the observation-level survey linearization of the contrast** and makes no stronger claim than "coverage is approximately nominal under the tested DGPs and the group-sum identity holds exactly." Under within-group-constant PSU (the pre-allocator accepted input), per-cell sums telescope to `U_centered[g]` and Binder variance is byte-identical (up to single-ULP floating-point noise) to the previous group-level expansion. **Strata and PSU must be constant within each `(g, t)` cell** (trivially satisfied in one-obs-per-cell panels — the canonical dCDH structure); variation **across cells of a group** is supported by the allocator. Within-group-varying **weights** are supported as before. When `survey_design.psu` is not specified, `fit()` auto-injects `psu=<group column>` so the TSL variance, `df_survey`, and t-based inference match the per-group PSU structure. **Strata that vary across cells of a group require either an explicit `psu=<col>` or the original `SurveyDesign(..., nest=True)` flag** — under `nest=True` the resolver combines `(stratum, psu)` into globally-unique labels, so the auto-injected `psu=<group>` is re-labeled per stratum and the cell allocator proceeds. Only the `nest=False` + varying-strata + omitted-psu combination is rejected up front with a targeted `ValueError` at `fit()` time (the synthesized PSU column would reuse group labels across strata and trip the cross-stratum PSU uniqueness check in `SurveyDesign.resolve()`). Under replicate-weight designs, the same cell-level `psi_i` is aggregated via Rao-Wu weight-ratio rescaling (`compute_replicate_if_variance` at `diff_diff/survey.py:1681`) rather than the Binder TSL formula. All five methods (BRR/Fay/JK1/JKn/SDR) are supported method-agnostically through the unified helper; the effective `df_survey` is reduced to `min(n_valid) - 1` across IF sites when some replicate solves fail (matching `efficient_did.py:1133-1135` and `triple_diff.py:676-686` precedents). Under DID^X, the first-stage residualization coefficient `theta_hat` is computed once on full-sample weights and treated as fixed (FWL plug-in IF convention) — per-replicate refits of `theta_hat` are not performed. **Scope limitations (follow-up PRs):** (a) `heterogeneity=` combined with within-group-varying PSU/strata raises `NotImplementedError` — the heterogeneity WLS `psi_obs` still uses the legacy group-level expansion, to be extended in PR 3; (b) `n_bootstrap > 0` combined with within-group-varying PSU raises `NotImplementedError` — the PSU-level Hall-Mammen wild bootstrap still uses the legacy group-level PSU map, to be extended in PR 4.
653653
- **Note (survey + bootstrap contract):** When `survey_design` and `n_bootstrap > 0` are both active, the bootstrap uses Hall-Mammen wild multiplier weights (Rademacher/Mammen/Webb) **at the PSU level**. Under the default auto-injected `psu=group`, the PSU coincides with the group so the wild bootstrap is a clean group-level clustered bootstrap (identity-map fast path, bit-identical to the non-survey multiplier bootstrap). When the user passes an explicit strictly-coarser PSU (e.g., `psu=state` with groups at county level), the IF contributions of all groups within a PSU receive the same bootstrap multiplier — the standard Hall-Mammen wild PSU bootstrap. Strata do not participate in the bootstrap randomization (they contribute only through the analytical TSL variance); this is conservative when strata differ substantially in variance. A `UserWarning` fires only when PSU is strictly coarser than group. **Scope note (cell-period allocator):** The PSU-level bootstrap uses a group-level `group_id_to_psu_code` map and therefore requires PSU to be constant within each group. Combining `n_bootstrap > 0` with a PSU that varies within group raises `NotImplementedError`; the cell-level Hall-Mammen extension is deferred to a follow-up PR. The analytical TSL variance fully supports within-group-varying PSU via the cell-period allocator — use `n_bootstrap=0` for those designs. **Replicate-weight designs and `n_bootstrap > 0` are mutually exclusive** (replicate variance is closed-form; bootstrap would double-count variance) — the combination raises `NotImplementedError`, matching `efficient_did.py:989`, `staggered.py:1869`, `two_stage.py:251-253`. For HonestDiD bounds under replicate weights, the replicate-effective `df_survey = min(resolved_survey.df_survey, min(n_valid_across_sites) - 1)` propagates to t-critical values — capped by the design's QR-rank-based df so a rank-deficient replicate matrix never produces a larger effective df than the design supports. When `resolved_survey.df_survey` is undefined (QR-rank ≤ 1), the effective df stays `None` and all inference fields (including HonestDiD bounds) are NaN — per-site `n_valid` cannot rescue a rank-deficient design.
654654

655655
---

tests/test_survey_dcdh.py

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1172,6 +1172,46 @@ def test_bootstrap_with_varying_psu_raises(self, base_data):
11721172
survey_design=sd,
11731173
)
11741174

1175+
def test_auto_inject_with_varying_strata_nest_true_succeeds(self, base_data):
1176+
"""When strata varies across cells of a group and the user
1177+
passes ``nest=True`` with no explicit ``psu``, the auto-inject
1178+
path is valid: ``SurveyDesign.resolve()`` combines
1179+
``(stratum, psu)`` into globally-unique labels via the
1180+
nest=True path (``diff_diff/survey.py:299-302``), so the
1181+
cross-stratum PSU uniqueness check is satisfied. Byte-check
1182+
against the explicit ``SurveyDesign(..., psu="group",
1183+
nest=True)`` baseline — both paths resolve to the same design.
1184+
"""
1185+
df_ = base_data.copy()
1186+
df_["pw"] = 1.0
1187+
df_["stratum"] = df_["period"] % 2
1188+
sd_auto = SurveyDesign(weights="pw", strata="stratum", nest=True)
1189+
sd_explicit = SurveyDesign(
1190+
weights="pw", strata="stratum", psu="group", nest=True,
1191+
)
1192+
r_auto = ChaisemartinDHaultfoeuille(seed=1).fit(
1193+
df_, outcome="outcome", group="group",
1194+
time="period", treatment="treatment",
1195+
survey_design=sd_auto, L_max=2,
1196+
)
1197+
r_explicit = ChaisemartinDHaultfoeuille(seed=1).fit(
1198+
df_, outcome="outcome", group="group",
1199+
time="period", treatment="treatment",
1200+
survey_design=sd_explicit, L_max=2,
1201+
)
1202+
assert np.isfinite(r_auto.overall_att)
1203+
assert np.isfinite(r_auto.overall_se)
1204+
if np.isfinite(r_auto.overall_se) and np.isfinite(r_explicit.overall_se):
1205+
assert r_auto.overall_se == pytest.approx(
1206+
r_explicit.overall_se, rel=1e-6
1207+
)
1208+
assert r_auto.survey_metadata is not None
1209+
assert r_explicit.survey_metadata is not None
1210+
assert (
1211+
r_auto.survey_metadata.df_survey
1212+
== r_explicit.survey_metadata.df_survey
1213+
)
1214+
11751215
def test_auto_inject_with_varying_strata_raises(self, base_data):
11761216
"""Auto-injected `psu=<group>` with nest=False cannot honor
11771217
strata that vary across cells of a group — the synthesized PSU

0 commit comments

Comments
 (0)