Skip to content

Commit 91eb2c7

Browse files
igerberclaude
andcommitted
Round-6 CI P1: narrow analytical guard to within-group-varying PSU only
The round-5 analytical guard in _survey_se_from_group_if was over- scoped: it fired on any panel with leaked centered mass, including PSU-within-group-constant regimes (PSU=group auto-inject, strictly- coarser-PSU-within-group-constant). But the docs said those regimes are the intended workaround. Panels with terminal missingness and PSU=group are the documented supported path and must not raise. Fix: add an analytical dispatcher inside `_survey_se_from_group_if` mirroring the bootstrap's `_psu_varies_within_group` routing. When PSU is within-group-constant on the eligible subset, fall back to the legacy group-level allocator (which uses U_centered[g] directly via the row-sum identity and does not trigger the sentinel-mass guard). Only when PSU actually varies within group does the cell allocator run — and only then can the sentinel-mass guard fire. Byte-identity: under PSU=group the row-sum identity makes the cell and group allocators statistically equivalent, but the legacy branch was the one in use before PR #323 introduced the cell allocator. Rerouting to it under within-group-constant regimes is a regression-free fallback (it's what PR #323 aimed to preserve via byte-identity in the first place). Test coverage: added `test_fit_succeeds_on_terminal_missingness_with_psu_group` — same fixture as the varying-PSU failure regression but with auto-inject `psu=<group>`, asserts both `n_bootstrap=0` and `n_bootstrap > 0` succeed with finite SE. Also updated the cell-level bootstrap `ValueError` text (and the _unroll_target_to_cells docstring) to no longer advertise `n_bootstrap=0` as a workaround — both paths now fail consistently on the varying-PSU carve-out, and the correct workaround is pre-processing or using an explicit `psu=<group_col>`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 262fc61 commit 91eb2c7

3 files changed

Lines changed: 103 additions & 7 deletions

File tree

diff_diff/chaisemartin_dhaultfoeuille.py

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5863,6 +5863,38 @@ def _survey_se_from_group_if(
58635863
and U_centered_per_period.size > 0
58645864
)
58655865

5866+
# When the cell allocator would apply but PSU is within-group-
5867+
# constant (PSU=group, strictly-coarser PSU within-group-constant,
5868+
# or the auto-inject default), the cell and group allocators are
5869+
# equivalent at PSU-level aggregation via the row-sum identity
5870+
# `sum_{c in g} u_cell[c] == u_centered[g]`. Prefer the legacy
5871+
# group-level path in that regime: it sidesteps the sentinel-
5872+
# mass guard (below) that would otherwise fire spuriously on
5873+
# terminally-missing panels whose PSU structure does not require
5874+
# cell-level resolution. Matches the bootstrap dispatcher's
5875+
# routing rule (`_compute_dcdh_bootstrap` + `_psu_varies_within_group`).
5876+
if use_cell_allocator:
5877+
psu_arr = getattr(resolved, "psu", None)
5878+
if psu_arr is None:
5879+
use_cell_allocator = False
5880+
else:
5881+
psu_eff = np.asarray(psu_arr)[pos_mask]
5882+
eligible_set = set(eligible_groups)
5883+
elig_row_mask = np.array(
5884+
[g in eligible_set for g in gids_eff], dtype=bool
5885+
)
5886+
if elig_row_mask.any():
5887+
psu_varies_within = bool(
5888+
pd.DataFrame({
5889+
"g": gids_eff[elig_row_mask],
5890+
"p": psu_eff[elig_row_mask],
5891+
}).groupby("g")["p"].nunique().gt(1).any()
5892+
)
5893+
if not psu_varies_within:
5894+
use_cell_allocator = False
5895+
else:
5896+
use_cell_allocator = False
5897+
58665898
if use_cell_allocator:
58675899
tids_eff = np.asarray(time_ids)[pos_mask]
58685900
# Map row's group to an index in eligible_groups (−1 when the

diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -614,9 +614,13 @@ def _unroll_target_to_cells(
614614
mass at that sentinel cell. The cell-level bootstrap cannot
615615
allocate that mass to any PSU (the cell has no positive-weight
616616
obs), so silently dropping it would under-weight the group's
617-
bootstrap contribution. The conservative guard rejects the
618-
combination and points users to ``n_bootstrap=0`` (analytical
619-
TSL) as the documented alternative for such panels.
617+
bootstrap contribution. The analytical TSL path shares the same
618+
cell-period allocator and fires a matching guard in
619+
``_survey_se_from_group_if``, so both paths reject this regime
620+
consistently. Documented workarounds: pre-process the panel
621+
(drop late-exit groups or trim to a balanced sub-panel), or use
622+
an explicit ``psu=<group_col>`` so both analytical and bootstrap
623+
paths route through the legacy group-level allocator.
620624
621625
Returns ``(u_cell, psu_cell)`` of shape
622626
``(n_valid_cells_in_target,)`` each.
@@ -658,10 +662,14 @@ def _unroll_target_to_cells(
658662
"within-group-varying PSU: `_cohort_recenter_per_period` "
659663
"subtracts column means across the full period grid, so a "
660664
"group with no observation at period t acquires non-zero "
661-
"centered mass there, which the cell-level bootstrap "
662-
"cannot allocate to any PSU. Use `n_bootstrap=0` to fall "
663-
"back to analytical TSL variance (which supports this "
664-
"panel regime)."
665+
"centered mass there, which the cell-level allocator "
666+
"cannot allocate to any PSU. The analytical TSL path "
667+
"(`_survey_se_from_group_if`) fires a matching guard on "
668+
"the same regime, so both paths reject this panel "
669+
"consistently. Pre-process the panel to remove terminal "
670+
"missingness (drop late-exit groups or trim to a balanced "
671+
"sub-panel), or use an explicit `psu=<group_col>` so both "
672+
"paths route through the legacy group-level allocator."
665673
)
666674
return flat_u[mask], flat_psu[mask].astype(np.int64, copy=False)
667675

tests/test_survey_dcdh.py

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2186,6 +2186,62 @@ def test_fit_raises_on_terminal_missingness_with_varying_psu(self):
21862186
survey_design=sd, L_max=1,
21872187
)
21882188

2189+
def test_fit_succeeds_on_terminal_missingness_with_psu_group(self):
2190+
"""Companion regression: the terminal-missingness + varying-PSU
2191+
sentinel-mass guard must NOT fire when PSU is within-group-
2192+
constant. Same fixture as the varying-PSU test above but with
2193+
`psu=<group column>` (auto-inject default) — both analytical
2194+
and bootstrap paths route through the legacy group-level
2195+
allocator (the analytical dispatcher in
2196+
`_survey_se_from_group_if` falls back to the group-level
2197+
allocator when PSU does not vary within group; the bootstrap
2198+
dispatcher in `_compute_dcdh_bootstrap` does the same). Fit
2199+
must succeed with finite SE.
2200+
"""
2201+
rows = []
2202+
for g in range(10):
2203+
if g < 5:
2204+
d_pattern = [0, 0, 0, 1, 1, 1]
2205+
elif g < 8:
2206+
d_pattern = [1, 1, 1, 1, 0, 0]
2207+
else:
2208+
d_pattern = [0, 0, 0, 0, 0, 0]
2209+
for t in range(6):
2210+
if g == 2 and t >= 4:
2211+
continue
2212+
d = d_pattern[t]
2213+
y = float(g) + 0.1 * t + 1.0 * d
2214+
rows.append({
2215+
"group": int(g),
2216+
"period": int(t),
2217+
"treatment": int(d),
2218+
"outcome": y,
2219+
"pw": 1.0,
2220+
})
2221+
df_ = pd.DataFrame(rows)
2222+
# Auto-inject: no explicit `psu` → `SurveyDesign` falls back to
2223+
# `psu=<group_col>` at fit() time. Within-group-constant.
2224+
sd = SurveyDesign(weights="pw")
2225+
import warnings as _w
2226+
for n_boot in (0, 50):
2227+
with _w.catch_warnings():
2228+
_w.simplefilter("ignore")
2229+
res = ChaisemartinDHaultfoeuille(
2230+
n_bootstrap=n_boot, seed=1,
2231+
).fit(
2232+
df_, outcome="outcome", group="group",
2233+
time="period", treatment="treatment",
2234+
survey_design=sd, L_max=1,
2235+
)
2236+
assert np.isfinite(res.overall_att), (
2237+
f"n_bootstrap={n_boot}: overall_att must be finite "
2238+
f"under PSU=group + terminal missingness."
2239+
)
2240+
assert np.isfinite(res.overall_se) and res.overall_se >= 0.0, (
2241+
f"n_bootstrap={n_boot}: overall_se must be finite "
2242+
f"under PSU=group + terminal missingness."
2243+
)
2244+
21892245
def test_bootstrap_dense_codes_under_singleton_baseline_excluded_group(self):
21902246
"""Regression for P0 #2: when a group is singleton-baseline-
21912247
excluded (e.g., an always-treated group whose baseline D=1

0 commit comments

Comments
 (0)