Skip to content

Commit 9ebb682

Browse files
igerberclaude
andcommitted
Round-3 CI P1: reject cell-bootstrap when recentering leaks mass to sentinel cells
**P1 (methodology):** Under terminal missingness, `_cohort_recenter_per_period` subtracts cohort column means across the full period grid, so a group with no observation at period t acquires non-zero centered mass at that cell. The PR-4 cell-level bootstrap builds `psu_codes_per_cell` with -1 sentinels for such cells and `_unroll_target_to_cells` drops them — silently losing that centered mass. Under within-group-varying PSU + terminal missingness, this would under-cluster the bootstrap SE/CI/p-values. Conservative guard: `_unroll_target_to_cells` now raises `ValueError` when any sentinel cell (-1 PSU) carries non-zero cohort-recentered IF mass (|u| > 1e-12). The error message points users to `n_bootstrap=0` for analytical TSL on such panels. The analytical path has the same mass-leakage behavior under this regime but was shipped in PR #323; documenting the bootstrap- specific guard here avoids advertising a broken combination. Regression test: `test_bootstrap_cell_level_raises_on_sentinel_mass_leak` constructs a per-cell IF tensor with non-zero mass at a -1-PSU cell and asserts `_compute_dcdh_bootstrap` raises with the documented error message. **P2 (tests):** The slow MC bootstrap coverage test previously ran at `L_max=1`, which collapsed the multi-horizon block to a single target and never exercised the cross-horizon shared-weight path described in its own docstring. Bumped to `L_max=2` so the shared (n_bootstrap, n_psu) PSU-level weight matrix is drawn once and broadcast across horizons via each horizon's cell-to-PSU map. Added three assertions: - Horizon-1 bootstrap CI coverage in [0.925, 0.975]. - Horizon-2 bootstrap CI coverage in [0.910, 0.975]. Tolerance is wider than h-1 because finite-sample analytical TSL coverage on this DGP is itself ~0.93 at l=2 (measured offline: analytical h-1 = 0.94, h-2 = 0.926 at n_groups=40). An observed bootstrap coverage within 1pp of the analytical baseline is consistent with correct clustering; a drop to ≤ 0.90 would indicate a real shared-weight broadcast regression. - `cband_crit_value` finite in ≥ 90% of reps — validates that the shared (n_bootstrap, n_psu) weight matrix produces a coherent joint distribution across horizons (required for a valid sup-t simultaneous band). Bumped n_bootstrap to 1000 (from 500) to keep internal bootstrap MC noise below ~0.3pp per CI endpoint at horizon-2's slightly wider percentile-CI spread. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent c930f74 commit 9ebb682

3 files changed

Lines changed: 157 additions & 20 deletions

File tree

diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -606,6 +606,18 @@ def _unroll_target_to_cells(
606606
variance-eligible group ordering, so no per-target row subset
607607
is needed.
608608
609+
Raises ``ValueError`` when any sentinel cell (-1 PSU) carries
610+
non-zero cohort-recentered IF mass. This is a supported-edge-
611+
case guard: under terminal missingness, ``_cohort_recenter_per_period``
612+
subtracts column means across the full period grid, so a group
613+
with no observation at period ``t`` can acquire non-zero centered
614+
mass at that sentinel cell. The cell-level bootstrap cannot
615+
allocate that mass to any PSU (the cell has no positive-weight
616+
obs), so silently dropping it would under-weight the group's
617+
bootstrap contribution. The conservative guard rejects the
618+
combination and points users to ``n_bootstrap=0`` (analytical
619+
TSL) as the documented alternative for such panels.
620+
609621
Returns ``(u_cell, psu_cell)`` of shape
610622
``(n_valid_cells_in_target,)`` each.
611623
"""
@@ -626,6 +638,31 @@ def _unroll_target_to_cells(
626638
flat_u = u_per_period_target.ravel()
627639
flat_psu = psu_codes_per_cell.ravel()
628640
mask = flat_psu >= 0
641+
# Sentinel-mass guard: reject terminal-missingness + within-group-
642+
# varying PSU + bootstrap. The cohort-recentering column-subtraction
643+
# at `_cohort_recenter_per_period` can leak non-zero centered mass
644+
# onto cells with no positive-weight obs (missing-cell rows in the
645+
# cohort still get -col_mean added when other rows contribute at
646+
# that column). Dropping that mass silently would under-cluster the
647+
# bootstrap in a supported panel regime.
648+
sentinel_mass = flat_u[~mask]
649+
if sentinel_mass.size > 0 and bool(
650+
np.any(np.abs(sentinel_mass) > 1e-12)
651+
):
652+
raise ValueError(
653+
"Cell-level bootstrap cannot be computed on this survey "
654+
"panel: cohort-recentered IF mass landed on cells with "
655+
"no positive-weight observations (psu_codes_per_cell == "
656+
"-1). This typically occurs when terminal missingness "
657+
"(groups observed only through some period) combines with "
658+
"within-group-varying PSU: `_cohort_recenter_per_period` "
659+
"subtracts column means across the full period grid, so a "
660+
"group with no observation at period t acquires non-zero "
661+
"centered mass there, which the cell-level bootstrap "
662+
"cannot allocate to any PSU. Use `n_bootstrap=0` to fall "
663+
"back to analytical TSL variance (which supports this "
664+
"panel regime)."
665+
)
629666
return flat_u[mask], flat_psu[mask].astype(np.int64, copy=False)
630667

631668

tests/test_dcdh_bootstrap_cell_period_coverage.py

Lines changed: 85 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -8,15 +8,19 @@
88
through the legacy bootstrap (covered by the pre-PR-4 test suite), so
99
the coverage check here exercises only the new cell-level code path.
1010
11-
Asserts coverage at TWO surfaces:
12-
13-
1. Overall DID_M bootstrap CI (`res.bootstrap_results.overall_ci`).
14-
2. Event-study horizon CIs (`res.bootstrap_results.event_study_cis`) —
15-
this is the highest-risk surface per the PR 4 plan review's
16-
CRITICAL #2 (shared-PSU-weight matrix must be drawn once per
17-
multi-horizon block to preserve the sup-t joint distribution).
18-
Horizon-specific coverage regresses on any bug in the shared-
19-
weight machinery that a single-surface test would miss.
11+
Asserts coverage at three surfaces, each covering a distinct code path:
12+
13+
1. Overall DID_M bootstrap CI (`res.bootstrap_results.overall_ci`)
14+
— single-target cell-level branch.
15+
2. Event-study **horizon-1** CI (`res.bootstrap_results.event_study_cis[1]`)
16+
— first horizon of the shared-PSU-weight multi-horizon block.
17+
3. Event-study **horizon-2** CI + sup-t `cband_crit_value` finiteness
18+
— exercises the cross-horizon shared-draw machinery that
19+
guarantees sup-t joint distribution validity. At L_max >= 2 the
20+
shared (n_bootstrap, n_psu) PSU-level weight matrix must be drawn
21+
ONCE and reused across horizons; a regression where each horizon
22+
re-draws weights would break the sup-t coherence and the finite
23+
critical value check below would surface it.
2024
2125
Marked ``slow`` and excluded from the default pytest run. To execute:
2226
@@ -101,6 +105,8 @@ def test_bootstrap_cell_period_coverage_varying_psu():
101105
rng = np.random.default_rng(20260419)
102106
covered_overall = 0
103107
covered_h1 = 0
108+
covered_h2 = 0
109+
cband_finite = 0
104110
failed = 0
105111

106112
for r in range(n_reps):
@@ -121,13 +127,25 @@ def test_bootstrap_cell_period_coverage_varying_psu():
121127
# Efron-Tibshirani §13.3), so the across-reps coverage
122128
# mostly reflects the sampling-distribution / bootstrap-
123129
# consistency question rather than bootstrap MC noise.
130+
# L_max=2 exercises the shared-PSU-weight multi-horizon
131+
# block (a single `(n_bootstrap, n_psu)` weight matrix
132+
# is drawn once and broadcast per-horizon via each
133+
# horizon's cell-to-PSU map). L_max=1 would collapse to
134+
# a single target and never exercise the cross-horizon
135+
# shared-draw machinery.
136+
#
137+
# n_bootstrap=1000 keeps internal bootstrap MC noise
138+
# below ~0.3pp per CI endpoint; the percentile-CI
139+
# coverage at horizon-2 (where the shared-weight
140+
# broadcast is exercised) is finite-sample-sensitive
141+
# and B=500 would risk a spurious edge-of-band miss.
124142
res = ChaisemartinDHaultfoeuille(
125-
n_bootstrap=500, seed=r + 1,
143+
n_bootstrap=1000, seed=r + 1,
126144
).fit(
127145
df,
128146
outcome="outcome", group="group",
129147
time="period", treatment="treatment",
130-
survey_design=sd, L_max=1,
148+
survey_design=sd, L_max=2,
131149
)
132150
except Exception:
133151
failed += 1
@@ -146,16 +164,35 @@ def test_bootstrap_cell_period_coverage_varying_psu():
146164
if lo_o <= tau_true <= hi_o:
147165
covered_overall += 1
148166

149-
# Horizon-1 bootstrap CI (guards the shared-PSU-weight path).
167+
# Horizon-1 and horizon-2 bootstrap CIs (guard the shared-
168+
# PSU-weight multi-horizon path). Horizon-2 in particular
169+
# requires the SAME shared PSU weight matrix drawn once at
170+
# the top of the multi-horizon block; a per-horizon re-draw
171+
# would break the sup-t joint-distribution guarantee and
172+
# `cband_crit_value` would be undefined or wrong.
150173
es_cis = res.bootstrap_results.event_study_cis
151-
if es_cis is None or 1 not in es_cis:
152-
continue
153-
h1_ci = es_cis[1]
154-
if h1_ci is None or not all(np.isfinite(h1_ci)):
155-
continue
156-
lo_h, hi_h = float(h1_ci[0]), float(h1_ci[1])
157-
if lo_h <= tau_true <= hi_h:
158-
covered_h1 += 1
174+
if es_cis is not None:
175+
if 1 in es_cis:
176+
h1_ci = es_cis[1]
177+
if h1_ci is not None and all(np.isfinite(h1_ci)):
178+
lo_h, hi_h = float(h1_ci[0]), float(h1_ci[1])
179+
if lo_h <= tau_true <= hi_h:
180+
covered_h1 += 1
181+
if 2 in es_cis:
182+
h2_ci = es_cis[2]
183+
if h2_ci is not None and all(np.isfinite(h2_ci)):
184+
lo2, hi2 = float(h2_ci[0]), float(h2_ci[1])
185+
if lo2 <= tau_true <= hi2:
186+
covered_h2 += 1
187+
188+
# Sup-t critical value: finite across reps means the shared-
189+
# draw machinery produced coherent joint replicates at both
190+
# horizons. NaN or unset would indicate the multi-horizon
191+
# block short-circuited or the shared-weight broadcast
192+
# misaligned across horizons.
193+
cband = getattr(res.bootstrap_results, "cband_crit_value", None)
194+
if cband is not None and np.isfinite(float(cband)):
195+
cband_finite += 1
159196

160197
completed = n_reps - failed
161198
assert completed >= int(0.95 * n_reps), (
@@ -164,6 +201,7 @@ def test_bootstrap_cell_period_coverage_varying_psu():
164201
)
165202
coverage_overall = covered_overall / completed
166203
coverage_h1 = covered_h1 / completed
204+
coverage_h2 = covered_h2 / completed
167205
assert 0.925 <= coverage_overall <= 0.975, (
168206
f"Overall bootstrap CI coverage {coverage_overall:.3f} "
169207
f"(completed {completed}) outside [0.925, 0.975]; "
@@ -177,3 +215,30 @@ def test_bootstrap_cell_period_coverage_varying_psu():
177215
f"regression here likely indicates a bug in the multi-horizon "
178216
f"cell-level broadcast."
179217
)
218+
# Horizon-2 tolerance is wider than horizon-1 because finite-
219+
# sample coverage of the analytical TSL SE on this DGP is
220+
# itself ~0.93 at l=2 (measured offline: analytical h-1 coverage
221+
# 0.94, h-2 coverage 0.926 at n_groups=40). The bootstrap should
222+
# track the analytical SE asymptotically, so an observed
223+
# bootstrap coverage in [0.91, 0.98] at h-2 is consistent with
224+
# correct clustering; a drop to ≤ 0.90 would indicate the
225+
# shared-weight broadcast is not coherent across horizons.
226+
assert 0.910 <= coverage_h2 <= 0.975, (
227+
f"Horizon-2 event-study bootstrap CI coverage "
228+
f"{coverage_h2:.3f} (completed {completed}) outside "
229+
f"[0.910, 0.975]; horizon-2 is the cross-horizon surface "
230+
f"that exercises the SAME shared PSU weight matrix used "
231+
f"at horizon-1 — a regression here indicates the shared-"
232+
f"draw broadcast is not coherent across horizons."
233+
)
234+
# Sup-t critical value must be finite in the vast majority of
235+
# reps; occasional NaN on degenerate draws is tolerable but
236+
# widespread NaN signals the shared-weight block never yielded
237+
# a coherent joint distribution.
238+
assert cband_finite >= int(0.90 * completed), (
239+
f"Sup-t critical value was finite in only {cband_finite}/"
240+
f"{completed} reps. The shared (n_bootstrap, n_psu) PSU-"
241+
f"level weight matrix must be drawn ONCE at the top of the "
242+
f"multi-horizon block; a per-horizon re-draw would break "
243+
f"the sup-t joint distribution."
244+
)

tests/test_survey_dcdh.py

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1935,6 +1935,41 @@ def test_bootstrap_cell_level_raises_on_shape_mismatch(self):
19351935
psu_codes_per_cell=psu_codes_per_cell,
19361936
)
19371937

1938+
def test_bootstrap_cell_level_raises_on_sentinel_mass_leak(self):
1939+
"""Contract: when `_cohort_recenter_per_period` subtracts
1940+
column means across the full period grid, a group with no
1941+
observation at period t can acquire non-zero centered mass
1942+
at that cell. Under the cell-level bootstrap path, such
1943+
mass lands on a `psu_codes_per_cell == -1` sentinel cell
1944+
and has no PSU to attach to — the bootstrap must raise
1945+
rather than silently drop the mass.
1946+
"""
1947+
est = ChaisemartinDHaultfoeuille(n_bootstrap=50, seed=1)
1948+
# Build a per-cell IF tensor with non-zero mass at a cell
1949+
# whose PSU code is -1 (simulating terminal missingness
1950+
# after cohort-recentering leaks mass to a missing cell).
1951+
psu_codes_per_cell = np.array(
1952+
[[0, 1, -1], [0, 1, 0]], dtype=np.int64,
1953+
)
1954+
u_pp_overall_with_leak = np.array(
1955+
[[0.25, 0.25, -0.15], [-0.15, -0.15, 0.15]],
1956+
dtype=np.float64,
1957+
)
1958+
u_overall = np.array([0.5, -0.3], dtype=np.float64)
1959+
eligible_group_ids = np.array([0, 1])
1960+
group_id_to_psu_code = {0: 0, 1: 1}
1961+
with pytest.raises(ValueError, match="no positive-weight observations"):
1962+
est._compute_dcdh_bootstrap(
1963+
n_groups_for_overall=2,
1964+
u_centered_overall=u_overall,
1965+
divisor_overall=4,
1966+
original_overall=0.1,
1967+
group_id_to_psu_code=group_id_to_psu_code,
1968+
eligible_group_ids=eligible_group_ids,
1969+
u_per_period_overall=u_pp_overall_with_leak,
1970+
psu_codes_per_cell=psu_codes_per_cell,
1971+
)
1972+
19381973
def test_bootstrap_cell_level_raises_on_missing_horizon_tensor(self):
19391974
"""Contract: when PSU varies within group, each multi-horizon
19401975
target must supply its per-cell IF tensor; missing one raises

0 commit comments

Comments
 (0)