Skip to content

Commit 262fc61

Browse files
igerberclaude
andcommitted
Round-5 CI P0: extend sentinel-mass guard to analytical TSL path
Addresses the P0 escalation: recommending `n_bootstrap=0` as a workaround for terminal missingness + within-group-varying PSU was incorrect — the analytical TSL path uses the SAME cell-period allocator and has the same silent mass-drop bug. **Analytical guard parity.** `_survey_se_from_group_if` now computes `W_cell` (per-(g,t) weight totals) and, before the cell-to-obs expansion, checks whether any cell has `W_cell == 0` while the corresponding cohort-recentered IF mass is non-zero. If so, raises a targeted `ValueError` mirroring the bootstrap-side `_unroll_target_to_cells` guard. The message text is aligned across the two paths (same "no positive-weight observations" phrasing) so the regression test matches both. **Docs cleanup.** Removed the "use n_bootstrap=0 as workaround" language from REGISTRY, CHANGELOG, and the fit() docstring. Replaced with the correct workaround: pre-process the panel (drop late-exit groups / trim to a balanced sub-panel), or use an explicit `psu=<group_col>` so the dispatcher routes through the legacy group-level path (which does not use the cell-period allocator and is not affected by the mass-leak). **Regression test update.** The end-to-end fit() regression now asserts `ValueError` on BOTH `n_bootstrap=0` and `n_bootstrap > 0` under the terminally-missing + within-group-varying PSU fixture. This is technically a behavior change for panels previously covered silently by PR #323's cell-period analytical allocator — those panels used to produce finite (but silently mass-dropped) SEs and now raise. The change closes a real silent-correctness bug; the analytical path never had a principled treatment for the leaked mass in the first place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent def6503 commit 262fc61

4 files changed

Lines changed: 71 additions & 28 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1010
### Changed
1111
- Add Zenodo DOI badge to README; upgrade the BibTeX citation block with the concept DOI (`10.5281/zenodo.19646175`) and list author as Isaac Gerber (matching `CITATION.cff`). Add `doi:` and `identifiers:` entries (concept + versioned) to `CITATION.cff`. DOI was minted by Zenodo when v3.1.3 was released.
1212
- **`ChaisemartinDHaultfoeuille` heterogeneity + within-group-varying PSU/strata now supported under Binder TSL** - `fit(heterogeneity=..., survey_design=...)` no longer raises `NotImplementedError` when the resolved design's PSU or strata vary across the cells of a group. On the **Binder TSL** branch (`compute_survey_if_variance`), the heterogeneity WLS coefficient IF is expanded to observation level via the cell-period allocator `ψ_i = ψ_g * (w_i / W_{g, out_idx})` on the post-period cell — the DID_l post-period single-cell convention shipped in v3.1.x. Under PSU=group the PSU-level Binder TSL variance is byte-identical to the previous release (PSU-level aggregate telescopes to `ψ_g`); under within-group-varying PSU, mass lands in the post-period PSU of the transition. The **Rao-Wu replicate-weight** branch (`compute_replicate_if_variance`) retains the legacy group-level allocator `ψ_i = ψ_g * (w_i / W_g)`: replicate variance computes `θ_r = sum_i ratio_ir * ψ_i` at observation level and is therefore not PSU-telescoping, so the cell-period allocator would silently change the replicate SE whenever a replicate column's ratios vary within group (e.g., per-row replicate matrices). Replicate + heterogeneity fits therefore produce byte-identical SE to the previous release, and the newly-unblocked `heterogeneity=` + within-group-varying PSU combination is unreachable under replicate designs by construction (`SurveyDesign` rejects `replicate_weights` combined with explicit `strata/psu/fpc`).
13-
- **`ChaisemartinDHaultfoeuille.fit(survey_design=..., n_bootstrap > 0)` now supports within-group-varying PSU** — the PSU-level Hall-Mammen wild multiplier bootstrap has been extended from a group-level PSU map (one multiplier per group) to a cell-level PSU map (one multiplier per `(g, t)` cell's PSU). A dispatcher in `_compute_dcdh_bootstrap` detects PSU-within-group-constant regimes (including PSU=group auto-inject and strictly-coarser PSU with within-group constancy) and routes them through the legacy group-level path so the bootstrap SE is bit-identical to the previous release (guarded by the new `test_bootstrap_se_matches_pre_pr4_baseline` and the pre-existing `test_auto_inject_bit_identical_to_group_level`). Under within-group-varying PSU, a group contributing cells to multiple PSUs receives independent multiplier draws per PSU — the correct Hall-Mammen wild PSU clustering at cell granularity. Multi-horizon bootstraps draw a single shared `(n_bootstrap, n_psu)` PSU-level weight matrix per block and broadcast per-horizon via each horizon's cell-to-PSU map, so the sup-t simultaneous confidence band remains a valid joint distribution. Closes the last `NotImplementedError` gate in the dCDH survey contract; replicate-weight variance and `n_bootstrap > 0` remain mutually exclusive by construction. **Scope note:** when a panel has *terminal missingness* (groups observed only through an early period) combined with within-group-varying PSU, the cell-level bootstrap raises a targeted `ValueError` — cohort-recentering leaks centered IF mass onto cells with no positive-weight observations, which the cell-level bootstrap cannot allocate to any PSU. Use `n_bootstrap=0` (analytical TSL variance, which supports that regime) on such panels. PSU-within-group-constant regimes (including PSU=group auto-inject) are unaffected.
13+
- **`ChaisemartinDHaultfoeuille.fit(survey_design=..., n_bootstrap > 0)` now supports within-group-varying PSU** — the PSU-level Hall-Mammen wild multiplier bootstrap has been extended from a group-level PSU map (one multiplier per group) to a cell-level PSU map (one multiplier per `(g, t)` cell's PSU). A dispatcher in `_compute_dcdh_bootstrap` detects PSU-within-group-constant regimes (including PSU=group auto-inject and strictly-coarser PSU with within-group constancy) and routes them through the legacy group-level path so the bootstrap SE is bit-identical to the previous release (guarded by the new `test_bootstrap_se_matches_pre_pr4_baseline` and the pre-existing `test_auto_inject_bit_identical_to_group_level`). Under within-group-varying PSU, a group contributing cells to multiple PSUs receives independent multiplier draws per PSU — the correct Hall-Mammen wild PSU clustering at cell granularity. Multi-horizon bootstraps draw a single shared `(n_bootstrap, n_psu)` PSU-level weight matrix per block and broadcast per-horizon via each horizon's cell-to-PSU map, so the sup-t simultaneous confidence band remains a valid joint distribution. Closes the last `NotImplementedError` gate in the dCDH survey contract; replicate-weight variance and `n_bootstrap > 0` remain mutually exclusive by construction. **Scope note:** under survey designs with within-group-varying PSU, panels with *terminal missingness* (groups observed only through an early period) where the terminally-missing group is in a cohort whose other groups still contribute at the missing period now raise a targeted `ValueError` on **both** the cell-level bootstrap and the analytical TSL path. Cohort-recentering leaks centered IF mass onto cells with no positive-weight observations, and both paths share the cell-period allocator that cannot allocate that mass. The analytical guard is new in this release and closes a silent mass-drop bug introduced by the cell-period allocator in v3.1.x; pre-processing the panel (drop late-exit groups or trim to a balanced sub-panel) or using an explicit `psu=<group_col>` so the dispatcher routes through the legacy group-level path is the documented workaround. PSU-within-group-constant regimes (including PSU=group auto-inject) are unaffected.
1414

1515
## [3.1.3] - 2026-04-18
1616

diff_diff/chaisemartin_dhaultfoeuille.py

Lines changed: 43 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -667,13 +667,18 @@ def fit(
667667
contributing cells to multiple PSUs receives independent
668668
multiplier draws per PSU (see the Survey + bootstrap
669669
contract Note in REGISTRY.md). **Scope note (terminal
670-
missingness):** on panels with terminally-missing groups
671-
(early exit / right-censoring) combined with within-group-
672-
varying PSU, the cell-level bootstrap raises
673-
``ValueError`` because cohort-recentering leaks centered
674-
IF mass onto cells with no positive-weight obs. Use
675-
``n_bootstrap=0`` for analytical TSL variance on those
676-
panels. **Replicate weights with ``n_bootstrap > 0``
670+
missingness + within-group-varying PSU):** on panels
671+
where a terminally-missing group is in a cohort whose
672+
other groups still contribute at the missing period,
673+
**both** the cell-level bootstrap and the analytical TSL
674+
path raise a targeted ``ValueError``. Cohort-recentering
675+
leaks centered IF mass onto cells with no positive-
676+
weight obs, which the cell-period allocator cannot
677+
allocate to any observation or PSU. Pre-process the
678+
panel (drop late-exit groups or trim to a balanced
679+
sub-panel), or use an explicit ``psu=<group_col>`` so
680+
the dispatcher routes through the legacy group-level
681+
path. **Replicate weights with ``n_bootstrap > 0``
677682
raises ``NotImplementedError``** (replicate variance is
678683
closed-form; bootstrap would double-count variance). See
679684
REGISTRY.md ``ChaisemartinDHaultfoeuille`` Notes for the
@@ -5885,6 +5890,37 @@ def _survey_se_from_group_if(
58855890
(elig_idx_eff[valid_cell], col_idx_eff[valid_cell]),
58865891
w_eff[valid_cell],
58875892
)
5893+
# Sentinel-mass guard (mirror of `_unroll_target_to_cells` on
5894+
# the bootstrap path). Under terminal missingness,
5895+
# `_cohort_recenter_per_period` subtracts cohort column means
5896+
# across the full period grid, so a group with no observation
5897+
# at period t can acquire non-zero centered mass at that cell.
5898+
# The cell-level expansion `psi_i = U[g,t] * (w_i / W_{g,t})`
5899+
# has no observation to attach that mass to (W_{g,t} = 0), so
5900+
# silently dropping it would understate the SE. Raise a
5901+
# targeted ValueError instead (consistent with the cell-level
5902+
# bootstrap's `_unroll_target_to_cells` guard).
5903+
missing_cell_mask = W_cell == 0
5904+
if missing_cell_mask.any():
5905+
leaked = U_centered_per_period[missing_cell_mask]
5906+
if leaked.size > 0 and bool(
5907+
np.any(np.abs(leaked) > 1e-12)
5908+
):
5909+
raise ValueError(
5910+
"Analytical survey SE cannot be computed on this "
5911+
"panel: cohort-recentered IF mass landed on (g, t) "
5912+
"cells with no positive-weight observations "
5913+
"(W_{g, t} = 0). This typically occurs when "
5914+
"terminal missingness combines with within-group-"
5915+
"varying PSU: _cohort_recenter_per_period subtracts "
5916+
"column means across the full period grid, so a "
5917+
"group with no observation at period t acquires "
5918+
"non-zero centered mass there, which the cell-level "
5919+
"analytical expansion cannot allocate to any "
5920+
"observation. Pre-process the panel to remove "
5921+
"terminal missingness (drop late-exit groups or "
5922+
"trim to a balanced sub-panel) before fitting."
5923+
)
58885924
# Lookup U_centered_per_period and W_cell per row.
58895925
u_obs_cell = np.zeros(w_eff.shape[0], dtype=np.float64)
58905926
u_obs_cell[valid_cell] = U_centered_per_period[

0 commit comments

Comments
 (0)