Skip to content

Commit b67c873

Browse files
igerberclaude
andcommitted
Round-12 CI P3s: suppress warning on varying-PSU + drop bootstrap psu= workaround text
**P3 #1 (warning predicate inconsistent with "strictly coarser PSU" contract):** the new bootstrap warning block's comment said the warning fires only on strictly-coarser PSU designs, but the predicate `n_psu_eff_warn < n_groups_eff_warn` could also fire on supported varying-PSU designs whose eligible groups happened to share PSU labels across groups. Detect within-group-varying PSU explicitly (`.groupby("g")["p"].nunique().gt(1).any()`) and suppress the warning in that regime. Under auto-inject PSU=group and under within-group-varying PSU the warning now stays silent, matching the stated contract. **P3 #2 (`_unroll_target_to_cells` suggested `psu=<group_col>` as a bootstrap workaround):** the Registry / CHANGELOG already clarified that `psu=<group_col>` is ONLY a Binder TSL workaround; the cell- level wild PSU bootstrap has no allocator fallback. The helper's docstring and `ValueError` message still advertised it as a bootstrap-path workaround. Dropped that suggestion and explicitly clarified: the varying-PSU bootstrap IS the cell-level path, so there is no legacy-allocator alternative to fall back to — pre-processing the panel is the only workaround on the bootstrap side. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent a520c52 commit b67c873

2 files changed

Lines changed: 48 additions & 22 deletions

File tree

diff_diff/chaisemartin_dhaultfoeuille.py

Lines changed: 35 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2125,32 +2125,34 @@ def fit(
21252125
"instead of replicate weights."
21262126
)
21272127

2128-
# Warning fires only when PSU is strictly coarser than
2129-
# group (multiple eligible groups share a PSU label).
2130-
# Under auto-inject psu=group or PSU that varies within
2131-
# group (each group contributes to >= 1 PSU), the
2132-
# warning should NOT fire because the Hall-Mammen
2133-
# wild PSU bootstrap is either identical to a group-
2134-
# level multiplier bootstrap (PSU=group) or finer-than-
2135-
# group (varying PSU; the cell-level allocator honors
2136-
# the per-cell PSU structure). Count unique PSUs
2137-
# across ALL positive-weight obs of eligible groups,
2138-
# not just the first label per group — under varying
2139-
# PSU a group spans multiple PSUs.
2128+
# Warning fires only when PSU is **strictly coarser
2129+
# than group** on an otherwise within-group-constant
2130+
# design (multiple eligible groups share a PSU label
2131+
# but no group spans more than one PSU). Two regimes
2132+
# are explicitly excluded:
2133+
# - PSU=group (auto-inject default): identity-map
2134+
# fast path — no warning needed.
2135+
# - Within-group-varying PSU: the cell-level
2136+
# allocator honors the per-cell PSU structure;
2137+
# "n_psu < n_groups" is expected whenever cells
2138+
# of a group share a PSU with cells of another
2139+
# group, which does not indicate coarser-than-group
2140+
# clustering in the Hall-Mammen sense.
2141+
# Count unique PSUs across ALL positive-weight obs of
2142+
# eligible groups AND detect within-group-varying
2143+
# PSU; suppress the warning in that regime.
21402144
psu_arr_warn = getattr(resolved_survey, "psu", None)
21412145
if psu_arr_warn is None or _obs_survey_info is None:
21422146
# No PSU info — can't compare to group count.
21432147
n_psu_eff_warn, n_groups_eff_warn = -1, -1
2148+
psu_varies_within_warn = False
21442149
else:
21452150
obs_gids_warn = np.asarray(_obs_survey_info["group_ids"])
21462151
obs_ws_warn = np.asarray(
21472152
_obs_survey_info["weights"], dtype=np.float64
21482153
)
21492154
pos_mask_warn = obs_ws_warn > 0
21502155
psu_codes_warn = np.asarray(psu_arr_warn)
2151-
# Restrict to positive-weight obs whose group is
2152-
# variance-eligible, then count unique PSU labels
2153-
# across that full set (not first-per-group).
21542156
eligible_gid_set = set(_eligible_group_ids)
21552157
elig_obs_mask_warn = pos_mask_warn & np.array(
21562158
[g in eligible_gid_set for g in obs_gids_warn],
@@ -2162,9 +2164,26 @@ def fit(
21622164
len(np.unique(elig_psu_labels_arr))
21632165
)
21642166
n_groups_eff_warn = len(_eligible_group_ids)
2167+
# Detect within-group-varying PSU on the
2168+
# eligible subset so we can suppress the
2169+
# "strictly coarser PSU" warning there.
2170+
psu_varies_within_warn = bool(
2171+
pd.DataFrame({
2172+
"g": obs_gids_warn[elig_obs_mask_warn],
2173+
"p": elig_psu_labels_arr,
2174+
})
2175+
.groupby("g")["p"]
2176+
.nunique()
2177+
.gt(1)
2178+
.any()
2179+
)
21652180
else:
21662181
n_psu_eff_warn, n_groups_eff_warn = -1, -1
2167-
if 0 <= n_psu_eff_warn < n_groups_eff_warn:
2182+
psu_varies_within_warn = False
2183+
if (
2184+
0 <= n_psu_eff_warn < n_groups_eff_warn
2185+
and not psu_varies_within_warn
2186+
):
21682187
warnings.warn(
21692188
f"Bootstrap with survey_design uses Hall-Mammen "
21702189
f"wild multiplier weights at the PSU level "

diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -687,10 +687,14 @@ def _unroll_target_to_cells(
687687
bootstrap contribution. The analytical TSL path shares the same
688688
cell-period allocator and fires a matching guard in
689689
``_survey_se_from_group_if``, so both paths reject this regime
690-
consistently. Documented workarounds: pre-process the panel
691-
(drop late-exit groups or trim to a balanced sub-panel), or use
692-
an explicit ``psu=<group_col>`` so both analytical and bootstrap
693-
paths route through the legacy group-level allocator.
690+
consistently. **Documented workaround (bootstrap path):**
691+
pre-process the panel to remove terminal missingness (drop
692+
late-exit groups or trim to a balanced sub-panel). The within-
693+
group-varying-PSU bootstrap has no allocator fallback — unlike
694+
Binder TSL, where using an explicit ``psu=<group_col>`` routes
695+
the analytical path through the legacy group-level allocator;
696+
that fallback is not available on the bootstrap side because
697+
the cell-level wild PSU bootstrap IS the varying-PSU regime.
694698
695699
Returns ``(u_cell, psu_cell)`` of shape
696700
``(n_valid_cells_in_target,)`` each.
@@ -738,8 +742,11 @@ def _unroll_target_to_cells(
738742
"the same regime, so both paths reject this panel "
739743
"consistently. Pre-process the panel to remove terminal "
740744
"missingness (drop late-exit groups or trim to a balanced "
741-
"sub-panel), or use an explicit `psu=<group_col>` so both "
742-
"paths route through the legacy group-level allocator."
745+
"sub-panel). The within-group-varying-PSU bootstrap has "
746+
"no allocator fallback — unlike Binder TSL, switching to "
747+
"`psu=<group_col>` does not help here because the varying-"
748+
"PSU bootstrap IS the cell-level path, not an analytical "
749+
"surface with a legacy-allocator alternative."
743750
)
744751
return flat_u[mask], flat_psu[mask].astype(np.int64, copy=False)
745752

0 commit comments

Comments
 (0)