Skip to content

Commit 082368a

Browse files
igerberclaude
andcommitted
Address PR #363 R1 review (1 P0 + 2 P1 + 1 P3)
R1 P0 (methodology): cband_low/high now NaN-gated for horizons with se <= 0 or non-finite, matching the safe_inference contract for pointwise output. Previously a horizon with se=0 produced a finite band equal to the point estimate, overstating precision. R1 P1 (methodology): sup-t multiplier bootstrap now applies stratum-centered + small-sample-corrected PSU aggregation before the matmul, so Var_xi(xi @ Psi_psu_scaled) reproduces the analytical Binder-TSL variance term-for-term (V = sum_h (1-f_h)(n_h/(n_h-1)) sum_j (psi_hj - psi_h_bar)^2). Previously the bootstrap omitted centering and the small-sample correction; under stratified designs the critical value diverged from the analytical target. Verified at H=1 G=400 n_strata=4: bootstrap q=1.985 matches Phi^-1(0.975)=1.960. R1 P1 (code quality): event-study weights= filtering now translates data_filtered.index LABELS to POSITIONAL offsets via data.index.get_indexer, so non-RangeIndex inputs work correctly. Previous code treated index labels as positions and silently broke on custom int / string indices. Raises a clear ValueError when index alignment fails (duplicate or malformed labels). R1 P3 (docs): HeterogeneousAdoptionDiDEventStudyResults docstring updated to describe Phase 4.5 B weighted/survey support + new cband_* fields + variance_formula / effective_dose_mean; fit() docstring updated to reflect the Phase 4.5 B dispatch matrix and document the new cband kwarg. Regression tests: - test_zero_se_horizon_nan_gates_cband locks the P0 NaN gate - test_weights_nonrange_index_aligned_positionally locks the P1 row-order contract across RangeIndex vs custom int indices - Existing TestSupTReducesToNormalAtH1 covers the stratum-centered path via the unit-PSU reduction case Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent a0063b8 commit 082368a

2 files changed

Lines changed: 491 additions & 166 deletions

File tree

diff_diff/had.py

Lines changed: 168 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -537,12 +537,18 @@ class HeterogeneousAdoptionDiDEventStudyResults:
537537
:class:`HeterogeneousAdoptionDiDResults.att` for the per-design
538538
formula, applied to ``ΔY_t = Y_{g,t} - Y_{g,F-1}``).
539539
se : np.ndarray, shape (n_horizons,)
540-
Per-horizon standard error on the beta-scale. Each horizon uses
541-
the INDEPENDENT per-period sandwich from the chosen design path
542-
(continuous: CCT-2014 robust divided by ``|den|``; mass-point:
543-
structural-residual 2SLS sandwich). Pointwise CIs only — joint
544-
cross-horizon covariance is not computed in Phase 2b (paper
545-
reports pointwise CIs per Pierce-Schott).
540+
Per-horizon standard error on the beta-scale. On unweighted fits
541+
each horizon uses the INDEPENDENT per-period sandwich from the
542+
chosen design path (continuous: CCT-2014 robust divided by
543+
``|den|``; mass-point: structural-residual 2SLS sandwich). On
544+
weighted fits (``weights=`` shortcut or ``survey=``) each horizon
545+
uses the Binder (1983) Taylor-series linearization via
546+
:func:`compute_survey_if_variance` on the per-unit β̂-scale IF
547+
(continuous + mass-point both route through the same helper).
548+
Pointwise CIs are always populated; a simultaneous confidence
549+
band is available only on the weighted path via ``cband_*``
550+
below. Joint cross-horizon analytical covariance is not computed
551+
in this release (tracked in TODO.md).
546552
t_stat, p_value : np.ndarray, shape (n_horizons,)
547553
Per-horizon inference triple element.
548554
conf_int_low, conf_int_high : np.ndarray, shape (n_horizons,)
@@ -587,8 +593,44 @@ class HeterogeneousAdoptionDiDEventStudyResults:
587593
cluster_name : str or None
588594
Column name of the cluster variable when cluster-robust SE is
589595
requested. ``None`` otherwise.
590-
survey_metadata : object or None
591-
Always ``None`` in Phase 2b. Field shape kept for future-compat.
596+
survey_metadata : SurveyMetadata or None
597+
Repo-standard survey metadata dataclass from
598+
:class:`diff_diff.survey.SurveyMetadata`. ``None`` when
599+
``fit()`` was called without ``survey=`` or ``weights=``;
600+
populated on the weighted event-study path (Phase 4.5 B). See
601+
:class:`HeterogeneousAdoptionDiDResults.survey_metadata` for
602+
the attribute contract.
603+
variance_formula : str or None
604+
Per-horizon variance family (applied uniformly across horizons).
605+
``"pweight"`` / ``"pweight_2sls"`` on the ``weights=`` shortcut,
606+
``"survey_binder_tsl"`` / ``"survey_binder_tsl_2sls"`` on the
607+
``survey=`` path. ``None`` on unweighted fits.
608+
effective_dose_mean : float or None
609+
Weighted denominator used by the β̂-scale rescaling (continuous
610+
paths: weighted sample mean of ``d`` or ``d - d_lower``;
611+
mass-point: weighted Wald-IV dose gap). ``None`` on unweighted
612+
fits.
613+
cband_low, cband_high : np.ndarray or None, shape (n_horizons,)
614+
Simultaneous confidence-band endpoints constructed by the
615+
multiplier-bootstrap sup-t procedure. ``None`` on unweighted
616+
fits and when ``fit(..., cband=False)`` is passed. Horizons
617+
with ``se <= 0`` or non-finite ``se`` are NaN (matches the
618+
pointwise inference gate from ``safe_inference``).
619+
cband_crit_value : float or None
620+
Sup-t multiplier-bootstrap critical value at level
621+
``1 - alpha``. Under a trivial resolved design (no strata /
622+
PSU / FPC) at ``H=1`` reduces to ``Φ⁻¹(1 − alpha/2) ≈ 1.96``
623+
up to Monte Carlo error; under stratified designs the helper
624+
applies PSU-aggregation + stratum-demeaning + ``sqrt(n_h /
625+
(n_h - 1))`` small-sample correction so the bootstrap
626+
variance matches the analytical Binder-TSL target term-for-
627+
term.
628+
cband_method : str or None
629+
``"multiplier_bootstrap"`` on the weighted event-study path
630+
with ``cband=True``, else ``None``.
631+
cband_n_bootstrap : int or None
632+
Number of multiplier-bootstrap replicates used to compute the
633+
sup-t critical value.
592634
bandwidth_diagnostics : list[BandwidthResult] or None
593635
Per-horizon bandwidth diagnostics on the continuous paths;
594636
``None`` on the mass-point path. When non-None, aligned with
@@ -2023,27 +2065,79 @@ def _sup_t_multiplier_bootstrap(
20232065
psu_weights, psu_ids = generate_survey_multiplier_weights_batch(
20242066
n_bootstrap, resolved_survey, bootstrap_weights, rng
20252067
)
2068+
# Aggregate Psi to PSU level, stratum-demean, and apply the
2069+
# small-sample correction so Var_xi(xi @ Psi_psu_scaled) matches
2070+
# the analytical Binder-TSL variance exactly (review R1 P1).
2071+
# Target:
2072+
# V = sum_h (1 - f_h) (n_h / (n_h - 1)) sum_j (psi_hj - psi_h_bar)²
2073+
# ``generate_survey_multiplier_weights_batch`` already bakes the
2074+
# (1 - f_h) FPC factor into the multipliers, so we only need to
2075+
# pre-process Psi at the PSU level (aggregate → stratum-demean →
2076+
# sqrt(n_h / (n_h - 1)) rescale).
2077+
n_psu = int(psu_weights.shape[1])
2078+
psu_id_to_col = {int(p): c for c, p in enumerate(psu_ids)}
2079+
Psi_psu = np.zeros((n_psu, n_horizons), dtype=np.float64)
20262080
if resolved_survey.psu is not None:
2027-
unit_psu = resolved_survey.psu
2028-
psu_id_to_col = {int(p): c for c, p in enumerate(psu_ids)}
2029-
unit_to_psu_col = np.array([psu_id_to_col[int(unit_psu[i])] for i in range(n_units)])
2081+
unit_psu = np.asarray(resolved_survey.psu)
2082+
for i in range(n_units):
2083+
col = psu_id_to_col[int(unit_psu[i])]
2084+
Psi_psu[col] += influence_matrix[i]
2085+
else:
2086+
# Each unit is its own PSU (psu_ids = np.arange(n_units)).
2087+
Psi_psu = influence_matrix.copy()
2088+
2089+
if resolved_survey.strata is not None:
2090+
strata = np.asarray(resolved_survey.strata)
2091+
# Build PSU -> stratum map (strata constant-within-PSU by
2092+
# SurveyDesign.resolve contract).
2093+
psu_stratum = np.empty(n_psu, dtype=strata.dtype)
2094+
if resolved_survey.psu is not None:
2095+
seen = np.zeros(n_psu, dtype=bool)
2096+
unit_psu = np.asarray(resolved_survey.psu)
2097+
for i in range(n_units):
2098+
col = psu_id_to_col[int(unit_psu[i])]
2099+
if not seen[col]:
2100+
psu_stratum[col] = strata[i]
2101+
seen[col] = True
2102+
else:
2103+
psu_stratum = strata.copy()
2104+
2105+
for h in np.unique(psu_stratum):
2106+
mask_h = psu_stratum == h
2107+
n_h = int(mask_h.sum())
2108+
if n_h < 2:
2109+
# Singleton / empty stratum contributes 0 variance
2110+
# regardless; the helper's lonely-PSU logic already
2111+
# zeros those multipliers. Skip centering to avoid
2112+
# a divide-by-zero on sqrt(n_h / (n_h - 1)).
2113+
continue
2114+
Psi_psu[mask_h] -= Psi_psu[mask_h].mean(axis=0, keepdims=True)
2115+
Psi_psu[mask_h] *= np.sqrt(n_h / (n_h - 1))
20302116
else:
2031-
unit_to_psu_col = np.arange(n_units)
2032-
all_bootstrap_weights = psu_weights[:, unit_to_psu_col] # (B, G)
2117+
# Single implicit stratum — demean across all PSUs, scale by
2118+
# sqrt(n_psu / (n_psu - 1)).
2119+
if n_psu >= 2:
2120+
Psi_psu -= Psi_psu.mean(axis=0, keepdims=True)
2121+
Psi_psu *= np.sqrt(n_psu / (n_psu - 1))
2122+
2123+
# PSU-level perturbations: (B, H) = (B, n_psu) @ (n_psu, H).
2124+
# No (1/n) prefactor — Psi_psu_scaled is already on the θ̂-scale
2125+
# matched to the analytical variance.
2126+
with np.errstate(divide="ignore", invalid="ignore", over="ignore"):
2127+
perturbations = psu_weights @ Psi_psu # (B, H)
20332128
else:
20342129
all_bootstrap_weights = generate_bootstrap_weights_batch(
20352130
n_bootstrap, n_units, bootstrap_weights, rng
20362131
) # (B, G)
2037-
2038-
# Perturbations: (B, H) = (B, G) @ (G, H). Matches staggered:373
2039-
# idiom — no (1/n) prefactor; ``psi`` is already on θ̂-scale.
2040-
# Silence divide/invalid/overflow warnings from the matmul — NaN /
2041-
# inf rows from degenerate horizons propagate and are filtered by
2042-
# the finite-mask below, so these are expected at construction time.
2132+
# Unit-level iid multipliers: no stratum centering needed.
2133+
# Var(xi @ Psi) = sum_g psi_g² matches the trivial analytical
2134+
# variance from compute_survey_if_variance at the IF-scale-
2135+
# invariant tolerance (PR #359 convention).
2136+
with np.errstate(divide="ignore", invalid="ignore", over="ignore"):
2137+
perturbations = all_bootstrap_weights @ influence_matrix # (B, H)
2138+
2139+
# t-statistics via per-horizon analytical SE.
20432140
with np.errstate(divide="ignore", invalid="ignore", over="ignore"):
2044-
perturbations = all_bootstrap_weights @ influence_matrix # (B, H)
2045-
2046-
# t-statistics via per-horizon analytical SE.
20472141
safe_se = np.where(
20482142
(se_per_horizon > 0) & np.isfinite(se_per_horizon),
20492143
se_per_horizon,
@@ -2074,8 +2168,13 @@ def _sup_t_multiplier_bootstrap(
20742168
return float("nan"), None, None, n_valid
20752169

20762170
q = float(np.quantile(sup_t_dist[finite_mask], 1.0 - alpha))
2077-
cband_low = att_per_horizon - q * se_per_horizon
2078-
cband_high = att_per_horizon + q * se_per_horizon
2171+
# NaN-gate simultaneous-band endpoints for degenerate horizons the
2172+
# same way ``safe_inference`` gates pointwise output: a horizon with
2173+
# ``se <= 0`` or non-finite ``se`` gets a NaN band instead of the
2174+
# point estimate ± 0, avoiding misleading precision (review R1 P0).
2175+
se_valid_mask = (se_per_horizon > 0) & np.isfinite(se_per_horizon)
2176+
cband_low = np.where(se_valid_mask, att_per_horizon - q * se_per_horizon, np.nan)
2177+
cband_high = np.where(se_valid_mask, att_per_horizon + q * se_per_horizon, np.nan)
20792178
return q, cband_low, cband_high, n_valid
20802179

20812180

@@ -2693,17 +2792,36 @@ def fit(
26932792
FPC) must be constant within unit (sampling-unit-level
26942793
assignment); within-unit variance raises ``ValueError``.
26952794
Replicate-weight designs raise ``NotImplementedError``
2696-
(Phase 4.5 C). ``design="mass_point"`` and
2697-
``aggregate="event_study"`` raise ``NotImplementedError`` on
2698-
survey/weights (Phase 4.5 B).
2795+
(Phase 4.5 C). Phase 4.5 B support matrix: survey / weights
2796+
are now accepted on ALL design × aggregate combinations
2797+
(continuous × {overall, event-study}, mass-point × {overall,
2798+
event-study}); HAD pretests (``qug_test``, ``stute_test``,
2799+
``yatchew_hr_test``, joint variants,
2800+
``did_had_pretest_workflow``) still don't accept
2801+
survey/weights — deferred to Phase 4.5 C / C0.
26992802
weights : np.ndarray or None
27002803
Per-row sampling weights as a lightweight shortcut equivalent
27012804
to ``survey=SurveyDesign(weights=<col>)``. Produces the same
2702-
ATT; the SE uses lprobust's weighted-robust CCT-2014 formula
2703-
rather than Binder-TSL (no PSU/strata composition). Mutually
2805+
ATT; the SE uses the analytical weighted HC1 sandwich
2806+
(continuous: CCT-2014 weighted-robust; mass-point: pweight
2807+
2SLS sandwich) rather than Binder-TSL. Must be constant
2808+
within each unit; row-order aligned with ``data`` (index
2809+
labels are resolved to positional offsets via
2810+
``data.index.get_indexer``, so custom non-RangeIndex inputs
2811+
work as long as ``data.index`` is unique). Mutually
27042812
exclusive with ``survey=`` — passing both raises
2705-
``ValueError``. Must be constant within each unit (same
2706-
invariant as ``survey=``).
2813+
``ValueError``.
2814+
cband : bool, default True
2815+
Phase 4.5 B: controls the multiplier-bootstrap simultaneous
2816+
confidence band on the weighted event-study path. When
2817+
``True`` (default) and ``aggregate="event_study"`` AND
2818+
``weights=`` or ``survey=`` is supplied, the fit populates
2819+
``cband_low`` / ``cband_high`` / ``cband_crit_value`` /
2820+
``cband_method`` / ``cband_n_bootstrap`` on the result. When
2821+
``False`` those fields stay ``None``. No effect on
2822+
``aggregate="overall"`` or on unweighted event-study.
2823+
``n_bootstrap`` and ``seed`` (constructor params) control
2824+
replicate count and RNG; defaults are 999 / ``None``.
27072825
27082826
Returns
27092827
-------
@@ -3573,11 +3691,25 @@ def _fit_event_study(
35733691
f"weights length ({w_full.shape[0]}) does not match "
35743692
f"data rows ({int(data.shape[0])})."
35753693
)
3576-
# Filter to rows surviving the staggered last-cohort filter.
3577-
# data_filtered.index is the original integer positional index
3578-
# into `data`; use positional slicing via `.iloc` elsewhere,
3579-
# but here `.index` carries the row labels to match.
3580-
w_filtered = w_full[data_filtered.index.to_numpy()]
3694+
# Public ``weights`` contract is ROW-ORDER aligned with
3695+
# ``data``, NOT index-label aligned, so we must translate
3696+
# ``data_filtered``'s surviving index LABELS back to
3697+
# POSITIONAL offsets via ``data.index.get_indexer`` (handles
3698+
# custom int, string, or MultiIndex inputs uniformly; raises
3699+
# on duplicate labels that would make the mapping ambiguous).
3700+
# Review R1 P1: using ``data_filtered.index.to_numpy()`` as
3701+
# positions was a silent-failure vector on non-RangeIndex
3702+
# inputs.
3703+
positional_idx = data.index.get_indexer(data_filtered.index)
3704+
if np.any(positional_idx < 0):
3705+
raise ValueError(
3706+
"Cannot align weights to filtered panel: some "
3707+
"data_filtered rows could not be located in the "
3708+
"original data.index (possible duplicate / malformed "
3709+
"index labels). Pass a DataFrame with a unique index "
3710+
"or reset the index before calling fit()."
3711+
)
3712+
w_filtered = w_full[positional_idx]
35813713
weights_unit = _aggregate_unit_weights(data_filtered, w_filtered, unit_col)
35823714
raw_weights_unit = weights_unit
35833715
elif survey is not None:

0 commit comments

Comments
 (0)