Skip to content

Commit 792997d

Browse files
igerberclaude
andcommitted
Address PR #346 CI review round 7: P1 defer cluster validation + P3 registry refs
**P1 (Code Quality): cluster= must truly be ignored on continuous paths** `HeterogeneousAdoptionDiD.fit()` previously passed `self.cluster` into `_aggregate_first_difference()` before the design was resolved. The aggregator validates the cluster column eagerly (missing column, within-unit variance, NaN ID), so a valid continuous fit could abort just because a shared config supplied an irrelevant `cluster=`. This contradicted the documented "ignored with a warning on continuous paths" contract. Fix: defer cluster extraction until after design resolution. The first aggregation call now passes `cluster_col=None` unconditionally; a second aggregation pass with `cluster_col=cluster_arg` runs only when `resolved_design == "mass_point"`, which is the only path that consumes the extracted cluster array. Continuous paths emit the existing `UserWarning` and proceed to fit without touching the cluster column at all. **P3 (Methodology): registry checklist theorem references were stale** Round 6 fixed the theorem citations in `had.py` and the paper review doc but missed the Phase 2a checklist line in `REGISTRY.md`, which still said "Equation 7 / Theorem 3" for Design 1' identification and "Theorem 4, WAS_{d̲} under Assumption 6" for the continuous-near-d_lower path. Updated the checklist line to match: Theorem 1 / Equation 3 (identification) + Equation 7 (sample estimator) for Design 1'; Theorem 3 / Equation 11 for WAS_{d̲}. **Tests (+4 regression):** - test_missing_cluster_column_on_continuous_only_warns: continuous_at_zero + cluster='does_not_exist' -> warn + fit succeeds. - test_nan_cluster_on_continuous_only_warns: NaN cluster IDs on continuous path -> warn + fit succeeds. - test_within_unit_varying_cluster_on_continuous_only_warns: within-unit- varying cluster IDs on continuous -> warn + fit succeeds. - test_auto_design_ignores_irrelevant_cluster_on_continuous: design='auto' resolving to continuous_at_zero also ignores cluster gracefully. Targeted regression: 145 HAD tests + 524 total across Phase 1 and adjacent surfaces, all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent afa0f6c commit 792997d

3 files changed

Lines changed: 80 additions & 4 deletions

File tree

diff_diff/had.py

Lines changed: 27 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1078,16 +1078,21 @@ def fit(
10781078
data, outcome_col, dose_col, time_col, unit_col, first_treat_col
10791079
)
10801080

1081-
# ---- Aggregate to unit-level first differences ----
1082-
d_arr, dy_arr, cluster_arr, _ = _aggregate_first_difference(
1081+
# ---- Aggregate to unit-level first differences (no cluster yet) ----
1082+
# Defer cluster validation/extraction until after the design is
1083+
# resolved: the continuous paths ignore cluster= with a warning,
1084+
# so a malformed or irrelevant cluster column must not abort a
1085+
# valid continuous fit. Cluster extraction is re-run below only
1086+
# when resolved_design == "mass_point".
1087+
d_arr, dy_arr, _, _ = _aggregate_first_difference(
10831088
data,
10841089
outcome_col,
10851090
dose_col,
10861091
time_col,
10871092
unit_col,
10881093
t_pre,
10891094
t_post,
1090-
cluster_arg,
1095+
None,
10911096
)
10921097

10931098
n_obs = int(d_arr.shape[0])
@@ -1103,6 +1108,25 @@ def fit(
11031108
else:
11041109
resolved_design = design_arg
11051110

1111+
# ---- Extract cluster IDs (mass-point path only) ----
1112+
# Continuous paths ignore cluster= with a warning emitted later in
1113+
# the dispatch block; the cluster column is not read for them. On
1114+
# the mass-point path we now re-run the aggregation with
1115+
# cluster_col so validation (missing column / NaN / within-unit
1116+
# variance) fires only when cluster is actually going to be used.
1117+
cluster_arr: Optional[np.ndarray] = None
1118+
if resolved_design == "mass_point" and cluster_arg is not None:
1119+
_, _, cluster_arr, _ = _aggregate_first_difference(
1120+
data,
1121+
outcome_col,
1122+
dose_col,
1123+
time_col,
1124+
unit_col,
1125+
t_pre,
1126+
t_post,
1127+
cluster_arg,
1128+
)
1129+
11061130
# ---- Resolve d_lower ----
11071131
if resolved_design == "continuous_at_zero":
11081132
d_lower_val = 0.0

docs/methodology/REGISTRY.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2316,7 +2316,7 @@ Shipped as `did_had_pretest_workflow()` and surfaced via `practitioner_next_step
23162316
- [x] Phase 1b: Calonico-Cattaneo-Farrell (2018) MSE-optimal bandwidth selector. In-house port of `nprobust::lpbwselect(bwselect="mse-dpi")` (nprobust 0.5.0, SHA `36e4e53`) as `diff_diff.mse_optimal_bandwidth` and `BandwidthResult`, backed by the private `diff_diff._nprobust_port` module (`kernel_W`, `lprobust_bw`, `lpbwselect_mse_dpi`). Three-stage DPI with four `lprobust.bw` calls at orders `q+1`, `q+2`, `q`, `p`. Python matches R to `0.0000%` relative error (i.e., bit-parity within float64 precision, ~8-13 digits agreement) on all five stage bandwidths (`c_bw`, `bw_mp2`, `bw_mp3`, `b_mse`, `h_mse`) across three deterministic DGPs (uniform, Beta(2,2), half-normal) via `benchmarks/R/generate_nprobust_golden.R` → `benchmarks/data/nprobust_mse_dpi_golden.json`. **Note:** `weights=` is currently unsupported (raises `NotImplementedError`); nprobust's `lpbwselect` has no weight argument so there is no parity anchor. Weighted-data support deferred to Phase 2 (survey-design adaptation). **Note (public API scope restriction):** the exported wrapper `mse_optimal_bandwidth` hard-codes the HAD Phase 1b configuration (`p=1`, `deriv=0`, `interior=False`, `vce="nn"`, `nnmatch=3`). The underlying port supports a broader surface (`hc0`/`hc1`/`hc2`/`hc3` variance, interior evaluation, higher `p`), but those paths are not parity-tested against `nprobust` and are deferred. Callers needing the broader surface should use `diff_diff._nprobust_port.lpbwselect_mse_dpi` directly and accept that parity has not been verified on non-HAD configurations. **Note (input contract):** the wrapper enforces HAD's support restriction `D_{g,2} >= 0` (front-door `ValueError` on negative doses and empty inputs). `boundary` must equal `0` (Design 1') or `float(d.min())` (Design 1 continuous-near-d_lower) within float tolerance; off-support values raise `ValueError`. When `boundary ~ 0`, the wrapper additionally requires `d.min() <= 0.05 * median(|d|)` as a Design 1' support plausibility heuristic, chosen to pass the paper's thin-boundary-density DGPs (Beta(2,2), d.min/median ~ 3%) while rejecting substantially off-support samples (U(0.5, 1.0), d.min/median ~ 1.0). Detected mass-point designs (`d.min() > 0` with modal fraction at `d.min() > 2%`) raise `NotImplementedError` pointing to the Phase 2 2SLS path per paper Section 3.2.4.
23172317
- [x] Phase 1c: First-order bias estimator `M̂_{ĥ*_G}` and robust variance `V̂_{ĥ*_G}`. Implemented via Calonico-Cattaneo-Titiunik (2014) bias-combined design matrix `Q.q` in the in-house port `diff_diff._nprobust_port.lprobust` (single-eval-point path of `nprobust::lprobust`, npfunctions.R:177-246).
23182318
- [x] Phase 1c: Bias-corrected CI (Equation 8) with `nprobust` parity. Public wrapper `diff_diff.bias_corrected_local_linear` returns `BiasCorrectedFit` with μ̂-scale point estimate, robust SE, and bias-corrected 95% CI `[tau.bc ± z_{1-α/2} * se.rb]`. The β-scale rescaling from Equation 8, `(1/G) Σ D_{g,2}`, is applied by Phase 2's `HeterogeneousAdoptionDiD.fit()`. Parity against `nprobust::lprobust(..., bwselect="mse-dpi")` is asserted at `atol=1e-12` on `tau_cl`/`tau_bc`/`se_cl`/`se_rb`/`ci_low`/`ci_high` across the three unclustered golden DGPs (DGP 1 and DGP 3 typically land closer to `1e-13`). The Python wrapper computes its own `z_{1-α/2}` via `scipy.stats.norm.ppf` inside `safe_inference()`; R's `qnorm` value is stored in the golden JSON for audit, and the parity harness compares Python's CI bounds to R's pre-computed CI bounds so any residual drift is purely the floating-point arithmetic in `tau.bc ± z * se.rb`, not a critical-value disagreement. The clustered DGP achieves bit-parity (`atol=1e-14`) when cluster IDs are in first-appearance order; otherwise BLAS reduction ordering can drift to `atol=1e-10`. Generator: `benchmarks/R/generate_nprobust_lprobust_golden.R`. **Note:** The wrapper matches nprobust's `rho=1` default (`b = h` in auto mode), so Phase 1b's separately-computed `b_mse` is surfaced via `bandwidth_diagnostics.b_mse` but not applied. **Note (public-API surface restriction):** Phase 1c restricts the public wrapper's `vce` parameter to `"nn"`; hc0/hc1/hc2/hc3 raise `NotImplementedError` and are queued for Phase 2+ pending dedicated R parity goldens. The port-level `diff_diff._nprobust_port.lprobust` still accepts all five vce modes (matching R's `nprobust::lprobust` signature) for callers who need the broader surface and accept that the hc-mode variance path — which reuses p-fit hat-matrix leverage for the q-fit residual in R (lprobust.R:229-241) — has not been separately parity-tested. **Note (Phase 1c internal bug workaround):** The clustered golden DGP 4 uses manual `h=b=0.3` to sidestep an nprobust-internal singleton-cluster shape bug in `lprobust.vce` fired by the mse-dpi pilot fits; the Python port has no equivalent bug.
2319-
- [x] Phase 2a: `HeterogeneousAdoptionDiD` class with separate code paths for Design 1' (`continuous_at_zero`), Design 1 continuous-near-`` (`continuous_near_d_lower`), and Design 1 mass-point. Continuous paths compose Phase 1c's `bias_corrected_local_linear` and form the beta-scale WAS estimate `β̂ = (mean(ΔY) - τ̂_bc) / den` where `τ̂_bc` is the bias-corrected local-linear estimate of the boundary limit `lim_{d↓d̲} E[ΔY | D_2 ≤ d]` and `den = E[D_2]` for Design 1' (paper Equation 7 / Theorem 3) or `den = E[D_2 - d̲]` for Design 1 (paper Theorem 4, `WAS_{d̲}` under Assumption 6). Mass-point path uses a sample-average 2SLS estimator with instrument `1{D_{g,2} > d̲}` (paper Section 3.2.4).
2319+
- [x] Phase 2a: `HeterogeneousAdoptionDiD` class with separate code paths for Design 1' (`continuous_at_zero`), Design 1 continuous-near-`` (`continuous_near_d_lower`), and Design 1 mass-point. Continuous paths compose Phase 1c's `bias_corrected_local_linear` and form the beta-scale WAS estimate `β̂ = (mean(ΔY) - τ̂_bc) / den` where `τ̂_bc` is the bias-corrected local-linear estimate of the boundary limit `lim_{d↓d̲} E[ΔY | D_2 ≤ d]` and `den = E[D_2]` for Design 1' (paper Theorem 1 / Equation 3 identification; Equation 7 sample estimator) or `den = E[D_2 - d̲]` for Design 1 (paper Theorem 3 / Equation 11, `WAS_{d̲}` under Assumption 6). Mass-point path uses a sample-average 2SLS estimator with instrument `1{D_{g,2} > d̲}` (paper Section 3.2.4).
23202320
- [x] Phase 2a: `design="auto"` detection rule (`min_g D_{g,2} < 0.01 · median_g D_{g,2}` → continuous_at_zero; modal-min fraction > 2% → mass_point; else continuous_near_lower). Implemented as strict first-match in `diff_diff.had._detect_design`; when `d.min() == 0` exactly, resolves `continuous_at_zero` unconditionally (modal-min check runs only when `d.min() > 0`). Edge case covered: 3% at `D=0` + 97% `Uniform(0.5, 1)` resolves to `continuous_at_zero`, matching the paper-endorsed Design 1' handling of small-share-of-treated samples.
23212321
- [x] Phase 2a: Panel validator (`diff_diff.had._validate_had_panel`) verifies `D_{g,1} = 0` for all units, rejects negative post-period doses (`D_{g,2} < 0`) front-door on the original (unshifted) scale, rejects `>2` time periods (staggered reduction queued for Phase 2b), and rejects unbalanced panels and NaN in outcome/dose/unit columns. Both Design 1 paths (`continuous_near_d_lower` and `mass_point`) additionally require `d_lower == float(d.min())` within float tolerance; mismatched overrides raise with a pointer to the unsupported (LATE-like / off-support) estimand.
23222322
- [x] Phase 2a: NaN-propagation tests across all 5 inference fields (`att`, `se`, `t_stat`, `p_value`, `conf_int`) via `safe_inference` and `assert_nan_inference` fixture, covering constant-y and degenerate mass-point inputs.

tests/test_had.py

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1356,6 +1356,58 @@ def test_cluster_name_none_without_cluster(self):
13561356
)
13571357
assert r.cluster_name is None
13581358

1359+
def test_missing_cluster_column_on_continuous_only_warns(self):
1360+
"""Review P1 round 7: irrelevant cluster on continuous path must not
1361+
abort the fit. The cluster column doesn't even need to exist.
1362+
"""
1363+
d, dy = _dgp_continuous_at_zero(200, seed=0)
1364+
panel = _make_panel(d, dy)
1365+
est = HeterogeneousAdoptionDiD(design="continuous_at_zero", cluster="does_not_exist")
1366+
with warnings.catch_warnings(record=True) as w:
1367+
warnings.simplefilter("always")
1368+
r = est.fit(panel, "outcome", "dose", "period", "unit")
1369+
assert any("cluster" in str(warn.message).lower() for warn in w)
1370+
assert np.isfinite(r.att)
1371+
assert r.cluster_name is None
1372+
1373+
def test_nan_cluster_on_continuous_only_warns(self):
1374+
"""NaN cluster IDs on continuous path must not abort the fit."""
1375+
d, dy = _dgp_continuous_at_zero(200, seed=0)
1376+
cluster_unit = np.repeat(np.arange(100).astype(float), 2)
1377+
cluster_unit[0] = np.nan
1378+
panel = _make_panel(d, dy, extra_cols={"state": cluster_unit})
1379+
est = HeterogeneousAdoptionDiD(design="continuous_at_zero", cluster="state")
1380+
with warnings.catch_warnings(record=True) as w:
1381+
warnings.simplefilter("always")
1382+
r = est.fit(panel, "outcome", "dose", "period", "unit")
1383+
assert any("cluster" in str(warn.message).lower() for warn in w)
1384+
assert np.isfinite(r.att)
1385+
1386+
def test_within_unit_varying_cluster_on_continuous_only_warns(self):
1387+
"""Within-unit-varying cluster IDs on continuous path must not abort."""
1388+
d, dy = _dgp_continuous_at_zero(200, seed=0)
1389+
panel = _make_panel(d, dy)
1390+
# Varies within unit (distinct value per row)
1391+
panel["state"] = np.arange(len(panel))
1392+
est = HeterogeneousAdoptionDiD(design="continuous_at_zero", cluster="state")
1393+
with warnings.catch_warnings(record=True) as w:
1394+
warnings.simplefilter("always")
1395+
r = est.fit(panel, "outcome", "dose", "period", "unit")
1396+
assert any("cluster" in str(warn.message).lower() for warn in w)
1397+
assert np.isfinite(r.att)
1398+
1399+
def test_auto_design_ignores_irrelevant_cluster_on_continuous(self):
1400+
"""design='auto' resolving to a continuous path must also ignore cluster."""
1401+
d, dy = _dgp_continuous_at_zero(500, seed=0)
1402+
panel = _make_panel(d, dy)
1403+
est = HeterogeneousAdoptionDiD(design="auto", cluster="does_not_exist")
1404+
with warnings.catch_warnings(record=True) as w:
1405+
warnings.simplefilter("always")
1406+
r = est.fit(panel, "outcome", "dose", "period", "unit")
1407+
assert any("cluster" in str(warn.message).lower() for warn in w)
1408+
assert r.design == "continuous_at_zero"
1409+
assert np.isfinite(r.att)
1410+
13591411

13601412
# =============================================================================
13611413
# First-difference aggregation helper

0 commit comments

Comments
 (0)