Address PR #346 CI review round 7: P1 defer cluster validation + P3 registry refs

igerber · claude · igerber · commit 792997dd2c40 · 2026-04-20T20:32:29.000-04:00
**P1 (Code Quality): cluster= must truly be ignored on continuous paths**

`HeterogeneousAdoptionDiD.fit()` previously passed `self.cluster` into
`_aggregate_first_difference()` before the design was resolved. The
aggregator validates the cluster column eagerly (missing column,
within-unit variance, NaN ID), so a valid continuous fit could abort
just because a shared config supplied an irrelevant `cluster=`. This
contradicted the documented "ignored with a warning on continuous
paths" contract.

Fix: defer cluster extraction until after design resolution. The
first aggregation call now passes `cluster_col=None` unconditionally;
a second aggregation pass with `cluster_col=cluster_arg` runs only
when `resolved_design == "mass_point"`, which is the only path that
consumes the extracted cluster array. Continuous paths emit the
existing `UserWarning` and proceed to fit without touching the
cluster column at all.

**P3 (Methodology): registry checklist theorem references were stale**

Round 6 fixed the theorem citations in `had.py` and the paper review
doc but missed the Phase 2a checklist line in `REGISTRY.md`, which
still said "Equation 7 / Theorem 3" for Design 1' identification and
"Theorem 4, WAS_{d̲} under Assumption 6" for the continuous-near-d_lower
path. Updated the checklist line to match: Theorem 1 / Equation 3
(identification) + Equation 7 (sample estimator) for Design 1'; Theorem
3 / Equation 11 for WAS_{d̲}.

**Tests (+4 regression):**
- test_missing_cluster_column_on_continuous_only_warns: continuous_at_zero
  + cluster='does_not_exist' -&gt; warn + fit succeeds.
- test_nan_cluster_on_continuous_only_warns: NaN cluster IDs on continuous
  path -&gt; warn + fit succeeds.
- test_within_unit_varying_cluster_on_continuous_only_warns: within-unit-
  varying cluster IDs on continuous -&gt; warn + fit succeeds.
- test_auto_design_ignores_irrelevant_cluster_on_continuous: design='auto'
  resolving to continuous_at_zero also ignores cluster gracefully.

Targeted regression: 145 HAD tests + 524 total across Phase 1 and
adjacent surfaces, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/diff_diff/had.py b/diff_diff/had.py
@@ -1078,16 +1078,21 @@ def fit(
             data, outcome_col, dose_col, time_col, unit_col, first_treat_col
         )
 
-        # ---- Aggregate to unit-level first differences ----
-        d_arr, dy_arr, cluster_arr, _ = _aggregate_first_difference(
+        # ---- Aggregate to unit-level first differences (no cluster yet) ----
+        # Defer cluster validation/extraction until after the design is
+        # resolved: the continuous paths ignore cluster= with a warning,
+        # so a malformed or irrelevant cluster column must not abort a
+        # valid continuous fit. Cluster extraction is re-run below only
+        # when resolved_design == "mass_point".
+        d_arr, dy_arr, _, _ = _aggregate_first_difference(
             data,
             outcome_col,
             dose_col,
             time_col,
             unit_col,
             t_pre,
             t_post,
-            cluster_arg,
+            None,
         )
 
         n_obs = int(d_arr.shape[0])
@@ -1103,6 +1108,25 @@ def fit(
         else:
             resolved_design = design_arg
 
+        # ---- Extract cluster IDs (mass-point path only) ----
+        # Continuous paths ignore cluster= with a warning emitted later in
+        # the dispatch block; the cluster column is not read for them. On
+        # the mass-point path we now re-run the aggregation with
+        # cluster_col so validation (missing column / NaN / within-unit
+        # variance) fires only when cluster is actually going to be used.
+        cluster_arr: Optional[np.ndarray] = None
+        if resolved_design == "mass_point" and cluster_arg is not None:
+            _, _, cluster_arr, _ = _aggregate_first_difference(
+                data,
+                outcome_col,
+                dose_col,
+                time_col,
+                unit_col,
+                t_pre,
+                t_post,
+                cluster_arg,
+            )
+
         # ---- Resolve d_lower ----
         if resolved_design == "continuous_at_zero":
             d_lower_val = 0.0
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
@@ -2316,7 +2316,7 @@ Shipped as `did_had_pretest_workflow()` and surfaced via `practitioner_next_step
 - [x] Phase 1b: Calonico-Cattaneo-Farrell (2018) MSE-optimal bandwidth selector. In-house port of `nprobust::lpbwselect(bwselect="mse-dpi")` (nprobust 0.5.0, SHA `36e4e53`) as `diff_diff.mse_optimal_bandwidth` and `BandwidthResult`, backed by the private `diff_diff._nprobust_port` module (`kernel_W`, `lprobust_bw`, `lpbwselect_mse_dpi`). Three-stage DPI with four `lprobust.bw` calls at orders `q+1`, `q+2`, `q`, `p`. Python matches R to `0.0000%` relative error (i.e., bit-parity within float64 precision, ~8-13 digits agreement) on all five stage bandwidths (`c_bw`, `bw_mp2`, `bw_mp3`, `b_mse`, `h_mse`) across three deterministic DGPs (uniform, Beta(2,2), half-normal) via `benchmarks/R/generate_nprobust_golden.R` → `benchmarks/data/nprobust_mse_dpi_golden.json`. **Note:** `weights=` is currently unsupported (raises `NotImplementedError`); nprobust's `lpbwselect` has no weight argument so there is no parity anchor. Weighted-data support deferred to Phase 2 (survey-design adaptation). **Note (public API scope restriction):** the exported wrapper `mse_optimal_bandwidth` hard-codes the HAD Phase 1b configuration (`p=1`, `deriv=0`, `interior=False`, `vce="nn"`, `nnmatch=3`). The underlying port supports a broader surface (`hc0`/`hc1`/`hc2`/`hc3` variance, interior evaluation, higher `p`), but those paths are not parity-tested against `nprobust` and are deferred. Callers needing the broader surface should use `diff_diff._nprobust_port.lpbwselect_mse_dpi` directly and accept that parity has not been verified on non-HAD configurations. **Note (input contract):** the wrapper enforces HAD's support restriction `D_{g,2} >= 0` (front-door `ValueError` on negative doses and empty inputs). `boundary` must equal `0` (Design 1') or `float(d.min())` (Design 1 continuous-near-d_lower) within float tolerance; off-support values raise `ValueError`. When `boundary ~ 0`, the wrapper additionally requires `d.min() <= 0.05 * median(|d|)` as a Design 1' support plausibility heuristic, chosen to pass the paper's thin-boundary-density DGPs (Beta(2,2), d.min/median ~ 3%) while rejecting substantially off-support samples (U(0.5, 1.0), d.min/median ~ 1.0). Detected mass-point designs (`d.min() > 0` with modal fraction at `d.min() > 2%`) raise `NotImplementedError` pointing to the Phase 2 2SLS path per paper Section 3.2.4.
 - [x] Phase 1c: First-order bias estimator `M̂_{ĥ*_G}` and robust variance `V̂_{ĥ*_G}`. Implemented via Calonico-Cattaneo-Titiunik (2014) bias-combined design matrix `Q.q` in the in-house port `diff_diff._nprobust_port.lprobust` (single-eval-point path of `nprobust::lprobust`, npfunctions.R:177-246).
 - [x] Phase 1c: Bias-corrected CI (Equation 8) with `nprobust` parity. Public wrapper `diff_diff.bias_corrected_local_linear` returns `BiasCorrectedFit` with μ̂-scale point estimate, robust SE, and bias-corrected 95% CI `[tau.bc ± z_{1-α/2} * se.rb]`. The β-scale rescaling from Equation 8, `(1/G) Σ D_{g,2}`, is applied by Phase 2's `HeterogeneousAdoptionDiD.fit()`. Parity against `nprobust::lprobust(..., bwselect="mse-dpi")` is asserted at `atol=1e-12` on `tau_cl`/`tau_bc`/`se_cl`/`se_rb`/`ci_low`/`ci_high` across the three unclustered golden DGPs (DGP 1 and DGP 3 typically land closer to `1e-13`). The Python wrapper computes its own `z_{1-α/2}` via `scipy.stats.norm.ppf` inside `safe_inference()`; R's `qnorm` value is stored in the golden JSON for audit, and the parity harness compares Python's CI bounds to R's pre-computed CI bounds so any residual drift is purely the floating-point arithmetic in `tau.bc ± z * se.rb`, not a critical-value disagreement. The clustered DGP achieves bit-parity (`atol=1e-14`) when cluster IDs are in first-appearance order; otherwise BLAS reduction ordering can drift to `atol=1e-10`. Generator: `benchmarks/R/generate_nprobust_lprobust_golden.R`. **Note:** The wrapper matches nprobust's `rho=1` default (`b = h` in auto mode), so Phase 1b's separately-computed `b_mse` is surfaced via `bandwidth_diagnostics.b_mse` but not applied. **Note (public-API surface restriction):** Phase 1c restricts the public wrapper's `vce` parameter to `"nn"`; hc0/hc1/hc2/hc3 raise `NotImplementedError` and are queued for Phase 2+ pending dedicated R parity goldens. The port-level `diff_diff._nprobust_port.lprobust` still accepts all five vce modes (matching R's `nprobust::lprobust` signature) for callers who need the broader surface and accept that the hc-mode variance path — which reuses p-fit hat-matrix leverage for the q-fit residual in R (lprobust.R:229-241) — has not been separately parity-tested. **Note (Phase 1c internal bug workaround):** The clustered golden DGP 4 uses manual `h=b=0.3` to sidestep an nprobust-internal singleton-cluster shape bug in `lprobust.vce` fired by the mse-dpi pilot fits; the Python port has no equivalent bug.
-- [x] Phase 2a: `HeterogeneousAdoptionDiD` class with separate code paths for Design 1' (`continuous_at_zero`), Design 1 continuous-near-`d̲` (`continuous_near_d_lower`), and Design 1 mass-point. Continuous paths compose Phase 1c's `bias_corrected_local_linear` and form the beta-scale WAS estimate `β̂ = (mean(ΔY) - τ̂_bc) / den` where `τ̂_bc` is the bias-corrected local-linear estimate of the boundary limit `lim_{d↓d̲} E[ΔY | D_2 ≤ d]` and `den = E[D_2]` for Design 1' (paper Equation 7 / Theorem 3) or `den = E[D_2 - d̲]` for Design 1 (paper Theorem 4, `WAS_{d̲}` under Assumption 6). Mass-point path uses a sample-average 2SLS estimator with instrument `1{D_{g,2} > d̲}` (paper Section 3.2.4).
+- [x] Phase 2a: `HeterogeneousAdoptionDiD` class with separate code paths for Design 1' (`continuous_at_zero`), Design 1 continuous-near-`d̲` (`continuous_near_d_lower`), and Design 1 mass-point. Continuous paths compose Phase 1c's `bias_corrected_local_linear` and form the beta-scale WAS estimate `β̂ = (mean(ΔY) - τ̂_bc) / den` where `τ̂_bc` is the bias-corrected local-linear estimate of the boundary limit `lim_{d↓d̲} E[ΔY | D_2 ≤ d]` and `den = E[D_2]` for Design 1' (paper Theorem 1 / Equation 3 identification; Equation 7 sample estimator) or `den = E[D_2 - d̲]` for Design 1 (paper Theorem 3 / Equation 11, `WAS_{d̲}` under Assumption 6). Mass-point path uses a sample-average 2SLS estimator with instrument `1{D_{g,2} > d̲}` (paper Section 3.2.4).
 - [x] Phase 2a: `design="auto"` detection rule (`min_g D_{g,2} < 0.01 · median_g D_{g,2}` → continuous_at_zero; modal-min fraction > 2% → mass_point; else continuous_near_lower). Implemented as strict first-match in `diff_diff.had._detect_design`; when `d.min() == 0` exactly, resolves `continuous_at_zero` unconditionally (modal-min check runs only when `d.min() > 0`). Edge case covered: 3% at `D=0` + 97% `Uniform(0.5, 1)` resolves to `continuous_at_zero`, matching the paper-endorsed Design 1' handling of small-share-of-treated samples.
 - [x] Phase 2a: Panel validator (`diff_diff.had._validate_had_panel`) verifies `D_{g,1} = 0` for all units, rejects negative post-period doses (`D_{g,2} < 0`) front-door on the original (unshifted) scale, rejects `>2` time periods (staggered reduction queued for Phase 2b), and rejects unbalanced panels and NaN in outcome/dose/unit columns. Both Design 1 paths (`continuous_near_d_lower` and `mass_point`) additionally require `d_lower == float(d.min())` within float tolerance; mismatched overrides raise with a pointer to the unsupported (LATE-like / off-support) estimand.
 - [x] Phase 2a: NaN-propagation tests across all 5 inference fields (`att`, `se`, `t_stat`, `p_value`, `conf_int`) via `safe_inference` and `assert_nan_inference` fixture, covering constant-y and degenerate mass-point inputs.
diff --git a/tests/test_had.py b/tests/test_had.py
@@ -1356,6 +1356,58 @@ def test_cluster_name_none_without_cluster(self):
         )
         assert r.cluster_name is None
 
+    def test_missing_cluster_column_on_continuous_only_warns(self):
+        """Review P1 round 7: irrelevant cluster on continuous path must not
+        abort the fit. The cluster column doesn't even need to exist.
+        """
+        d, dy = _dgp_continuous_at_zero(200, seed=0)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD(design="continuous_at_zero", cluster="does_not_exist")
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            r = est.fit(panel, "outcome", "dose", "period", "unit")
+            assert any("cluster" in str(warn.message).lower() for warn in w)
+        assert np.isfinite(r.att)
+        assert r.cluster_name is None
+
+    def test_nan_cluster_on_continuous_only_warns(self):
+        """NaN cluster IDs on continuous path must not abort the fit."""
+        d, dy = _dgp_continuous_at_zero(200, seed=0)
+        cluster_unit = np.repeat(np.arange(100).astype(float), 2)
+        cluster_unit[0] = np.nan
+        panel = _make_panel(d, dy, extra_cols={"state": cluster_unit})
+        est = HeterogeneousAdoptionDiD(design="continuous_at_zero", cluster="state")
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            r = est.fit(panel, "outcome", "dose", "period", "unit")
+            assert any("cluster" in str(warn.message).lower() for warn in w)
+        assert np.isfinite(r.att)
+
+    def test_within_unit_varying_cluster_on_continuous_only_warns(self):
+        """Within-unit-varying cluster IDs on continuous path must not abort."""
+        d, dy = _dgp_continuous_at_zero(200, seed=0)
+        panel = _make_panel(d, dy)
+        # Varies within unit (distinct value per row)
+        panel["state"] = np.arange(len(panel))
+        est = HeterogeneousAdoptionDiD(design="continuous_at_zero", cluster="state")
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            r = est.fit(panel, "outcome", "dose", "period", "unit")
+            assert any("cluster" in str(warn.message).lower() for warn in w)
+        assert np.isfinite(r.att)
+
+    def test_auto_design_ignores_irrelevant_cluster_on_continuous(self):
+        """design='auto' resolving to a continuous path must also ignore cluster."""
+        d, dy = _dgp_continuous_at_zero(500, seed=0)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD(design="auto", cluster="does_not_exist")
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            r = est.fit(panel, "outcome", "dose", "period", "unit")
+            assert any("cluster" in str(warn.message).lower() for warn in w)
+        assert r.design == "continuous_at_zero"
+        assert np.isfinite(r.att)
+
 
 # =============================================================================
 # First-difference aggregation helper