Address codex R6 P3s on HAD: trends_lin already shipped + scope wording

igerber · claude · igerber · commit 16ad99cb7a5a · 2026-05-20T08:18:14.000-04:00
- P3 (Methodology): the promoted HAD materials described the Eq. 17/18 `trends_lin=True` linear-trend-detrended variant as "deferred per Phase 4". This conflated TWO different things: (a) the FEATURE — which is shipped via the `trends_lin: bool = False` keyword-only kwarg on HAD.fit(), joint_pretrends_test, and joint_homogeneity_test (PR igerber#389; R-parity locked against DIDHAD::did_had(trends_lin=TRUE) v2.0.0 in test_did_had_parity.py); and (b) the PIERCE-SCHOTT NUMERICAL REPLICATION against the published p=0.51 anchor on the LBD-restricted panel, which IS waived per REGISTRY Deviations Note igerber#3. Updated 3 surfaces (paper-review L194, METHODOLOGY_REVIEW Eq. 18 Verified-Components row, test_methodology_had.py module docstring + TestHADJointStute class docstring) to distinguish "feature shipped + R-parity locked elsewhere" from "Pierce-Schott numerical replication waived". - P3 (Documentation/Tests): TestHADJointStute promotion narrative overstated H1 coverage as "H0 fail-to-reject and H1 reject on linear vs nonlinear DGPs" for both joint_pretrends_test and joint_homogeneity_test. Reality: H1 rejection is tested only on joint_homogeneity_test via a quadratic post- DGP; joint_pretrends_test gets H0-only coverage in this file (H1 would require a violating-pretrends fixture that re-verifies bootstrap calibration covered by test_had_pretests.py). Narrowed wording in METHODOLOGY_REVIEW Verified-Components row + TestHADJointStute class docstring; CHANGELOG entry unchanged (the H1 reject claim in CHANGELOG explicitly cites the homogeneity side via "H1 reject under nonlinear DGP", which is accurate). All 35 methodology tests pass; lint clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
diff --git a/METHODOLOGY_REVIEW.md b/METHODOLOGY_REVIEW.md
@@ -697,7 +697,7 @@ and covariate-adjusted specifications.)
 - [x] Eq. 11 / Theorem 3 (`WAS_{d_lower}` under Assumption 6, mass-point path) — `tests/test_methodology_had.py::TestHADTheorem3MassPoint` (5 tests including Wald-IV closed-form equivalence at `atol=1e-9`)
 - [x] Theorem 4 (QUG null test, limit law `T_λ = (λ + E_1) / E_2` under Exp(1)/Exp(1)) — `tests/test_methodology_had.py::TestHADTheorem4QUG` (6 tests; MC distributional match against closed-form `F(t) = t/(1+t)` at KS-stat ≤ 0.05, n_draws=5000)
 - [x] Eq. 29 / Theorem 7 (Yatchew-HR linearity test, paper-literal `σ²_diff = 1/(2G)` normalization) — `tests/test_methodology_had.py::TestHADTheorem7YatchewHR` (6 tests; standard-normal limit, normalization lock, both `null="linearity"` and `null="mean_independence"` modes)
-- [x] Eq. 18 mean-independence variant (joint Stute pre-trends + homogeneity, sum-of-CvMs + shared-η Mammen wild bootstrap) — `tests/test_methodology_had.py::TestHADJointStute` (5 tests; H0 fail-to-reject and H1 reject on linear vs. nonlinear DGPs). Eq. 18 linear-trend-detrended variant deferred per REGISTRY checklist (Phase 4 follow-up, `trends_lin=True`).
+- [x] Eq. 18 joint Stute pre-trends + homogeneity (sum-of-CvMs + shared-η Mammen wild bootstrap; both mean-independence and linearity nulls) — `tests/test_methodology_had.py::TestHADJointStute` (5 tests). Coverage scope: H0 fail-to-reject on `joint_pretrends_test` (mean-independence) and `joint_homogeneity_test` (linearity); H1 rejection demonstrated on `joint_homogeneity_test` via a nonlinear DGP. **Out of scope for the new methodology file:** the `trends_lin=True` linear-trend-detrended variant is SHIPPED in the library (R-parity locked against `DIDHAD::did_had(..., trends_lin=TRUE)` v2.0.0; see REGISTRY § "Note (Phase 4 — Eq 17 / Eq 18 linear-trend detrending shipped)" and `tests/test_did_had_parity.py`) but its methodology-walk-through tests are NOT duplicated in `test_methodology_had.py`. Pierce-Schott NUMERICAL replication against the published p=0.51 anchor on the LBD-restricted panel is the waived item (REGISTRY Deviations Note #3).
 - [x] R parity (`chaisemartin::did_had`) at `atol=1e-8` on 3 DGPs × 5 method combos (bit-exact, `rtol=0`) — `tests/test_did_had_parity.py::TestPointSEParity` + `TestYatchewParity` (5 direct parity tests; YatchewTest closed-form parity at `atol=1e-10`)
 - [x] `nprobust` (Calonico-Cattaneo-Farrell) port at machine precision (`atol=1e-14`) — `tests/test_nprobust_port.py` (7 classes spanning kernel constants, QR-based `(X'X)^{-1}`, three-stage MSE-DPI bandwidth, clustered variance, weighted local-linear, single-eval-point parity)
 - [x] Bandwidth selector (CCF MSE-DPI) at 1% tolerance — `tests/test_bandwidth_selector.py` (8 classes covering public-API wrapper, stage diagnostics)
diff --git a/docs/methodology/papers/dechaisemartin-2026-review.md b/docs/methodology/papers/dechaisemartin-2026-review.md
@@ -191,7 +191,7 @@ Alternative to Stute when `G` is large or heteroskedasticity is suspected.
 - [ ] Warnings for extensive-margin effects / positive mass of untreated (not fatal; suggests running existing DiD). **Status 2026-05-20 (partial):** `qug_test()` filters zero-dose observations upfront with a `UserWarning` naming the exclusion count — surfaces the *presence* of extensive-margin / positive-mass-of-untreated units to users running pre-tests. The paper-language "suggests running existing DiD" recommendation is NOT a separate fit-time warning on the main `HeterogeneousAdoptionDiD.fit()` path; this item remains open as a Low-priority follow-up tracked in `TODO.md`.
 - [x] Documentation of non-testability of Assumptions 5 and 6. **Phase 4 closure (2026-05-20):** `HeterogeneousAdoptionDiD.fit()` emits a `UserWarning` at fit time when `resolved_design ∈ {continuous_near_d_lower, mass_point}` (Design 1 family) explicitly flagging that point identification of `WAS_{d_lower}` requires Assumption 6, sign identification requires Assumption 5, and NEITHER is testable via pre-trends (`diff_diff/had.py`, search for "---- Assumption 5/6 warning on Design 1 paths ----"). The `HeterogeneousAdoptionDiD` class docstring + `qug_test` / `stute_test` / `yatchew_hr_test` / `did_had_pretest_workflow` Notes sections cross-reference this and explicitly state that the available pre-tests verify ADJACENT identifying conditions: QUG tests the Theorem 4 / Design 1' support-infimum null `d_lower = 0` — adjacent evidence on the `d_lower = 0` clause of Assumption 4 only, NOT a test of full Assumption 4's boundary-density / conditional-mean smoothness / variance regularity statement; the raw `stute_test` / `yatchew_hr_test` helpers test Assumption 8 linearity (residuals from `dy ~ 1 + d`); `joint_pretrends_test` tests Assumption 7 mean-independence (intercept-only residuals via `null_form="mean_independence"`). None of these test Assumptions 5 or 6 directly. The composite workflow verdict string does NOT mention Assumptions 5 or 6 — it only flags the Assumption 7 step-2 gap on the two-period `aggregate="overall"` path. The Assumption 5/6 caveat is surfaced separately by the Design 1 fit-time `UserWarning` and by T21 tutorial prose.
 - [x] Multi-period event-study extension (Appendix B.2). **Phase 2b implementation (2026-04):** `aggregate="event_study"` returns per-event-time WAS estimates using uniform `F-1` anchor. Staggered-timing contract (see L190 closure for full statement): when `first_treat_col` is supplied, the panel auto-filters to last-cohort + never-treated units with a `UserWarning` per Appendix B.2 prescription; when omitted on a multi-cohort panel, the estimator raises `ValueError` (fail-closed, see REGISTRY § "Library extension: Staggered-timing fail-closed"). Pointwise CIs per horizon (no joint cross-horizon covariance; matches paper's Pierce-Schott Figure 2). Pre-period placebos at `e <= -2`; the anchor `e = -1` is skipped since `ΔY = 0` there by construction.
-- [x] Joint Stute tests (paper Section 4.2 step 2 + Section 4.3 joint extension, pages 23-25 + 32). **Phase 3 follow-up (2026-04):** `stute_joint_pretest()` (residuals-in core) + `joint_pretrends_test()` (mean-independence null) + `joint_homogeneity_test()` (linearity null) in `diff_diff/had_pretests.py`. Sum-of-CvMs aggregation, shared-η Mammen wild bootstrap across horizons (Delgado-Manteiga 2001), per-horizon exact-linear short-circuit. Paper Eq (18) linear-trend detrending variant (Section 5.2 Pierce-Schott p=0.51) deferred to Phase 4 replication harness where the published value serves as parity anchor.
+- [x] Joint Stute tests (paper Section 4.2 step 2 + Section 4.3 joint extension, pages 23-25 + 32). **Phase 3 follow-up (2026-04):** `stute_joint_pretest()` (residuals-in core) + `joint_pretrends_test()` (mean-independence null) + `joint_homogeneity_test()` (linearity null) in `diff_diff/had_pretests.py`. Sum-of-CvMs aggregation, shared-η Mammen wild bootstrap across horizons (Delgado-Manteiga 2001), per-horizon exact-linear short-circuit. **Eq (18) linear-trend detrending variant SHIPPED (PR #389):** the `trends_lin: bool = False` keyword-only kwarg on `HeterogeneousAdoptionDiD.fit(aggregate="event_study")`, `joint_pretrends_test`, and `joint_homogeneity_test` applies the per-group linear-trend slope `Y[g, F-1] - Y[g, F-2]` adjustment. R parity validated against `DIDHAD::did_had(..., trends_lin=TRUE)` v2.0.0 (`Credible-Answers/did_had`) — see REGISTRY § "Note (Phase 4 — Eq 17 / Eq 18 linear-trend detrending shipped)". The Pierce-Schott (2016) NUMERICAL REPLICATION against the published p=0.51 anchor on the LBD-restricted panel is waived per REGISTRY Deviations Note #3.
 
 **Eq (18) transcription (paper page 31):** The Pierce-Schott linear-trend-detrended joint Stute test of pre-trends reads
 ```
diff --git a/tests/test_methodology_had.py b/tests/test_methodology_had.py
@@ -20,8 +20,13 @@
 - Theorem 4 (QUG):     T_lambda = (lambda + E_1) / E_2 limit law, lambda=0
                          under H_0: d_lower = 0
 - Eq. 18 / (Algorithm): joint Stute pre-trends + homogeneity
-                         (mean-independence variant; Eq. 18 detrending
-                         deferred per REGISTRY checklist)
+                         (mean-independence and linearity nulls).
+                         The trends_lin=True linear-trend-detrended
+                         variant is shipped in the library (R-parity
+                         locked against DIDHAD::did_had(trends_lin=TRUE)
+                         in tests/test_did_had_parity.py) but is
+                         OUT OF SCOPE for this methodology file (no
+                         coverage duplication).
 - Eq. 29 / Theorem 7:  T_hr = sqrt(G) (sigma2_lin - sigma2_diff) / sigma2_W
 
 See:
@@ -701,9 +706,23 @@ class TestHADJointStute:
     The library ships the mean-independence variant in
     ``joint_pretrends_test`` (residuals from OLS Y_t - Y_base ~ 1) and
     the linearity (homogeneity) variant in ``joint_homogeneity_test``
-    (residuals from OLS Y_t - Y_base ~ 1 + D). The Eq. 18
-    linear-trend-detrended variant is deferred per REGISTRY (Phase 4
-    follow-up); this class targets the shipped mean-independence variant.
+    (residuals from OLS Y_t - Y_base ~ 1 + D).
+
+    **Coverage scope of this class:** H0 fail-to-reject is exercised
+    for both ``joint_pretrends_test`` (mean-independence null) and
+    ``joint_homogeneity_test`` (linearity null) on a linear-DGP panel
+    where D is independent of pre-Y; H1 rejection is demonstrated on
+    ``joint_homogeneity_test`` only, via a nonlinear (D + D^2) post-
+    period DGP. An H1 violating-pretrends test for
+    ``joint_pretrends_test`` is not added here (a synthetic
+    correlated-D-vs-pre-Y DGP would re-verify the bootstrap
+    calibration covered by ``test_had_pretests.py``).
+
+    The ``trends_lin=True`` Eq. 17 / Eq. 18 linear-trend-detrended
+    variant is SHIPPED in the library and R-parity-locked against
+    ``DIDHAD::did_had(..., trends_lin=TRUE)`` in
+    ``tests/test_did_had_parity.py`` (3 DGPs x 5 method combos at
+    ``atol=1e-8``). It is OUT OF SCOPE for this methodology file.
     """
 
     def _build_multi_period_panel(