Address PR #392 R5 review (3 P3, all non-blocking)

igerber · claude · igerber · commit eb27bf5914e9 · 2026-04-26T10:23:48.000-04:00
R5 was ✅ Looks good — only P3 polish remained. All addressed: P3 #1 — exact-pin nprobust: The parity contract runs through nprobust numerical paths (DIDHAD's local-linear bandwidth + bias-correction calls), so a fresh regeneration could drift if CRAN serves a newer nprobust. Pin nprobust == 0.5.0 in both the R generator's stopifnot guard and the parity test's metadata assertion alongside DIDHAD and YatchewTest. P3 #2 — workflow docstring: did_had_pretest_workflow's top-level docstring still said "Eq 18 linear-trend detrending is a Phase 4 follow-up" which contradicts the shipped trends_lin behavior. Updated to describe the forwarding contract (trends_lin → joint_pretrends_test + joint_homogeneity_test, consumed-placebo skip path on minimal panels). Same fix on the StuteJointResult class docstring. P3 #3 — parity test horizon-shape assertions: Added an explicit "missing in Python" assertion in _zip_r_python: every R-mapped event time must be present in Python's event_times (catches future horizon-shape regressions where Python silently drops a horizon R requested). Added an effects+placebo row-count sanity check in test_yatchew_t_stat_parity (uses the previously- unused effects/placebo parametrize values to catch fixture drift). Stats: 540 tests pass, 0 regressions. No estimator/methodology changes — all P3 polish. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
diff --git a/benchmarks/R/generate_did_had_golden.R b/benchmarks/R/generate_did_had_golden.R
@@ -25,13 +25,16 @@ library(jsonlite)
 library(DIDHAD)
 library(YatchewTest)
 
-# PR #392 R4 P3: pin exact upstream versions so future regeneration
-# does not silently re-anchor the goldens to a newer CRAN release
-# while CHANGELOG / REGISTRY / parity test still cite v2.0.0 / SHA
-# `edc09197`. Bump these pins (here AND in the parity test's
-# `test_metadata_versions_match`) when intentionally re-anchoring.
+# PR #392 R4 P3 / R5 P3: pin exact upstream versions so future
+# regeneration does not silently re-anchor the goldens to a newer
+# CRAN release while CHANGELOG / REGISTRY / parity test still cite
+# v2.0.0 / SHA `edc09197`. The parity contract runs through
+# `nprobust` numerical paths so we pin it too. Bump these pins
+# (here AND in the parity test's `test_metadata_versions_match`)
+# when intentionally re-anchoring.
 stopifnot(packageVersion("DIDHAD") == "2.0.0")
 stopifnot(packageVersion("YatchewTest") == "1.1.1")
+stopifnot(packageVersion("nprobust") == "0.5.0")
 
 # -------------------------------------------------------------------------
 # Panel builder: 5-period panel with F=4 (treatment onset at t=4).
diff --git a/diff_diff/had_pretests.py b/diff_diff/had_pretests.py
@@ -433,8 +433,10 @@ class StuteJointResult:
     :func:`joint_pretrends_test` (mean-independence: ``E[Y_t - Y_base | D]
     = mu_t``, design matrix ``[1]``) and :func:`joint_homogeneity_test`
     (linearity: ``E[Y_t - Y_base | D_t] = beta_{0,t} + beta_{fe,t} * D``,
-    design matrix ``[1, D]``). Eq 18 linear-trend detrending (paper
-    Section 5.2 Pierce-Schott application) is a Phase 4 follow-up.
+    design matrix ``[1, D]``). Both wrappers accept a ``trends_lin:
+    bool = False`` keyword-only flag (PR #392): when ``True``, applies
+    paper Eq 17 / Eq 18 linear-trend detrending before the joint CvM
+    using per-group slope ``Y[g, F-1] - Y[g, F-2]``.
 
     Attributes
     ----------
@@ -4322,9 +4324,17 @@ def did_had_pretest_workflow(
     users who need Yatchew robustness under multi-period data should
     call :func:`yatchew_hr_test` on each (base, post) pair manually.
 
-    Eq 18 linear-trend detrending (paper Section 5.2 Pierce-Schott
-    application) is a Phase 4 follow-up; the event-study path here
-    implements the simpler mean-independence / linearity nulls.
+    Eq 17 / Eq 18 linear-trend detrending (paper Section 5.2 Pierce-
+    Schott application) is now SHIPPED on the event-study path via
+    the ``trends_lin`` keyword-only parameter (PR #392 / Phase 4
+    R-parity). When ``trends_lin=True``, this workflow forwards the
+    flag to both :func:`joint_pretrends_test` and
+    :func:`joint_homogeneity_test`; the consumed placebo at
+    ``base_period - 1`` is auto-dropped from step 2 and the workflow
+    skips step 2 (``pretrends_joint=None``) if no earlier placebo
+    survives. Mirrors R ``DIDHAD::did_had(..., trends_lin=TRUE)``.
+    Mutually exclusive with ``aggregate="overall"`` (raises
+    ``NotImplementedError``).
 
     Parameters
     ----------
diff --git a/tests/test_did_had_parity.py b/tests/test_did_had_parity.py
@@ -169,22 +169,38 @@ def _zip_r_python(
     r_result: Dict[str, Any], py_result: Any, trends_lin: bool
 ) -> List[Tuple[int, int, str]]:
     """Build (r_row_idx, py_event_idx, r_rowname) tuples zipping R rows
-    to Python event-time positions for parity assertions."""
+    to Python event-time positions for parity assertions.
+
+    PR #392 R5 P3: also asserts the EXACT mapped event-time set is a
+    subset of Python's ``event_times`` and that the mapping is total
+    over R's reported rows (no R row maps to a missing Python
+    horizon). This catches future horizon-shape regressions where
+    Python silently drops an event-time the R fixture lists."""
     py_event_times = py_result.event_times.tolist()
     py_idx_by_event_time = {int(e): i for i, e in enumerate(py_event_times)}
     pairs = []
     r_event_ids = _as_list(r_result["event_id"])
     r_rownames = _as_list(r_result["rownames"])
+    expected_event_times = []
     for i, (r_id, rowname) in enumerate(zip(r_event_ids, r_rownames)):
         e = _r_id_to_event_time(int(r_id), trends_lin)
+        expected_event_times.append(e)
         if e not in py_idx_by_event_time:
-            # Should not happen for valid fixtures; surface explicit
-            # diagnostic if R reports a row our Python event_times lacks.
             raise AssertionError(
                 f"R row {rowname!r} (ID={r_id}) maps to our e={e}, but "
                 f"Python event_times = {py_event_times}. Mapping bug?"
             )
         pairs.append((i, py_idx_by_event_time[e], rowname))
+    # Exact-shape assertion: every R-mapped event time must be present
+    # in Python's event_times. Length-equality is too strict (Python
+    # may emit additional horizons R didn't request, e.g. e=0 anchor),
+    # but every R row must find a Python counterpart.
+    missing_in_python = set(expected_event_times) - set(py_event_times)
+    assert not missing_in_python, (
+        f"event_times mismatch: R requested {sorted(expected_event_times)} "
+        f"(mapped from R IDs); Python emitted {sorted(py_event_times)}; "
+        f"missing in Python: {sorted(missing_in_python)}."
+    )
     return pairs
 
 
@@ -355,6 +371,20 @@ def test_yatchew_t_stat_parity(
     ):
         r_combo = fixture["fixtures"][dgp_name]["combos"][combo_name]
         r_result = r_combo["result"]
+        # PR #392 R5 P3: assert R's reported (effects + placebo) row
+        # count matches the parametrize spec — catches future fixture
+        # drift where R's effects/placebo args don't actually drive
+        # the row count we expect.
+        n_yatchew_rows = len(_as_list(r_result["yatchew_t"]))
+        # Under trends_lin, R drops one placebo (consumed). Otherwise
+        # rows = effects + placebo (the auto-truncation cap from R is
+        # capped at the panel's max via did_het_adoption_main).
+        expected_rows = effects + placebo - (1 if trends_lin else 0)
+        assert n_yatchew_rows == expected_rows, (
+            f"R fixture row count for {combo_name} = {n_yatchew_rows}, "
+            f"expected effects+placebo{'-1' if trends_lin else ''} = "
+            f"{expected_rows}; fixture/combo spec drift?"
+        )
         if "yatchew_t" not in r_result:
             pytest.fail(
                 f"{combo_name} expected to have yatchew_t in fixture; "
@@ -446,6 +476,15 @@ def test_metadata_versions_match(self, fixture):
             f"{meta['yatchewtest_version']!r}; the parity test pins exactly "
             f"1.1.1. Regenerate after bumping the pin."
         )
+        # PR #392 R5 P3: nprobust is on the parity contract path
+        # (DIDHAD's local-linear bandwidth + bias-correction calls go
+        # through it), so pin it exactly too. Bump in lockstep with
+        # the generator's stopifnot guards.
+        assert meta["nprobust_version"] == "0.5.0", (
+            f"Fixture was generated against nprobust="
+            f"{meta['nprobust_version']!r}; the parity test pins exactly "
+            f"0.5.0. Regenerate after bumping the pin."
+        )
 
     def test_metadata_n_dgps(self, fixture):
         meta = fixture["metadata"]