Address PR #353 CI review round 5 (1 P1 + 1 P3)

igerber · claude · igerber · commit e3f7450022fc · 2026-04-23T21:01:46.000-04:00
P1 - stute_joint_pretest G&lt;_MIN_G_STUTE warn+NaN contract:
The joint core raised `ValueError` on G &lt; 10, while single-horizon
`stute_test` emits a `UserWarning` and returns a NaN result on the
same condition. Because the event-study workflow dispatches into
the joint core for both step-2 pre-trends and step-3 homogeneity,
a staggered panel whose last-cohort auto-filter leaves fewer than
10 units would now crash the workflow instead of surfacing an
inconclusive report - a regression versus Phase 3's two-period
behavior.

Fix: mirror the single-horizon contract. Emit `UserWarning`
("below the minimum ... Returning NaN result") and return a
`StuteJointResult` with `cvm_stat_joint=nan`, `p_value=nan`,
`reject=False`, and a full-NaN `per_horizon_stats` dict keyed by
the validated horizon labels (so the diagnostic surface is
consistent with the NaN-propagation branch). `n_bootstrap &lt;
_MIN_N_BOOTSTRAP` and non-numeric `alpha` still raise; only the
small-G branch relaxes.

Test updates:
- `test_small_G_raises` renamed to `test_small_G_warns_returns_nan`
  and rewritten to assert the new contract.
- New `test_event_study_small_panel_after_filter_inconclusive_not_
  crash` covers the workflow-level regression: a staggered fixture
  with 40 early-cohort + 6 late-cohort units filters to G=6 after
  the validator's last-cohort auto-filter; `did_had_pretest_
  workflow(aggregate="event_study")` now completes without
  exception, emits the "below the minimum" warning, and surfaces a
  NaN joint-Stute report with `all_pass=False`.

P3 - module docstring refresh:
`had_pretests.py` top-level docstring still said Phase 3 shipped
steps 1 + 3 only, that step 2 was deferred, and that
`did_had_pretest_workflow` was a two-period-only entry point. That
drifted after the joint-pretest follow-up landed. Rewrote the
docstring to describe: (a) the three single-horizon tests, (b) the
three new joint helpers (`stute_joint_pretest`,
`joint_pretrends_test`, `joint_homogeneity_test`), (c) both
workflow dispatch modes (`aggregate="overall"` two-period and
`aggregate="event_study"` multi-period), and (d) the narrowed
deferment - only Eq. 18 linear-trend detrending remains, tracked
in TODO for Phase 4 alongside the Pierce-Schott replication.

126 tests pass (125 + 1 new R5 workflow regression, -0 + 1
converted from raise to warn); black/ruff/mypy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/diff_diff/had_pretests.py b/diff_diff/had_pretests.py
@@ -2,8 +2,9 @@
 
 Paper Section 4 (de Chaisemartin, Ciccia, D'Haultfoeuille, Knau 2026,
 arXiv:2405.04465v6) prescribes a four-step pre-testing workflow for TWFE
-validity in HADs. Phase 3 ships steps 1 and 3 of that workflow (step 2 is
-deferred):
+validity in HADs. This module ships the tests and the composite workflow:
+
+Single-horizon tests:
 
 1. :func:`qug_test` - order-statistic ratio test of the support infimum
    ``H_0: d_lower = 0`` (paper Theorem 4). Closed-form, tuning-free.
@@ -14,16 +15,43 @@
    linearity test (paper Theorem 7 / Equation 29). Feasible at
    ``G >= 100k``.
 
-The composite :func:`did_had_pretest_workflow` runs the three implemented
-tests in sequence on a two-period HAD panel and returns a
-:class:`HADPretestReport` with a partial-workflow verdict. When all three
-fail-to-reject, the verdict explicitly flags that **the paper's step 2
-pre-trends test (Assumption 7) is NOT run** — callers do not receive an
-unconditional "TWFE safe" signal; the Assumption 7 check must be performed
-separately (e.g., via an event-study / placebo analysis) until the Phase 3
-follow-up patch lands the joint Equation 18 cross-horizon Stute variant.
-
-See ``docs/methodology/REGISTRY.md`` and ``TODO.md`` for the deferred items.
+Joint / multi-period tests (Phase 3 follow-up):
+
+4. :func:`stute_joint_pretest` - residuals-in core that generalizes the
+   single-horizon Stute CvM to K horizons with shared-η wild bootstrap
+   and sum-of-CvMs aggregation (Delgado 1993; Escanciano 2006).
+5. :func:`joint_pretrends_test` - data-in wrapper for the mean-
+   independence null (paper step 2 pre-trends across pre-period
+   placebos, Section 4.2 footnote 6 + Section 4.3 paragraph 1).
+6. :func:`joint_homogeneity_test` - data-in wrapper for the linearity
+   null across post-periods (paper Section 4.3 joint extension,
+   page 32).
+
+Composite workflow:
+
+:func:`did_had_pretest_workflow` has two dispatch modes:
+
+- ``aggregate="overall"`` (default, two-period panel): runs steps 1 + 3
+  via :func:`qug_test` + :func:`stute_test` + :func:`yatchew_hr_test`.
+  Paper step 2 is NOT run on this path (a two-period panel has no pre-
+  period placebo); the verdict explicitly flags the Assumption 7 gap
+  via the ``"paper step 2 deferred"`` caveat so callers do not get an
+  unconditional "TWFE safe" signal.
+- ``aggregate="event_study"`` (multi-period panel, >= 3 periods): runs
+  QUG at ``F`` + joint pre-trends Stute across earlier pre-periods +
+  joint homogeneity-linearity Stute across post-periods. Closes the
+  paper step-2 gap and does NOT emit the step-2-deferred caveat in the
+  verdict when at least one earlier pre-period is available. Step 4
+  (alternative linearity via Yatchew) is subsumed by joint Stute on
+  this path; the paper does not derive a joint Yatchew variant, so
+  users who need Yatchew robustness under multi-period data can call
+  :func:`yatchew_hr_test` on each ``(base, post)`` pair manually.
+
+Eq. 18 linear-trend detrending (paper Section 5.2 Pierce-Schott
+application, published p=0.51) is the one remaining deferred item;
+tracked in ``TODO.md`` and slated for Phase 4 alongside the replication
+harness. See ``docs/methodology/REGISTRY.md`` for the full algorithm
+narrative, invariants, and deviation notes.
 """
 
 from __future__ import annotations
@@ -1963,8 +1991,16 @@ def stute_joint_pretest(
             f"Found {int(np.sum(doses_arr < 0))} negative value(s)."
         )
 
-    if G < _MIN_G_STUTE:
-        raise ValueError(f"Joint Stute test requires G >= {_MIN_G_STUTE} units; got " f"G = {G}.")
+    # G < _MIN_G_STUTE (CvM statistic not well-calibrated): mirror the
+    # single-horizon `stute_test` contract - warn + return NaN result
+    # rather than raise, so callers (including the event-study workflow
+    # on a staggered panel whose last-cohort filter leaves fewer than
+    # 10 units) get an inconclusive diagnostic instead of a crash. The
+    # NaN return still satisfies the workflow's `np.isfinite(p_value)`
+    # gating, so `all_pass` becomes False downstream.
+    # Note: the actual `warn + return` happens below after horizon
+    # labels are validated and collision-checked, so the NaN result
+    # carries full per-horizon diagnostic keys.
     if n_bootstrap < _MIN_N_BOOTSTRAP:
         raise ValueError(f"n_bootstrap must be >= {_MIN_N_BOOTSTRAP}; got " f"{n_bootstrap}.")
     if not isinstance(alpha, (int, float)) or not (0 < float(alpha) < 1):
@@ -2025,6 +2061,35 @@ def stute_joint_pretest(
     # stringification is injective on the provided keys.
     horizon_labels = str_labels
 
+    # Small-G NaN result (paired with the comment near the top of this
+    # function): mirror the single-horizon stute_test contract so the
+    # event-study workflow on a small or staggered-filtered panel gets
+    # an inconclusive diagnostic rather than an exception. Positioned
+    # AFTER the label-collision / shape-alignment guards so the NaN
+    # result carries a consistent per-horizon diagnostic surface.
+    if G < _MIN_G_STUTE:
+        warnings.warn(
+            f"stute_joint_pretest: G = {G} is below the minimum "
+            f"{_MIN_G_STUTE} for the CvM statistic to be well-calibrated. "
+            f"Returning NaN result.",
+            UserWarning,
+            stacklevel=2,
+        )
+        return StuteJointResult(
+            cvm_stat_joint=float("nan"),
+            p_value=float("nan"),
+            reject=False,
+            alpha=float(alpha),
+            horizon_labels=horizon_labels,
+            per_horizon_stats={k: float("nan") for k in horizon_labels},
+            n_bootstrap=int(n_bootstrap),
+            n_obs=int(G),
+            n_horizons=int(K),
+            seed=None if seed is None else int(seed),
+            null_form=str(null_form),
+            exact_linear_short_circuited=False,
+        )
+
     if any_nan:
         return StuteJointResult(
             cvm_stat_joint=float("nan"),
diff --git a/tests/test_had_pretests.py b/tests/test_had_pretests.py
@@ -1407,18 +1407,29 @@ def test_negative_dose_raises(self):
                 seed=0,
             )
 
-    def test_small_G_raises(self):
+    def test_small_G_warns_returns_nan(self):
+        """R5: G < _MIN_G_STUTE mirrors single-horizon stute_test -
+        warn + NaN result instead of raise. Prevents event-study
+        workflow crash when a last-cohort filter leaves fewer than 10
+        units."""
         G = 5  # below _MIN_G_STUTE (10)
         resid, fit, d = _multi_period_residuals(G, K=2)
-        with pytest.raises(ValueError, match="G >="):
-            stute_joint_pretest(
+        with pytest.warns(UserWarning, match="below the minimum"):
+            result = stute_joint_pretest(
                 residuals_by_horizon=resid,
                 fitted_by_horizon=fit,
                 doses=d,
                 design_matrix=np.ones((G, 1)),
                 n_bootstrap=199,
                 seed=0,
             )
+        assert np.isnan(result.cvm_stat_joint)
+        assert np.isnan(result.p_value)
+        assert result.reject is False
+        assert result.n_obs == G
+        # Full diagnostic surface preserved on the NaN result
+        assert set(result.per_horizon_stats.keys()) == set(str(k) for k in resid.keys())
+        assert all(np.isnan(v) for v in result.per_horizon_stats.values())
 
     def test_small_bootstrap_raises(self):
         G = 50
@@ -2453,6 +2464,60 @@ def test_event_study_all_conclusive_no_reject_admissible(self):
         verdict = _compose_verdict_event_study(qug, pretrends, homogeneity)
         assert "TWFE admissible under Section 4" in verdict
 
+    def test_event_study_small_panel_after_filter_inconclusive_not_crash(self):
+        """R5: staggered-panel last-cohort filter can leave fewer than
+        `_MIN_G_STUTE` (10) units. The joint Stute core must warn +
+        return NaN on small G (matching single-horizon stute_test) so
+        the event-study workflow surfaces an inconclusive report
+        rather than crashing. Regression against the original
+        ValueError-on-G<10 contract."""
+        parts = []
+        # First cohort: 40 units treated at 1999 - will be DROPPED by
+        # the last-cohort filter (F_last=2000 > 1999).
+        # Second cohort: only 6 units treated at 2000 - kept. After
+        # filter G = 6 < _MIN_G_STUTE, so the joint CvM is ill-
+        # calibrated and must return NaN via warn.
+        for cohort_ft, cohort_range in [(1999, (0, 40)), (2000, (40, 46))]:
+            for g in range(*cohort_range):
+                dose = 0.05 + 0.01 * (g - cohort_range[0])
+                for t in [1997, 1998, 1999, 2000, 2001]:
+                    is_post = t >= cohort_ft
+                    parts.append(
+                        {
+                            "unit": g,
+                            "period": t,
+                            "y": 0.1 * g + (0.3 * dose if is_post else 0.0),
+                            "d": dose if is_post else 0.0,
+                            "first_treat": cohort_ft,
+                        }
+                    )
+        df = pd.DataFrame(parts)
+        with warnings.catch_warnings(record=True) as caught:
+            warnings.simplefilter("always")
+            report = did_had_pretest_workflow(
+                df,
+                "y",
+                "d",
+                "period",
+                "unit",
+                first_treat_col="first_treat",
+                aggregate="event_study",
+                n_bootstrap=199,
+                seed=0,
+            )
+        # Workflow must complete (no crash) and surface an inconclusive
+        # report. Both joint tests (pretrends + homogeneity) should
+        # return NaN on the post-filter G=6 panel.
+        assert report.aggregate == "event_study"
+        if report.pretrends_joint is not None:
+            assert np.isnan(report.pretrends_joint.p_value)
+        assert report.homogeneity_joint is not None
+        assert np.isnan(report.homogeneity_joint.p_value)
+        assert report.all_pass is False
+        # At least one "below the minimum" warning from the joint core.
+        msgs = [str(w.message) for w in caught]
+        assert any("below the minimum" in m for m in msgs)
+
 
 class TestOrderedCategoricalChronology:
     """R2 P1 regressions: ordered-categorical time columns whose lexical