Skip to content

Commit e3f7450

Browse files
igerberclaude
andcommitted
Address PR #353 CI review round 5 (1 P1 + 1 P3)
P1 - stute_joint_pretest G<_MIN_G_STUTE warn+NaN contract: The joint core raised `ValueError` on G < 10, while single-horizon `stute_test` emits a `UserWarning` and returns a NaN result on the same condition. Because the event-study workflow dispatches into the joint core for both step-2 pre-trends and step-3 homogeneity, a staggered panel whose last-cohort auto-filter leaves fewer than 10 units would now crash the workflow instead of surfacing an inconclusive report - a regression versus Phase 3's two-period behavior. Fix: mirror the single-horizon contract. Emit `UserWarning` ("below the minimum ... Returning NaN result") and return a `StuteJointResult` with `cvm_stat_joint=nan`, `p_value=nan`, `reject=False`, and a full-NaN `per_horizon_stats` dict keyed by the validated horizon labels (so the diagnostic surface is consistent with the NaN-propagation branch). `n_bootstrap < _MIN_N_BOOTSTRAP` and non-numeric `alpha` still raise; only the small-G branch relaxes. Test updates: - `test_small_G_raises` renamed to `test_small_G_warns_returns_nan` and rewritten to assert the new contract. - New `test_event_study_small_panel_after_filter_inconclusive_not_ crash` covers the workflow-level regression: a staggered fixture with 40 early-cohort + 6 late-cohort units filters to G=6 after the validator's last-cohort auto-filter; `did_had_pretest_ workflow(aggregate="event_study")` now completes without exception, emits the "below the minimum" warning, and surfaces a NaN joint-Stute report with `all_pass=False`. P3 - module docstring refresh: `had_pretests.py` top-level docstring still said Phase 3 shipped steps 1 + 3 only, that step 2 was deferred, and that `did_had_pretest_workflow` was a two-period-only entry point. That drifted after the joint-pretest follow-up landed. Rewrote the docstring to describe: (a) the three single-horizon tests, (b) the three new joint helpers (`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`), (c) both workflow dispatch modes (`aggregate="overall"` two-period and `aggregate="event_study"` multi-period), and (d) the narrowed deferment - only Eq. 18 linear-trend detrending remains, tracked in TODO for Phase 4 alongside the Pierce-Schott replication. 126 tests pass (125 + 1 new R5 workflow regression, -0 + 1 converted from raise to warn); black/ruff/mypy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent db170bd commit e3f7450

2 files changed

Lines changed: 147 additions & 17 deletions

File tree

diff_diff/had_pretests.py

Lines changed: 79 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,9 @@
22
33
Paper Section 4 (de Chaisemartin, Ciccia, D'Haultfoeuille, Knau 2026,
44
arXiv:2405.04465v6) prescribes a four-step pre-testing workflow for TWFE
5-
validity in HADs. Phase 3 ships steps 1 and 3 of that workflow (step 2 is
6-
deferred):
5+
validity in HADs. This module ships the tests and the composite workflow:
6+
7+
Single-horizon tests:
78
89
1. :func:`qug_test` - order-statistic ratio test of the support infimum
910
``H_0: d_lower = 0`` (paper Theorem 4). Closed-form, tuning-free.
@@ -14,16 +15,43 @@
1415
linearity test (paper Theorem 7 / Equation 29). Feasible at
1516
``G >= 100k``.
1617
17-
The composite :func:`did_had_pretest_workflow` runs the three implemented
18-
tests in sequence on a two-period HAD panel and returns a
19-
:class:`HADPretestReport` with a partial-workflow verdict. When all three
20-
fail-to-reject, the verdict explicitly flags that **the paper's step 2
21-
pre-trends test (Assumption 7) is NOT run** — callers do not receive an
22-
unconditional "TWFE safe" signal; the Assumption 7 check must be performed
23-
separately (e.g., via an event-study / placebo analysis) until the Phase 3
24-
follow-up patch lands the joint Equation 18 cross-horizon Stute variant.
25-
26-
See ``docs/methodology/REGISTRY.md`` and ``TODO.md`` for the deferred items.
18+
Joint / multi-period tests (Phase 3 follow-up):
19+
20+
4. :func:`stute_joint_pretest` - residuals-in core that generalizes the
21+
single-horizon Stute CvM to K horizons with shared-η wild bootstrap
22+
and sum-of-CvMs aggregation (Delgado 1993; Escanciano 2006).
23+
5. :func:`joint_pretrends_test` - data-in wrapper for the mean-
24+
independence null (paper step 2 pre-trends across pre-period
25+
placebos, Section 4.2 footnote 6 + Section 4.3 paragraph 1).
26+
6. :func:`joint_homogeneity_test` - data-in wrapper for the linearity
27+
null across post-periods (paper Section 4.3 joint extension,
28+
page 32).
29+
30+
Composite workflow:
31+
32+
:func:`did_had_pretest_workflow` has two dispatch modes:
33+
34+
- ``aggregate="overall"`` (default, two-period panel): runs steps 1 + 3
35+
via :func:`qug_test` + :func:`stute_test` + :func:`yatchew_hr_test`.
36+
Paper step 2 is NOT run on this path (a two-period panel has no pre-
37+
period placebo); the verdict explicitly flags the Assumption 7 gap
38+
via the ``"paper step 2 deferred"`` caveat so callers do not get an
39+
unconditional "TWFE safe" signal.
40+
- ``aggregate="event_study"`` (multi-period panel, >= 3 periods): runs
41+
QUG at ``F`` + joint pre-trends Stute across earlier pre-periods +
42+
joint homogeneity-linearity Stute across post-periods. Closes the
43+
paper step-2 gap and does NOT emit the step-2-deferred caveat in the
44+
verdict when at least one earlier pre-period is available. Step 4
45+
(alternative linearity via Yatchew) is subsumed by joint Stute on
46+
this path; the paper does not derive a joint Yatchew variant, so
47+
users who need Yatchew robustness under multi-period data can call
48+
:func:`yatchew_hr_test` on each ``(base, post)`` pair manually.
49+
50+
Eq. 18 linear-trend detrending (paper Section 5.2 Pierce-Schott
51+
application, published p=0.51) is the one remaining deferred item;
52+
tracked in ``TODO.md`` and slated for Phase 4 alongside the replication
53+
harness. See ``docs/methodology/REGISTRY.md`` for the full algorithm
54+
narrative, invariants, and deviation notes.
2755
"""
2856

2957
from __future__ import annotations
@@ -1963,8 +1991,16 @@ def stute_joint_pretest(
19631991
f"Found {int(np.sum(doses_arr < 0))} negative value(s)."
19641992
)
19651993

1966-
if G < _MIN_G_STUTE:
1967-
raise ValueError(f"Joint Stute test requires G >= {_MIN_G_STUTE} units; got " f"G = {G}.")
1994+
# G < _MIN_G_STUTE (CvM statistic not well-calibrated): mirror the
1995+
# single-horizon `stute_test` contract - warn + return NaN result
1996+
# rather than raise, so callers (including the event-study workflow
1997+
# on a staggered panel whose last-cohort filter leaves fewer than
1998+
# 10 units) get an inconclusive diagnostic instead of a crash. The
1999+
# NaN return still satisfies the workflow's `np.isfinite(p_value)`
2000+
# gating, so `all_pass` becomes False downstream.
2001+
# Note: the actual `warn + return` happens below after horizon
2002+
# labels are validated and collision-checked, so the NaN result
2003+
# carries full per-horizon diagnostic keys.
19682004
if n_bootstrap < _MIN_N_BOOTSTRAP:
19692005
raise ValueError(f"n_bootstrap must be >= {_MIN_N_BOOTSTRAP}; got " f"{n_bootstrap}.")
19702006
if not isinstance(alpha, (int, float)) or not (0 < float(alpha) < 1):
@@ -2025,6 +2061,35 @@ def stute_joint_pretest(
20252061
# stringification is injective on the provided keys.
20262062
horizon_labels = str_labels
20272063

2064+
# Small-G NaN result (paired with the comment near the top of this
2065+
# function): mirror the single-horizon stute_test contract so the
2066+
# event-study workflow on a small or staggered-filtered panel gets
2067+
# an inconclusive diagnostic rather than an exception. Positioned
2068+
# AFTER the label-collision / shape-alignment guards so the NaN
2069+
# result carries a consistent per-horizon diagnostic surface.
2070+
if G < _MIN_G_STUTE:
2071+
warnings.warn(
2072+
f"stute_joint_pretest: G = {G} is below the minimum "
2073+
f"{_MIN_G_STUTE} for the CvM statistic to be well-calibrated. "
2074+
f"Returning NaN result.",
2075+
UserWarning,
2076+
stacklevel=2,
2077+
)
2078+
return StuteJointResult(
2079+
cvm_stat_joint=float("nan"),
2080+
p_value=float("nan"),
2081+
reject=False,
2082+
alpha=float(alpha),
2083+
horizon_labels=horizon_labels,
2084+
per_horizon_stats={k: float("nan") for k in horizon_labels},
2085+
n_bootstrap=int(n_bootstrap),
2086+
n_obs=int(G),
2087+
n_horizons=int(K),
2088+
seed=None if seed is None else int(seed),
2089+
null_form=str(null_form),
2090+
exact_linear_short_circuited=False,
2091+
)
2092+
20282093
if any_nan:
20292094
return StuteJointResult(
20302095
cvm_stat_joint=float("nan"),

tests/test_had_pretests.py

Lines changed: 68 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1407,18 +1407,29 @@ def test_negative_dose_raises(self):
14071407
seed=0,
14081408
)
14091409

1410-
def test_small_G_raises(self):
1410+
def test_small_G_warns_returns_nan(self):
1411+
"""R5: G < _MIN_G_STUTE mirrors single-horizon stute_test -
1412+
warn + NaN result instead of raise. Prevents event-study
1413+
workflow crash when a last-cohort filter leaves fewer than 10
1414+
units."""
14111415
G = 5 # below _MIN_G_STUTE (10)
14121416
resid, fit, d = _multi_period_residuals(G, K=2)
1413-
with pytest.raises(ValueError, match="G >="):
1414-
stute_joint_pretest(
1417+
with pytest.warns(UserWarning, match="below the minimum"):
1418+
result = stute_joint_pretest(
14151419
residuals_by_horizon=resid,
14161420
fitted_by_horizon=fit,
14171421
doses=d,
14181422
design_matrix=np.ones((G, 1)),
14191423
n_bootstrap=199,
14201424
seed=0,
14211425
)
1426+
assert np.isnan(result.cvm_stat_joint)
1427+
assert np.isnan(result.p_value)
1428+
assert result.reject is False
1429+
assert result.n_obs == G
1430+
# Full diagnostic surface preserved on the NaN result
1431+
assert set(result.per_horizon_stats.keys()) == set(str(k) for k in resid.keys())
1432+
assert all(np.isnan(v) for v in result.per_horizon_stats.values())
14221433

14231434
def test_small_bootstrap_raises(self):
14241435
G = 50
@@ -2453,6 +2464,60 @@ def test_event_study_all_conclusive_no_reject_admissible(self):
24532464
verdict = _compose_verdict_event_study(qug, pretrends, homogeneity)
24542465
assert "TWFE admissible under Section 4" in verdict
24552466

2467+
def test_event_study_small_panel_after_filter_inconclusive_not_crash(self):
2468+
"""R5: staggered-panel last-cohort filter can leave fewer than
2469+
`_MIN_G_STUTE` (10) units. The joint Stute core must warn +
2470+
return NaN on small G (matching single-horizon stute_test) so
2471+
the event-study workflow surfaces an inconclusive report
2472+
rather than crashing. Regression against the original
2473+
ValueError-on-G<10 contract."""
2474+
parts = []
2475+
# First cohort: 40 units treated at 1999 - will be DROPPED by
2476+
# the last-cohort filter (F_last=2000 > 1999).
2477+
# Second cohort: only 6 units treated at 2000 - kept. After
2478+
# filter G = 6 < _MIN_G_STUTE, so the joint CvM is ill-
2479+
# calibrated and must return NaN via warn.
2480+
for cohort_ft, cohort_range in [(1999, (0, 40)), (2000, (40, 46))]:
2481+
for g in range(*cohort_range):
2482+
dose = 0.05 + 0.01 * (g - cohort_range[0])
2483+
for t in [1997, 1998, 1999, 2000, 2001]:
2484+
is_post = t >= cohort_ft
2485+
parts.append(
2486+
{
2487+
"unit": g,
2488+
"period": t,
2489+
"y": 0.1 * g + (0.3 * dose if is_post else 0.0),
2490+
"d": dose if is_post else 0.0,
2491+
"first_treat": cohort_ft,
2492+
}
2493+
)
2494+
df = pd.DataFrame(parts)
2495+
with warnings.catch_warnings(record=True) as caught:
2496+
warnings.simplefilter("always")
2497+
report = did_had_pretest_workflow(
2498+
df,
2499+
"y",
2500+
"d",
2501+
"period",
2502+
"unit",
2503+
first_treat_col="first_treat",
2504+
aggregate="event_study",
2505+
n_bootstrap=199,
2506+
seed=0,
2507+
)
2508+
# Workflow must complete (no crash) and surface an inconclusive
2509+
# report. Both joint tests (pretrends + homogeneity) should
2510+
# return NaN on the post-filter G=6 panel.
2511+
assert report.aggregate == "event_study"
2512+
if report.pretrends_joint is not None:
2513+
assert np.isnan(report.pretrends_joint.p_value)
2514+
assert report.homogeneity_joint is not None
2515+
assert np.isnan(report.homogeneity_joint.p_value)
2516+
assert report.all_pass is False
2517+
# At least one "below the minimum" warning from the joint core.
2518+
msgs = [str(w.message) for w in caught]
2519+
assert any("below the minimum" in m for m in msgs)
2520+
24562521

24572522
class TestOrderedCategoricalChronology:
24582523
"""R2 P1 regressions: ordered-categorical time columns whose lexical

0 commit comments

Comments
 (0)