Skip to content

Commit 0fd4564

Browse files
igerberclaude
andcommitted
Address codex R5 P2+P3 on HAD: stute_test scope + verdict-language accuracy
- P2 (Methodology): tightened stute_test / yatchew_hr_test / class docstring to correctly attribute Assumption 7 (mean-independence pre-trends) to joint_pretrends_test (intercept-only residual form via null_form="mean_independence") rather than to the raw stute_test helper. The raw stute_test always fits dy ~ 1 + d and tests Assumption 8 linearity. Updated all 5 surfaces: stute_test Notes, yatchew_hr_test Notes (now also documents null="linearity" vs null="mean_independence" kwarg correctly, no longer references nonexistent "residual_form"), HeterogeneousAdoptionDiD class docstring (split into 4 distinct ADJACENT condition bullets), REGISTRY HAD checklist L2694 closure, paper-review L192 closure. - P3 (Documentation/Tests): the new workflow / REGISTRY / paper-review prose said the composite verdict surfaces the Assumption 5/6 caveat. Actually the verdict string only flags the Assumption 7 step-2 gap on the aggregate="overall" path. Reworded in 4 surfaces (workflow Notes, HAD class docstring, REGISTRY L2694, paper-review L192) to clarify that the Assumption 5/6 caveat is surfaced by (a) the Design 1 fit-time UserWarning and (b) T21 tutorial prose — NOT by the workflow verdict string. - P3 (Documentation/Tests): yatchew_hr_test Notes referenced a nonexistent "residual_form" selector. Replaced with the correct kwarg name "null" ({"linearity", "mean_independence"}) and described both branches. All 35 methodology tests pass; full HAD + drift sweep 665 passed; lint clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent cde7fa4 commit 0fd4564

4 files changed

Lines changed: 54 additions & 33 deletions

File tree

diff_diff/had.py

Lines changed: 23 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -2615,18 +2615,29 @@ class HeterogeneousAdoptionDiD:
26152615
is on the Design 1 family (``continuous_near_d_lower`` or
26162616
``mass_point``) so users are not silently led to interpret point
26172617
estimates as full point identification. The available pre-tests
2618-
(:func:`diff_diff.qug_test`, :func:`diff_diff.stute_test`,
2619-
:func:`diff_diff.yatchew_hr_test`) verify ADJACENT identifying
2620-
conditions: QUG tests the Theorem 4 / Design 1' support-infimum
2621-
null ``d_lower = 0`` — adjacent evidence on the ``d_lower = 0``
2622-
clause of Assumption 4 only, NOT a test of the full Assumption 4
2623-
statement (which also covers boundary-density positivity,
2624-
conditional-mean smoothness, conditional-variance regularity, and
2625-
bandwidth conditions); Assumption 7 mean-independence pre-trends
2626-
via Stute; Assumption 8 linearity / homogeneity via Yatchew. None
2627-
of these test Assumptions 5 or 6 directly. T21 (HAD pretest
2628-
workflow tutorial) shows the verdict-language convention that
2629-
surfaces this caveat to end users.
2618+
verify ADJACENT identifying conditions:
2619+
2620+
- :func:`diff_diff.qug_test`: Theorem 4 / Design 1' support-infimum
2621+
null ``d_lower = 0`` (adjacent evidence on the ``d_lower = 0``
2622+
clause of Assumption 4 only, NOT a test of the full Assumption 4
2623+
statement which also covers boundary-density positivity,
2624+
conditional-mean smoothness, conditional-variance regularity, and
2625+
bandwidth conditions).
2626+
- :func:`diff_diff.stute_test` / :func:`diff_diff.yatchew_hr_test`:
2627+
Assumption 8 linearity of ``E[ΔY | D_2]`` in ``D_2`` (residuals
2628+
from ``dy ~ 1 + d``).
2629+
- :func:`diff_diff.joint_pretrends_test`: Assumption 7
2630+
mean-independence pre-trends across multi-period placebos
2631+
(intercept-only residual form via ``null_form="mean_independence"``;
2632+
the raw ``stute_test`` / ``yatchew_hr_test`` helpers do NOT cover
2633+
Assumption 7 on their own).
2634+
2635+
None of these test Assumptions 5 or 6 directly. The Assumption 5/6
2636+
non-testability caveat is surfaced by the Design 1 fit-time
2637+
``UserWarning`` and by T21 (HAD pretest workflow tutorial) prose,
2638+
NOT by the composite workflow verdict string (which only flags the
2639+
Assumption 7 step-2 gap on the two-period ``aggregate="overall"``
2640+
path).
26302641
26312642
**Diagnostics coverage.** ``HeterogeneousAdoptionDiDResults.bandwidth_diagnostics``
26322643
and ``.bias_corrected_fit`` are populated only on the continuous

diff_diff/had_pretests.py

Lines changed: 29 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1653,16 +1653,21 @@ def stute_test(
16531653
Notes
16541654
-----
16551655
**Scope (what this test does NOT cover).** ``stute_test`` targets
1656-
paper Assumption 8 (mean-independence of treatment effects /
1657-
pre-trends linearity, depending on the residual definition). It does
1656+
paper Assumption 8 (linearity of ``E[ΔY | D_2]`` in ``D_2``) — the
1657+
raw helper always fits ``dy ~ 1 + d`` and tests the linearity null;
1658+
it does NOT target Assumption 7 mean-independence pre-trends on its
1659+
own. For Assumption 7 mean-independence (residuals from intercept-
1660+
only ``dy ~ 1``), use :func:`joint_pretrends_test` (which routes
1661+
``null_form="mean_independence"`` into the joint CvM core). It does
16581662
NOT and CANNOT test Assumptions 5 and 6 from de Chaisemartin et al.
16591663
(2026) Section 3.1.2, which are required for sign / point
16601664
identification of ``WAS_{d_lower}`` on the Design 1 family
16611665
(``d_lower > 0``). Assumptions 5/6 are non-testable via pre-trends
16621666
(boundary-conditional expectations and counterfactual-mean alignment
1663-
statements). See :class:`HeterogeneousAdoptionDiD` class docstring
1664-
Notes for the full statement and T21 for the verdict-language
1665-
convention that surfaces this gap to end users.
1667+
statements); they are surfaced by the Design 1 fit-time
1668+
``UserWarning`` and by T21 tutorial prose, NOT by the workflow
1669+
verdict string. See :class:`HeterogeneousAdoptionDiD` class
1670+
docstring Notes for the full statement.
16661671
16671672
Sample-size gate: below ``G = 10`` the CvM statistic is not
16681673
well-calibrated. In that case the function emits ``UserWarning`` and
@@ -2141,15 +2146,18 @@ def yatchew_hr_test(
21412146
Notes
21422147
-----
21432148
**Scope (what this test does NOT cover).** ``yatchew_hr_test`` targets
2144-
paper Assumption 8 (linearity of ``E[ΔY | D_2]`` in ``D_2``, or
2145-
mean-independence depending on ``residual_form``). It does NOT and
2146-
CANNOT test Assumptions 5 and 6 from de Chaisemartin et al. (2026)
2147-
Section 3.1.2, which are required for sign / point identification of
2148-
``WAS_{d_lower}`` on the Design 1 family (``d_lower > 0``).
2149-
Assumptions 5/6 are non-testable via pre-trends. See
2150-
:class:`HeterogeneousAdoptionDiD` class docstring Notes for the full
2151-
statement and T21 for the verdict-language convention that surfaces
2152-
this gap to end users.
2149+
paper Assumption 8 (linearity of ``E[ΔY | D_2]`` in ``D_2``) under
2150+
``null="linearity"`` (default); ``null="mean_independence"`` swaps
2151+
the residual definition to intercept-only ``dy ~ 1`` for R parity
2152+
with ``YatchewTest::yatchew_test(order=0)`` on pre-trend placebos.
2153+
It does NOT and CANNOT test Assumptions 5 and 6 from de
2154+
Chaisemartin et al. (2026) Section 3.1.2, which are required for
2155+
sign / point identification of ``WAS_{d_lower}`` on the Design 1
2156+
family (``d_lower > 0``). Assumptions 5/6 are non-testable via
2157+
pre-trends; they are surfaced by the Design 1 fit-time
2158+
``UserWarning`` and by T21 tutorial prose, NOT by the workflow
2159+
verdict string. See :class:`HeterogeneousAdoptionDiD` class
2160+
docstring Notes for the full statement.
21532161
21542162
Sample-size gate: below ``G = 3`` the difference-variance estimator
21552163
is undefined; the function emits ``UserWarning`` and returns NaN
@@ -4599,12 +4607,14 @@ def did_had_pretest_workflow(
45994607
from de Chaisemartin et al. (2026) Section 3.1.2, which are required
46004608
for sign / point identification of ``WAS_{d_lower}`` on the Design 1
46014609
family (``d_lower > 0``). Assumptions 5/6 are non-testable via
4602-
pre-trends. The composite verdict surfaces this gap explicitly via
4603-
its ``"Assumption 7 gap"`` (when QUG defers) and via the
4610+
pre-trends. The composite verdict string does NOT mention
4611+
Assumptions 5 or 6 — it only flags the Assumption 7 step-2 gap on
4612+
the two-period ``aggregate="overall"`` path. The Assumption 5/6
4613+
caveat is surfaced separately by (a) the
46044614
``HeterogeneousAdoptionDiD.fit()`` fit-time ``UserWarning`` (which
4605-
fires whenever the resolved design is Design 1 family). T21 (HAD
4606-
pretest workflow tutorial) shows the recommended user-facing
4607-
verdict-language convention.
4615+
fires whenever the resolved design is Design 1 family
4616+
``continuous_near_d_lower`` or ``mass_point``) and (b) T21 (HAD
4617+
pretest workflow tutorial) tutorial prose.
46084618
46094619
Survey/weighted data (Phase 4.5 C): under ``survey=`` or ``weights=``,
46104620
the workflow:

docs/methodology/REGISTRY.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2691,7 +2691,7 @@ Shipped in `diff_diff/had_pretests.py` as `stute_joint_pretest()` (residuals-in
26912691
- [x] Phase 5 (partial): README catalog one-liner, bundled `llms.txt` `## Estimators` entry, `docs/api/had.rst` (autoclass for the three classes), and `docs/references.rst` citation landed in PR #372 docs refresh.
26922692
- [x] Phase 5 (wave 2 first slice, PR #409): T21 HAD pretest workflow tutorial (`docs/tutorials/21_had_pretest_workflow.ipynb`) — composite pre-test walkthrough for `did_had_pretest_workflow`. Uses a `Uniform[$0.01K, $50K]` dose-distribution variant of T20's brand-campaign panel (true support strictly positive but near-zero, chosen so QUG fails-to-reject `H0: d_lower = 0` in finite sample). Walks through `aggregate="overall"` (Steps 1 + 3 only, verdict explicitly flags Step 2 deferral) and upgrades to `aggregate="event_study"` (joint pre-trends Stute + joint homogeneity Stute close the gap). Side panel exercises both `yatchew_hr_test` null modes (`linearity` vs `mean_independence`). Companion drift-test file `tests/test_t21_had_pretest_workflow_drift.py` (17 tests pinning panel composition, both verdict pivots, structural anchors, deterministic stats, bootstrap p-value tolerance bands per backend, and `HAD(design="auto")` resolution to `continuous_at_zero` on this panel).
26932693
- [x] Phase 5 (wave 2 second slice): T22 weighted/survey HAD tutorial (`docs/tutorials/22_had_survey_design.ipynb`) - shipped as the follow-up to PR #432. End-to-end walkthrough of `HeterogeneousAdoptionDiD` + `did_had_pretest_workflow` under `SurveyDesign(weights, strata, psu, fpc)` on a BRFSS-shape state-rollout panel (5 strata x 6 PSUs/stratum x 2 states/PSU = 60 states; post-stratification raking weights with CV ~ 0.30; FPC = 30 PSUs/stratum). Companion drift-test file `tests/test_t22_had_survey_design_drift.py` (32 tests pinning panel composition, naive-vs-survey SE inflation direction, design auto-detection, event-study cband-vs-pointwise width ordering, `_QUG_DEFERRED_SUFFIX` substring on `report.verdict` for both overall and event-study paths, the distinct `report.summary()` QUG-skip note on the event-study path, deterministic Yatchew sigma2_*, bootstrap p-value anchored windows of total width 0.30 (± 0.15 around seeded centers) per `feedback_strata_bootstrap_path_divergence`, workflow-surface separation between overall and event-study paths, and the weighted point-estimation contract via the `_fit_continuous` algebraic identity).
2694-
- [x] Documentation of non-testability of Assumptions 5 and 6. **Closed 2026-05-20:** `HeterogeneousAdoptionDiD` class docstring carries a "Non-testable assumptions (paper Section 3.1.2)" Notes block; `qug_test` / `stute_test` / `yatchew_hr_test` / `did_had_pretest_workflow` Notes sections carry "Scope (what this test does NOT cover)" clauses explicitly stating they verify ADJACENT assumptions (Assumption 4 / 7 / 8) and CANNOT test Assumptions 5 or 6. Belt-and-suspenders: `HAD.fit()` emits a `UserWarning` in `diff_diff/had.py` (search for "---- Assumption 5/6 warning on Design 1 paths ----") whenever the resolved design is Design 1 family (`continuous_near_d_lower` or `mass_point`). T21 surfaces the caveat to end users via the verdict language.
2694+
- [x] Documentation of non-testability of Assumptions 5 and 6. **Closed 2026-05-20:** `HeterogeneousAdoptionDiD` class docstring carries a "Non-testable assumptions (paper Section 3.1.2)" Notes block; `qug_test` / `stute_test` / `yatchew_hr_test` / `did_had_pretest_workflow` Notes sections carry "Scope (what this test does NOT cover)" clauses explicitly stating they verify ADJACENT identifying conditions (QUG: support-infimum null `d_lower = 0`; Stute / Yatchew: Assumption 8 linearity; `joint_pretrends_test`: Assumption 7 mean-independence) and CANNOT test Assumptions 5 or 6. The composite workflow verdict string does NOT mention Assumptions 5 or 6 — it only flags the Assumption 7 step-2 gap on the two-period `aggregate="overall"` path. The Assumption 5/6 non-testability caveat is surfaced separately by (a) `HAD.fit()`'s fit-time `UserWarning` in `diff_diff/had.py` (search for "---- Assumption 5/6 warning on Design 1 paths ----") which fires whenever the resolved design is Design 1 family (`continuous_near_d_lower` or `mass_point`), and (b) T21 (HAD pretest workflow tutorial) tutorial prose.
26952695
- [x] Warnings for staggered treatment timing (redirect to `ChaisemartinDHaultfoeuille`). **Closed 2026-05-20:** fail-closed `ValueError` at `diff_diff/had.py:1511` (see Deviations § "Library extension: Staggered-timing fail-closed" for the rationale on raising vs warning).
26962696
- [ ] `NotImplementedError` phase pointer when `covariates=` is passed (Theorem 6 future work). **Status 2026-05-20:** current behavior is a Python `TypeError` (the `covariates=` kwarg is not in the `HAD.fit()` signature). Adding an explicit `**kwargs`-trap with `NotImplementedError` and a Theorem 6 pointer is a follow-up PR; tracked in `TODO.md` as Low priority — the existing TypeError is fail-closed.
26972697

docs/methodology/papers/dechaisemartin-2026-review.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -189,7 +189,7 @@ Alternative to Stute when `G` is large or heteroskedasticity is suspected.
189189
- [x] Composite workflow `did_had_pretest_workflow()` (paper Section 4.2-4.3). **Phase 3 implementation (2026-04):** `aggregate="overall"` (default, two-period) runs QUG + Stute + Yatchew on a two-period panel; step 2 is NOT run on this path because a two-period panel has no pre-period placebo horizon. **Phase 3 follow-up (2026-04):** `aggregate="event_study"` (multi-period) runs QUG at F + joint pre-trends Stute + joint homogeneity-linearity Stute; closes the paper step-2 gap.
190190
- [x] Warnings for staggered treatment timing (direct users to existing `ChaisemartinDHaultfoeuille` in diff-diff). **Phase 4 closure (2026-05-20):** fail-closed `ValueError` at `diff_diff/had.py:1511` when multiple first-treat cohorts are detected without `first_treat_col`; the error message directs the user to either supply `first_treat_col` (which activates the last-cohort + never-treated auto-filter per Appendix B.2) or to use `ChaisemartinDHaultfoeuille` (`did_multiplegt_dyn`) for full staggered support. The fail-closed choice (over `UserWarning`) is documented in REGISTRY Deviations § "Staggered-timing fail-closed" as a library extension toward stricter safety than the paper's "Warn" prescription.
191191
- [ ] Warnings for extensive-margin effects / positive mass of untreated (not fatal; suggests running existing DiD). **Status 2026-05-20 (partial):** `qug_test()` filters zero-dose observations upfront with a `UserWarning` naming the exclusion count — surfaces the *presence* of extensive-margin / positive-mass-of-untreated units to users running pre-tests. The paper-language "suggests running existing DiD" recommendation is NOT a separate fit-time warning on the main `HeterogeneousAdoptionDiD.fit()` path; this item remains open as a Low-priority follow-up tracked in `TODO.md`.
192-
- [x] Documentation of non-testability of Assumptions 5 and 6. **Phase 4 closure (2026-05-20):** `HeterogeneousAdoptionDiD.fit()` emits a `UserWarning` at fit time when `resolved_design ∈ {continuous_near_d_lower, mass_point}` (Design 1 family) explicitly flagging that point identification of `WAS_{d_lower}` requires Assumption 6, sign identification requires Assumption 5, and NEITHER is testable via pre-trends (`diff_diff/had.py`, search for "---- Assumption 5/6 warning on Design 1 paths ----"). The `HeterogeneousAdoptionDiD` class docstring + `qug_test` / `stute_test` / `yatchew_hr_test` / `did_had_pretest_workflow` Notes sections cross-reference this and explicitly state that the available pre-tests verify ADJACENT identifying conditions (QUG tests the Theorem 4 / Design 1' support-infimum null `d_lower = 0` — adjacent evidence on the `d_lower = 0` clause of Assumption 4 only, NOT a test of full Assumption 4's boundary-density / conditional-mean smoothness / variance regularity statement; Assumption 7 mean-independence pre-trends via Stute; Assumption 8 linearity / homogeneity via Yatchew) and do NOT and CANNOT test Assumptions 5 or 6 directly. T21 verdict logic surfaces the caveat to end users.
192+
- [x] Documentation of non-testability of Assumptions 5 and 6. **Phase 4 closure (2026-05-20):** `HeterogeneousAdoptionDiD.fit()` emits a `UserWarning` at fit time when `resolved_design ∈ {continuous_near_d_lower, mass_point}` (Design 1 family) explicitly flagging that point identification of `WAS_{d_lower}` requires Assumption 6, sign identification requires Assumption 5, and NEITHER is testable via pre-trends (`diff_diff/had.py`, search for "---- Assumption 5/6 warning on Design 1 paths ----"). The `HeterogeneousAdoptionDiD` class docstring + `qug_test` / `stute_test` / `yatchew_hr_test` / `did_had_pretest_workflow` Notes sections cross-reference this and explicitly state that the available pre-tests verify ADJACENT identifying conditions: QUG tests the Theorem 4 / Design 1' support-infimum null `d_lower = 0` — adjacent evidence on the `d_lower = 0` clause of Assumption 4 only, NOT a test of full Assumption 4's boundary-density / conditional-mean smoothness / variance regularity statement; the raw `stute_test` / `yatchew_hr_test` helpers test Assumption 8 linearity (residuals from `dy ~ 1 + d`); `joint_pretrends_test` tests Assumption 7 mean-independence (intercept-only residuals via `null_form="mean_independence"`). None of these test Assumptions 5 or 6 directly. The composite workflow verdict string does NOT mention Assumptions 5 or 6 — it only flags the Assumption 7 step-2 gap on the two-period `aggregate="overall"` path. The Assumption 5/6 caveat is surfaced separately by the Design 1 fit-time `UserWarning` and by T21 tutorial prose.
193193
- [x] Multi-period event-study extension (Appendix B.2). **Phase 2b implementation (2026-04):** `aggregate="event_study"` returns per-event-time WAS estimates using uniform `F-1` anchor. Staggered-timing contract (see L190 closure for full statement): when `first_treat_col` is supplied, the panel auto-filters to last-cohort + never-treated units with a `UserWarning` per Appendix B.2 prescription; when omitted on a multi-cohort panel, the estimator raises `ValueError` (fail-closed, see REGISTRY § "Library extension: Staggered-timing fail-closed"). Pointwise CIs per horizon (no joint cross-horizon covariance; matches paper's Pierce-Schott Figure 2). Pre-period placebos at `e <= -2`; the anchor `e = -1` is skipped since `ΔY = 0` there by construction.
194194
- [x] Joint Stute tests (paper Section 4.2 step 2 + Section 4.3 joint extension, pages 23-25 + 32). **Phase 3 follow-up (2026-04):** `stute_joint_pretest()` (residuals-in core) + `joint_pretrends_test()` (mean-independence null) + `joint_homogeneity_test()` (linearity null) in `diff_diff/had_pretests.py`. Sum-of-CvMs aggregation, shared-η Mammen wild bootstrap across horizons (Delgado-Manteiga 2001), per-horizon exact-linear short-circuit. Paper Eq (18) linear-trend detrending variant (Section 5.2 Pierce-Schott p=0.51) deferred to Phase 4 replication harness where the published value serves as parity anchor.
195195

0 commit comments

Comments
 (0)