Skip to content

Commit 4a24758

Browse files
igerberclaude
andcommitted
Address PR #402 R7 review (1 P1 + 1 P2)
P1 step-2 / Assumption 7 closure precondition: per docs/methodology/REGISTRY.md HeterogeneousAdoptionDiD § "Assumption 7 / step 2 closure" + had_pretests.py:4738-4756 + 2769, aggregate="event_study" closes paper Section 4.2 step 2 ONLY IF the panel has at least one earlier placebo pre-period beyond the base F-1. With only the base F-1 pre-period available (minimal 3-period event-study, or 4-period under trends_lin=True where the consumed F-2 placebo is auto-dropped), the workflow sets pretrends_joint=None, all_pass=False, and appends 'joint pre-trends skipped (no earlier pre-period)' to the verdict - step 2 stays uncovered. The previous Step-3 wording in both _handle_had and _handle_had_event_study + the HAD Pretests intro in llms-full.txt said generically that aggregate="event_study" closes the step-2 gap, which is overbroad and could mislead agents on minimal valid event-study panels. All three surfaces now make the precondition explicit AND document the pretrends_joint=None / 'joint pre-trends skipped' fallback verdict so agents know what to expect when the precondition fails. P2 missing regression coverage: the prior tests locked assumption labels and the QUG-under-survey caveat but did not lock the earlier-pre-period precondition - that is why the overstatement landed in the new agent-facing surfaces without tripping the existing guide / practitioner tests. Tests added (2 new, 94 total): - test_had_step_3_documents_earlier_pre_period_precondition_for_step_2: asserts both HAD handler variants surface the 'earlier pre-period' / placebo precondition AND the pretrends_joint=None / 'joint pre-trends skipped' fallback. - test_llms_full_had_pretests_documents_earlier_pre_period_precondition: same lock on the HAD Pretests section in llms-full.txt. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent da24a59 commit 4a24758

4 files changed

Lines changed: 78 additions & 5 deletions

File tree

diff_diff/guides/llms-full.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1406,7 +1406,7 @@ results = did.fit(data, outcome='y', treatment='treated', time='post')
14061406

14071407
## HAD Pretests
14081408

1409-
Diagnostic pretests for the `HeterogeneousAdoptionDiD` identifying assumptions (de Chaisemartin, Ciccia, D'Haultfœuille & Knau 2026). The composite workflow `did_had_pretest_workflow` is the recommended entry point — call it before reporting WAS as causal. The workflow follows paper Section 4.2's three-step battery: **step 1** is the QUG support-infimum test (decides whether Design 1' or Design 1 applies); **step 2** is the Assumption 7 pre-trends test (joint Stute on the event-study path; explicitly NOT covered on the overall path because a single-pre-period panel cannot support the joint variant); **step 3** is the Assumption 8 linearity test (`stute_test` or `yatchew_hr_test`). On the default `aggregate="overall"` path the workflow runs steps 1 + 3 only and the returned `verdict` flags the Assumption 7 gap; pass `aggregate="event_study"` on a multi-period panel to close that gap.
1409+
Diagnostic pretests for the `HeterogeneousAdoptionDiD` identifying assumptions (de Chaisemartin, Ciccia, D'Haultfœuille & Knau 2026). The composite workflow `did_had_pretest_workflow` is the recommended entry point — call it before reporting WAS as causal. The workflow follows paper Section 4.2's three-step battery: **step 1** is the QUG support-infimum test (decides whether Design 1' or Design 1 applies); **step 2** is the Assumption 7 pre-trends test (joint Stute on the event-study path; explicitly NOT covered on the overall path because a single-pre-period panel cannot support the joint variant); **step 3** is the Assumption 8 linearity test (`stute_test` or `yatchew_hr_test`). On the default `aggregate="overall"` path the workflow runs steps 1 + 3 only and the returned `verdict` flags the Assumption 7 gap; pass `aggregate="event_study"` on a multi-period panel **with at least one earlier placebo pre-period beyond the base `F-1`** to close that gap. With only the base `F-1` pre-period available (minimal 3-period event-study, or 4-period under `trends_lin=True` where the consumed `F-2` placebo is dropped), the workflow still sets `pretrends_joint=None`, `all_pass=False`, and appends `joint pre-trends skipped (no earlier pre-period)` to the verdict — step 2 stays uncovered.
14101410

14111411
```python
14121412
from diff_diff import (

diff_diff/practitioner.py

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -865,7 +865,13 @@ def _handle_had(results: Any):
865865
"path - a single pre-period cannot support the joint "
866866
"Stute variant - and the returned verdict explicitly "
867867
"flags that gap. To close step 2, refit on a multi-period "
868-
"panel with aggregate='event_study'. On survey-weighted "
868+
"panel with aggregate='event_study' AND verify the panel "
869+
"has at least one earlier placebo pre-period beyond F-1; "
870+
"if only the base pre-period F-1 is available, the "
871+
"workflow still sets pretrends_joint=None, all_pass=False, "
872+
"and a 'joint pre-trends skipped (no earlier pre-period)' "
873+
"verdict suffix - in that case step 2 stays uncovered "
874+
"even on the event-study path. On survey-weighted "
869875
"fits (survey_design= / survey= / weights=) the workflow "
870876
"skips QUG with a UserWarning (permanent Phase 4.5 C0 "
871877
"deferral - extreme order statistics are not smooth "
@@ -1010,9 +1016,15 @@ def _handle_had_event_study(results: Any):
10101016
"On multi-period unweighted panels, did_had_pretest_workflow "
10111017
"with aggregate='event_study' runs QUG plus joint Stute "
10121018
"pre-trends plus joint homogeneity-linearity Stute. The "
1013-
"joint Stute variants close the paper Section 4.2 step-2 "
1014-
"gap that the overall path explicitly flags as deferred. "
1015-
"On survey-weighted fits (survey_design= / survey= / "
1019+
"joint Stute pre-trends variant closes the paper Section "
1020+
"4.2 step-2 gap ONLY IF the panel carries at least one "
1021+
"earlier placebo pre-period beyond the base F-1. With "
1022+
"only the base F-1 pre-period present (e.g. a minimal "
1023+
"valid 3-period event-study fit, or a 4-period fit under "
1024+
"trends_lin=True where the consumed F-2 placebo gets "
1025+
"dropped), pretrends_joint=None, all_pass=False, and the "
1026+
"verdict carries 'joint pre-trends skipped (no earlier "
1027+
"pre-period)' - step 2 stays uncovered. On survey-weighted fits (survey_design= / survey= / "
10161028
"weights=) the workflow skips QUG with a UserWarning "
10171029
"(permanent Phase 4.5 C0 deferral) and returns a "
10181030
"linearity-conditional verdict only - so step 1 coverage "

tests/test_guides.py

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -547,6 +547,30 @@ def test_llms_practitioner_step_4_distinguishes_had_from_continuous(self):
547547
"ContinuousDiD routing."
548548
)
549549

550+
def test_llms_full_had_pretests_documents_earlier_pre_period_precondition(self):
551+
# Same precondition as the practitioner test: per
552+
# docs/methodology/REGISTRY.md HeterogeneousAdoptionDiD
553+
# § "Assumption 7 / step 2 closure" + had_pretests.py:4738-4756 +
554+
# 2769, aggregate="event_study" closes step 2 ONLY IF the
555+
# panel carries at least one earlier placebo pre-period beyond
556+
# the base F-1. The HAD Pretests section in llms-full.txt must
557+
# document this precondition so agents do not assume any
558+
# multi-period event-study fit closes step 2.
559+
text = get_llm_guide("full")
560+
pretests_start = text.index("## HAD Pretests")
561+
pretests_end = text.index("## Honest DiD", pretests_start)
562+
pretests_block = text[pretests_start:pretests_end]
563+
lower = pretests_block.lower()
564+
assert "earlier" in lower and ("pre-period" in lower or "placebo" in lower), (
565+
"HAD Pretests section must document the 'earlier pre-period' "
566+
"precondition for step-2 closure on the event-study path."
567+
)
568+
assert "skipped" in lower or "pretrends_joint=none" in lower, (
569+
"HAD Pretests section must surface the "
570+
"'joint pre-trends skipped' / pretrends_joint=None fallback "
571+
"when no earlier pre-period exists."
572+
)
573+
550574
def test_llms_full_had_pretests_assumption_labels_correct(self):
551575
# Per docs/methodology/REGISTRY.md HeterogeneousAdoptionDiD
552576
# § "Assumptions / Theorems / Estimators":

tests/test_practitioner.py

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -779,6 +779,43 @@ def test_had_results_dataclass_docstrings_match_weighted_mass_point_contract(sel
779779
"for mass-point fits."
780780
)
781781

782+
def test_had_step_3_documents_earlier_pre_period_precondition_for_step_2(
783+
self, mock_had_results, mock_had_event_study_results
784+
):
785+
# Per docs/methodology/REGISTRY.md HeterogeneousAdoptionDiD
786+
# § "Assumption 7 / step 2 closure" + had_pretests.py:4738-4756 +
787+
# 2769: aggregate="event_study" closes step 2 ONLY IF the panel
788+
# carries at least one earlier placebo pre-period beyond the
789+
# base F-1. With only F-1 available the workflow sets
790+
# pretrends_joint=None, all_pass=False, and the verdict carries
791+
# 'joint pre-trends skipped (no earlier pre-period)'. Both HAD
792+
# handler variants must surface this precondition - otherwise
793+
# agents reading the guidance can think any multi-period
794+
# event-study fit closes step 2 when it does not.
795+
for fixture in (mock_had_results, mock_had_event_study_results):
796+
output = practitioner_next_steps(fixture, verbose=False)
797+
step_3_steps = [s for s in output["next_steps"] if s["baker_step"] == 3]
798+
assert len(step_3_steps) == 1
799+
text = (step_3_steps[0].get("why", "") + " " + step_3_steps[0].get("code", "")).lower()
800+
# Must mention "earlier" pre-period / placebo precondition.
801+
assert "earlier" in text and ("pre-period" in text or "placebo" in text), (
802+
"Step-3 text must mention the 'earlier pre-period' "
803+
"precondition for closing Assumption 7 / step 2 on the "
804+
"event-study path. With only the base F-1 pre-period "
805+
"the workflow returns pretrends_joint=None and the "
806+
"verdict carries 'joint pre-trends skipped (no earlier "
807+
"pre-period)' - step 2 stays uncovered."
808+
)
809+
# Must mention the skip-fallback verdict so agents know
810+
# what to expect when the precondition fails.
811+
assert "skipped" in text or "pretrends_joint=none" in text, (
812+
"Step-3 text must surface the 'joint pre-trends skipped' "
813+
"/ pretrends_joint=None fallback when no earlier "
814+
"pre-period exists - otherwise agents cannot tell "
815+
"whether step 2 was actually covered on a minimal "
816+
"event-study fit."
817+
)
818+
782819
def test_had_step_3_flags_qug_under_survey_deferral(
783820
self, mock_had_results, mock_had_event_study_results
784821
):

0 commit comments

Comments
 (0)