Skip to content

Commit 80cb9ae

Browse files
igerberclaude
andcommitted
Align second REGISTRY bullet with narrowed signature contract; add survey-scope regression test
R0 review on the prior commit caught two follow-on items: (P2) REGISTRY.md:2555-2556 had two adjacent Phase 5 wave 1 bullets about HAD signatures. The first (already narrowed) correctly limits the regression-lock to parameter-name presence. The second still claimed "constructor / fit() signatures match the real API (regression-tested via inspect.signature)" - the same overstatement the prior commit fixed in the first bullet. Bring the second bullet in line with the narrower contract. (P3) The new practitioner Step-3 caveats about the supported survey-pretest scope (pweight + PSU/FPC) and the deferred stratified + replicate-weight regimes were not regression-locked at the practitioner test layer. The existing test_had_step_3_flags_qug_under_survey_deferral only covers the QUG-skip / linearity-conditional wording, leaving the new scope qualifications free to drift silently. Add test_had_step_3_qualifies_supported_survey_scope asserting the supported subset is named explicitly (pweight + PSU + FPC) and the deferred regimes are flagged by name (stratif, replicate, NotImplementedError) on both HAD handler variants. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 6c7fc7f commit 80cb9ae

2 files changed

Lines changed: 42 additions & 1 deletion

File tree

docs/methodology/REGISTRY.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2553,7 +2553,7 @@ Shipped in `diff_diff/had_pretests.py` as `stute_joint_pretest()` (residuals-in
25532553
- [ ] Phase 4: Pierce-Schott (2016) replication harness reproduces Figure 2 values.
25542554
- [ ] Phase 4: Full DGP 1/2/3 coverage-rate reproduction from Table 1.
25552555
- [x] Phase 5 (wave 1, PR #402): `practitioner_next_steps()` integration for HAD results - `_handle_had` and `_handle_had_event_study` route both result classes through HAD-specific Baker et al. (2025) step guidance with bidirectional HAD ↔ ContinuousDiD Step-4 routing closure. The `_check_nan_att` helper extends to ndarray `att` (HAD event-study) via `np.all(np.isnan(arr))` semantics; scalar path bit-exact preserved. The `llms-full.txt` HAD section's documented constructor and `fit()` parameter lists are regression-locked against `inspect.signature(HeterogeneousAdoptionDiD.__init__)` and `HeterogeneousAdoptionDiD.fit` for parameter-name presence (defaults, type annotations, and return-type unions are not pinned by the current test).
2556-
- [x] Phase 5 (wave 1, PR #402): `llms-full.txt` HeterogeneousAdoptionDiD section + result-class blocks + `## HAD Pretests` index + Choosing-an-Estimator row landed; constructor / fit() signatures match the real API (regression-tested via `inspect.signature`); result-class field tables enumerate every public dataclass field (regression-tested via `dataclasses.fields()`); `llms-practitioner.txt` Step 4 decision tree distinguishes ContinuousDiD (per-dose ATT(d), needs never-treated) from HeterogeneousAdoptionDiD (WAS, universal-rollout-compatible).
2556+
- [x] Phase 5 (wave 1, PR #402): `llms-full.txt` HeterogeneousAdoptionDiD section + result-class blocks + `## HAD Pretests` index + Choosing-an-Estimator row landed; constructor / fit() parameter names are regression-locked against `inspect.signature(HeterogeneousAdoptionDiD.__init__)` and `HeterogeneousAdoptionDiD.fit` (parameter-name presence only - defaults, type annotations, and return-type unions are not pinned); result-class field tables enumerate every public dataclass field (regression-tested via `dataclasses.fields()`); `llms-practitioner.txt` Step 4 decision tree distinguishes ContinuousDiD (per-dose ATT(d), needs never-treated) from HeterogeneousAdoptionDiD (WAS, universal-rollout-compatible).
25572557
- [x] Phase 5 (partial): README catalog one-liner, bundled `llms.txt` `## Estimators` entry, `docs/api/had.rst` (autoclass for the three classes), and `docs/references.rst` citation landed in PR #372 docs refresh.
25582558
- [x] Phase 5 (wave 2 first slice, PR #409): T21 HAD pretest workflow tutorial (`docs/tutorials/21_had_pretest_workflow.ipynb`) — composite pre-test walkthrough for `did_had_pretest_workflow`. Uses a `Uniform[$0.01K, $50K]` dose-distribution variant of T20's brand-campaign panel (true support strictly positive but near-zero, chosen so QUG fails-to-reject `H0: d_lower = 0` in finite sample). Walks through `aggregate="overall"` (Steps 1 + 3 only, verdict explicitly flags Step 2 deferral) and upgrades to `aggregate="event_study"` (joint pre-trends Stute + joint homogeneity Stute close the gap). Side panel exercises both `yatchew_hr_test` null modes (`linearity` vs `mean_independence`). Companion drift-test file `tests/test_t21_had_pretest_workflow_drift.py` (16 tests pinning panel composition, both verdict pivots, structural anchors, deterministic stats, bootstrap p-value tolerance bands per backend, and `HAD(design="auto")` resolution to `continuous_at_zero` on this panel).
25592559
- [ ] Phase 5 (remaining): T22 weighted/survey HAD tutorial - tracked in `TODO.md`.

tests/test_practitioner.py

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -894,6 +894,47 @@ def test_had_step_3_flags_qug_under_survey_deferral(
894894
"assumption."
895895
)
896896

897+
def test_had_step_3_qualifies_supported_survey_scope(
898+
self, mock_had_results, mock_had_event_study_results
899+
):
900+
# Per diff_diff/had_pretests.py:1725-1740 + :1927-1940, only
901+
# pweight + PSU/FPC survey designs are supported on HAD
902+
# pretests. Stratified (SurveyDesign(strata=...)) and
903+
# replicate-weight (BRR/Fay/JK1/JKn/SDR) designs raise
904+
# NotImplementedError on the linearity kernels. Both HAD
905+
# handlers' Step-3 text must call out the supported subset
906+
# and the deferred regimes so agents don't generate
907+
# `practitioner_next_steps` outputs that overstate what the
908+
# workflow will run on a given survey design.
909+
for fixture in (mock_had_results, mock_had_event_study_results):
910+
output = practitioner_next_steps(fixture, verbose=False)
911+
step_3_steps = [s for s in output["next_steps"] if s["baker_step"] == 3]
912+
assert len(step_3_steps) == 1
913+
text = step_3_steps[0].get("why", "").lower()
914+
# Supported subset must be named explicitly.
915+
assert "pweight" in text and "psu" in text and "fpc" in text, (
916+
"Step-3 text must name the supported survey-pretest scope "
917+
"(pweight + PSU/FPC) so agents do not assume any "
918+
"survey_design= path is supported."
919+
)
920+
# Deferred regimes must be flagged explicitly so agents
921+
# know not to attempt them.
922+
assert "stratif" in text, (
923+
"Step-3 text must explicitly note that stratified "
924+
"(SurveyDesign(strata=...)) survey designs are not yet "
925+
"supported on HAD pretests."
926+
)
927+
assert "replicate" in text, (
928+
"Step-3 text must explicitly note that replicate-weight "
929+
"(BRR/Fay/JK1/JKn/SDR) survey designs are not yet "
930+
"supported on HAD pretests."
931+
)
932+
assert "notimplementederror" in text, (
933+
"Step-3 text must name the actual exception raised "
934+
"(NotImplementedError) so agents can match it in "
935+
"error-handling paths."
936+
)
937+
897938
def test_had_step_3_pretest_assumption_labels_correct(self, mock_had_results):
898939
# Per docs/methodology/REGISTRY.md and diff_diff/had_pretests.py
899940
# docstrings:

0 commit comments

Comments
 (0)