You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
P1 pretest assumption labels: _handle_had step-3 + llms-full.txt HAD
Pretests section + qug_test/stute_test bullets misstated which paper
assumption each shipped test actually tests:
- qug_test was labeled "Assumption 5 support condition", but QUG tests
H_0: d_lower = 0 (paper Theorem 4 / step 1 of the workflow).
Assumption 5 is the Design 1 sign-identification condition and is
NOT testable via pre-trends per REGISTRY.md:2270.
- stute_test was labeled "Assumption 7 mean-independence", but
stute_test is the Assumption 8 linearity test (paper Section 4.2
step 3 / Appendix D). Assumption 7 is pre-trends (step 2).
- did_had_pretest_workflow(aggregate="overall") was implied to cover
step 2, but the workflow runs steps 1 + 3 only - step 2 is
explicitly not covered on the overall path (had_pretests.py:4434-4441
+ the workflow's verdict flags the gap).
Rewrote both surfaces to match the actual contracts: QUG = paper
Theorem 4 support-infimum test (step 1, decides Design 1' vs Design 1);
Stute / Yatchew-HR = Assumption 8 linearity tests (step 3); Assumption
7 step 2 closure requires aggregate="event_study" (joint Stute
pre-trends). Assumption 7 / step 2 gap is explicitly flagged on the
overall path so agents do not assume coverage where there is none.
P2 result-class field tables incomplete: HeterogeneousAdoptionDiDResults
table was missing n_mass_point, n_above_d_lower, cluster_name,
bias_corrected_fit, variance_formula, effective_dose_mean.
HeterogeneousAdoptionDiDEventStudyResults table was missing vcov_type,
cluster_name, bandwidth_diagnostics, bias_corrected_fit, filter_info.
Added all missing fields with correct types and descriptions.
Tests added (3 new, 86 total):
- test_llms_full_had_results_class_field_lists_match_real_dataclass:
uses dataclasses.fields() to enumerate every public field on both
result classes and assert each appears in the documented table.
Catches future drift where new fields land but the guide is not
updated.
- test_llms_full_had_pretests_assumption_labels_correct: scans the
qug_test and stute_test bullets in the HAD Pretests section and
enforces positive labels (support-infimum / Theorem 4 / linearity)
+ forbids positive Assumption-5 / Assumption-7 misclaims (negative
disclaimers like "QUG does NOT test Assumption 5" remain allowed).
- test_had_step_3_pretest_assumption_labels_correct: same checks on
the practitioner.py _handle_had step-3 why-text; also requires
positive acknowledgment of the Assumption 7 / step 2 gap on the
overall workflow path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: diff_diff/guides/llms-full.txt
+32-16Lines changed: 32 additions & 16 deletions
Original file line number
Diff line number
Diff line change
@@ -1226,7 +1226,7 @@ Each event study effect dict contains: `effect`, `se`, `t_stat`, `p_value`, `con
1226
1226
1227
1227
### HeterogeneousAdoptionDiDResults
1228
1228
1229
-
Single-period results container for `HeterogeneousAdoptionDiD`.
1229
+
Single-period results container for `HeterogeneousAdoptionDiD`. The table below enumerates every public dataclass field; a regression test in `tests/test_guides.py` (`test_llms_full_had_results_class_field_lists_match_real_dataclass`) compares this list against the real `dataclasses.fields()` of the result class.
1230
1230
1231
1231
| Attribute | Type | Description |
1232
1232
|-----------|------|-------------|
@@ -1243,16 +1243,22 @@ Single-period results container for `HeterogeneousAdoptionDiD`.
1243
1243
| `n_obs` | `int` | Units contributing to estimation |
1244
1244
| `n_treated` | `int` | Units with `D > d_lower` |
1245
1245
| `n_control` | `int` | Units at or below `d_lower` |
1246
+
| `n_mass_point` | `int | None` | Mass-point design only: units exactly at `d_lower`; `None` on continuous designs |
1247
+
| `n_above_d_lower` | `int | None` | Mass-point design only: units strictly above `d_lower`; `None` on continuous designs |
1246
1248
| `inference_method` | `str` | `"analytical_nonparametric"` or `"analytical_2sls"` |
| `bias_corrected_fit` | `BiasCorrectedFit | None` | Phase 1c bias-corrected local-linear fit object (continuous designs); `None` on `mass_point` |
1254
+
| `variance_formula` | `str | None` | HAD-specific SE label on the weighted continuous path: `"pweight"` (CCT 2014 weighted-robust) or `"survey_binder_tsl"` (Binder 1983); `None` on unweighted / mass-point fits |
1255
+
| `effective_dose_mean` | `float | None` | Weighted denominator used by the β̂-scale rescaling on the weighted continuous path; `None` on unweighted fits |
Per-horizon event-study results container for `HeterogeneousAdoptionDiD` with `aggregate="event_study"`. The anchor horizon `e = -1` is excluded by construction.
1261
+
Per-horizon event-study results container for `HeterogeneousAdoptionDiD` with `aggregate="event_study"`. The anchor horizon `e = -1` is excluded by construction. The table below enumerates every public dataclass field; a regression test (`test_llms_full_had_results_class_field_lists_match_real_dataclass`) compares this list against the real `dataclasses.fields()`.
1256
1262
1257
1263
| Attribute | Type | Description |
1258
1264
|-----------|------|-------------|
@@ -1263,11 +1269,6 @@ Per-horizon event-study results container for `HeterogeneousAdoptionDiD` with `a
Diagnostic pretests for the `HeterogeneousAdoptionDiD` identifying assumptions (de Chaisemartin, Ciccia, D'Haultfœuille & Knau 2026). The composite workflow `did_had_pretest_workflow` is the recommended entry point — call it before reporting WAS as causal.
1407
+
Diagnostic pretests for the `HeterogeneousAdoptionDiD` identifying assumptions (de Chaisemartin, Ciccia, D'Haultfœuille & Knau 2026). The composite workflow `did_had_pretest_workflow` is the recommended entry point — call it before reporting WAS as causal. The workflow follows paper Section 4.2's three-step battery: **step 1** is the QUG support-infimum test (decides whether Design 1' or Design 1 applies); **step 2** is the Assumption 7 pre-trends test (joint Stute on the event-study path; explicitly NOT covered on the overall path because a single-pre-period panel cannot support the joint variant); **step 3** is the Assumption 8 linearity test (`stute_test` or `yatchew_hr_test`). On the default `aggregate="overall"` path the workflow runs steps 1 + 3 only and the returned `verdict` flags the Assumption 7 gap; pass `aggregate="event_study"` on a multi-period panel to close that gap.
aggregate='overall', # or 'event_study' for joint Stute on multi-period panels
1425
+
aggregate='overall',
1410
1426
survey_design=None) # SurveyDesign for survey-aware pretests (Phase 4.5 C)
1411
1427
print(report.summary())
1412
1428
print(report.all_pass, report.verdict)
1413
1429
```
1414
1430
1415
1431
Individual tests:
1416
1432
1417
-
- `qug_test(d)` — Assumption 5 support condition. Extreme order statistics, Exp(1)/Exp(1) limit law. **Permanently rejects** non-`None` `survey_design=` / `weights=` (`NotImplementedError`) per Phase 4.5 C0 deferral — extreme-value functionals are not smooth in the empirical CDF, so standard survey machinery does not yield a calibrated test.
1418
-
- `stute_test(d, dy)` — Assumption 7 mean-independence of trends via Cramér-von Mises functional with Mammen wild bootstrap. Survey-aware via PSU-level Mammen multiplier bootstrap.
1419
-
- `yatchew_hr_test(d, dy, *, null="linearity")` — Assumption 8 linearity of `E[ΔY|D]` via Yatchew (1997) heteroskedasticity-robust variance-ratio test. The `null="mean_independence"` mode (R `YatchewTest::yatchew_test(order=0)`) is also exposed for placebo-style mean-independence testing. Survey-aware via closed-form weighted variance components (no bootstrap).
1433
+
- `qug_test(d)` — paper Theorem 4 support-infimum test (`H_0: d_lower = 0`; the QUG decides whether Design 1' or Design 1 applies in step 1 of the workflow). Extreme order statistics, Exp(1)/Exp(1) limit law. The QUG itself does NOT test Assumption 5 (which is the Design 1 sign-identification condition and is not testable via pre-trends per registry). **Permanently rejects** non-`None` `survey_design=` / `weights=` (`NotImplementedError`) per Phase 4.5 C0 deferral — extreme-value functionals are not smooth in the empirical CDF, so standard survey machinery does not yield a calibrated test.
1434
+
- `stute_test(d, dy)` — Assumption 8 linearity of `E[ΔY|D]` (paper Section 4.2 step 3) via Stute Cramér-von Mises functional with Mammen wild bootstrap. Survey-aware via PSU-level Mammen multiplier bootstrap.
1435
+
- `yatchew_hr_test(d, dy, *, null="linearity")` — Assumption 8 linearity of `E[ΔY|D]` (alternative test for step 3) via Yatchew (1997) heteroskedasticity-robust variance-ratio test. The `null="mean_independence"` mode (R `YatchewTest::yatchew_test(order=0)`) is also exposed for placebo-style mean-independence testing. Survey-aware via closed-form weighted variance components (no bootstrap).
1420
1436
- `stute_joint_pretest(residuals_dict, d)` — joint Cramér-von Mises across K horizons with shared-η Mammen wild bootstrap (Delgado-Manteiga 2001 / Hlávka-Hušková 2020). Residuals-in core; the two data-in wrappers below construct residuals for the two paper-spelled nulls.
1421
-
- `joint_pretrends_test(...)` — joint pre-trends on K pre-periods (paper Section 4.2 step 2 closure on the event-study path).
1422
-
- `joint_homogeneity_test(...)` — joint linearity-and-homogeneity on K post-periods.
1437
+
- `joint_pretrends_test(...)` — Assumption 7 joint pre-trends on K pre-periods (paper Section 4.2 step 2 closure on the event-study path).
1438
+
- `joint_homogeneity_test(...)` — joint linearity-and-homogeneity on K post-periods (event-study step 3 alternative).
1423
1439
1424
1440
The QUG-under-survey deferral is permanent; the linearity-family pretests support `survey_design=` (pweight, PSU, FPC) per Phase 4.5 C. Stratified designs and replicate-weight designs are deferred to follow-up PRs.
0 commit comments