Skip to content

Commit d152b50

Browse files
igerberclaude
andcommitted
HAD Phase 5 wave 1: agent-facing surfaces
Add _handle_had + _handle_had_event_study to practitioner.py, routing both HeterogeneousAdoptionDiD result classes through HAD-specific Baker et al. (2025) step guidance: did_had_pretest_workflow (step 3), ContinuousDiD/CallawaySantAnna routing nudge (step 4), bandwidth_diagnostics + simultaneous bands (step 6), per-horizon WAS event-study disaggregation (step 7), design-auto-detection + last-cohort-only-WAS framing (step 8). Symmetric pair: _handle_continuous gains Step-4 routing to HAD on no-untreated panels - the HAD <-> ContinuousDiD routing loop is now bidirectional. Extend _check_nan_att with ndarray branch (lazy numpy import + np.all(np.isnan(arr)) semantics so partial-NaN arrays don't over-fire the warning). Scalar path bit-exact preserved across all 12 untouched handlers. Add full HAD section + result-class blocks + ## HAD Pretests index covering all 7 pretest entry points + Choosing-an-Estimator row to diff_diff/guides/llms-full.txt (the bundled-in-wheel agent reference). Tighten the existing Continuous treatment intensity Choosing row with "(some units untreated)" so the HAD vs ContinuousDiD contrast is explicit. Framing: "no untreated unit" / dose variation, never "no comparison group" - locked by negative-assertion tests on both handler text and llms-full.txt section. docs/doc-deps.yaml: remove the llms-full.txt deferral note on had.py and add llms-full.txt entries to had.py, had_pretests.py, and practitioner.py blocks. 21 new tests (14 in tests/test_practitioner.py::TestHADDispatch + 6 in tests/test_guides.py::TestLLMsFullHADCoverage + 1 fixture-minimality regression locking the "handlers are STRING-ONLY at runtime" stability invariant). Closes the Phase 5 "agent surfaces" gap; T21 pretest tutorial and T22 weighted/survey tutorial remain queued as separate notebook PRs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 33afb6a commit d152b50

7 files changed

Lines changed: 723 additions & 42 deletions

File tree

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,11 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [Unreleased]
9+
10+
### Added
11+
- **HAD `practitioner_next_steps()` handler + `llms-full.txt` reference section** (Phase 5). Adds `_handle_had` and `_handle_had_event_study` to `diff_diff/practitioner.py::_HANDLERS`, routing both `HeterogeneousAdoptionDiDResults` (single-period) and `HeterogeneousAdoptionDiDEventStudyResults` (event-study) through HAD-specific Baker et al. (2025) step guidance: `did_had_pretest_workflow` (step 3 — paper Section 4.2 step-2 closure on the event-study path), `ContinuousDiD` / `CallawaySantAnna` routing nudge (step 4 — fires on the wrong-estimator-for-this-data path), `bandwidth_diagnostics` inspection on continuous designs and simultaneous (sup-t) `cband_*` reading on weighted event-study fits (step 6), per-horizon WAS event-study disaggregation (step 7), and the explicit design-auto-detection / last-cohort-only-WAS framing (step 8). Symmetric pair: `_handle_continuous` gains a Step-4 nudge to `HeterogeneousAdoptionDiD` for ContinuousDiD users on no-untreated panels — the routing loop is now bidirectional. Extends `_check_nan_att` with an ndarray branch via lazy `numpy` import for HAD's per-horizon `att` array; uses `np.all(np.isnan(arr))` semantics so partial-NaN arrays (legitimate event-study output under degenerate horizon-specific designs) do not over-fire the warning. Scalar path is bit-exact preserved across all 12 untouched handlers. Adds full HAD section + `HeterogeneousAdoptionDiDResults` / `HeterogeneousAdoptionDiDEventStudyResults` blocks + `## HAD Pretests` index covering all 7 pretest entry points + Choosing-an-Estimator row to `diff_diff/guides/llms-full.txt` (the bundled-in-wheel agent reference). Tightens the existing `Continuous treatment intensity` Choosing row to `(some units untreated)` so the contrast with the new HAD row is explicit. Framing convention follows the "no untreated unit" / dose variation language; locked by negative-assertion tests on both the handler text and the `llms-full.txt` HAD section. `docs/doc-deps.yaml` updated to remove the `llms-full.txt` deferral note on `had.py` and add `llms-full.txt` entries to `had.py`, `had_pretests.py`, and `practitioner.py` blocks. Patch-level (additive on stable surfaces). 21 new tests (14 in `tests/test_practitioner.py::TestHADDispatch` + 6 in `tests/test_guides.py::TestLLMsFullHADCoverage` + 1 fixture-minimality regression locking the "handlers are STRING-ONLY at runtime" stability invariant). Closes the Phase 5 "agent surfaces" gap; T21 pretest tutorial and T22 weighted/survey tutorial remain queued as separate notebook PRs.
12+
813
## [3.3.2] - 2026-04-26
914

1015
### Added

TODO.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ Deferred items from PR reviews that were not addressed before merge.
109109
| `HeterogeneousAdoptionDiD` Phase 3 R-parity: Phase 3 ships coverage-rate validation on synthetic DGPs (not tight point parity against `chaisemartin::stute_test` / `yatchew_test`). Tight numerical parity requires aligning bootstrap seed semantics and `B` across numpy/R and is deferred. | `tests/test_had_pretests.py` | Phase 3 | Low |
110110
| `HeterogeneousAdoptionDiD` Phase 3 nprobust bandwidth for Stute: some Stute variants on continuous regressors use nprobust-style optimal bandwidth selection. Phase 3 uses OLS residuals from a 2-parameter linear fit (no bandwidth selection). nprobust integration is a future enhancement; not in paper scope. | `diff_diff/had_pretests.py::stute_test` | Phase 3 | Low |
111111
| `HeterogeneousAdoptionDiD` Phase 4: Pierce-Schott (2016) replication harness; reproduce paper Figure 2 values and Table 1 coverage rates. | `benchmarks/`, `tests/` | Phase 2a | Low |
112-
| `HeterogeneousAdoptionDiD` Phase 5: `practitioner_next_steps()` integration, tutorial notebook, and `llms-full.txt` HeterogeneousAdoptionDiD section (preserving UTF-8 fingerprint). README catalog + bundled `llms.txt` entry + `docs/api/had.rst` + `docs/references.rst` citation landed in PR #372 docs refresh. | `diff_diff/practitioner.py`, `tutorials/`, `diff_diff/guides/llms-full.txt` | Phase 2a | Low |
112+
| `HeterogeneousAdoptionDiD` Phase 5 follow-up tutorials (T21 HAD pretest workflow notebook + T22 weighted/survey HAD tutorial). `practitioner_next_steps()` HAD handlers + `llms-full.txt` HeterogeneousAdoptionDiD section + Choosing-an-Estimator row landed in Phase 5 wave 1. | `tutorials/`, `tests/test_t21_*_drift.py`, `tests/test_t22_*_drift.py` | Phase 2a | Low |
113113
| `HeterogeneousAdoptionDiD` time-varying dose on event study: Phase 2b REJECTS panels where `D_{g,t}` varies within a unit for `t >= F` (the aggregation uses `D_{g, F}` as the single regressor for all horizons, paper Appendix B.2 constant-dose convention). A follow-up PR could add a time-varying-dose estimator for these panels; current behavior is front-door rejection with a redirect to `ChaisemartinDHaultfoeuille`. | `diff_diff/had.py::_validate_had_panel_event_study` | Phase 2b | Low |
114114
| `HeterogeneousAdoptionDiD` repeated-cross-section support: paper Section 2 defines HAD on panel OR repeated cross-section, but Phase 2a is panel-only. RCS inputs (disjoint unit IDs between periods) are rejected by the balanced-panel validator with the generic "unit(s) do not appear in both periods" error. A follow-up PR will add an RCS identification path based on pre/post cell means (rather than unit-level first differences), with its own validator and a distinct `data_mode` / API surface. | `diff_diff/had.py::_validate_had_panel`, `diff_diff/had.py::_aggregate_first_difference` | Phase 2a | Medium |
115115
| SyntheticDiD: bootstrap cross-language parity anchor against R's default `synthdid::vcov(method="bootstrap")` (refit; rebinds `opts` per draw) or Julia `Synthdid.jl::src/vcov.jl::bootstrap_se` (refit by construction). Same-library validation (placebo-SE tracking, AER §6.3 MC truth) is in place; a cross-language anchor is desirable to bolster the methodology contract. Julia is the cleanest target — minimal wrapping work and refit-native vcov. Tolerance target: 1e-6 on Monte Carlo samples (different BLAS + RNG paths preclude 1e-10). The R-parity fixture from the previous release was deleted because it pinned the now-removed fixed-weight path. | `benchmarks/R/`, `benchmarks/julia/`, `tests/` | follow-up | Low |

diff_diff/guides/llms-full.txt

Lines changed: 155 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -590,6 +590,68 @@ results = est.fit(data, outcome='outcome', unit='unit', time='period',
590590
results.print_summary()
591591
```
592592

593+
### HeterogeneousAdoptionDiD
594+
595+
HeterogeneousAdoption DiD estimator (de Chaisemartin, Ciccia, D'Haultfœuille & Knau 2026). Targets a Weighted Average Slope (WAS) on **Heterogeneous Adoption Designs where no unit remains untreated** — every unit receives the treatment at some positive dose level, so the comparison structure comes from dose variation across units rather than from an untreated holdout. Treatment varies in intensity, not in status. Uses a bias-corrected local-linear estimator at the dose support boundary on continuous-dose designs (Design 1' / Design 1) and a 2SLS Wald-IV estimator on the mass-point design.
596+
597+
```python
598+
HeterogeneousAdoptionDiD(
599+
design: str = "auto", # "auto" / "continuous_at_zero" / "continuous_near_d_lower" / "mass_point"
600+
alpha: float = 0.05,
601+
n_bootstrap: int = 999, # Multiplier-bootstrap iterations for sup-t bands
602+
seed: int | None = None,
603+
h: float | None = None, # Bias-corrected local-linear bandwidth (auto-selected if None)
604+
b: float | None = None, # Pilot bandwidth (auto-selected if None)
605+
rcond: float | None = None,
606+
)
607+
```
608+
609+
**Alias:** `HAD`
610+
611+
**fit() parameters:**
612+
613+
```python
614+
had.fit(
615+
data: pd.DataFrame,
616+
outcome_col: str,
617+
unit_col: str,
618+
time_col: str,
619+
dose_col: str,
620+
first_treat_col: str | None = None, # Required on staggered panels (last-cohort auto-filter trigger)
621+
aggregate: str | None = None, # None (single scalar WAS) or "event_study" (per-horizon WAS)
622+
cband: bool = True, # Simultaneous (sup-t) confidence bands on weighted event-study fits
623+
survey_design: SurveyDesign | None = None, # Survey weights, strata, PSU, FPC
624+
weights: np.ndarray | None = None, # pweight shortcut (mutually exclusive with survey_design)
625+
) -> HeterogeneousAdoptionDiDResults | HeterogeneousAdoptionDiDEventStudyResults
626+
```
627+
628+
**Usage:**
629+
630+
```python
631+
from diff_diff import HeterogeneousAdoptionDiD, did_had_pretest_workflow
632+
633+
# Vet the testable identifying assumptions first:
634+
report = did_had_pretest_workflow(
635+
data, outcome_col='y', unit_col='unit', time_col='t',
636+
dose_col='d', first_treat_col='first_treat')
637+
print(report.summary())
638+
639+
# Single-period scalar WAS:
640+
est = HeterogeneousAdoptionDiD()
641+
results = est.fit(data, outcome_col='y', unit_col='unit',
642+
time_col='t', dose_col='d',
643+
first_treat_col='first_treat')
644+
print(results.summary())
645+
646+
# Multi-period per-horizon WAS:
647+
es = est.fit(data, outcome_col='y', unit_col='unit',
648+
time_col='t', dose_col='d',
649+
first_treat_col='first_treat',
650+
aggregate='event_study')
651+
```
652+
653+
**Staggered panels.** On multi-cohort panels with `aggregate="event_study"`, `fit()` auto-filters to the last treatment cohort plus never-treated units (paper Appendix B.2) and emits a `UserWarning` naming kept/dropped counts. The estimand is then a **last-cohort-only WAS**, not a multi-cohort average. For full multi-cohort staggered support, see `ChaisemartinDHaultfoeuille`.
654+
593655
### StackedDiD
594656

595657
Stacked DiD estimator (Wing, Freedman & Hollingsworth 2024). Addresses TWFE bias with corrective Q-weights.
@@ -1157,6 +1219,65 @@ Each event study effect dict contains: `effect`, `se`, `t_stat`, `p_value`, `con
11571219

11581220
**Methods:** `summary()`, `print_summary()`, `to_dataframe()`
11591221

1222+
### HeterogeneousAdoptionDiDResults
1223+
1224+
Single-period results container for `HeterogeneousAdoptionDiD`.
1225+
1226+
| Attribute | Type | Description |
1227+
|-----------|------|-------------|
1228+
| `att` | `float` | Point estimate of the WAS parameter on the β-scale |
1229+
| `se` | `float` | Standard error on the β-scale |
1230+
| `t_stat` | `float` | T-statistic |
1231+
| `p_value` | `float` | P-value |
1232+
| `conf_int` | `tuple[float, float]` | Confidence interval |
1233+
| `alpha` | `float` | CI level used at fit time |
1234+
| `design` | `str` | Resolved design: `"continuous_at_zero"`, `"continuous_near_d_lower"`, or `"mass_point"` |
1235+
| `target_parameter` | `str` | `"WAS"` (Design 1') or `"WAS_d_lower"` (Design 1 / mass-point) |
1236+
| `d_lower` | `float` | Support infimum (`0.0` on Design 1', `min(d)` otherwise) |
1237+
| `dose_mean` | `float` | `D_bar = (1/G) * sum(D_{g,2})` |
1238+
| `n_obs` | `int` | Units contributing to estimation |
1239+
| `n_treated` | `int` | Units with `D > d_lower` |
1240+
| `n_control` | `int` | Units at or below `d_lower` |
1241+
| `inference_method` | `str` | `"analytical_nonparametric"` or `"analytical_2sls"` |
1242+
| `vcov_type` | `str | None` | Mass-point only: `"classical"`, `"hc1"`, or `"cr1"` |
1243+
| `bandwidth_diagnostics` | `BandwidthResult | None` | MSE-DPI selector output (continuous designs); `None` on `mass_point` |
1244+
| `survey_metadata` | `SurveyMetadata | None` | Repo-standard survey metadata when `survey_design=` / `weights=` is supplied |
1245+
1246+
**Methods:** `summary()`, `print_summary()`, `to_dict()`, `to_dataframe()`
1247+
1248+
### HeterogeneousAdoptionDiDEventStudyResults
1249+
1250+
Per-horizon event-study results container for `HeterogeneousAdoptionDiD` with `aggregate="event_study"`. The anchor horizon `e = -1` is excluded by construction.
1251+
1252+
| Attribute | Type | Description |
1253+
|-----------|------|-------------|
1254+
| `event_times` | `np.ndarray` | Integer event-time labels `e = t - F`, sorted ascending |
1255+
| `att` | `np.ndarray` | Per-horizon WAS point estimates |
1256+
| `se` | `np.ndarray` | Per-horizon standard errors |
1257+
| `t_stat` | `np.ndarray` | Per-horizon t-statistics |
1258+
| `p_value` | `np.ndarray` | Per-horizon p-values |
1259+
| `conf_int_low` | `np.ndarray` | Pointwise CI lower bounds |
1260+
| `conf_int_high` | `np.ndarray` | Pointwise CI upper bounds |
1261+
| `cband_low` | `np.ndarray | None` | Simultaneous (sup-t) band lower bounds; `None` on unweighted fits or when `cband=False` |
1262+
| `cband_high` | `np.ndarray | None` | Simultaneous (sup-t) band upper bounds |
1263+
| `cband_crit_value` | `float | None` | Sup-t critical value used for the simultaneous band |
1264+
| `cband_method` | `str | None` | `"multiplier_bootstrap"` when populated |
1265+
| `cband_n_bootstrap` | `int | None` | Bootstrap iterations used for the band |
1266+
| `n_obs_per_horizon` | `np.ndarray` | Per-horizon contributing-unit counts |
1267+
| `alpha` | `float` | CI level used at fit time |
1268+
| `design` | `str` | Shared across horizons (paper Appendix B.2 invariant) |
1269+
| `target_parameter` | `str` | Same convention as the single-period result |
1270+
| `d_lower` | `float` | Support infimum, shared across horizons |
1271+
| `dose_mean` | `float` | `D_bar` on the fit sample |
1272+
| `F` | `object` | First-treatment period label |
1273+
| `n_units` | `int` | Unique units contributing to the fit (post last-cohort filter) |
1274+
| `inference_method` | `str` | `"analytical_nonparametric"` or `"analytical_2sls"` |
1275+
| `survey_metadata` | `SurveyMetadata | None` | Populated on weighted fits |
1276+
| `variance_formula` | `str | None` | Per-horizon variance family label |
1277+
| `effective_dose_mean` | `float | None` | Weighted denominator |
1278+
1279+
**Methods:** `summary()`, `print_summary()`, `to_dict()`, `to_dataframe()`
1280+
11601281
### TROPResults
11611282

11621283
| Attribute | Type | Description |
@@ -1265,6 +1386,38 @@ did = DifferenceInDifferences(inference="wild_bootstrap", n_bootstrap=999,
12651386
results = did.fit(data, outcome='y', treatment='treated', time='post')
12661387
```
12671388

1389+
## HAD Pretests
1390+
1391+
Diagnostic pretests for the `HeterogeneousAdoptionDiD` identifying assumptions (de Chaisemartin, Ciccia, D'Haultfœuille & Knau 2026). The composite workflow `did_had_pretest_workflow` is the recommended entry point — call it before reporting WAS as causal.
1392+
1393+
```python
1394+
from diff_diff import (
1395+
did_had_pretest_workflow,
1396+
qug_test, stute_test, yatchew_hr_test,
1397+
stute_joint_pretest, joint_pretrends_test, joint_homogeneity_test,
1398+
)
1399+
1400+
# Composite workflow — bundles QUG + Stute + Yatchew per the paper's three-step battery
1401+
report = did_had_pretest_workflow(
1402+
data, outcome_col='y', unit_col='unit', time_col='t',
1403+
dose_col='d', first_treat_col='first_treat',
1404+
aggregate='overall', # or 'event_study' for joint Stute on multi-period panels
1405+
survey_design=None) # SurveyDesign for survey-aware pretests (Phase 4.5 C)
1406+
print(report.summary())
1407+
print(report.all_pass, report.verdict)
1408+
```
1409+
1410+
Individual tests:
1411+
1412+
- `qug_test(d)` — Assumption 5 support condition. Extreme order statistics, Exp(1)/Exp(1) limit law. **Permanently rejects** non-`None` `survey_design=` / `weights=` (`NotImplementedError`) per Phase 4.5 C0 deferral — extreme-value functionals are not smooth in the empirical CDF, so standard survey machinery does not yield a calibrated test.
1413+
- `stute_test(d, dy)` — Assumption 7 mean-independence of trends via Cramér-von Mises functional with Mammen wild bootstrap. Survey-aware via PSU-level Mammen multiplier bootstrap.
1414+
- `yatchew_hr_test(d, dy, *, null="linearity")` — Assumption 8 linearity of `E[ΔY|D]` via Yatchew (1997) heteroskedasticity-robust variance-ratio test. The `null="mean_independence"` mode (R `YatchewTest::yatchew_test(order=0)`) is also exposed for placebo-style mean-independence testing. Survey-aware via closed-form weighted variance components (no bootstrap).
1415+
- `stute_joint_pretest(residuals_dict, d)` — joint Cramér-von Mises across K horizons with shared-η Mammen wild bootstrap (Delgado-Manteiga 2001 / Hlávka-Hušková 2020). Residuals-in core; the two data-in wrappers below construct residuals for the two paper-spelled nulls.
1416+
- `joint_pretrends_test(...)` — joint pre-trends on K pre-periods (paper Section 4.2 step 2 closure on the event-study path).
1417+
- `joint_homogeneity_test(...)` — joint linearity-and-homogeneity on K post-periods.
1418+
1419+
The QUG-under-survey deferral is permanent; the linearity-family pretests support `survey_design=` (pweight, PSU, FPC) per Phase 4.5 C. Stratified designs and replicate-weight designs are deferred to follow-up PRs.
1420+
12681421
## Honest DiD Sensitivity Analysis
12691422

12701423
Rambachan & Roth (2023) robust inference allowing bounded parallel trends violations.
@@ -1734,7 +1887,8 @@ DIFF_DIFF_BACKEND=rust pytest # Force Rust (fail if unavailable)
17341887
| Staggered treatment timing | `CallawaySantAnna`, `ImputationDiD`, or `SunAbraham` |
17351888
| Few treated units / synthetic control | `SyntheticDiD` |
17361889
| Interactive fixed effects / factor confounding | `TROP` |
1737-
| Continuous treatment intensity | `ContinuousDiD` |
1890+
| Continuous treatment intensity (some units untreated) | `ContinuousDiD` |
1891+
| No untreated unit / universal rollout (every unit treated at different doses) | `HeterogeneousAdoptionDiD` |
17381892
| Two-criterion treatment, simultaneous (2x2x2 DDD) | `TripleDifference` |
17391893
| Two-criterion treatment, staggered timing + eligibility | `StaggeredTripleDifference` |
17401894
| Nonlinear outcome (binary/count) with staggered timing | `WooldridgeDiD` |

0 commit comments

Comments
 (0)