Skip to content

Commit 9b71f67

Browse files
authored
Merge pull request #359 from igerber/had-phase-4.5-survey-continuous
HAD Phase 4.5: survey support on continuous-dose paths
2 parents f894506 + 1998d75 commit 9b71f67

13 files changed

Lines changed: 2784 additions & 108 deletions

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
88
## [Unreleased]
99

1010
### Added
11+
- **`HeterogeneousAdoptionDiD.fit(survey=..., weights=...)` on continuous-dose paths (Phase 4.5 survey support).** The `continuous_at_zero` (paper Design 1') and `continuous_near_d_lower` (Design 1 continuous-near-d̲) designs accept survey weights through two interchangeable kwargs: `weights=<array>` (pweight shortcut, weighted-robust SE from the CCT-2014 lprobust port) and `survey=SurveyDesign(weights, strata, psu, fpc)` (design-based inference via Binder-TSL variance using the existing `compute_survey_if_variance` helper at `diff_diff/survey.py:1802`). Point estimates match across both entry paths; SE diverges by design (pweight-only vs PSU-aggregated). `HeterogeneousAdoptionDiDResults.survey_metadata` is a repo-standard `SurveyMetadata` dataclass (weight_type / effective_n / design_effect / sum_weights / weight_range / n_strata / n_psu / df_survey); HAD-specific extras (`variance_formula` label, `effective_dose_mean`) are separate top-level result fields. `to_dict()` surfaces the full `SurveyMetadata` object plus `variance_formula` + `effective_dose_mean`; `summary()` renders `variance_formula`, `effective_n`, `effective_dose_mean`, and (when the survey= path is used) `df_survey`; `__repr__` surfaces `variance_formula` + `effective_dose_mean` when present. The HAD `mass_point` design and `aggregate="event_study"` path raise `NotImplementedError` under survey/weights (deferred to Phase 4.5 B: weighted 2SLS + event-study survey composition); the HAD pretests stay unweighted in this release (Phase 4.5 C). Parity ceiling acknowledged — no public weighted-CCF bias-corrected local-linear reference exists in any language; methodology confidence comes from (1) uniform-weights bit-parity at `atol=1e-14` on the full lprobust output struct, (2) cross-language weighted-OLS parity (manual R reference) at `atol=1e-12`, and (3) Monte Carlo oracle consistency on known-τ DGPs. `_nprobust_port.lprobust` gains `weights=` and `return_influence=` (used internally by the Binder-TSL path); `bias_corrected_local_linear` removes the Phase 1c `NotImplementedError` on `weights=` and forwards. Auto-bandwidth selection remains unweighted in this release — pass `h`/`b` explicitly for weight-aware bandwidths. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Weighted extension (Phase 4.5 survey support)".
1112
- **`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test` + `StuteJointResult`** (HeterogeneousAdoptionDiD Phase 3 follow-up). Joint Cramér-von Mises pretests across K horizons with shared-η Mammen wild bootstrap (preserves vector-valued empirical-process unit-level dependence per Delgado-Manteiga 2001 / Hlávka-Hušková 2020). The core `stute_joint_pretest` is residuals-in; two thin data-in wrappers construct per-horizon residuals for the two nulls the paper spells out: mean-independence (step 2 pre-trends, `OLS(Y_t − Y_base ~ 1)` per pre-period) and linearity (step 3 joint, `OLS(Y_t − Y_base ~ 1 + D)` per post-period). Sum-of-CvMs aggregation (`S_joint = Σ_k S_k`); per-horizon scale-invariant exact-linear short-circuit. Closes the paper Section 4.2 step-2 gap that Phase 3 `did_had_pretest_workflow` previously flagged with an "Assumption 7 pre-trends test NOT run" caveat. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Joint Stute tests" for algorithm, invariants, and scope exclusion of Eq 18 linear-trend detrending (deferred to Phase 4 Pierce-Schott replication).
1213
- **`did_had_pretest_workflow(aggregate="event_study")`**: multi-period dispatch on balanced ≥3-period panels. Runs QUG at `F` + joint pre-trends Stute across earlier pre-periods + joint homogeneity-linearity Stute across post-periods. Step 2 closure requires ≥2 pre-periods; with only a single pre-period (the base `F-1`) `pretrends_joint=None` and the verdict flags the skip. Reuses the Phase 2b event-study panel validator (last-cohort auto-filter under staggered timing with `UserWarning`; `ValueError` when `first_treat_col=None` and the panel is staggered). The data-in wrappers `joint_pretrends_test` and `joint_homogeneity_test` also route through that same validator internally, so direct wrapper calls inherit the last-cohort filter and constant-post-dose invariant. `HADPretestReport` extended with `pretrends_joint`, `homogeneity_joint`, and `aggregate` fields; serialization methods (`summary`, `to_dict`, `to_dataframe`, `__repr__`) preserve the Phase 3 output bit-exactly on `aggregate="overall"` — no `aggregate` key, no header row, no schema drift — and only surface the new fields on `aggregate="event_study"`.
1314
- **`ChaisemartinDHaultfoeuille.by_path`** — per-path event-study disaggregation, mirroring R `did_multiplegt_dyn(..., by_path=k)`. Passing `by_path=k` (positive int) to the estimator reports separate `DID_{path,l}` + SE + inference for the top-k most common observed treatment paths in the window `[F_g-1, F_g-1+L_max]`, answering the practitioner question "is a single pulse enough, or do you need sustained exposure?" across paths like `(0,1,0,0)` vs `(0,1,1,0)` vs `(0,1,1,1)`. The per-path SE follows the joiners-only / leavers-only IF precedent (switcher-side contribution zeroed for non-path groups; control pool and cohort structure unchanged; plug-in SE with path-specific divisor). Requires `drop_larger_lower=False` (multi-switch groups are the object of interest) and `L_max >= 1`. Binary treatment only in this release; combinations with `controls`, `trends_linear`, `trends_nonparam`, `heterogeneity`, `design2`, `honest_did`, `survey_design`, and `n_bootstrap > 0` raise `NotImplementedError` and are deferred to follow-up PRs. Results expose `results.path_effects: Dict[Tuple[int, ...], Dict[str, Any]]` and `results.to_dataframe(level="by_path")`; the summary grows a "Treatment-Path Disaggregation" block. Ties in path frequency are broken lexicographically on the path tuple for deterministic ranking. Overflow (`by_path > n_observed_paths`) returns all observed paths with a `UserWarning`. See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path per-path event-study disaggregation)` for the full contract.

TODO.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -92,8 +92,11 @@ Deferred items from PR reviews that were not addressed before merge.
9292
| Clustered-DGP parity: Phase 1c's DGP 4 uses manual `h=b=0.3` to sidestep an nprobust-internal singleton-cluster bug in `lpbwselect.mse.dpi`'s pilot fits. Once nprobust ships a fix (or we derive one independently), add a clustered-auto-bandwidth parity test. | `benchmarks/R/generate_nprobust_lprobust_golden.R` | Phase 1c | Low |
9393
| `HeterogeneousAdoptionDiD` joint cross-horizon covariance on event study: per-horizon SEs use INDEPENDENT sandwiches in Phase 2b (paper-faithful pointwise CIs per Pierce-Schott Figure 2). A follow-up could derive an IF-based stacking of per-horizon scores for joint cross-horizon inference (needed for joint hypothesis tests across event-time horizons). Block-bootstrap is a reasonable alternative. | `diff_diff/had.py::_fit_event_study` | Phase 2b | Low |
9494
| `HeterogeneousAdoptionDiD` event-study staggered-timing beyond last cohort: Phase 2b auto-filters staggered panels to the last cohort per paper Appendix B.2. Earlier-cohort treatment effects are not identified by HAD; redirecting to `ChaisemartinDHaultfoeuille` / `did_multiplegt_dyn` is the paper's prescription. A full staggered HAD would require a different identification path (out of paper scope). | `diff_diff/had.py::_validate_had_panel_event_study` | Phase 2b | Low |
95-
| `HeterogeneousAdoptionDiD`: survey-design integration (`survey=SurveyDesign(...)`). Currently raises `NotImplementedError`. Requires Taylor-linearization of the β-scale rescaling and replicate-weight-compatible 2SLS variance on the mass-point path. | `diff_diff/had.py` | Phase 2a | Medium |
96-
| `HeterogeneousAdoptionDiD`: `weights=` support. Deferred jointly with survey integration. nprobust's `lprobust` has no weight argument so the nonparametric continuous path needs a derivation; the 2SLS mass-point path needs weighted-sandwich parity. | `diff_diff/had.py` | Phase 2a | Medium |
95+
| `HeterogeneousAdoptionDiD` Phase 4.5 B: `survey=` / `weights=` on `design="mass_point"` (weighted 2SLS + weighted-sandwich variance; the Wooldridge 2010 Ch. 12 weighted-IV sandwich has a Stata `ivregress ... [pweight=...]` + R `AER::ivreg(weights=...)` parity anchor). Also ships `aggregate="event_study"` + survey/weights via per-horizon IPW + shared PSU multiplier bootstrap across horizons. This PR (Phase 4.5 A) raises `NotImplementedError` on both paths. | `diff_diff/had.py::_fit_mass_point_2sls`, `diff_diff/had.py::_fit_event_study` | Phase 4.5 B | Medium |
96+
| `HeterogeneousAdoptionDiD` Phase 4.5 C0: QUG-under-survey decision gate. `qug_test` uses a ratio of extreme order statistics `D_{(1)} / (D_{(2)} - D_{(1)})` — extreme-value theory under inverse-probability weighting is a research area, not a standard toolkit. Lit-review Guillou-Hall (2001), Chen-Chen (2004); likely outcome is `NotImplementedError` on `qug_test(..., weights=...)` with a clear pointer to the Stute/Yatchew/joint pretests as the survey-supported alternatives. | `diff_diff/had_pretests.py::qug_test` | Phase 4.5 C0 | Low |
97+
| `HeterogeneousAdoptionDiD` Phase 4.5 C: pretests under survey (`stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`, `did_had_pretest_workflow`). Rao-Wu rescaled bootstrap for the Stute-family (weighted η generation + PSU clustering in the bootstrap draw); weighted OLS residuals + weighted variance estimator for Yatchew. | `diff_diff/had_pretests.py` | Phase 4.5 C | Medium |
98+
| `HeterogeneousAdoptionDiD` Phase 4.5: weight-aware auto-bandwidth MSE-DPI selector. Phase 4.5 A ships weighted `lprobust` with an unweighted DPI selector; users who want a weight-aware bandwidth must pass `h`/`b` explicitly. Extending `lpbwselect_mse_dpi` to propagate weights through density, second-derivative, and variance stages is ~300 LoC of methodology and was out of scope. | `diff_diff/_nprobust_port.py::lpbwselect_mse_dpi` | Phase 4.5 | Low |
99+
| `HeterogeneousAdoptionDiD` Phase 4.5 C: replicate-weight SurveyDesigns (BRR / Fay / JK1 / JKn / SDR) on the continuous-dose paths. Phase 4.5 A raises `NotImplementedError` on replicate designs in `_aggregate_unit_resolved_survey`. Rao-Wu-style replicate bootstrap for HAD paths requires deriving the per-replicate weight-ratio rescaling for the local-linear intercept IF. | `diff_diff/had.py::_aggregate_unit_resolved_survey` | Phase 4.5 C | Low |
97100
| `HeterogeneousAdoptionDiD` mass-point: `vcov_type in {"hc2", "hc2_bm"}` raises `NotImplementedError` pending a 2SLS-specific leverage derivation. The OLS leverage `x_i' (X'X)^{-1} x_i` is wrong for 2SLS; the correct finite-sample correction uses `x_i' (Z'X)^{-1} (...) (X'Z)^{-1} x_i`. Needs derivation plus an R / Stata (`ivreg2 small robust`) parity anchor. | `diff_diff/had.py::_fit_mass_point_2sls` | Phase 2a | Medium |
98101
| `HeterogeneousAdoptionDiD` continuous paths: thread `cluster=` through `bias_corrected_local_linear` (Phase 1c's wrapper already supports cluster; Phase 2a ignores it with a `UserWarning` on the continuous path to keep scope tight). | `diff_diff/had.py`, `diff_diff/local_linear.py` | Phase 2a | Low |
99102
| `HeterogeneousAdoptionDiD` Eq 18 linear-trend detrending (Pierce-Schott style): the joint-Stute infrastructure shipped in the Phase 3 follow-up supports pre-trends (mean-indep) and post-homogeneity (linearity) nulls. The Pierce-Schott application (paper Section 5.2) uses a LINEAR-TREND detrending of pre-period outcomes before the joint CvM — `Y_{g,t} - Y_{g,t_anchor} - (t - t_anchor)*(Y_{g,t_anchor} - Y_{g,t_anchor-1})` — reaching p=0.51 on US-China tariff data. Extends `joint_pretrends_test` with a detrending mode or a separate Eq 18-specific helper. Deferred to Phase 4 replication harness (where the published p=0.51 serves as the parity anchor). | `diff_diff/had_pretests.py::joint_pretrends_test` | Phase 4 | Medium |
Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
# Generate cross-language weighted-OLS parity fixture for HAD Phase 4.5.
2+
#
3+
# Purpose: no public weighted-CCF reference exists for bias-corrected
4+
# local-linear (nprobust::lprobust has no weight argument; np::npreg uses
5+
# its own internal local-linear algorithm that does not reduce to a
6+
# straightforward weighted OLS at the intercept). To validate the weighted
7+
# kernel-composition machinery in diff_diff._nprobust_port cross-language,
8+
# we record the intercept from an R implementation of the SAME formula
9+
# (weighted-OLS with one-sided Epanechnikov kernel) that the Python port
10+
# implements. Bit-parity at atol=1e-14 locks in numerical consistency
11+
# across BLAS reductions.
12+
#
13+
# This is NOT third-party validation of the weighted-CCF methodology. It is
14+
# a regression lock against R↔Python drift on the weighted-OLS formula
15+
# itself. Methodology confidence under informative weights comes from:
16+
# 1. Analytic derivation in docs/methodology/REGISTRY.md
17+
# 2. Uniform-weights bit-parity: weights=np.ones ≡ unweighted at 1e-14
18+
# 3. Monte Carlo oracle consistency (tests/test_had_mc.py)
19+
#
20+
# Usage:
21+
# Rscript benchmarks/R/generate_np_npreg_weighted_golden.R
22+
#
23+
# Output:
24+
# benchmarks/data/np_npreg_weighted_golden.json
25+
#
26+
# Phase 4.5 of HeterogeneousAdoptionDiD (de Chaisemartin et al. 2026).
27+
# Python test loader: tests/test_np_npreg_weighted_parity.py.
28+
29+
library(jsonlite)
30+
31+
# -------------------------------------------------------------------------
32+
# Weighted local-linear at a boundary: manual weighted OLS with Epa kernel.
33+
# Matches diff_diff/local_linear.py::local_linear_fit exactly.
34+
# -------------------------------------------------------------------------
35+
36+
weighted_local_linear <- function(d, y, weights, eval_point = 0.0, h = 0.3) {
37+
# One-sided epanechnikov on [0, 1]: k(u) = (3/4)(1 - u^2), zero elsewhere.
38+
u <- (d - eval_point) / h
39+
kw <- ifelse(u >= 0 & u <= 1, 0.75 * (1 - u^2), 0)
40+
# Combined weights: user weights * kernel weights.
41+
combined <- kw * weights
42+
# Active window (non-zero combined weight).
43+
active <- combined > 0
44+
if (sum(active) < 2) {
45+
stop("Active window has fewer than 2 observations.")
46+
}
47+
# Weighted OLS of y ~ 1 + (d - eval_point), intercept is mu_hat at
48+
# eval_point.
49+
fit <- lm(y[active] ~ I(d[active] - eval_point), weights = combined[active])
50+
mu_hat <- as.numeric(coef(fit)[1])
51+
slope_hat <- as.numeric(coef(fit)[2])
52+
list(
53+
mu_hat = mu_hat,
54+
slope = slope_hat,
55+
n_active = as.integer(sum(active)),
56+
h = h,
57+
eval_point = eval_point
58+
)
59+
}
60+
61+
# -------------------------------------------------------------------------
62+
# DGPs: deterministic seeds for reproducibility.
63+
# -------------------------------------------------------------------------
64+
65+
set.seed(20260424)
66+
67+
dgp1 <- local({
68+
G <- 500
69+
d <- runif(G, 0, 1)
70+
y <- 2 * d + 0.3 * d^2 + rnorm(G, sd = 0.25)
71+
w <- rep(1.0, G)
72+
list(d = d, y = y, w = w, eval_point = 0.0, h = 0.30,
73+
description = "Uniform weights, G=500, boundary=0")
74+
})
75+
76+
dgp2 <- local({
77+
G <- 400
78+
d <- runif(G, 0, 1)
79+
y <- 2 * d + 0.3 * d^2 + rnorm(G, sd = 0.25)
80+
w <- exp(-d * 2.0)
81+
list(d = d, y = y, w = w, eval_point = 0.0, h = 0.25,
82+
description = "Informative weights (exp decay from boundary), G=400")
83+
})
84+
85+
dgp3 <- local({
86+
G <- 200
87+
d <- runif(G, 0, 1)
88+
y <- 3 * d - d^2 + 0.5 * d^3 + rnorm(G, sd = 0.30)
89+
w <- pmax(0.1, runif(G, 0.5, 1.5))
90+
list(d = d, y = y, w = w, eval_point = 0.0, h = 0.35,
91+
description = "Small G=200, nonlinear m(d), bounded heterogeneous weights")
92+
})
93+
94+
dgp4 <- local({
95+
G <- 400
96+
d_lower <- 0.1
97+
d <- runif(G, d_lower, 1)
98+
y <- 2 * (d - d_lower) + 0.3 * (d - d_lower)^2 + rnorm(G, sd = 0.25)
99+
w <- rep(1.0, G)
100+
list(d = d - d_lower, y = y, w = w, eval_point = 0.0, h = 0.30,
101+
description = "G=400, d_lower=0.1 shifted boundary=0 (Design 1 near-d_lower)",
102+
d_lower = d_lower)
103+
})
104+
105+
# -------------------------------------------------------------------------
106+
# Run each DGP through the weighted local-linear reference.
107+
# -------------------------------------------------------------------------
108+
109+
run_one <- function(name, dgp) {
110+
cat(sprintf("Running %s: %s\n", name, dgp$description))
111+
res <- weighted_local_linear(
112+
d = dgp$d, y = dgp$y, weights = dgp$w,
113+
eval_point = dgp$eval_point, h = dgp$h
114+
)
115+
list(
116+
description = dgp$description,
117+
n = length(dgp$d),
118+
d = as.numeric(dgp$d),
119+
y = as.numeric(dgp$y),
120+
weights = as.numeric(dgp$w),
121+
eval_point = as.numeric(res$eval_point),
122+
h = as.numeric(res$h),
123+
kernel = "epanechnikov",
124+
n_active = res$n_active,
125+
mu_hat = as.numeric(res$mu_hat),
126+
slope = as.numeric(res$slope)
127+
)
128+
}
129+
130+
out <- list(
131+
metadata = list(
132+
r_version = paste(R.Version()$major, R.Version()$minor, sep = "."),
133+
seed = 20260424L,
134+
generator = "generate_np_npreg_weighted_golden.R",
135+
algorithm = "manual weighted OLS with one-sided Epanechnikov kernel",
136+
purpose = "HAD Phase 4.5 cross-language weighted-LL parity",
137+
note = paste(
138+
"Regression lock on the weighted kernel + weighted OLS formula",
139+
"implemented in diff_diff.local_linear.local_linear_fit. Not a",
140+
"third-party validation of weighted-CCF methodology; see REGISTRY",
141+
"'Weighted extension (Phase 4.5)' for the parity-gap acknowledgement."
142+
)
143+
),
144+
dgp1 = run_one("dgp1", dgp1),
145+
dgp2 = run_one("dgp2", dgp2),
146+
dgp3 = run_one("dgp3", dgp3),
147+
dgp4 = run_one("dgp4", dgp4)
148+
)
149+
150+
out_path <- "benchmarks/data/np_npreg_weighted_golden.json"
151+
dir.create(dirname(out_path), recursive = TRUE, showWarnings = FALSE)
152+
153+
write_json(out, out_path, auto_unbox = TRUE, pretty = TRUE, digits = 14)
154+
cat(sprintf("Wrote %s\n", out_path))

0 commit comments

Comments
 (0)