Skip to content

Commit 2042b7c

Browse files
igerberclaude
andcommitted
Phase 4.5 C: HAD linearity-family pretests under survey
Closes the Phase 4.5 C0 promise (PR #367 commit 29f8b12). Linearity- family pretests now accept survey=/weights= keyword-only kwargs: - stute_test, yatchew_hr_test, stute_joint_pretest, joint_pretrends_test, joint_homogeneity_test, did_had_pretest_workflow. Stute family: PSU-level Mammen multiplier bootstrap via generate_survey_multiplier_weights_batch. Each replicate draws (B, n_psu) Mammen multipliers, broadcast to per-obs perturbation eta_obs[g] = eta_psu[psu(g)], weighted OLS refit, weighted CvM via new _cvm_statistic_weighted helper. Joint Stute SHARES the multiplier matrix across horizons within each replicate, preserving both vector-valued empirical-process unit-level dependence (Delgado 1993; Escanciano 2006) AND PSU clustering (Krieger-Pfeffermann 1997). NOT Rao-Wu rescaling -- multiplier bootstrap is a different mechanism. Yatchew: closed-form weighted OLS + pweight-sandwich variance components (no bootstrap): sigma2_lin = sum(w * eps^2) / sum(w) sigma2_diff = sum(w_avg * diff^2) / (2 * sum(w)) [Reviewer CRITICAL #2] sigma4_W = sum(w_avg * eps_g^2 * eps_{g-1}^2) / sum(w_avg) T_hr = sqrt(sum(w)) * (sigma2_lin - sigma2_diff) / sigma2_W where w_avg_g = (w_g + w_{g-1}) / 2 (Krieger-Pfeffermann 1997 Section 3). All three components reduce bit-exactly to existing unweighted formulas at w=ones(G); locked at atol=1e-14 by direct helper test. Workflow under survey/weights: skips the QUG step with UserWarning (per C0 deferral), sets qug=None on the report, dispatches the linearity family with survey-aware mechanism, appends "linearity-conditional verdict; QUG-under-survey deferred per Phase 4.5 C0" suffix to the verdict. all_pass drops the QUG-conclusiveness gate (one less precondition). HADPretestReport.qug retyped from QUGTestResults to Optional[QUGTestResults]; summary/to_dict/to_dataframe updated to None-tolerant rendering. Pweight shortcut routing: weights= passes through a synthetic trivial ResolvedSurveyDesign (new survey._make_trivial_resolved helper) so the same kernel handles both entry paths -- mirrors PR #363's R7 fix pattern on HAD sup-t. Replicate-weight survey designs (BRR/Fay/JK1/JKn/SDR) raise NotImplementedError at every entry point (defense in depth, reciprocal- guard discipline). The per-replicate weight-ratio rescaling for the OLS-on-residuals refit step is not covered by the multiplier-bootstrap composition; deferred to a parallel follow-up. Per-row weights= / survey=col aggregated to per-unit via existing HAD helpers (_aggregate_unit_weights, _aggregate_unit_resolved_survey; constant-within-unit invariant enforced) through new _resolve_pretest_unit_weights helper. Strictly-positive weights required on Yatchew (the adjacent-difference variance is undefined under contiguous-zero blocks). Stability invariants preserved: - Unweighted code paths bit-exact pre-PR (the new survey/weights branch is a separate if arm; existing 138 pretest tests pass unchanged). - Yatchew weighted variance components reduce to unweighted at w=1 at atol=1e-14 (locked by TestYatchewHRTestSurvey). - HADPretestReport schema bit-exact on the unweighted path; qug=None triggers the new None-tolerant rendering only on the survey path. 20 new tests across TestHADPretestWorkflowSurveyGuards (revised from C0 rejection-only to C functional + 2 mutex/replicate-weight retained), TestStuteTestSurvey (7), TestYatchewHRTestSurvey (7), TestJointStuteSurvey (5). Full pretest suite: 158 tests pass. Patch-level addition (additive on stable surfaces). See docs/methodology/REGISTRY.md "QUG Null Test" -- Note (Phase 4.5 C) for the full methodology. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 2899e6c commit 2042b7c

6 files changed

Lines changed: 1269 additions & 224 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
88
## [Unreleased]
99

1010
### Added
11+
- **HAD linearity-family pretests under survey (Phase 4.5 C).** `stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`, and `did_had_pretest_workflow` now accept `weights=` / `survey=` keyword-only kwargs. Stute family uses **PSU-level Mammen multiplier bootstrap** via `bootstrap_utils.generate_survey_multiplier_weights_batch` (the same kernel as PR #363's HAD event-study sup-t bootstrap): each replicate draws an `(n_bootstrap, n_psu)` Mammen multiplier matrix, broadcast to per-obs perturbation `eta_obs[g] = eta_psu[psu(g)]`, weighted OLS refit, weighted CvM via new `_cvm_statistic_weighted` helper. Joint Stute SHARES the multiplier matrix across horizons within each replicate, preserving both the vector-valued empirical-process unit-level dependence AND PSU clustering. Yatchew uses **closed-form weighted OLS + pweight-sandwich variance components** (no bootstrap): `sigma2_lin = sum(w·eps²)/sum(w)`, `sigma2_diff = sum(w_avg·diff²)/(2·sum(w))` with arithmetic-mean pair weights `w_avg_g = (w_g+w_{g-1})/2`, `sigma4_W = sum(w_avg·prod)/sum(w_avg)`, `T_hr = sqrt(sum(w))·(sigma2_lin-sigma2_diff)/sigma2_W`. All three Yatchew components reduce bit-exactly to the unweighted formulas at `w=ones(G)` (locked at `atol=1e-14` by direct helper test). The pweight `weights=` shortcut routes through a synthetic trivial `ResolvedSurveyDesign` (new `survey._make_trivial_resolved` helper) so the same kernel handles both entry paths. `did_had_pretest_workflow(..., survey=, weights=)` removes the Phase 4.5 C0 `NotImplementedError`, dispatches to the survey-aware sub-tests, **skips the QUG step with `UserWarning`** (per C0 deferral), sets `qug=None` on the report, and appends a `"linearity-conditional verdict; QUG-under-survey deferred per Phase 4.5 C0"` suffix to the verdict. `HADPretestReport.qug` retyped from `QUGTestResults` to `Optional[QUGTestResults]`; `summary()` / `to_dict()` / `to_dataframe()` updated to None-tolerant rendering. Replicate-weight survey designs (BRR/Fay/JK1/JKn/SDR) raise `NotImplementedError` at every entry point (defense in depth, reciprocal-guard discipline) — parallel follow-up after this PR. Strictly positive weights required on Yatchew (the adjacent-difference variance is undefined under contiguous-zero blocks). Per-row `weights=` / `survey=col` aggregated to per-unit via existing HAD helpers `_aggregate_unit_weights` / `_aggregate_unit_resolved_survey` (constant-within-unit invariant enforced). Unweighted code paths preserved bit-exactly. Patch-level addition (additive on stable surfaces). See `docs/methodology/REGISTRY.md` § "QUG Null Test" — Note (Phase 4.5 C) for the full methodology.
1112
- **`ChaisemartinDHaultfoeuille.by_path` + `placebo=True`** — per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max`. The same per-path SE convention used for the event-study (joiners/leavers IF precedent: switcher-side contributions zeroed for non-path groups; cohort structure and control pool unchanged; plug-in SE with path-specific divisor `N^{pl}_{l, path}`) is applied to backward horizons via the new `switcher_subset_mask` parameter on `_compute_per_group_if_placebo_horizon`. Surfaced on `results.path_placebo_event_study[path][-l]` (negative-int inner keys mirroring `placebo_event_study`); `summary()` renders the rows alongside per-path event-study horizons; `to_dataframe(level="by_path")` emits negative-horizon rows alongside the existing positive-horizon rows. **Bootstrap** (when `n_bootstrap > 0`) propagates per-`(path, lag)` percentile CI / p-value through the same `_bootstrap_one_target` dispatch as the per-path event-study, with the canonical NaN-on-invalid contract enforced on the new surface (PR #364 library-wide invariant). **SE inherits the cross-path cohort-sharing deviation from R** documented for `path_effects` (full-panel cohort-centered plug-in vs R's per-path re-run): tracks R within tolerance on single-path-cohort panels, diverges materially on cohort-mixed panels — the bootstrap SE is a Monte Carlo analog of the analytical SE and inherits the same deviation. R-parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` on the new `multi_path_reversible_by_path_placebo` scenario (point estimates exact match; SE within Phase-2 envelope rtol ≤ 5%); positive analytical + bootstrap invariants at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (and the gated `::TestBootstrap` subclass). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path ...)` → "Per-path placebos" for the full contract.
1213

1314
## [3.3.0] - 2026-04-25

TODO.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ Deferred items from PR reviews that were not addressed before merge.
9595
| `HeterogeneousAdoptionDiD` joint cross-horizon analytical covariance on the weighted event-study path: Phase 4.5 B ships multiplier-bootstrap sup-t simultaneous CIs on the weighted event-study path but pointwise analytical variance is still independent across horizons. A follow-up could derive the full H × H analytical covariance from the per-horizon IF matrix (`Psi.T @ Psi` under survey weighting) for an analytical alternative to the bootstrap. Would also let the unweighted event-study path ship a sup-t band. | `diff_diff/had.py::_fit_event_study` | follow-up | Low |
9696
| `HeterogeneousAdoptionDiD` unweighted event-study sup-t band: Phase 4.5 B ships sup-t only on the WEIGHTED event-study path (to preserve pre-PR bit-exact output on unweighted). Extending sup-t to unweighted event-study (either via the multiplier bootstrap with unit-level iid multipliers or via analytical joint cross-horizon covariance) is a symmetric follow-up. | `diff_diff/had.py::_fit_event_study` | follow-up | Low |
9797
| `HeterogeneousAdoptionDiD` survey-aware support-endpoint test (research, not engineering): if the academic literature ever publishes a calibrated support-infimum test under complex sampling — combining endpoint-estimation EVT (Hall 1982, Aarssen-de Haan 1994, Hall-Wang 1999) with survey-aware functional CLTs for the empirical process (Boistard-Lopuhaä-Ruiz-Gazen 2017, Bertail-Chautru-Clémençon 2017) and tail-empirical-process theory (Drees 2003) — Phase 4.5 C0's permanent NotImplementedError on `qug_test(..., survey=...)` / `weights=` can be revisited and the bridge implemented against the published recipe. See `docs/methodology/REGISTRY.md` § "QUG Null Test" — Note (Phase 4.5 C0) for the decision rationale and the research-direction sketch. | `diff_diff/had_pretests.py::qug_test` | Phase 4.5 C0 (2026-04, decision shipped) | Low |
98-
| `HeterogeneousAdoptionDiD` Phase 4.5 C: pretests under survey (`stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`, `did_had_pretest_workflow`). Rao-Wu rescaled bootstrap for the Stute-family (weighted η generation + PSU clustering in the bootstrap draw); weighted OLS residuals + weighted variance estimator for Yatchew. | `diff_diff/had_pretests.py` | Phase 4.5 C | Medium |
98+
| `HeterogeneousAdoptionDiD` survey-aware pretests with replicate-weight designs (BRR/Fay/JK1/JKn/SDR): Phase 4.5 C ships pweight + PSU/strata/FPC support via PSU-level Mammen multiplier bootstrap (Stute family) + closed-form weighted variance components (Yatchew); replicate-weight pretests are deferred — the per-replicate weight-ratio rescaling for the OLS-on-residuals refit step is not covered by the multiplier-bootstrap composition. Each linearity-family helper raises `NotImplementedError` on `survey.replicate_weights is not None`. | `diff_diff/had_pretests.py` | Phase 4.5 C follow-up | Low |
9999
| `HeterogeneousAdoptionDiD` Phase 4.5: weight-aware auto-bandwidth MSE-DPI selector. Phase 4.5 A ships weighted `lprobust` with an unweighted DPI selector; users who want a weight-aware bandwidth must pass `h`/`b` explicitly. Extending `lpbwselect_mse_dpi` to propagate weights through density, second-derivative, and variance stages is ~300 LoC of methodology and was out of scope. | `diff_diff/_nprobust_port.py::lpbwselect_mse_dpi` | Phase 4.5 | Low |
100100
| `HeterogeneousAdoptionDiD` Phase 4.5 C: replicate-weight SurveyDesigns (BRR / Fay / JK1 / JKn / SDR) on the continuous-dose paths. Phase 4.5 A raises `NotImplementedError` on replicate designs in `_aggregate_unit_resolved_survey`. Rao-Wu-style replicate bootstrap for HAD paths requires deriving the per-replicate weight-ratio rescaling for the local-linear intercept IF. | `diff_diff/had.py::_aggregate_unit_resolved_survey` | Phase 4.5 C | Low |
101101
| `HeterogeneousAdoptionDiD` mass-point: `vcov_type in {"hc2", "hc2_bm"}` raises `NotImplementedError` pending a 2SLS-specific leverage derivation. The OLS leverage `x_i' (X'X)^{-1} x_i` is wrong for 2SLS; the correct finite-sample correction uses `x_i' (Z'X)^{-1} (...) (X'Z)^{-1} x_i`. Needs derivation plus an R / Stata (`ivreg2 small robust`) parity anchor. | `diff_diff/had.py::_fit_mass_point_2sls` | Phase 2a | Medium |

0 commit comments

Comments
 (0)