Skip to content

Commit 7ceff73

Browse files
igerberclaude
andcommitted
Phase 4.5 C0: QUG-under-survey decision gate
Add survey=/weights= kwargs to qug_test and did_had_pretest_workflow as keyword-only with default None. Both raise NotImplementedError when either kwarg is non-None, with an educational message naming the methodology rationale and pointing users to joint Stute (Phase 4.5 C, planned) as the survey-compatible alternative. Mutex guard on survey=+weights= mirrors HeterogeneousAdoptionDiD.fit() at had.py:2890. QUG-under-survey is permanently deferred. The test statistic uses extreme order statistics (D_(1), D_(2)) which are not smooth functionals of the empirical CDF -- standard survey machinery (Binder TSL, Rao-Wu rescaled bootstrap, Krieger-Pfeffermann (1997) EDF tests) does not yield a calibrated test; under cluster sampling the Exp(1)/Exp(1) limit law's independence assumption breaks; and the EVT-under-unequal- probability-sampling literature (Quintos et al. 2001, Beirlant et al.) addresses tail-index estimation, not boundary tests. The workflow's gate is temporary -- Phase 4.5 C will close it for the linearity-family pretests (stute_test, yatchew_hr_test, joint variants) via Rao-Wu rescaled bootstrap. Sister pretests keep their closed signatures in this release; Phase 4.5 C will add kwargs and implementation together to avoid API churn. - 11 new tests: 5 on TestQUGTest covering rejection / mutex / message- text checks / unweighted regression; 6 on new TestHADPretestWorkflowSurveyGuards covering both kwarg paths, mutex, methodology pointer, both aggregate paths, and unweighted regression. - docs/methodology/REGISTRY.md: Note (Phase 4.5 C0) under QUG section with three-reason rationale plus a research-direction sketch (the theoretical bridge would combine Hall 1982 / Aarssen-de Haan 1994 / Hall-Wang 1999 endpoint EVT, Boistard-Lopuhaa-Ruiz-Gazen 2017 / Bertail-Chautru-Clemencon 2017 survey-aware functional CLT, and Drees 2003 tail-empirical-process theory -- publishable methodology research, not engineering work). - CHANGELOG.md: Phase 4.5 C0 entry under [Unreleased]. - TODO.md: replaces decision-gate row with carry-forward research row. Unweighted qug_test(d) and did_had_pretest_workflow(...) calls are bit-exact pre-PR (kwargs are keyword-only after *; positional path unchanged). 138 pretest tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent b4d4628 commit 7ceff73

5 files changed

Lines changed: 335 additions & 5 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
88
## [Unreleased]
99

1010
### Added
11+
- **`qug_test` and `did_had_pretest_workflow` survey-aware NotImplementedError gates (Phase 4.5 C0 decision gate).** `qug_test(d, *, survey=None, weights=None)` and `did_had_pretest_workflow(..., *, survey=None, weights=None)` now accept the two kwargs as keyword-only with default `None`. Passing either non-`None` raises `NotImplementedError` with an educational message naming the methodology rationale and pointing users to joint Stute (Phase 4.5 C, planned) as the survey-compatible alternative. Mutex guard on `survey=` + `weights=` mirrors `HeterogeneousAdoptionDiD.fit()` at `had.py:2890`. **QUG-under-survey is permanently deferred** — the test statistic uses extreme order statistics `D_{(1)}, D_{(2)}` which are NOT smooth functionals of the empirical CDF, so standard survey machinery (Binder-TSL linearization, Rao-Wu rescaled bootstrap, Krieger-Pfeffermann (1997) EDF tests) does not yield a calibrated test; under cluster sampling the `Exp(1)/Exp(1)` limit law's independence assumption breaks; and the EVT-under-unequal-probability-sampling literature (Quintos et al. 2001, Beirlant et al.) addresses tail-index estimation, not boundary tests. The workflow's gate is **temporary** — Phase 4.5 C will close it for the linearity-family pretests (`stute_test`, `yatchew_hr_test`, joint variants) via Rao-Wu rescaled bootstrap. Sister pretests (`stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`) keep their closed signatures in this release — Phase 4.5 C will add kwargs and implementation together to avoid API churn. Unweighted `qug_test(d)` and `did_had_pretest_workflow(...)` calls are bit-exact pre-PR (kwargs are keyword-only after `*`; positional path unchanged). New tests at `tests/test_had_pretests.py::TestQUGTest` (5 rejection / mutex / message / regression tests) and the new `TestHADPretestWorkflowSurveyGuards` class (6 tests covering both kwarg paths, mutex, methodology pointer, both aggregate paths, and unweighted regression). See `docs/methodology/REGISTRY.md` § "QUG Null Test" — Note (Phase 4.5 C0) for the full methodology rationale plus a sketch of the (out-of-scope) theoretical bridge that combines endpoint-estimation EVT (Hall 1982, Aarssen-de Haan 1994, Hall-Wang 1999, Beirlant-de Wet-Goegebeur 2006), survey-aware functional CLTs (Boistard-Lopuhaä-Ruiz-Gazen 2017, Bertail-Chautru-Clémençon 2017), and tail-empirical-process theory (Drees 2003) — publishable methodology research, not engineering work.
1112
- **`HeterogeneousAdoptionDiD` mass-point `survey=` / `weights=` + event-study `aggregate="event_study"` survey composition + multiplier-bootstrap sup-t simultaneous confidence band (Phase 4.5 B).** Closes the two Phase 4.5 A `NotImplementedError` gates: `design="mass_point" + weights/survey` and `aggregate="event_study" + weights/survey`. Weighted 2SLS sandwich in `_fit_mass_point_2sls` follows the Wooldridge 2010 Ch. 12 pweight convention (`w²` in the HC1 meat, `w·u` in the CR1 cluster score, weighted bread `Z'WX`); HC1 and CR1 ("stata" `se_type`) bit-parity with `estimatr::iv_robust(..., weights=, clusters=)` at `atol=1e-10` (new cross-language golden at `benchmarks/data/estimatr_iv_robust_golden.json`, generated by `benchmarks/R/generate_estimatr_iv_robust_golden.R`; `estimatr` added to `benchmarks/R/requirements.R`). `_fit_mass_point_2sls` gains `weights=` + `return_influence=` kwargs and now always returns a 3-tuple `(beta, se, psi)` — `psi` is the per-unit IF on the β̂-scale scaled so `compute_survey_if_variance(psi, trivial_resolved) ≈ V_HC1[1,1]` at `atol=1e-10` (PR #359 IF scale convention applied uniformly; no `sum(psi²)` claims). Event-study per-horizon variance: `survey=` path composes Binder-TSL via `compute_survey_if_variance`; `weights=` shortcut uses the analytical weighted-robust SE (continuous: CCT-2014 `bc_fit.se_robust / |den|`; mass-point: weighted 2SLS pweight sandwich from `_fit_mass_point_2sls` — HC1 / classical / CR1). `survey_metadata` / `variance_formula` / `effective_dose_mean` populated in both regimes (previously hardcoded `None` at `had.py:3366`). New multiplier-bootstrap sup-t: `_sup_t_multiplier_bootstrap` reuses `diff_diff.bootstrap_utils.generate_survey_multiplier_weights_batch` for PSU-level draws with stratum centering + sqrt(n_h/(n_h-1)) small-sample correction + FPC scaling + lonely-PSU handling. On the `weights=` shortcut, sup-t calibration is routed through a synthetic trivial `ResolvedSurveyDesign` so the centered + small-sample-corrected branch fires uniformly — targets the analytical HC1 variance family (`compute_survey_if_variance(IF, trivial) ≈ V_HC1` per the PR #359 IF scale invariant) rather than the raw `sum(ψ²) = ((n-1)/n) · V_HC1` that unit-level Rademacher multipliers would produce on the HC1-scaled IF. Perturbations: `delta = weights @ IF` with NO `(1/n)` prefactor (matching `staggered_bootstrap.py:373` idiom), normalized by per-horizon analytical SE, `(1-alpha)`-quantile of the sup-t distribution. At H=1 the quantile reduces to `Φ⁻¹(1 − alpha/2) ≈ 1.96` up to MC noise (regression-locked by `TestSupTReducesToNormalAtH1`). `HeterogeneousAdoptionDiD.__init__` gains `n_bootstrap: int = 999` and `seed: Optional[int] = None` (CS-parity singular seed); `fit()` gains `cband: bool = True` (only consulted on weighted event-study). `HeterogeneousAdoptionDiDEventStudyResults` extended with `variance_formula`, `effective_dose_mean`, `cband_low`, `cband_high`, `cband_crit_value`, `cband_method`, `cband_n_bootstrap` (all `None` on unweighted fits); surfaced in `to_dict`, `to_dataframe`, `summary`, `__repr__`. Unweighted event-study with `cband=False` preserves pre-Phase 4.5 B numerical output bit-exactly (stability invariant, locked by regression tests). Zero-weight subpopulation convention carries over from PR #359 (filter for design decisions; preserve full ResolvedSurveyDesign for variance). Non-pweight SurveyDesigns (`aweight`, `fweight`, replicate designs) raise `NotImplementedError` on both new paths (reciprocal-guard discipline). Pretest surfaces (`qug_test`, `stute_test`, `yatchew_hr_test`, joint variants, `did_had_pretest_workflow`) remain unweighted in this release — Phase 4.5 C / C0. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Weighted 2SLS (Phase 4.5 B)", "Event-study survey composition", and "Sup-t multiplier bootstrap" for derivations and invariants.
1213
- **`HeterogeneousAdoptionDiD.fit(survey=..., weights=...)` on continuous-dose paths (Phase 4.5 survey support).** The `continuous_at_zero` (paper Design 1') and `continuous_near_d_lower` (Design 1 continuous-near-d̲) designs accept survey weights through two interchangeable kwargs: `weights=<array>` (pweight shortcut, weighted-robust SE from the CCT-2014 lprobust port) and `survey=SurveyDesign(weights, strata, psu, fpc)` (design-based inference via Binder-TSL variance using the existing `compute_survey_if_variance` helper at `diff_diff/survey.py:1802`). Point estimates match across both entry paths; SE diverges by design (pweight-only vs PSU-aggregated). `HeterogeneousAdoptionDiDResults.survey_metadata` is a repo-standard `SurveyMetadata` dataclass (weight_type / effective_n / design_effect / sum_weights / weight_range / n_strata / n_psu / df_survey); HAD-specific extras (`variance_formula` label, `effective_dose_mean`) are separate top-level result fields. `to_dict()` surfaces the full `SurveyMetadata` object plus `variance_formula` + `effective_dose_mean`; `summary()` renders `variance_formula`, `effective_n`, `effective_dose_mean`, and (when the survey= path is used) `df_survey`; `__repr__` surfaces `variance_formula` + `effective_dose_mean` when present. The HAD `mass_point` design and `aggregate="event_study"` path raise `NotImplementedError` under survey/weights (deferred to Phase 4.5 B: weighted 2SLS + event-study survey composition); the HAD pretests stay unweighted in this release (Phase 4.5 C). Parity ceiling acknowledged — no public weighted-CCF bias-corrected local-linear reference exists in any language; methodology confidence comes from (1) uniform-weights bit-parity at `atol=1e-14` on the full lprobust output struct, (2) cross-language weighted-OLS parity (manual R reference) at `atol=1e-12`, and (3) Monte Carlo oracle consistency on known-τ DGPs. `_nprobust_port.lprobust` gains `weights=` and `return_influence=` (used internally by the Binder-TSL path); `bias_corrected_local_linear` removes the Phase 1c `NotImplementedError` on `weights=` and forwards. Auto-bandwidth selection remains unweighted in this release — pass `h`/`b` explicitly for weight-aware bandwidths. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Weighted extension (Phase 4.5 survey support)".
1314
- **`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test` + `StuteJointResult`** (HeterogeneousAdoptionDiD Phase 3 follow-up). Joint Cramér-von Mises pretests across K horizons with shared-η Mammen wild bootstrap (preserves vector-valued empirical-process unit-level dependence per Delgado-Manteiga 2001 / Hlávka-Hušková 2020). The core `stute_joint_pretest` is residuals-in; two thin data-in wrappers construct per-horizon residuals for the two nulls the paper spells out: mean-independence (step 2 pre-trends, `OLS(Y_t − Y_base ~ 1)` per pre-period) and linearity (step 3 joint, `OLS(Y_t − Y_base ~ 1 + D)` per post-period). Sum-of-CvMs aggregation (`S_joint = Σ_k S_k`); per-horizon scale-invariant exact-linear short-circuit. Closes the paper Section 4.2 step-2 gap that Phase 3 `did_had_pretest_workflow` previously flagged with an "Assumption 7 pre-trends test NOT run" caveat. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Joint Stute tests" for algorithm, invariants, and scope exclusion of Eq 18 linear-trend detrending (deferred to Phase 4 Pierce-Schott replication).

TODO.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ Deferred items from PR reviews that were not addressed before merge.
9494
| `HeterogeneousAdoptionDiD` event-study staggered-timing beyond last cohort: Phase 2b auto-filters staggered panels to the last cohort per paper Appendix B.2. Earlier-cohort treatment effects are not identified by HAD; redirecting to `ChaisemartinDHaultfoeuille` / `did_multiplegt_dyn` is the paper's prescription. A full staggered HAD would require a different identification path (out of paper scope). | `diff_diff/had.py::_validate_had_panel_event_study` | Phase 2b | Low |
9595
| `HeterogeneousAdoptionDiD` joint cross-horizon analytical covariance on the weighted event-study path: Phase 4.5 B ships multiplier-bootstrap sup-t simultaneous CIs on the weighted event-study path but pointwise analytical variance is still independent across horizons. A follow-up could derive the full H × H analytical covariance from the per-horizon IF matrix (`Psi.T @ Psi` under survey weighting) for an analytical alternative to the bootstrap. Would also let the unweighted event-study path ship a sup-t band. | `diff_diff/had.py::_fit_event_study` | follow-up | Low |
9696
| `HeterogeneousAdoptionDiD` unweighted event-study sup-t band: Phase 4.5 B ships sup-t only on the WEIGHTED event-study path (to preserve pre-PR bit-exact output on unweighted). Extending sup-t to unweighted event-study (either via the multiplier bootstrap with unit-level iid multipliers or via analytical joint cross-horizon covariance) is a symmetric follow-up. | `diff_diff/had.py::_fit_event_study` | follow-up | Low |
97-
| `HeterogeneousAdoptionDiD` Phase 4.5 C0: QUG-under-survey decision gate. `qug_test` uses a ratio of extreme order statistics `D_{(1)} / (D_{(2)} - D_{(1)})` — extreme-value theory under inverse-probability weighting is a research area, not a standard toolkit. Lit-review Guillou-Hall (2001), Chen-Chen (2004); likely outcome is `NotImplementedError` on `qug_test(..., weights=...)` with a clear pointer to the Stute/Yatchew/joint pretests as the survey-supported alternatives. | `diff_diff/had_pretests.py::qug_test` | Phase 4.5 C0 | Low |
97+
| `HeterogeneousAdoptionDiD` survey-aware support-endpoint test (research, not engineering): if the academic literature ever publishes a calibrated support-infimum test under complex sampling — combining endpoint-estimation EVT (Hall 1982, Aarssen-de Haan 1994, Hall-Wang 1999) with survey-aware functional CLTs for the empirical process (Boistard-Lopuhaä-Ruiz-Gazen 2017, Bertail-Chautru-Clémençon 2017) and tail-empirical-process theory (Drees 2003) — Phase 4.5 C0's permanent NotImplementedError on `qug_test(..., survey=...)` / `weights=` can be revisited and the bridge implemented against the published recipe. See `docs/methodology/REGISTRY.md` § "QUG Null Test" — Note (Phase 4.5 C0) for the decision rationale and the research-direction sketch. | `diff_diff/had_pretests.py::qug_test` | Phase 4.5 C0 (2026-04, decision shipped) | Low |
9898
| `HeterogeneousAdoptionDiD` Phase 4.5 C: pretests under survey (`stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`, `did_had_pretest_workflow`). Rao-Wu rescaled bootstrap for the Stute-family (weighted η generation + PSU clustering in the bootstrap draw); weighted OLS residuals + weighted variance estimator for Yatchew. | `diff_diff/had_pretests.py` | Phase 4.5 C | Medium |
9999
| `HeterogeneousAdoptionDiD` Phase 4.5: weight-aware auto-bandwidth MSE-DPI selector. Phase 4.5 A ships weighted `lprobust` with an unweighted DPI selector; users who want a weight-aware bandwidth must pass `h`/`b` explicitly. Extending `lpbwselect_mse_dpi` to propagate weights through density, second-derivative, and variance stages is ~300 LoC of methodology and was out of scope. | `diff_diff/_nprobust_port.py::lpbwselect_mse_dpi` | Phase 4.5 | Low |
100100
| `HeterogeneousAdoptionDiD` Phase 4.5 C: replicate-weight SurveyDesigns (BRR / Fay / JK1 / JKn / SDR) on the continuous-dose paths. Phase 4.5 A raises `NotImplementedError` on replicate designs in `_aggregate_unit_resolved_survey`. Rao-Wu-style replicate bootstrap for HAD paths requires deriving the per-replicate weight-ratio rescaling for the local-linear intercept IF. | `diff_diff/had.py::_aggregate_unit_resolved_survey` | Phase 4.5 C | Low |

0 commit comments

Comments
 (0)