You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
R1 P3 — forward-looking docs were internally inconsistent about
Phase 4.5 C survey support for yatchew_hr_test. TODO.md correctly
distinguishes Rao-Wu (Stute family) from weighted-OLS+weighted-variance
(Yatchew); the new C0 docs in had_pretests.py, REGISTRY.md, and
CHANGELOG.md described the whole linearity family as Rao-Wu-based,
which would mislead the next implementer (Yatchew 1997 is a closed-form
variance-ratio test, not bootstrap-based).
Updated all three new docs to spell out the per-test mechanism
explicitly: Rao-Wu rescaled bootstrap for stute_test +
stute_joint_pretest + joint_pretrends_test + joint_homogeneity_test;
weighted OLS residuals + weighted variance estimator for
yatchew_hr_test. Now consistent with TODO.md.
Test message-text assertions still pass (assert "Rao-Wu" in msg holds
because Stute family still uses Rao-Wu).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: CHANGELOG.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
8
8
## [Unreleased]
9
9
10
10
### Added
11
-
- **`qug_test` and `did_had_pretest_workflow` survey-aware NotImplementedError gates (Phase 4.5 C0 decision gate).** `qug_test(d, *, survey=None, weights=None)` and `did_had_pretest_workflow(..., *, survey=None, weights=None)` now accept the two kwargs as keyword-only with default `None`. Passing either non-`None` raises `NotImplementedError` with an educational message naming the methodology rationale and pointing users to joint Stute (Phase 4.5 C, planned) as the survey-compatible alternative. Mutex guard on `survey=` + `weights=` mirrors `HeterogeneousAdoptionDiD.fit()` at `had.py:2890`. **QUG-under-survey is permanently deferred** — the test statistic uses extreme order statistics `D_{(1)}, D_{(2)}` which are NOT smooth functionals of the empirical CDF, so standard survey machinery (Binder-TSL linearization, Rao-Wu rescaled bootstrap, Krieger-Pfeffermann (1997) EDF tests) does not yield a calibrated test; under cluster sampling the `Exp(1)/Exp(1)` limit law's independence assumption breaks; and the EVT-under-unequal-probability-sampling literature (Quintos et al. 2001, Beirlant et al.) addresses tail-index estimation, not boundary tests. The workflow's gate is **temporary** — Phase 4.5 C will close it for the linearity-family pretests (`stute_test`, `yatchew_hr_test`, joint variants) via Rao-Wu rescaled bootstrap. Sister pretests (`stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`) keep their closed signatures in this release — Phase 4.5 C will add kwargs and implementation together to avoid API churn. Unweighted `qug_test(d)` and `did_had_pretest_workflow(...)` calls are bit-exact pre-PR (kwargs are keyword-only after `*`; positional path unchanged). New tests at `tests/test_had_pretests.py::TestQUGTest` (5 rejection / mutex / message / regression tests) and the new `TestHADPretestWorkflowSurveyGuards` class (6 tests covering both kwarg paths, mutex, methodology pointer, both aggregate paths, and unweighted regression). See `docs/methodology/REGISTRY.md` § "QUG Null Test" — Note (Phase 4.5 C0) for the full methodology rationale plus a sketch of the (out-of-scope) theoretical bridge that combines endpoint-estimation EVT (Hall 1982, Aarssen-de Haan 1994, Hall-Wang 1999, Beirlant-de Wet-Goegebeur 2006), survey-aware functional CLTs (Boistard-Lopuhaä-Ruiz-Gazen 2017, Bertail-Chautru-Clémençon 2017), and tail-empirical-process theory (Drees 2003) — publishable methodology research, not engineering work.
11
+
- **`qug_test` and `did_had_pretest_workflow` survey-aware NotImplementedError gates (Phase 4.5 C0 decision gate).** `qug_test(d, *, survey=None, weights=None)` and `did_had_pretest_workflow(..., *, survey=None, weights=None)` now accept the two kwargs as keyword-only with default `None`. Passing either non-`None` raises `NotImplementedError` with an educational message naming the methodology rationale and pointing users to joint Stute (Phase 4.5 C, planned) as the survey-compatible alternative. Mutex guard on `survey=` + `weights=` mirrors `HeterogeneousAdoptionDiD.fit()` at `had.py:2890`. **QUG-under-survey is permanently deferred** — the test statistic uses extreme order statistics `D_{(1)}, D_{(2)}` which are NOT smooth functionals of the empirical CDF, so standard survey machinery (Binder-TSL linearization, Rao-Wu rescaled bootstrap, Krieger-Pfeffermann (1997) EDF tests) does not yield a calibrated test; under cluster sampling the `Exp(1)/Exp(1)` limit law's independence assumption breaks; and the EVT-under-unequal-probability-sampling literature (Quintos et al. 2001, Beirlant et al.) addresses tail-index estimation, not boundary tests. The workflow's gate is **temporary** — Phase 4.5 C will close it for the linearity-family pretests with mechanism varying by test: Rao-Wu rescaled bootstrap for `stute_test` and the joint variants (`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`); weighted OLS residuals + weighted variance estimator for `yatchew_hr_test` (Yatchew 1997 is a closed-form variance-ratio test, not bootstrap-based). Sister pretests (`stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`) keep their closed signatures in this release — Phase 4.5 C will add kwargs and implementation together to avoid API churn. Unweighted `qug_test(d)` and `did_had_pretest_workflow(...)` calls are bit-exact pre-PR (kwargs are keyword-only after `*`; positional path unchanged). New tests at `tests/test_had_pretests.py::TestQUGTest` (5 rejection / mutex / message / regression tests) and the new `TestHADPretestWorkflowSurveyGuards` class (6 tests covering both kwarg paths, mutex, methodology pointer, both aggregate paths, and unweighted regression). See `docs/methodology/REGISTRY.md` § "QUG Null Test" — Note (Phase 4.5 C0) for the full methodology rationale plus a sketch of the (out-of-scope) theoretical bridge that combines endpoint-estimation EVT (Hall 1982, Aarssen-de Haan 1994, Hall-Wang 1999, Beirlant-de Wet-Goegebeur 2006), survey-aware functional CLTs (Boistard-Lopuhaä-Ruiz-Gazen 2017, Bertail-Chautru-Clémençon 2017), and tail-empirical-process theory (Drees 2003) — publishable methodology research, not engineering work.
12
12
- **`HeterogeneousAdoptionDiD` mass-point `survey=` / `weights=` + event-study `aggregate="event_study"` survey composition + multiplier-bootstrap sup-t simultaneous confidence band (Phase 4.5 B).** Closes the two Phase 4.5 A `NotImplementedError` gates: `design="mass_point" + weights/survey` and `aggregate="event_study" + weights/survey`. Weighted 2SLS sandwich in `_fit_mass_point_2sls` follows the Wooldridge 2010 Ch. 12 pweight convention (`w²` in the HC1 meat, `w·u` in the CR1 cluster score, weighted bread `Z'WX`); HC1 and CR1 ("stata" `se_type`) bit-parity with `estimatr::iv_robust(..., weights=, clusters=)` at `atol=1e-10` (new cross-language golden at `benchmarks/data/estimatr_iv_robust_golden.json`, generated by `benchmarks/R/generate_estimatr_iv_robust_golden.R`; `estimatr` added to `benchmarks/R/requirements.R`). `_fit_mass_point_2sls` gains `weights=` + `return_influence=` kwargs and now always returns a 3-tuple `(beta, se, psi)` — `psi` is the per-unit IF on the β̂-scale scaled so `compute_survey_if_variance(psi, trivial_resolved) ≈ V_HC1[1,1]` at `atol=1e-10` (PR #359 IF scale convention applied uniformly; no `sum(psi²)` claims). Event-study per-horizon variance: `survey=` path composes Binder-TSL via `compute_survey_if_variance`; `weights=` shortcut uses the analytical weighted-robust SE (continuous: CCT-2014 `bc_fit.se_robust / |den|`; mass-point: weighted 2SLS pweight sandwich from `_fit_mass_point_2sls` — HC1 / classical / CR1). `survey_metadata` / `variance_formula` / `effective_dose_mean` populated in both regimes (previously hardcoded `None` at `had.py:3366`). New multiplier-bootstrap sup-t: `_sup_t_multiplier_bootstrap` reuses `diff_diff.bootstrap_utils.generate_survey_multiplier_weights_batch` for PSU-level draws with stratum centering + sqrt(n_h/(n_h-1)) small-sample correction + FPC scaling + lonely-PSU handling. On the `weights=` shortcut, sup-t calibration is routed through a synthetic trivial `ResolvedSurveyDesign` so the centered + small-sample-corrected branch fires uniformly — targets the analytical HC1 variance family (`compute_survey_if_variance(IF, trivial) ≈ V_HC1` per the PR #359 IF scale invariant) rather than the raw `sum(ψ²) = ((n-1)/n) · V_HC1` that unit-level Rademacher multipliers would produce on the HC1-scaled IF. Perturbations: `delta = weights @ IF` with NO `(1/n)` prefactor (matching `staggered_bootstrap.py:373` idiom), normalized by per-horizon analytical SE, `(1-alpha)`-quantile of the sup-t distribution. At H=1 the quantile reduces to `Φ⁻¹(1 − alpha/2) ≈ 1.96` up to MC noise (regression-locked by `TestSupTReducesToNormalAtH1`). `HeterogeneousAdoptionDiD.__init__` gains `n_bootstrap: int = 999` and `seed: Optional[int] = None` (CS-parity singular seed); `fit()` gains `cband: bool = True` (only consulted on weighted event-study). `HeterogeneousAdoptionDiDEventStudyResults` extended with `variance_formula`, `effective_dose_mean`, `cband_low`, `cband_high`, `cband_crit_value`, `cband_method`, `cband_n_bootstrap` (all `None` on unweighted fits); surfaced in `to_dict`, `to_dataframe`, `summary`, `__repr__`. Unweighted event-study with `cband=False` preserves pre-Phase 4.5 B numerical output bit-exactly (stability invariant, locked by regression tests). Zero-weight subpopulation convention carries over from PR #359 (filter for design decisions; preserve full ResolvedSurveyDesign for variance). Non-pweight SurveyDesigns (`aweight`, `fweight`, replicate designs) raise `NotImplementedError` on both new paths (reciprocal-guard discipline). Pretest surfaces (`qug_test`, `stute_test`, `yatchew_hr_test`, joint variants, `did_had_pretest_workflow`) remain unweighted in this release — Phase 4.5 C / C0. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Weighted 2SLS (Phase 4.5 B)", "Event-study survey composition", and "Sup-t multiplier bootstrap" for derivations and invariants.
13
13
- **`HeterogeneousAdoptionDiD.fit(survey=..., weights=...)` on continuous-dose paths (Phase 4.5 survey support).** The `continuous_at_zero` (paper Design 1') and `continuous_near_d_lower` (Design 1 continuous-near-d̲) designs accept survey weights through two interchangeable kwargs: `weights=<array>` (pweight shortcut, weighted-robust SE from the CCT-2014 lprobust port) and `survey=SurveyDesign(weights, strata, psu, fpc)` (design-based inference via Binder-TSL variance using the existing `compute_survey_if_variance` helper at `diff_diff/survey.py:1802`). Point estimates match across both entry paths; SE diverges by design (pweight-only vs PSU-aggregated). `HeterogeneousAdoptionDiDResults.survey_metadata` is a repo-standard `SurveyMetadata` dataclass (weight_type / effective_n / design_effect / sum_weights / weight_range / n_strata / n_psu / df_survey); HAD-specific extras (`variance_formula` label, `effective_dose_mean`) are separate top-level result fields. `to_dict()` surfaces the full `SurveyMetadata` object plus `variance_formula` + `effective_dose_mean`; `summary()` renders `variance_formula`, `effective_n`, `effective_dose_mean`, and (when the survey= path is used) `df_survey`; `__repr__` surfaces `variance_formula` + `effective_dose_mean` when present. The HAD `mass_point` design and `aggregate="event_study"` path raise `NotImplementedError` under survey/weights (deferred to Phase 4.5 B: weighted 2SLS + event-study survey composition); the HAD pretests stay unweighted in this release (Phase 4.5 C). Parity ceiling acknowledged — no public weighted-CCF bias-corrected local-linear reference exists in any language; methodology confidence comes from (1) uniform-weights bit-parity at `atol=1e-14` on the full lprobust output struct, (2) cross-language weighted-OLS parity (manual R reference) at `atol=1e-12`, and (3) Monte Carlo oracle consistency on known-τ DGPs. `_nprobust_port.lprobust` gains `weights=` and `return_influence=` (used internally by the Binder-TSL path); `bias_corrected_local_linear` removes the Phase 1c `NotImplementedError` on `weights=` and forwards. Auto-bandwidth selection remains unweighted in this release — pass `h`/`b` explicitly for weight-aware bandwidths. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Weighted extension (Phase 4.5 survey support)".
14
14
- **`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test` + `StuteJointResult`** (HeterogeneousAdoptionDiD Phase 3 follow-up). Joint Cramér-von Mises pretests across K horizons with shared-η Mammen wild bootstrap (preserves vector-valued empirical-process unit-level dependence per Delgado-Manteiga 2001 / Hlávka-Hušková 2020). The core `stute_joint_pretest` is residuals-in; two thin data-in wrappers construct per-horizon residuals for the two nulls the paper spells out: mean-independence (step 2 pre-trends, `OLS(Y_t − Y_base ~ 1)` per pre-period) and linearity (step 3 joint, `OLS(Y_t − Y_base ~ 1 + D)` per post-period). Sum-of-CvMs aggregation (`S_joint = Σ_k S_k`); per-horizon scale-invariant exact-linear short-circuit. Closes the paper Section 4.2 step-2 gap that Phase 3 `did_had_pretest_workflow` previously flagged with an "Assumption 7 pre-trends test NOT run" caveat. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Joint Stute tests" for algorithm, invariants, and scope exclusion of Eq 18 linear-trend detrending (deferred to Phase 4 Pierce-Schott replication).
Copy file name to clipboardExpand all lines: docs/methodology/REGISTRY.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2373,7 +2373,7 @@ Tuning-parameter-free test of `H_0: d̲ = 0` versus `H_1: d̲ > 0`. Shipped in `
2373
2373
1.**Extreme order statistics are not smooth functionals of the empirical CDF.** Standard survey machinery (Binder-TSL linearization via `compute_survey_if_variance`, Rao-Wu rescaled bootstrap via `bootstrap_utils.generate_rao_wu_weights`, Krieger-Pfeffermann (1997) EDF tests for complex surveys) all rely on Hadamard differentiability of the test statistic in the empirical CDF. The first two order statistics are NOT differentiable functionals — small perturbations to F near zero produce O(1) shifts in `D_{(1)}`. None of the standard survey-bootstrap or linearization tools give a calibrated test for QUG.
2374
2374
2.**The `Exp(1)/Exp(1)` limit law assumes iid sampling with smooth density at zero.** Under cluster sampling, `D_{(1)}` and `D_{(2)}` may both come from the same PSU, breaking the independence required for the Poisson-process limit of rescaled spacings near the boundary. Under stratification, the smallest dose may come from a small stratum that's systematically over- or under-sampled, biasing the test.
2375
2375
3.**The literature on EVT under unequal-probability sampling is sparse.** Quintos et al. (2001) and Beirlant et al. cover tail-INDEX estimation under unequal sample sizes. There is no off-the-shelf method for "test the support endpoint under complex sampling" in the standard survey-statistics toolkit. Adapting Hill / Pickands / DEdH estimators to the boundary problem would be novel research, not engineering. The de Chaisemartin et al. (2026) paper itself does not discuss survey extensions of QUG.
2376
-
The survey-compatible alternative for HAD pretesting is **joint Stute** (a CvM cusum of regression residuals) — a smooth functional of the empirical CDF for which Krieger-Pfeffermann (1997) + Rao-Wu rescaled bootstrap give a calibrated survey-aware test. Phase 4.5 C ships this on `stute_test`, `yatchew_hr_test`, and the joint variants.
2376
+
The survey-compatible alternative for HAD pretesting is **joint Stute** (a CvM cusum of regression residuals) — a smooth functional of the empirical CDF for which Krieger-Pfeffermann (1997) + Rao-Wu rescaled bootstrap give a calibrated survey-aware test. Phase 4.5 C ships survey support for the linearity family with mechanism varying by test: Rao-Wu rescaled bootstrap for `stute_test` and the joint variants (`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`); weighted OLS residuals + weighted variance estimator for `yatchew_hr_test` (Yatchew 1997 is a closed-form variance-ratio test, not bootstrap-based).
2377
2377
**Research direction (out of scope for diff-diff):** the bridge IS sketchable by combining (a) endpoint-estimation EVT under iid (Hall 1982, Aarssen-de Haan 1994, Hall-Wang 1999, Beirlant-de Wet-Goegebeur 2006); (b) survey-aware functional CLT for the empirical process (Boistard-Lopuhaä-Ruiz-Gazen 2017, Bertail-Chautru-Clémençon 2017); and (c) tail-empirical-process theory (Drees 2003) to define a "design-effective boundary intensity" `λ_eff = Σ_h W_h · f_h(0+)`. Under a "no boundary clumping" assumption (`P(D_{(1)}, D_{(2)}` in same PSU `| both ≤ δ) → 0`), the `Exp(1)/Exp(1)` limit law's pivotality is preserved and only the calibration needs a survey-aware bootstrap (subsampling within strata per Politis-Romano-Wolf, or Bertail et al.'s design-aware bootstrap). This is publishable methodology research — one paper, ~6-12 months for a methods PhD student. If the bridge gets built and published externally, this gate can be revisited.
2378
2378
2379
2379
*Algorithm variant - TWFE linearity test via Stute (1997) Cramér-von Mises with wild bootstrap (Section 4.3, Appendix D):*
0 commit comments