You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
R1 P0 — Stute survey path silently accepted zero-weight units, which
leak into the dose-variation check + CvM cusum + bootstrap refit while
contributing zero population mass. Extreme case: only zero-weight units
carry dose variation -> spurious finite test statistic with no warning.
Fix: strictly-positive guards on every survey-aware Stute / Yatchew /
workflow entry point (the weights= shortcut already had this; survey=
branch was the gap).
R1 P1 #1 — aweight/fweight survey designs slipped through pweight-only
formulas silently (the variance components are derived assuming pweight
sandwich semantics). Fix: weight_type='pweight' guards added in
_resolve_pretest_unit_weights and on every direct-helper survey= branch
(stute_test, yatchew_hr_test, stute_joint_pretest). Mirrors HAD.fit
guard at had.py:2976 + survey._resolve_pweight_only at survey.py:914.
R1 P1 #2 — workflow's row-level weights= crashed on staggered event-
study panels because _validate_multi_period_panel filters to last
cohort but the joint wrappers re-aggregate with the original full-
panel weights array. Fix: subset joint_weights to data_filtered's
rows via data.index.get_indexer(data_filtered.index) BEFORE passing
to the wrappers. Mirrors HeterogeneousAdoptionDiD.fit positional-
index pattern. Survey= path is unaffected (column references resolve
internally on data_filtered).
R1 P3 — REGISTRY C0 note still said "the same gate applies to
did_had_pretest_workflow" and "Phase 4.5 C uses Rao-Wu rescaling"; both
are stale post-C. Updated to clarify (a) workflow gate was temporary
and is now closed by C, (b) qug_test direct-helper gate remains
permanent, (c) C uses PSU-level Mammen multiplier bootstrap (NOT
Rao-Wu rescaling).
7 new tests in TestPhase45CR1Regressions covering: zero-weight survey
on stute_test / stute_joint_pretest / workflow; aweight rejection on
stute_test / workflow; fweight rejection on yatchew_hr_test; staggered
event-study workflow with weights= (catches the length-mismatch crash).
165 pretest tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: docs/methodology/REGISTRY.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2425,11 +2425,11 @@ Tuning-parameter-free test of `H_0: d̲ = 0` versus `H_1: d̲ > 0`. Shipped in `
2425
2425
4. Theorem 4 establishes: asymptotic size `α`; uniform consistency against fixed alternatives; local power at rate `G` on the class `F^{d̲,d̄}_{m,K}` of differentiable cdfs with positive density and Lipschitz derivative.
2426
2426
5. Li et al. (2024, Theorem 2.4) implies the QUG test is asymptotically independent of the WAS / TWFE estimator, so conditional inference on WAS given non-rejection does not distort inference (asymptotically; the paper's Footnote 8 notes the extension to triangular arrays is conjectured but not proven).
2427
2427
-**Note:** Implementation is `O(G)` via `np.partition`; no sort required.
2428
-
-**Note (Phase 4.5 C0):**`qug_test(..., survey=...)` and `qug_test(..., weights=...)` raise `NotImplementedError` permanently (Phase 4.5 C0 decision gate, 2026-04). The same gate applies to `did_had_pretest_workflow(..., survey=...)` / `weights=`. Three reasons survey extension is genuinely hard, not "we just haven't done the lit review":
2428
+
-**Note (Phase 4.5 C0):**`qug_test(..., survey=...)` and `qug_test(..., weights=...)` raise `NotImplementedError`**permanently** (Phase 4.5 C0 decision gate, 2026-04 -- direct-helper gate is permanent). The Phase 4.5 C0 release also gated `did_had_pretest_workflow(..., survey=...)` / `weights=` with `NotImplementedError`, but that workflow gate was **temporary**: Phase 4.5 C (PR #370, 2026-04) replaces it with functional dispatch that skips the QUG step with `UserWarning` and runs the linearity family with the survey-aware mechanism (see Note (Phase 4.5 C) below for the full algorithm). Direct callers of `qug_test` still get the permanent rejection. Three reasons QUG-under-survey is genuinely hard, not "we just haven't done the lit review":
2429
2429
1.**Extreme order statistics are not smooth functionals of the empirical CDF.** Standard survey machinery (Binder-TSL linearization via `compute_survey_if_variance`, Rao-Wu rescaled bootstrap via `bootstrap_utils.generate_rao_wu_weights`, Krieger-Pfeffermann (1997) EDF tests for complex surveys) all rely on Hadamard differentiability of the test statistic in the empirical CDF. The first two order statistics are NOT differentiable functionals — small perturbations to F near zero produce O(1) shifts in `D_{(1)}`. None of the standard survey-bootstrap or linearization tools give a calibrated test for QUG.
2430
2430
2.**The `Exp(1)/Exp(1)` limit law assumes iid sampling with smooth density at zero.** Under cluster sampling, `D_{(1)}` and `D_{(2)}` may both come from the same PSU, breaking the independence required for the Poisson-process limit of rescaled spacings near the boundary. Under stratification, the smallest dose may come from a small stratum that's systematically over- or under-sampled, biasing the test.
2431
2431
3.**The literature on EVT under unequal-probability sampling is sparse.** Quintos et al. (2001) and Beirlant et al. cover tail-INDEX estimation under unequal sample sizes. There is no off-the-shelf method for "test the support endpoint under complex sampling" in the standard survey-statistics toolkit. Adapting Hill / Pickands / DEdH estimators to the boundary problem would be novel research, not engineering. The de Chaisemartin et al. (2026) paper itself does not discuss survey extensions of QUG.
2432
-
The survey-compatible alternative for HAD pretesting is **joint Stute** (a CvM cusum of regression residuals) — a smooth functional of the empirical CDF for which Krieger-Pfeffermann (1997) + Rao-Wu rescaled bootstrap give a calibrated survey-aware test. Phase 4.5 C ships survey support for the linearity family with mechanism varying by test: Rao-Wu rescaled bootstrap for `stute_test` and the joint variants (`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`); weighted OLS residuals + weighted variance estimator for `yatchew_hr_test` (Yatchew 1997 is a closed-form variance-ratio test, not bootstrap-based).
2432
+
The survey-compatible alternative for HAD pretesting is **joint Stute** (a CvM cusum of regression residuals) — a smooth functional of the empirical CDF for which Krieger-Pfeffermann (1997) + a survey-aware multiplier bootstrap give a calibrated test. Phase 4.5 C (PR #370) ships survey support for the linearity family — the **PSU-level Mammen multiplier bootstrap** for `stute_test` and the joint variants (NOT Rao-Wu rescaling — multiplier bootstrap is a different mechanism), and **closed-form weighted OLS + pweight-sandwich variance components** for `yatchew_hr_test`. See the dedicated Note (Phase 4.5 C) below for the full algorithm.
2433
2433
**Research direction (out of scope for diff-diff):** the bridge IS sketchable by combining (a) endpoint-estimation EVT under iid (Hall 1982, Aarssen-de Haan 1994, Hall-Wang 1999, Beirlant-de Wet-Goegebeur 2006); (b) survey-aware functional CLT for the empirical process (Boistard-Lopuhaä-Ruiz-Gazen 2017, Bertail-Chautru-Clémençon 2017); and (c) tail-empirical-process theory (Drees 2003) to define a "design-effective boundary intensity" `λ_eff = Σ_h W_h · f_h(0+)`. Under a "no boundary clumping" assumption (`P(D_{(1)}, D_{(2)}` in same PSU `| both ≤ δ) → 0`), the `Exp(1)/Exp(1)` limit law's pivotality is preserved and only the calibration needs a survey-aware bootstrap (subsampling within strata per Politis-Romano-Wolf, or Bertail et al.'s design-aware bootstrap). This is publishable methodology research — one paper, ~6-12 months for a methods PhD student. If the bridge gets built and published externally, this gate can be revisited.
2434
2434
-**Note (Phase 4.5 C):**`stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`, and `did_had_pretest_workflow` accept `weights=` and `survey=ResolvedSurveyDesign` kwargs (or `survey=SurveyDesign` for the data-in entries). Mechanism varies by test:
2435
2435
- **Stute family** (`stute_test`, `stute_joint_pretest`, joint wrappers) uses **PSU-level Mammen multiplier bootstrap** via `bootstrap_utils.generate_survey_multiplier_weights_batch` (the same kernel as PR #363's HAD event-study sup-t bootstrap). Each replicate draws an `(n_bootstrap, n_psu)` Mammen multiplier matrix; multipliers broadcast to per-obs perturbation `eta_obs[g] = eta_psu[psu(g)]`. The bootstrap residual perturbation is `dy_b = fitted + eps * w * eta_obs`, followed by weighted OLS refit and weighted CvM recompute via `_cvm_statistic_weighted`. Joint Stute SHARES the multiplier matrix across horizons within each replicate, preserving both the vector-valued empirical-process unit-level dependence (Delgado 1993; Escanciano 2006) AND PSU clustering (Krieger-Pfeffermann 1997). PSU-shared multipliers are conservative under no-within-PSU outcome correlation (over-clustering gives conservative size in finite samples), asymptotically correct under the standard survey assumption that PSU is the ultimate sampling unit AND outcomes correlate within PSU. The pweight `weights=` shortcut routes through a synthetic trivial `ResolvedSurveyDesign` (constructed via `survey._make_trivial_resolved`) so the kernel is shared across both entry paths. NOT "Rao-Wu rescaled bootstrap" — different mechanism (the Rao-Wu kernel rescales per-unit weights via stratified PSU resampling, while this kernel applies multipliers without resampling).
0 commit comments