Skip to content

Commit 80b5710

Browse files
igerberclaude
andcommitted
Address PR #432 R2: synthesis framing on citations + wired-in regression for Stute call sites + n_psu spec fix
P2 #1 (citation overclaim): Davidson-Flachaire (2008) and Kreiss-Lahiri (2012) are cited as ingredients being synthesized for the stratified-PSU multiplier-centering rule, NOT as direct papers on this exact composition. Rephrase docs/references.rst entries to describe what each paper actually covers (wild bootstrap for heteroskedastic regression; bootstrap for time series) and to flag the stratified-survey extension as the library synthesis. Add an explicit "Citations below are ingredients, not direct papers on this exact composition" preface to REGISTRY § "Note (Stute stratified survey-bootstrap calibration)". P2 #2 (regression-coverage overclaim + spec drift): - The helper bit-parity regression at TestApplyStratumCentering only covers axis-0 algebra; it does NOT catch a future change that removes the apply_stratum_centering(..., psu_axis=1) calls from had_pretests.py:1985 (stute_test) or :3312 (stute_joint_pretest). Add test_stute_call_sites_invoke_apply_stratum_centering that monkey-patches the helper and asserts both call sites invoke it with psu_axis=1. Closes the disconnection-detection gap surfaced by R2 review. - CHANGELOG + REGISTRY documented n_psu_per_stratum=8 but the actual MC fixtures pass n_psu_per_stratum=6. Update docs to match fixture (the helper default of 8 is unchanged; only the slow MC test overrides differ). - REGISTRY + CHANGELOG wording updated to cite BOTH regressions (helper bit-parity for axis-0 algebra + wired-in monkeypatch for call-site connection) so the documentation matches what the test suite actually catches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 252dfba commit 80b5710

4 files changed

Lines changed: 83 additions & 7 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
88
## [Unreleased]
99

1010
### Added
11-
- **HAD pretest workflow: stratified survey-design support (Phase 4.5 C continuation).** Lifts the `NotImplementedError` gate on `SurveyDesign(strata=...)` in `stute_test` (`had_pretests.py:1927-1940` pre-PR) and `stute_joint_pretest` (`:3259-3271` pre-PR), and by inheritance in `joint_pretrends_test`, `joint_homogeneity_test`, and `did_had_pretest_workflow` (the wrappers delegate to the joint Stute helper). Implements the standard stratified clustered wild bootstrap (Cameron, Gelbach & Miller 2008; Davidson & Flachaire 2008 §2.3; Djogbenou, MacKinnon & Nielsen 2019 Theorem 2; Kreiss & Lahiri 2012): within-stratum demean + `sqrt(n_h/(n_h-1))` Bessel rescale (Wu 1986; Liu 1988) applied to the PSU multipliers `psu_mults` BEFORE the per-obs broadcast `eta_obs = psu_mults[b, psu_col_idx]` in the wild-residual loop. Bootstrap CvM variance matches the analytical Binder-TSL stratified target `V_S = sum_h (1 - f_h) (n_h / (n_h - 1)) sum_j (psi_hj - psi_h_bar)²` exactly (the `(1 - f_h)` FPC factor was already baked in by `generate_survey_multiplier_weights_batch`; this PR bakes the remaining `(n_h / (n_h - 1))` factor and enforces within-stratum-mean-zero centering). New shared helper `bootstrap_utils.apply_stratum_centering(psu_mults, resolved_survey, psu_ids, psu_axis=...)` is called from both the new Stute path (psu_axis=1 on the multiplier matrix) AND the existing HAD sup-t event-study cband bootstrap (psu_axis=0 on the PSU-aggregated influence tensor; refactored bit-exactly from the inline block previously at `had.py:2172-2204`). Locks the algebraic identity architecturally instead of leaving parallel code blocks to drift. MC oracle consistency validated under a 4-stratum × 8-PSU/stratum stratified null DGP with weights+strata+PSU (200 seeded draws, empirical Type I at α=0.05 in `[0, 0.10]` — 3σ band; the FPC bake-in is covered separately by the helper-unit test `test_fpc_baked_in_helper_is_fpc_agnostic`); MC power validated under a known-alternative stratified DGP (rejection > 0.50). HAD sup-t event-study cband bit-parity preserved (`atol=1e-14, rtol=1e-14` on the refactored helper output + 29 existing cband tests passing post-refactor; that same helper-unit bit-parity test is what locks any revert of `apply_stratum_centering` end-to-end, regardless of which call site consumes it). See `docs/methodology/REGISTRY.md` § HeterogeneousAdoptionDiD — "Note (Stute stratified survey-bootstrap calibration)" for the full derivation. Remaining deferrals: `lonely_psu='adjust'` + singleton-strata (same pseudo-stratum centering gap as the HAD sup-t deviation at REGISTRY:2382) and replicate-weight designs (BRR/Fay/JK1/JKn/SDR — separate Rao-Wu / JKn bootstrap composition). Unblocks the realistic survey-weighted HAD workflow on BRFSS/CPS/NHANES/ACS-shaped designs.
11+
- **HAD pretest workflow: stratified survey-design support (Phase 4.5 C continuation).** Lifts the `NotImplementedError` gate on `SurveyDesign(strata=...)` in `stute_test` (`had_pretests.py:1927-1940` pre-PR) and `stute_joint_pretest` (`:3259-3271` pre-PR), and by inheritance in `joint_pretrends_test`, `joint_homogeneity_test`, and `did_had_pretest_workflow` (the wrappers delegate to the joint Stute helper). Implements the standard stratified clustered wild bootstrap (Cameron, Gelbach & Miller 2008; Davidson & Flachaire 2008 §2.3; Djogbenou, MacKinnon & Nielsen 2019 Theorem 2; Kreiss & Lahiri 2012): within-stratum demean + `sqrt(n_h/(n_h-1))` Bessel rescale (Wu 1986; Liu 1988) applied to the PSU multipliers `psu_mults` BEFORE the per-obs broadcast `eta_obs = psu_mults[b, psu_col_idx]` in the wild-residual loop. Bootstrap CvM variance matches the analytical Binder-TSL stratified target `V_S = sum_h (1 - f_h) (n_h / (n_h - 1)) sum_j (psi_hj - psi_h_bar)²` exactly (the `(1 - f_h)` FPC factor was already baked in by `generate_survey_multiplier_weights_batch`; this PR bakes the remaining `(n_h / (n_h - 1))` factor and enforces within-stratum-mean-zero centering). New shared helper `bootstrap_utils.apply_stratum_centering(psu_mults, resolved_survey, psu_ids, psu_axis=...)` is called from both the new Stute path (psu_axis=1 on the multiplier matrix) AND the existing HAD sup-t event-study cband bootstrap (psu_axis=0 on the PSU-aggregated influence tensor; refactored bit-exactly from the inline block previously at `had.py:2172-2204`). Locks the algebraic identity architecturally instead of leaving parallel code blocks to drift. MC oracle consistency validated under a 4-stratum × 6-PSU/stratum stratified null DGP with weights+strata+PSU (200 seeded draws, empirical Type I at α=0.05 in `[0, 0.10]` — 3σ band; the FPC bake-in is covered separately by the helper-unit test `test_fpc_baked_in_helper_is_fpc_agnostic`); MC power validated under a known-alternative stratified DGP (rejection > 0.50). HAD sup-t event-study cband bit-parity preserved (`atol=1e-14, rtol=1e-14` on the refactored helper output + 29 existing cband tests passing post-refactor; that helper-level bit-parity test locks the axis-0 algebra). A separate wired-in regression at `tests/test_had_pretests.py::TestStuteStratifiedSurveyBootstrap::test_stute_call_sites_invoke_apply_stratum_centering` monkey-patches the helper and asserts both Stute call sites (`stute_test` at `had_pretests.py:1985` and `stute_joint_pretest` at `:3312`) invoke it with `psu_axis=1` — that test fails if either call site is disconnected (the axis-0 helper-parity test alone does not catch that case). See `docs/methodology/REGISTRY.md` § HeterogeneousAdoptionDiD — "Note (Stute stratified survey-bootstrap calibration)" for the full derivation. Remaining deferrals: `lonely_psu='adjust'` + singleton-strata (same pseudo-stratum centering gap as the HAD sup-t deviation at REGISTRY:2382) and replicate-weight designs (BRR/Fay/JK1/JKn/SDR — separate Rao-Wu / JKn bootstrap composition). Unblocks the realistic survey-weighted HAD workflow on BRFSS/CPS/NHANES/ACS-shaped designs.
1212
- **Conley (1999) spatial-HAC standard errors via `vcov_type="conley"`** on cross-sectional `LinearRegression` / `compute_robust_vcov` plus panel `MultiPeriodDiD` / `TwoWayFixedEffects` (Phases 1 and 2 of the spillover-conley initiative). **Cross-sectional contract:** `conley_coords` (n × 2 array of lat/lon or projected coords), `conley_cutoff_km=<float>` (positive finite bandwidth in km for haversine, or coord units for euclidean — REQUIRED, no default per the no-silent-failures contract), `conley_metric="haversine"|"euclidean"|callable` (default `"haversine"`; great-circle uses Earth's mean radius 6371.01 km matching R `conleyreg`), `conley_kernel="bartlett"|"uniform"` (default `"bartlett"`; both kernels emit a `UserWarning` if the resulting meat has a materially negative eigenvalue — neither the radial 1-D Bartlett nor the uniform kernel is formally PSD-guaranteed; Conley 1999's explicit PSD formula is the 2-D separable lattice product window at Eq 3.14). Cross-sectional variance estimator `Var̂(β) = (X'X)^{-1} · ( Σ_{i,j} K(d_ij/h) · X_i ε_i ε_j X_j' ) · (X'X)^{-1}` (Conley 1999 Eq 4.2). **Panel contract (Phase 2, new):** Three new co-required kwargs `conley_time` (n-length array), `conley_unit` (n-length array), and `conley_lag_cutoff=<int>` (non-negative; 0 means within-period spatial only, no serial component) switch into the **block-decomposed panel sandwich** that matches R `conleyreg` with `lag_cutoff > 0`: `XeeX_total = Σ_t (within-period spatial sandwich) + Σ_u (within-unit Bartlett temporal sandwich, lag ∈ {1..L}, same-time excluded)`. This is NOT a multiplicative product kernel — verified empirically against `conleyreg::time_dist` and `XeeXhC` at ~1e-14 on the panel parity fixtures. The temporal kernel is hardcoded Bartlett `(1 - |lag|/(L+1))` regardless of `conley_kernel`, mirroring `conleyreg::time_dist.cpp`; documented as a `Note (deviation from R-symmetric API)` in REGISTRY. **Panel estimator wire-up (Phase 2):** `MultiPeriodDiD(vcov_type="conley", conley_lag_cutoff=...).fit(..., unit=...)` and `TwoWayFixedEffects(vcov_type="conley", conley_lag_cutoff=...).fit(..., unit=...)` lift the Phase 1 fit-time rejection; the `conley_time` and `conley_unit` arrays are auto-derived from the existing `time` and `unit` column-name arguments at fit-time. **`DifferenceInDifferences(vcov_type="conley")` continues to raise `NotImplementedError`** because `DiD.fit()` has no `unit` column declaration; the error message redirects users to `MultiPeriodDiD` / `TwoWayFixedEffects`. **Other constraints (Phase 1, unchanged):** `SyntheticDiD(vcov_type="conley")` raises `TypeError` (uses bootstrap variance, not analytical sandwich); `set_params` mirrors the constructor rejection. `vcov_type="conley"` + `cluster_ids=` / `weights=` / `survey_design=` raises `NotImplementedError` (combined product kernel + Bertanha-Imbens 2014 weighted-Conley deferred to follow-up PRs). TWFE's default auto-cluster on the Conley path is silently dropped (the explicit `cluster=` raises with a clear redirect). `inference="wild_bootstrap"` + Conley raises (incompatible inference modes). `n > 20_000` emits a `UserWarning` about the dense O(n²) distance-matrix memory; sparse k-d-tree fast path is a follow-up. **Implementation:** Helpers live in `diff_diff/conley.py` (`_haversine_km`, `_pairwise_distance_matrix`, `_bartlett_kernel`, `_uniform_kernel`, `_validate_conley_kwargs`, `_compute_conley_vcov` — the validator and sandwich helper now accept keyword-only `time` / `unit` / `lag_cutoff` for the panel path); `compute_robust_vcov` in `diff_diff/linalg.py` threads the new kwargs through. **R `conleyreg` parity (Düsterhöft 2021, CRAN v0.1.9)** on **six** benchmark fixtures (`benchmarks/data/r_conleyreg_conley_golden.json`, regenerable via `benchmarks/R/generate_conley_golden.R`): 3 cross-sectional (Phase 1) + 3 new panel fixtures (`panel_haversine_lag1`, `panel_haversine_lag2`, `panel_lat_lon_realistic_lag1`; n_units × T = 60×3, 80×5, 100×4 at lag={1,2,1}); observed max abs diff ~5.7e-16. Earth radius 6371.01 km matches `conleyreg::haversine_dist`. Test file `tests/test_conley_vcov.py` skips parity cleanly when the JSON is absent. New REGISTRY section `## ConleySpatialHAC`. Subsequent phases of the spillover-conley initiative (sparse k-d-tree fast path for `n > 20_000`; Conley + `cluster_ids` combined kernel; ring-indicator spillover-aware DiD per Butts 2021; survey-design / replicate-weight support; `SyntheticDiD` Conley path) are tracked in `TODO.md` under "Tech Debt from Code Reviews" → spillover-conley rows.
1313
- **Tutorial 21: HAD Pre-test Workflow** (`docs/tutorials/21_had_pretest_workflow.ipynb`) — composite pre-test walkthrough for `HeterogeneousAdoptionDiD` building on Tutorial 20's brand-campaign framing. Uses a 60-DMA × 8-week panel close in shape to T20's but with the dose distribution drawn from `Uniform[$0.01K, $50K]` (vs T20's `[$5K, $50K]`); the true support is strictly positive but very near zero, chosen so the QUG step in `did_had_pretest_workflow` fails-to-reject `H0: d_lower = 0` in this finite sample and the verdict text fires the load-bearing "Assumption 7 deferred" pivot for the upgrade-arc narrative. (HAD's `design="auto"` selector — a separate min/median heuristic at `had.py::_detect_design`, NOT the QUG p-value — independently lands on the `continuous_at_zero` identification path with target `WAS` on this panel because `d.min() < 0.01 * median(|d|)`. The QUG test and the design selector are independent rules that point to the same identification path here.) Walks through three surfaces: (a) `did_had_pretest_workflow(aggregate="overall")` on a two-period collapse, where the verdict explicitly flags Step 2 (Assumption 7 pre-trends) as not run because a single pre-period structurally cannot support a pre-trends test, and the structural fields `pretrends_joint` / `homogeneity_joint` are both `None`; (b) `did_had_pretest_workflow(aggregate="event_study")` on the full multi-period panel, where the verdict reads "TWFE admissible under Section 4 assumptions" because all three testable diagnostics (QUG + joint pre-trends Stute over 3 horizons + joint homogeneity Stute over 4 horizons) fail-to-reject — non-rejection evidence under finite-sample power and test specification, not proof that the identifying assumptions hold; and (c) a side panel exercising both `yatchew_hr_test` null modes — `null="linearity"` (default, paper Theorem 7) vs `null="mean_independence"` (Phase 4 R-parity with R `YatchewTest::yatchew_test(order=0)`) — on the within-pre-period first-difference paired with post-period dose, illustrating the stricter null's larger residual variance (`sigma2_lin` 7.01 vs 6.53) and smaller p-value (0.29 vs 0.49). Companion drift-test file `tests/test_t21_had_pretest_workflow_drift.py` (16 tests pinning panel composition, both verdict pivots, structural anchors on both paths, deterministic QUG / Yatchew statistics, bootstrap p-value tolerance bands per `feedback_bootstrap_drift_tests_need_backend_tolerance`, and `HAD(design="auto")` resolution to `continuous_at_zero` on this panel). T20's "Composite pretest workflow" Extensions bullet updated with a forward-pointer to T21. T22 weighted/survey HAD tutorial remains queued as a separate notebook PR.
1414
- **`ChaisemartinDHaultfoeuille.by_path` and `paths_of_interest` now compose with `survey_design`** for analytical Binder TSL SE and replicate-weight bootstrap variance. The `NotImplementedError` gate at `chaisemartin_dhaultfoeuille.py:1233-1239` is replaced by a per-path multiplier-bootstrap-only gate (`survey_design + n_bootstrap > 0` under by_path / paths_of_interest still raises, since the survey-aware perturbation pivot for path-restricted IFs is methodologically underived). Per-path SE routes through the existing `_survey_se_from_group_if` cell-period allocator: the per-period IF (`U_pp_l_path`) is built with non-path switcher-side contributions skipped (control contributions are unchanged, matching the joiners/leavers IF convention; preserves the row-sum identity `U_pp.sum(axis=1) == U`), cohort-recentered via `_cohort_recenter_per_period`, then expanded to observations as `psi_i = U_pp[g_i, t_i] · (w_i / W_{g_i, t_i})`. Replicate-weight designs unconditionally use the cell allocator (Class A contract from PR #323). New `_refresh_path_inference` helper post-call refreshes `safe_inference` on every populated entry across `multi_horizon_inference`, `placebo_horizon_inference`, `path_effects`, and `path_placebos` so all four surfaces use the same final `df_survey` after per-path replicate fits append `n_valid` to the shared accumulator. Path-enumeration ranking under `survey_design` remains unweighted (group-cardinality, not population-weight mass). Lonely-PSU policy stays sample-wide, not per-path. Telescope invariant: on a single-path panel, per-path SE matches the global non-by_path survey SE bit-exactly. **No R parity** — R `did_multiplegt_dyn` does not support survey weighting; this is a Python-only methodology extension. The global non-by_path TSL multiplier-bootstrap path is unaffected (anti-regression test `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSurveyDesignAnalytical::test_global_survey_plus_n_bootstrap_still_works` locks the per-path-only scope of the new gate). Cross-surface invariants regression-tested at `TestByPathSurveyDesignAnalytical` (~17 tests across gate / dispatch / analytical SE / replicate-weight SE / per-path placebos / `trends_linear` composition / unobserved-path warnings / final-df refresh regressions) and `TestByPathSurveyDesignTelescope`. See `docs/methodology/REGISTRY.md` §`ChaisemartinDHaultfoeuille` `Note (Phase 3 by_path ...)` → "Per-path survey-design SE" for the full contract.

0 commit comments

Comments
 (0)