Skip to content

Commit 252dfba

Browse files
igerberclaude
andcommitted
Address PR #432 R1: narrow docs to match shipped tests + TODO cleanup
REGISTRY § Note (Stute stratified survey-bootstrap calibration): - Drop the "TestStuteCalibrationShift via git worktree + tools/dump_stute_baseline.py" pre/post baseline language. The committed test is a finite + range smoke; the architectural lock against any revert of apply_stratum_centering lives in the helper bit-parity test (atol=1e-14) and applies end-to-end regardless of which call site consumes it. - Drop the "FPC=20/stratum" claim from the MC oracle/power description. The MC fixture builds weights+strata+PSU (no fpc column); the FPC bake-in is covered by the helper-level test_fpc_baked_in_helper_is_fpc_agnostic unit test, separately cited in the narrowed wording. CHANGELOG: mirror the narrowed MC + helper-bit-parity language. tests/test_had_pretests.py: rename test_calibration_shift_direction_pin_non_strata to test_calibration_shift_non_strata_end_to_end_smoke; rewrite the docstring to describe the actual finite + range assertion and the helper-level bit-parity lock that catches reverts architecturally. TODO.md row 101: stratified Stute support is now SHIPPED via this PR. Narrow the row to the still-open follow-ups: replicate-weight designs (BRR/Fay/JK1/JKn/SDR) and lonely_psu='adjust'+singleton strata on the Stute family (pseudo-stratum centering derivation — same gap as HAD sup-t at REGISTRY:2382). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 62f64db commit 252dfba

4 files changed

Lines changed: 23 additions & 29 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
88
## [Unreleased]
99

1010
### Added
11-
- **HAD pretest workflow: stratified survey-design support (Phase 4.5 C continuation).** Lifts the `NotImplementedError` gate on `SurveyDesign(strata=...)` in `stute_test` (`had_pretests.py:1927-1940` pre-PR) and `stute_joint_pretest` (`:3259-3271` pre-PR), and by inheritance in `joint_pretrends_test`, `joint_homogeneity_test`, and `did_had_pretest_workflow` (the wrappers delegate to the joint Stute helper). Implements the standard stratified clustered wild bootstrap (Cameron, Gelbach & Miller 2008; Davidson & Flachaire 2008 §2.3; Djogbenou, MacKinnon & Nielsen 2019 Theorem 2; Kreiss & Lahiri 2012): within-stratum demean + `sqrt(n_h/(n_h-1))` Bessel rescale (Wu 1986; Liu 1988) applied to the PSU multipliers `psu_mults` BEFORE the per-obs broadcast `eta_obs = psu_mults[b, psu_col_idx]` in the wild-residual loop. Bootstrap CvM variance matches the analytical Binder-TSL stratified target `V_S = sum_h (1 - f_h) (n_h / (n_h - 1)) sum_j (psi_hj - psi_h_bar)²` exactly (the `(1 - f_h)` FPC factor was already baked in by `generate_survey_multiplier_weights_batch`; this PR bakes the remaining `(n_h / (n_h - 1))` factor and enforces within-stratum-mean-zero centering). New shared helper `bootstrap_utils.apply_stratum_centering(psu_mults, resolved_survey, psu_ids, psu_axis=...)` is called from both the new Stute path (psu_axis=1 on the multiplier matrix) AND the existing HAD sup-t event-study cband bootstrap (psu_axis=0 on the PSU-aggregated influence tensor; refactored bit-exactly from the inline block previously at `had.py:2172-2204`). Locks the algebraic identity architecturally instead of leaving parallel code blocks to drift. MC oracle consistency validated under a 4-stratum × 8-PSU/stratum stratified null DGP (200 seeded draws, empirical Type I at α=0.05 in `[0, 0.10]` — 3σ band); MC power validated under a known-alternative stratified DGP (rejection > 0.50). HAD sup-t event-study cband bit-parity preserved (`atol=1e-14, rtol=1e-14` on the refactored helper output + 29 existing cband tests passing post-refactor). See `docs/methodology/REGISTRY.md` § HeterogeneousAdoptionDiD — "Note (Stute stratified survey-bootstrap calibration)" for the full derivation. Remaining deferrals: `lonely_psu='adjust'` + singleton-strata (same pseudo-stratum centering gap as the HAD sup-t deviation at REGISTRY:2382) and replicate-weight designs (BRR/Fay/JK1/JKn/SDR — separate Rao-Wu / JKn bootstrap composition). Unblocks the realistic survey-weighted HAD workflow on BRFSS/CPS/NHANES/ACS-shaped designs.
11+
- **HAD pretest workflow: stratified survey-design support (Phase 4.5 C continuation).** Lifts the `NotImplementedError` gate on `SurveyDesign(strata=...)` in `stute_test` (`had_pretests.py:1927-1940` pre-PR) and `stute_joint_pretest` (`:3259-3271` pre-PR), and by inheritance in `joint_pretrends_test`, `joint_homogeneity_test`, and `did_had_pretest_workflow` (the wrappers delegate to the joint Stute helper). Implements the standard stratified clustered wild bootstrap (Cameron, Gelbach & Miller 2008; Davidson & Flachaire 2008 §2.3; Djogbenou, MacKinnon & Nielsen 2019 Theorem 2; Kreiss & Lahiri 2012): within-stratum demean + `sqrt(n_h/(n_h-1))` Bessel rescale (Wu 1986; Liu 1988) applied to the PSU multipliers `psu_mults` BEFORE the per-obs broadcast `eta_obs = psu_mults[b, psu_col_idx]` in the wild-residual loop. Bootstrap CvM variance matches the analytical Binder-TSL stratified target `V_S = sum_h (1 - f_h) (n_h / (n_h - 1)) sum_j (psi_hj - psi_h_bar)²` exactly (the `(1 - f_h)` FPC factor was already baked in by `generate_survey_multiplier_weights_batch`; this PR bakes the remaining `(n_h / (n_h - 1))` factor and enforces within-stratum-mean-zero centering). New shared helper `bootstrap_utils.apply_stratum_centering(psu_mults, resolved_survey, psu_ids, psu_axis=...)` is called from both the new Stute path (psu_axis=1 on the multiplier matrix) AND the existing HAD sup-t event-study cband bootstrap (psu_axis=0 on the PSU-aggregated influence tensor; refactored bit-exactly from the inline block previously at `had.py:2172-2204`). Locks the algebraic identity architecturally instead of leaving parallel code blocks to drift. MC oracle consistency validated under a 4-stratum × 8-PSU/stratum stratified null DGP with weights+strata+PSU (200 seeded draws, empirical Type I at α=0.05 in `[0, 0.10]` — 3σ band; the FPC bake-in is covered separately by the helper-unit test `test_fpc_baked_in_helper_is_fpc_agnostic`); MC power validated under a known-alternative stratified DGP (rejection > 0.50). HAD sup-t event-study cband bit-parity preserved (`atol=1e-14, rtol=1e-14` on the refactored helper output + 29 existing cband tests passing post-refactor; that same helper-unit bit-parity test is what locks any revert of `apply_stratum_centering` end-to-end, regardless of which call site consumes it). See `docs/methodology/REGISTRY.md` § HeterogeneousAdoptionDiD — "Note (Stute stratified survey-bootstrap calibration)" for the full derivation. Remaining deferrals: `lonely_psu='adjust'` + singleton-strata (same pseudo-stratum centering gap as the HAD sup-t deviation at REGISTRY:2382) and replicate-weight designs (BRR/Fay/JK1/JKn/SDR — separate Rao-Wu / JKn bootstrap composition). Unblocks the realistic survey-weighted HAD workflow on BRFSS/CPS/NHANES/ACS-shaped designs.
1212
- **Conley (1999) spatial-HAC standard errors via `vcov_type="conley"`** on cross-sectional `LinearRegression` / `compute_robust_vcov` plus panel `MultiPeriodDiD` / `TwoWayFixedEffects` (Phases 1 and 2 of the spillover-conley initiative). **Cross-sectional contract:** `conley_coords` (n × 2 array of lat/lon or projected coords), `conley_cutoff_km=<float>` (positive finite bandwidth in km for haversine, or coord units for euclidean — REQUIRED, no default per the no-silent-failures contract), `conley_metric="haversine"|"euclidean"|callable` (default `"haversine"`; great-circle uses Earth's mean radius 6371.01 km matching R `conleyreg`), `conley_kernel="bartlett"|"uniform"` (default `"bartlett"`; both kernels emit a `UserWarning` if the resulting meat has a materially negative eigenvalue — neither the radial 1-D Bartlett nor the uniform kernel is formally PSD-guaranteed; Conley 1999's explicit PSD formula is the 2-D separable lattice product window at Eq 3.14). Cross-sectional variance estimator `Var̂(β) = (X'X)^{-1} · ( Σ_{i,j} K(d_ij/h) · X_i ε_i ε_j X_j' ) · (X'X)^{-1}` (Conley 1999 Eq 4.2). **Panel contract (Phase 2, new):** Three new co-required kwargs `conley_time` (n-length array), `conley_unit` (n-length array), and `conley_lag_cutoff=<int>` (non-negative; 0 means within-period spatial only, no serial component) switch into the **block-decomposed panel sandwich** that matches R `conleyreg` with `lag_cutoff > 0`: `XeeX_total = Σ_t (within-period spatial sandwich) + Σ_u (within-unit Bartlett temporal sandwich, lag ∈ {1..L}, same-time excluded)`. This is NOT a multiplicative product kernel — verified empirically against `conleyreg::time_dist` and `XeeXhC` at ~1e-14 on the panel parity fixtures. The temporal kernel is hardcoded Bartlett `(1 - |lag|/(L+1))` regardless of `conley_kernel`, mirroring `conleyreg::time_dist.cpp`; documented as a `Note (deviation from R-symmetric API)` in REGISTRY. **Panel estimator wire-up (Phase 2):** `MultiPeriodDiD(vcov_type="conley", conley_lag_cutoff=...).fit(..., unit=...)` and `TwoWayFixedEffects(vcov_type="conley", conley_lag_cutoff=...).fit(..., unit=...)` lift the Phase 1 fit-time rejection; the `conley_time` and `conley_unit` arrays are auto-derived from the existing `time` and `unit` column-name arguments at fit-time. **`DifferenceInDifferences(vcov_type="conley")` continues to raise `NotImplementedError`** because `DiD.fit()` has no `unit` column declaration; the error message redirects users to `MultiPeriodDiD` / `TwoWayFixedEffects`. **Other constraints (Phase 1, unchanged):** `SyntheticDiD(vcov_type="conley")` raises `TypeError` (uses bootstrap variance, not analytical sandwich); `set_params` mirrors the constructor rejection. `vcov_type="conley"` + `cluster_ids=` / `weights=` / `survey_design=` raises `NotImplementedError` (combined product kernel + Bertanha-Imbens 2014 weighted-Conley deferred to follow-up PRs). TWFE's default auto-cluster on the Conley path is silently dropped (the explicit `cluster=` raises with a clear redirect). `inference="wild_bootstrap"` + Conley raises (incompatible inference modes). `n > 20_000` emits a `UserWarning` about the dense O(n²) distance-matrix memory; sparse k-d-tree fast path is a follow-up. **Implementation:** Helpers live in `diff_diff/conley.py` (`_haversine_km`, `_pairwise_distance_matrix`, `_bartlett_kernel`, `_uniform_kernel`, `_validate_conley_kwargs`, `_compute_conley_vcov` — the validator and sandwich helper now accept keyword-only `time` / `unit` / `lag_cutoff` for the panel path); `compute_robust_vcov` in `diff_diff/linalg.py` threads the new kwargs through. **R `conleyreg` parity (Düsterhöft 2021, CRAN v0.1.9)** on **six** benchmark fixtures (`benchmarks/data/r_conleyreg_conley_golden.json`, regenerable via `benchmarks/R/generate_conley_golden.R`): 3 cross-sectional (Phase 1) + 3 new panel fixtures (`panel_haversine_lag1`, `panel_haversine_lag2`, `panel_lat_lon_realistic_lag1`; n_units × T = 60×3, 80×5, 100×4 at lag={1,2,1}); observed max abs diff ~5.7e-16. Earth radius 6371.01 km matches `conleyreg::haversine_dist`. Test file `tests/test_conley_vcov.py` skips parity cleanly when the JSON is absent. New REGISTRY section `## ConleySpatialHAC`. Subsequent phases of the spillover-conley initiative (sparse k-d-tree fast path for `n > 20_000`; Conley + `cluster_ids` combined kernel; ring-indicator spillover-aware DiD per Butts 2021; survey-design / replicate-weight support; `SyntheticDiD` Conley path) are tracked in `TODO.md` under "Tech Debt from Code Reviews" → spillover-conley rows.
1313
- **Tutorial 21: HAD Pre-test Workflow** (`docs/tutorials/21_had_pretest_workflow.ipynb`) — composite pre-test walkthrough for `HeterogeneousAdoptionDiD` building on Tutorial 20's brand-campaign framing. Uses a 60-DMA × 8-week panel close in shape to T20's but with the dose distribution drawn from `Uniform[$0.01K, $50K]` (vs T20's `[$5K, $50K]`); the true support is strictly positive but very near zero, chosen so the QUG step in `did_had_pretest_workflow` fails-to-reject `H0: d_lower = 0` in this finite sample and the verdict text fires the load-bearing "Assumption 7 deferred" pivot for the upgrade-arc narrative. (HAD's `design="auto"` selector — a separate min/median heuristic at `had.py::_detect_design`, NOT the QUG p-value — independently lands on the `continuous_at_zero` identification path with target `WAS` on this panel because `d.min() < 0.01 * median(|d|)`. The QUG test and the design selector are independent rules that point to the same identification path here.) Walks through three surfaces: (a) `did_had_pretest_workflow(aggregate="overall")` on a two-period collapse, where the verdict explicitly flags Step 2 (Assumption 7 pre-trends) as not run because a single pre-period structurally cannot support a pre-trends test, and the structural fields `pretrends_joint` / `homogeneity_joint` are both `None`; (b) `did_had_pretest_workflow(aggregate="event_study")` on the full multi-period panel, where the verdict reads "TWFE admissible under Section 4 assumptions" because all three testable diagnostics (QUG + joint pre-trends Stute over 3 horizons + joint homogeneity Stute over 4 horizons) fail-to-reject — non-rejection evidence under finite-sample power and test specification, not proof that the identifying assumptions hold; and (c) a side panel exercising both `yatchew_hr_test` null modes — `null="linearity"` (default, paper Theorem 7) vs `null="mean_independence"` (Phase 4 R-parity with R `YatchewTest::yatchew_test(order=0)`) — on the within-pre-period first-difference paired with post-period dose, illustrating the stricter null's larger residual variance (`sigma2_lin` 7.01 vs 6.53) and smaller p-value (0.29 vs 0.49). Companion drift-test file `tests/test_t21_had_pretest_workflow_drift.py` (16 tests pinning panel composition, both verdict pivots, structural anchors on both paths, deterministic QUG / Yatchew statistics, bootstrap p-value tolerance bands per `feedback_bootstrap_drift_tests_need_backend_tolerance`, and `HAD(design="auto")` resolution to `continuous_at_zero` on this panel). T20's "Composite pretest workflow" Extensions bullet updated with a forward-pointer to T21. T22 weighted/survey HAD tutorial remains queued as a separate notebook PR.
1414
- **`ChaisemartinDHaultfoeuille.by_path` and `paths_of_interest` now compose with `survey_design`** for analytical Binder TSL SE and replicate-weight bootstrap variance. The `NotImplementedError` gate at `chaisemartin_dhaultfoeuille.py:1233-1239` is replaced by a per-path multiplier-bootstrap-only gate (`survey_design + n_bootstrap > 0` under by_path / paths_of_interest still raises, since the survey-aware perturbation pivot for path-restricted IFs is methodologically underived). Per-path SE routes through the existing `_survey_se_from_group_if` cell-period allocator: the per-period IF (`U_pp_l_path`) is built with non-path switcher-side contributions skipped (control contributions are unchanged, matching the joiners/leavers IF convention; preserves the row-sum identity `U_pp.sum(axis=1) == U`), cohort-recentered via `_cohort_recenter_per_period`, then expanded to observations as `psi_i = U_pp[g_i, t_i] · (w_i / W_{g_i, t_i})`. Replicate-weight designs unconditionally use the cell allocator (Class A contract from PR #323). New `_refresh_path_inference` helper post-call refreshes `safe_inference` on every populated entry across `multi_horizon_inference`, `placebo_horizon_inference`, `path_effects`, and `path_placebos` so all four surfaces use the same final `df_survey` after per-path replicate fits append `n_valid` to the shared accumulator. Path-enumeration ranking under `survey_design` remains unweighted (group-cardinality, not population-weight mass). Lonely-PSU policy stays sample-wide, not per-path. Telescope invariant: on a single-path panel, per-path SE matches the global non-by_path survey SE bit-exactly. **No R parity** — R `did_multiplegt_dyn` does not support survey weighting; this is a Python-only methodology extension. The global non-by_path TSL multiplier-bootstrap path is unaffected (anti-regression test `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSurveyDesignAnalytical::test_global_survey_plus_n_bootstrap_still_works` locks the per-path-only scope of the new gate). Cross-surface invariants regression-tested at `TestByPathSurveyDesignAnalytical` (~17 tests across gate / dispatch / analytical SE / replicate-weight SE / per-path placebos / `trends_linear` composition / unobserved-path warnings / final-df refresh regressions) and `TestByPathSurveyDesignTelescope`. See `docs/methodology/REGISTRY.md` §`ChaisemartinDHaultfoeuille` `Note (Phase 3 by_path ...)` → "Per-path survey-design SE" for the full contract.

0 commit comments

Comments
 (0)