Skip to content

Commit a9ec8bd

Browse files
igerberclaude
andcommitted
Conley Wave A: DiD support, combined cluster kernel, sparse k-d-tree, callable metric validation
Four mechanical extensions on top of the Phase 1+2 Conley sandwich (PR #426). All four touch the same surface (conley.py + linalg.py + estimators.py + twfe.py + tests/test_conley_vcov.py). - DiD.fit() accepts `unit=<col>` as a fit-time kwarg (NOT on __init__; unused unless Conley is set; not part of get_params/set_params). - Drops the prior fit-time redirect to MultiPeriodDiD. - Validates unit/lag_cutoff/coords/cutoff_km; rejects survey_design and wild_bootstrap with NotImplementedError. - On a 2-period panel, matches MPD(...).fit(post_periods=[1], reference_period=0) bit-exactly (atol=1e-10). - compute_robust_vcov / LinearRegression / TWFE / DiD now accept cluster_ids alongside vcov_type='conley'; meat applies K_total[i,j] = K_space(d_ij/h) * 1{c_i == c_j}. - Validator enforces cluster-time invariance on the panel block- decomposed path (cluster constant within unit across periods). - Per-slice mask construction (NOT full n×n) preserves memory on panel paths; serial-component mask is trivially all-ones under the invariance contract. - TWFE auto-cluster on Conley path still silently dropped; explicit cluster=<col> opts into combined kernel. DiD has no auto-cluster, so opt-in is fully explicit. DiD-vs-TWFE asymmetry documented. - linalg validator's conley + cluster_ids reject and twfe's explicit-cluster reject both dropped. - _compute_spatial_bartlett_meat_sparse uses scipy.spatial.cKDTree.query_ball_tree to build a CSR sparse kernel matrix instead of materializing the dense n×n distance. - Auto-activates for n > _CONLEY_SPARSE_N_THRESHOLD (5_000) AND metric in {haversine, euclidean} AND kernel == "bartlett". - Bartlett-only gate: bartlett at u=1 returns exactly 0 so the sparse path safely drops at-cutoff pairs; uniform at u=1 is 1 and would require closed-interval query semantics incompatible with haversine chord projection roundoff. - Haversine projects to 3-D unit-sphere Cartesian; chord query radius matches arc-length cutoff with a 1e-12 relative epsilon for projection roundoff; exact great-circle is recomputed only for in-range neighbors. - Private _conley_sparse: Optional[bool] kwarg controls the toggle (None=auto, True=force, False=force dense; True with unsupported config raises). - Bit-identity parity vs dense at atol=1e-10 on synthetic fixtures; R parity at atol=1e-6 preserved on all 3 panel R fixtures with _conley_sparse=True forced. - Renames _CONLEY_DENSE_WARN_N -> _CONLEY_DENSE_OOM_WARN_N (20_000) to disambiguate from the new 5_000 sparse threshold; warning text differentiates sparse-eligible vs dense-fallback paths. - _validate_callable_metric_result checks shape (n,n), finite, non-negative, symmetric within atol=1e-10. - Each failure raises a targeted ValueError naming the violated invariant. Previously, malformed callables produced opaque BLAS errors deep in the pipeline. Tests - tests/test_conley_vcov.py: 36 new tests across TestConleySparse, TestConleySparseRParityForced, TestConleyCluster, TestConleyDistanceMetrics extension. - Existing DiD-rejection / TWFE-explicit-cluster-reject / linalg-conley-cluster-reject tests flipped to behavioral asserts. - test_did_conley_matches_mpd_post_periods_1 locks the DiD-vs-MPD bit-exact agreement on a 2-period panel. - Full regression sweep (test_conley_vcov + test_linalg + test_estimators + test_estimators_vcov_type + test_methodology_twfe + test_linalg_hc2_bm): 511 passed. Docs - docs/methodology/REGISTRY.md: new "Combined spatial + cluster product kernel" subsection with math + cluster-time-invariance contract + two-limit-fixture anchors; new "Performance / scale" subsection on the sparse path; new "Callable conley_metric validation" subsection; updated panel-API restrictions table; DiD-vs-TWFE asymmetry paragraph. - CHANGELOG.md: Unreleased Wave A entry; Phase 1+2 entry's "DiD continues to raise" / "cluster_ids raises" text updated to reflect the lifted rejects. - diff_diff/guides/llms.txt + llms-full.txt: DiD support + combined kernel + sparse path documented; restrictions list refreshed. - README.md: catalog one-line refresh. - TODO.md: rows for the four Wave A items removed; rows 121 (Conley + survey/weights, Bertanha-Imbens 2014) and 122 (SyntheticDiD Conley path, spatial-block bootstrap) retained for later waves. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent d5e5021 commit a9ec8bd

14 files changed

Lines changed: 1644 additions & 200 deletions

File tree

CHANGELOG.md

Lines changed: 2 additions & 1 deletion
Large diffs are not rendered by default.

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,7 @@ Full guide: `diff_diff.get_llm_guide("practitioner")`.
124124
- [Honest DiD](https://diff-diff.readthedocs.io/en/stable/api/honest_did.html) - Rambachan & Roth (2023) sensitivity analysis: robust CI under PT violations, breakdown values
125125
- [Pre-Trends Power Analysis](https://diff-diff.readthedocs.io/en/stable/api/pretrends.html) - Roth (2022) minimum detectable violation and power curves
126126
- [Power Analysis](https://diff-diff.readthedocs.io/en/stable/api/power.html) - analytical and simulation-based MDE, sample size, power curves for study design
127-
- Conley spatial HAC SE (`vcov_type="conley"`) on cross-sectional `LinearRegression` / `compute_robust_vcov` plus panel `MultiPeriodDiD` / `TwoWayFixedEffects` (with `conley_lag_cutoff` for within-unit Bartlett temporal HAC) - Conley (1999) spatial-correlation-aware SEs with parity vs R `conleyreg` on cross-sectional + panel fixtures
127+
- Conley spatial HAC SE (`vcov_type="conley"`) on cross-sectional `LinearRegression` / `compute_robust_vcov` plus panel `DifferenceInDifferences` / `MultiPeriodDiD` / `TwoWayFixedEffects` (with `conley_lag_cutoff` for within-unit Bartlett temporal HAC) - Conley (1999) spatial-correlation-aware SEs with parity vs R `conleyreg` on cross-sectional + panel fixtures, optional combined spatial + cluster product kernel via explicit `cluster=`, auto-activating sparse k-d-tree fast path for `n > 5_000`
128128

129129
## Survey Support
130130

TODO.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -116,12 +116,8 @@ Deferred items from PR reviews that were not addressed before merge.
116116
| `HeterogeneousAdoptionDiD` time-varying dose on event study: Phase 2b REJECTS panels where `D_{g,t}` varies within a unit for `t >= F` (the aggregation uses `D_{g, F}` as the single regressor for all horizons, paper Appendix B.2 constant-dose convention). A follow-up PR could add a time-varying-dose estimator for these panels; current behavior is front-door rejection with a redirect to `ChaisemartinDHaultfoeuille`. | `diff_diff/had.py::_validate_had_panel_event_study` | Phase 2b | Low |
117117
| `HeterogeneousAdoptionDiD` repeated-cross-section support: paper Section 2 defines HAD on panel OR repeated cross-section, but Phase 2a is panel-only. RCS inputs (disjoint unit IDs between periods) are rejected by the balanced-panel validator with the generic "unit(s) do not appear in both periods" error. A follow-up PR will add an RCS identification path based on pre/post cell means (rather than unit-level first differences), with its own validator and a distinct `data_mode` / API surface. | `diff_diff/had.py::_validate_had_panel`, `diff_diff/had.py::_aggregate_first_difference` | Phase 2a | Medium |
118118
| SyntheticDiD: bootstrap cross-language parity anchor against R's default `synthdid::vcov(method="bootstrap")` (refit; rebinds `opts` per draw) or Julia `Synthdid.jl::src/vcov.jl::bootstrap_se` (refit by construction). Same-library validation (placebo-SE tracking, AER §6.3 MC truth) is in place; a cross-language anchor is desirable to bolster the methodology contract. Julia is the cleanest target — minimal wrapping work and refit-native vcov. Tolerance target: 1e-6 on Monte Carlo samples (different BLAS + RNG paths preclude 1e-10). The R-parity fixture from the previous release was deleted because it pinned the now-removed fixed-weight path. | `benchmarks/R/`, `benchmarks/julia/`, `tests/` | follow-up | Low |
119-
| `DifferenceInDifferences(vcov_type="conley")` support: Phase 2 lifted the panel rejection on `MultiPeriodDiD` / `TwoWayFixedEffects` but kept DiD rejected because `DiD.fit()` has no `unit` column declaration. A follow-up could either add a `unit` kwarg to `DiD.__init__` / `.fit()` and wire the block-decomposed sandwich, or document the redirect to `MultiPeriodDiD` permanently. | `diff_diff/estimators.py::DifferenceInDifferences.fit` | follow-up (spillover-conley) | Low |
120-
| Conley + cluster_ids combined product kernel `K_space(d_ij/h) · 1{cluster_i = cluster_j}`. Deferred to a follow-up PR per the Phase 2 scope decision (methodology PR only); tracked here for the next Conley wave. Currently raises `NotImplementedError` at the linalg validator (cross-sectional Conley + cluster) and at `TwoWayFixedEffects.fit` when the user sets `cluster=` explicitly. | `linalg.py::_validate_vcov_args`, `twfe.py::TwoWayFixedEffects.fit` | follow-up (spillover-conley) | Medium |
121-
| Conley sparse k-d-tree fast path via `scipy.spatial.cKDTree.query_ball_tree` for `n > 20_000`; lifts the dense O(n²) `UserWarning` that fires at the validator. Kernel must be compact-support (bartlett or uniform); callable metric not supported in the fast path. Performance-only; semantics unchanged. | `diff_diff/conley.py::_pairwise_distance_matrix`, `_compute_conley_vcov` | follow-up (spillover-conley) | Low |
122119
| Conley + survey weights / `survey_design`. Score-reweighted meat `s_i = w_i · X_i · ε_i` is mechanical, but PSU clustering interaction with the spatial kernel and replicate-weights variance under spatial correlation are non-trivial (Bertanha-Imbens 2014 covers cluster-sample but not the explicit Conley case). Phase 5 of the spillover-conley initiative; paper review prerequisite. Currently raises `NotImplementedError` at the linalg validator. | `linalg.py::_validate_vcov_args` | Phase 5 (spillover-conley) | Medium |
123120
| `SyntheticDiD(vcov_type="conley")` support. Currently raises `TypeError` at `__init__` because SyntheticDiD uses `variance_method ∈ {bootstrap, jackknife, placebo}` rather than the analytical sandwich that Conley plugs into. Wiring would require either reimplementing an analytical sandwich path for SyntheticDiD or designing a spatial-block bootstrap (new methodology, Politis-Romano 1994 territory). | `synthetic_did.py::SyntheticDiD` | follow-up (spillover-conley) | Low |
124-
| Validate user-supplied callable `conley_metric` for shape `(n, n)`, finiteness, non-negativity, and symmetry. Currently `np.asarray(metric(coords, coords))` is accepted unchecked; a malformed callable produces opaque matmul errors and a non-symmetric distance matrix produces a non-symmetric vcov. CI reviewer flagged as P2 M3 in PR #(spillover-conley). | `diff_diff/conley.py::_pairwise_distance_matrix`, `_compute_conley_vcov` | follow-up (spillover-conley) | Low |
125121

126122
#### Performance
127123

0 commit comments

Comments
 (0)