Skip to content

Commit 3728831

Browse files
igerberclaude
andcommitted
Reject Conley on panel estimators; remove dead conley result-class wiring
Address CI Codex review of PR #411 (P1#1 + P1#2): P1#1 — Panel estimators with vcov_type="conley" silently produced wrong SE because cross-sectional Conley over (unit, time) rows treated same- unit cross-time pairs as d_ij=0 -> K=1, mishandling the space-time HAC. Phase 1 supports cross-sectional Conley only; reject panel fits at fit-time on DifferenceInDifferences, TwoWayFixedEffects, and MultiPeriodDiD with NotImplementedError. Practitioners pre-collapse to per-unit first-differences and call compute_robust_vcov directly. Phase 2 will add the space-time product kernel (Driscoll-Kraay) and lift the rejection. Granular Conley-arg validation collapsed into the single unconditional reject (cluster/absorb/coords/cutoff combinations all hit the same path). P1#2 — conley_metric was dropped at the result boundary and _format_vcov_label hard-coded "km" for the cutoff label even when metric was "euclidean". With panels rejected, the conley_cutoff_km / conley_kernel fields on DiDResults / MultiPeriodDiDResults are now unreachable; remove the dead fields, the dead arg passes from estimators.py / twfe.py, and the dead "conley" branch in _format_vcov_label. Tests added: TWFE / DiD / MPD panel-rejection regressions, including a repeated-coords-across-periods regression per the CI reviewer's recommendation. 70 Conley tests + 401 targeted regression tests pass. REGISTRY / CHANGELOG / llms.txt / README / TODO updated to reflect that the only supported Phase 1 Conley path is direct LinearRegression / compute_robust_vcov on a single-period design. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent f30e1f7 commit 3728831

10 files changed

Lines changed: 197 additions & 306 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
88
## [Unreleased]
99

1010
### Added
11-
- **Conley (1999) spatial-HAC standard errors via `vcov_type="conley"`** on `DifferenceInDifferences`, `TwoWayFixedEffects`, and `MultiPeriodDiD` (Phase 1 of the spillover-conley initiative). New keyword-only kwargs on `__init__`: `conley_coords=(<lat_col>, <lon_col>)` (column-name tuple from `data`), `conley_cutoff_km=<float>` (positive finite bandwidth in km for haversine, or coord units for euclidean — REQUIRED, no default per the no-silent-failures contract), `conley_metric="haversine"|"euclidean"|callable` (default `"haversine"`; great-circle uses Earth's mean radius 6371.01 km matching R `conleyreg`), `conley_kernel="bartlett"|"uniform"` (default `"bartlett"` is PSD-guaranteed; `"uniform"` emits `UserWarning` if the meat has a materially negative eigenvalue per Conley 1999 footnote 11). Variance estimator `Var̂(β) = (X'X)^{-1} · ( Σ_{i,j} K(d_ij/h) · X_i ε_i ε_j X_j' ) · (X'X)^{-1}` (Conley 1999 Eq 4.2). FWL composes cleanly because the meat depends only on scores `X·ε`, both of which within-transformation preserves — `TwoWayFixedEffects(vcov_type="conley", ...)` is supported, UNLIKE `hc2`/`hc2_bm` which need the full hat matrix. TWFE auto-cluster-at-unit is disabled when `vcov_type="conley"`; explicit `cluster=` raises `NotImplementedError` (combined product kernel deferred to Phase 2). `n > 20_000` emits a `UserWarning` about the dense O(n²) distance-matrix memory; sparse k-d-tree fast path is queued for Phase 2. `SyntheticDiD(vcov_type="conley")` raises `TypeError` (uses bootstrap variance, not analytical sandwich); `set_params` mirrors the constructor rejection. `vcov_type="conley"` + `weights=` / `survey_design=` / `absorb=` raises `NotImplementedError` (Bertanha-Imbens 2014 weighted-Conley + arbitrary FE projection are deferred to follow-up phases). `TwoWayFixedEffects(vcov_type="conley", inference="wild_bootstrap")` raises `NotImplementedError` (Conley analytical spatial-HAC and wild cluster bootstrap are different inference paths). Helpers live in new module `diff_diff/conley.py` (`_haversine_km`, `_pairwise_distance_matrix`, `_bartlett_kernel`, `_uniform_kernel`, `_validate_conley_kwargs`, `_compute_conley_vcov`); `compute_robust_vcov` in `diff_diff/linalg.py` imports the dispatch helpers. R `conleyreg` parity (Düsterhöft 2021, CRAN v0.1.9) on three benchmark fixtures (`benchmarks/data/r_conleyreg_conley_golden.json`, regenerable via `benchmarks/R/generate_conley_golden.R`); observed max abs diff 5.7e-16. Earth radius 6371.01 km matches `conleyreg::haversine_dist`. Test file `tests/test_conley_vcov.py` skips parity cleanly when the JSON is absent. `result.summary()` prints `"Conley spatial HAC (bartlett, cutoff=200.0km)"` via the extended `_format_vcov_label` helper. New REGISTRY section `## ConleySpatialHAC`. Tracked on `BRIEFING.md` as Phase 1 of the 6-phase initiative (Phase 2: two-way space×time + sparse fast path; Phase 3: ring-indicator spillover-aware DiD per Butts 2021; Phase 4a/4b: mechanical extension to IF-aggregation and sandwich-derived estimators; Phase 5: survey design support).
11+
- **Conley (1999) spatial-HAC standard errors via `vcov_type="conley"`** on cross-sectional `LinearRegression` / `compute_robust_vcov` (Phase 1 of the spillover-conley initiative). Keyword arguments: `conley_coords` (n × 2 array of lat/lon or projected coords), `conley_cutoff_km=<float>` (positive finite bandwidth in km for haversine, or coord units for euclidean — REQUIRED, no default per the no-silent-failures contract), `conley_metric="haversine"|"euclidean"|callable` (default `"haversine"`; great-circle uses Earth's mean radius 6371.01 km matching R `conleyreg`), `conley_kernel="bartlett"|"uniform"` (default `"bartlett"` is PSD-guaranteed; `"uniform"` emits `UserWarning` if the meat has a materially negative eigenvalue per Conley 1999 footnote 11). Variance estimator `Var̂(β) = (X'X)^{-1} · ( Σ_{i,j} K(d_ij/h) · X_i ε_i ε_j X_j' ) · (X'X)^{-1}` (Conley 1999 Eq 4.2). **Panel estimators (`DifferenceInDifferences`, `TwoWayFixedEffects`, `MultiPeriodDiD`) reject `vcov_type="conley"` at fit-time with `NotImplementedError`** — Phase 1's cross-sectional Conley does not handle the time dimension. Applying it over (unit, time) rows would treat same-unit cross-time pairs as `d_ij = 0 → K = 1`, mishandling the space-time HAC. Practitioners needing Conley with a panel design should pre-collapse to per-unit first-differences and call `compute_robust_vcov` directly on a single-period regression. Phase 2 will add the space-time product kernel (Driscoll-Kraay) for full panel support. `SyntheticDiD(vcov_type="conley")` raises `TypeError` (uses bootstrap variance, not analytical sandwich); `set_params` mirrors the constructor rejection. `vcov_type="conley"` + `cluster_ids=` / `weights=` / `survey_design=` raises `NotImplementedError` (combined product kernel + Bertanha-Imbens 2014 weighted-Conley deferred to follow-up phases). `n > 20_000` emits a `UserWarning` about the dense O(n²) distance-matrix memory; sparse k-d-tree fast path is queued for Phase 2. Helpers live in new module `diff_diff/conley.py` (`_haversine_km`, `_pairwise_distance_matrix`, `_bartlett_kernel`, `_uniform_kernel`, `_validate_conley_kwargs`, `_compute_conley_vcov`); `compute_robust_vcov` in `diff_diff/linalg.py` imports the dispatch helpers. R `conleyreg` parity (Düsterhöft 2021, CRAN v0.1.9) on three benchmark fixtures (`benchmarks/data/r_conleyreg_conley_golden.json`, regenerable via `benchmarks/R/generate_conley_golden.R`); observed max abs diff 5.7e-16. Earth radius 6371.01 km matches `conleyreg::haversine_dist`. Test file `tests/test_conley_vcov.py` skips parity cleanly when the JSON is absent. New REGISTRY section `## ConleySpatialHAC`. Tracked on `BRIEFING.md` as Phase 1 of the 6-phase initiative (Phase 2: space-time product kernel + sparse fast path + panel-estimator support; Phase 3: ring-indicator spillover-aware DiD per Butts 2021; Phase 4a/4b: mechanical extension to IF-aggregation and sandwich-derived estimators; Phase 5: survey design support).
1212
- **`ChaisemartinDHaultfoeuille.by_path` and `paths_of_interest` now compose with `survey_design`** for analytical Binder TSL SE and replicate-weight bootstrap variance. The `NotImplementedError` gate at `chaisemartin_dhaultfoeuille.py:1233-1239` is replaced by a per-path multiplier-bootstrap-only gate (`survey_design + n_bootstrap > 0` under by_path / paths_of_interest still raises, since the survey-aware perturbation pivot for path-restricted IFs is methodologically underived). Per-path SE routes through the existing `_survey_se_from_group_if` cell-period allocator: the per-period IF (`U_pp_l_path`) is built with non-path switcher-side contributions skipped (control contributions are unchanged, matching the joiners/leavers IF convention; preserves the row-sum identity `U_pp.sum(axis=1) == U`), cohort-recentered via `_cohort_recenter_per_period`, then expanded to observations as `psi_i = U_pp[g_i, t_i] · (w_i / W_{g_i, t_i})`. Replicate-weight designs unconditionally use the cell allocator (Class A contract from PR #323). New `_refresh_path_inference` helper post-call refreshes `safe_inference` on every populated entry across `multi_horizon_inference`, `placebo_horizon_inference`, `path_effects`, and `path_placebos` so all four surfaces use the same final `df_survey` after per-path replicate fits append `n_valid` to the shared accumulator. Path-enumeration ranking under `survey_design` remains unweighted (group-cardinality, not population-weight mass). Lonely-PSU policy stays sample-wide, not per-path. Telescope invariant: on a single-path panel, per-path SE matches the global non-by_path survey SE bit-exactly. **No R parity** — R `did_multiplegt_dyn` does not support survey weighting; this is a Python-only methodology extension. The global non-by_path TSL multiplier-bootstrap path is unaffected (anti-regression test `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSurveyDesignAnalytical::test_global_survey_plus_n_bootstrap_still_works` locks the per-path-only scope of the new gate). Cross-surface invariants regression-tested at `TestByPathSurveyDesignAnalytical` (~17 tests across gate / dispatch / analytical SE / replicate-weight SE / per-path placebos / `trends_linear` composition / unobserved-path warnings / final-df refresh regressions) and `TestByPathSurveyDesignTelescope`. See `docs/methodology/REGISTRY.md` §`ChaisemartinDHaultfoeuille` `Note (Phase 3 by_path ...)` → "Per-path survey-design SE" for the full contract.
1313
- **Inference-field aliases on staggered result classes** for adapter / external-consumer compatibility. Read-only `@property` aliases expose the flat `att` / `se` / `conf_int` / `p_value` / `t_stat` names (matching `DiDResults` / `TROPResults` / `SyntheticDiDResults` / `HeterogeneousAdoptionDiDResults`) on every result class that previously only carried prefixed canonical fields: `CallawaySantAnnaResults`, `StackedDiDResults`, `EfficientDiDResults`, `ChaisemartinDHaultfoeuilleResults`, `StaggeredTripleDiffResults`, `WooldridgeDiDResults`, `SunAbrahamResults`, `ImputationDiDResults`, `TwoStageDiDResults` (mapping to `overall_*`); `ContinuousDiDResults` (mapping to `overall_att_*`, ATT-side as the headline, ACRT-side accessible unchanged via `overall_acrt_*`); `MultiPeriodDiDResults` (mapping to `avg_*`). `ContinuousDiDResults` additionally exposes `overall_se` / `overall_conf_int` / `overall_p_value` / `overall_t_stat` aliases for naming consistency with the rest of the staggered family. Aliases are pure read-throughs over the canonical fields — no recomputation, no behavior change — so the `safe_inference()` joint-NaN contract (per CLAUDE.md "Inference computation") is inherited automatically (NaN canonical → NaN alias, locked at `tests/test_result_aliases.py::test_pattern_b_aliases_propagate_nan`). The native `overall_*` / `overall_att_*` / `avg_*` fields remain canonical for documentation and computation. Motivated by the `balance.interop.diff_diff.as_balance_diagnostic()` adapter (`facebookresearch/balance` PR #465) which calls `getattr(res, "se", None)` / `getattr(res, "conf_int", None)` without a fallback chain — pre-alias, every staggered result class returned `None` on those keys, silently dropping `se` and `conf_int` from the adapter's diagnostic dict. 23 alias-mechanic + balance-adapter regression tests at `tests/test_result_aliases.py`. Patch-level (additive on stable surfaces).
1414
- **`ChaisemartinDHaultfoeuille.by_path` + non-binary integer treatment** — `by_path=k` now accepts integer-coded discrete treatment (D in Z, e.g. ordinal `{0, 1, 2}`); path tuples become integer-state tuples like `(0, 2, 2, 2)`. The previous `NotImplementedError` gate at `chaisemartin_dhaultfoeuille.py:1870` is replaced by a `ValueError` for continuous D (e.g. `D=1.5`) at fit-time per the no-silent-failures contract — the existing `int(round(float(v)))` cast in `_enumerate_treatment_paths` is now defensive (no-op for integer-coded D). Validated against R `did_multiplegt_dyn(..., by_path)` for D in `{0, 1, 2}` via the new `multi_path_reversible_by_path_non_binary` golden-value scenario (78 switchers, 3 paths, single-baseline custom DGP, F_g >= 4): per-path point estimates match R bit-exactly (rtol ~1e-9 on event horizons; rtol+atol envelope for placebo near-zero values), per-path SE inherits the documented cross-path cohort-sharing deviation (~5% rtol observed; SE_RTOL=0.15 envelope). **Deviation from R for D >= 10:** R's `did_multiplegt_by_path` derives the per-path baseline via `path_index$baseline_XX <- substr(path_index$path, 1, 1)`, which captures only the first character of the comma-separated path string (e.g. for `path = "12,12,..."` it captures `"1"` instead of `"12"`); this mis-allocates R's per-path control-pool subset for D >= 10. Python's tuple-key matching is correct in this regime — the per-path point estimates we compute are correct; R's per-path subset for the same path is buggy. The shipped parity scenario stays in `D in {0, 1, 2}` to avoid the R bug. R-parity test at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathNonBinary`; cross-surface invariants regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathNonBinary`.

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,7 @@ Full guide: `diff_diff.get_llm_guide("practitioner")`.
124124
- [Honest DiD](https://diff-diff.readthedocs.io/en/stable/api/honest_did.html) - Rambachan & Roth (2023) sensitivity analysis: robust CI under PT violations, breakdown values
125125
- [Pre-Trends Power Analysis](https://diff-diff.readthedocs.io/en/stable/api/pretrends.html) - Roth (2022) minimum detectable violation and power curves
126126
- [Power Analysis](https://diff-diff.readthedocs.io/en/stable/api/power.html) - analytical and simulation-based MDE, sample size, power curves for study design
127-
- Conley spatial HAC SE (`vcov_type="conley"`) on DifferenceInDifferences/TwoWayFixedEffects/MultiPeriodDiD - Conley (1999) spatial-correlation-aware SEs with parity vs R `conleyreg`
127+
- Conley spatial HAC SE (`vcov_type="conley"`) on cross-sectional `LinearRegression` / `compute_robust_vcov` - Conley (1999) spatial-correlation-aware SEs with parity vs R `conleyreg`
128128

129129
## Survey Support
130130

TODO.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -113,13 +113,11 @@ Deferred items from PR reviews that were not addressed before merge.
113113
| `HeterogeneousAdoptionDiD` time-varying dose on event study: Phase 2b REJECTS panels where `D_{g,t}` varies within a unit for `t >= F` (the aggregation uses `D_{g, F}` as the single regressor for all horizons, paper Appendix B.2 constant-dose convention). A follow-up PR could add a time-varying-dose estimator for these panels; current behavior is front-door rejection with a redirect to `ChaisemartinDHaultfoeuille`. | `diff_diff/had.py::_validate_had_panel_event_study` | Phase 2b | Low |
114114
| `HeterogeneousAdoptionDiD` repeated-cross-section support: paper Section 2 defines HAD on panel OR repeated cross-section, but Phase 2a is panel-only. RCS inputs (disjoint unit IDs between periods) are rejected by the balanced-panel validator with the generic "unit(s) do not appear in both periods" error. A follow-up PR will add an RCS identification path based on pre/post cell means (rather than unit-level first differences), with its own validator and a distinct `data_mode` / API surface. | `diff_diff/had.py::_validate_had_panel`, `diff_diff/had.py::_aggregate_first_difference` | Phase 2a | Medium |
115115
| SyntheticDiD: bootstrap cross-language parity anchor against R's default `synthdid::vcov(method="bootstrap")` (refit; rebinds `opts` per draw) or Julia `Synthdid.jl::src/vcov.jl::bootstrap_se` (refit by construction). Same-library validation (placebo-SE tracking, AER §6.3 MC truth) is in place; a cross-language anchor is desirable to bolster the methodology contract. Julia is the cleanest target — minimal wrapping work and refit-native vcov. Tolerance target: 1e-6 on Monte Carlo samples (different BLAS + RNG paths preclude 1e-10). The R-parity fixture from the previous release was deleted because it pinned the now-removed fixed-weight path. | `benchmarks/R/`, `benchmarks/julia/`, `tests/` | follow-up | Low |
116-
| Conley + cluster_ids combined product kernel `K_space(d_ij/h) · 1{cluster_i = cluster_j}`. Phase 2 of the spillover-conley initiative will add this alongside the time-dimension extension (Driscoll-Kraay). Currently raises `NotImplementedError` at both the linalg validator and TWFE early-block. | `linalg.py::_validate_vcov_args`, `twfe.py`, `estimators.py` (DiD/MultiPeriodDiD `fit`) | Phase 2 (spillover-conley) | Medium |
117-
| Conley + survey weights / `survey_design`. Score-reweighted meat `s_i = w_i · X_i · ε_i` is mechanical, but PSU clustering interaction with the spatial kernel and replicate-weights variance under spatial correlation are non-trivial (Bertanha-Imbens 2014 covers cluster-sample but not the explicit Conley case). Phase 5 of the spillover-conley initiative; paper review prerequisite. Currently raises `NotImplementedError`. | `linalg.py::_validate_vcov_args`, `twfe.py`, `estimators.py` | Phase 5 (spillover-conley) | Medium |
118-
| Conley + `absorb=` (arbitrary FE projection beyond TWFE's two-FE within-transformation). FWL composability is proven analytically for TWFE's fixed two-FE design but not formally verified for arbitrary `absorb` dimensions; conservatively rejected at fit-time with a redirect to `fixed_effects=` dummies. Lift after empirical verification on multi-FE within-transformations. | `estimators.py::DifferenceInDifferences.fit`, `MultiPeriodDiD.fit` | follow-up (spillover-conley) | Low |
116+
| Conley space-time product kernel + panel-estimator wire-up. Phase 1 rejects `vcov_type="conley"` on `DifferenceInDifferences`, `TwoWayFixedEffects`, `MultiPeriodDiD` at fit-time because cross-sectional Conley over (unit, time) rows mishandles same-unit cross-time pairs (`d_ij = 0 → K = 1`). Phase 2 will add `K(d_ij, |s-t|) = K_space(d_ij/h_space) · K_time(|s-t|/h_time)` (Driscoll-Kraay) and lift the rejection. | `linalg.py`, `conley.py`, `estimators.py`, `twfe.py` | Phase 2 (spillover-conley) | High |
117+
| Conley + cluster_ids combined product kernel `K_space(d_ij/h) · 1{cluster_i = cluster_j}`. Phase 2 of the spillover-conley initiative will add this alongside the time-dimension extension. Currently raises `NotImplementedError` at the linalg validator (cross-sectional Conley + cluster). | `linalg.py::_validate_vcov_args` | Phase 2 (spillover-conley) | Medium |
118+
| Conley + survey weights / `survey_design`. Score-reweighted meat `s_i = w_i · X_i · ε_i` is mechanical, but PSU clustering interaction with the spatial kernel and replicate-weights variance under spatial correlation are non-trivial (Bertanha-Imbens 2014 covers cluster-sample but not the explicit Conley case). Phase 5 of the spillover-conley initiative; paper review prerequisite. Currently raises `NotImplementedError` at the linalg validator. | `linalg.py::_validate_vcov_args` | Phase 5 (spillover-conley) | Medium |
119119
| `SyntheticDiD(vcov_type="conley")` support. Currently raises `TypeError` at `__init__` because SyntheticDiD uses `variance_method ∈ {bootstrap, jackknife, placebo}` rather than the analytical sandwich that Conley plugs into. Wiring would require either reimplementing an analytical sandwich path for SyntheticDiD or designing a spatial-block bootstrap (new methodology, Politis-Romano 1994 territory). | `synthetic_did.py::SyntheticDiD` | follow-up (spillover-conley) | Low |
120120
| Validate user-supplied callable `conley_metric` for shape `(n, n)`, finiteness, non-negativity, and symmetry. Currently `np.asarray(metric(coords, coords))` is accepted unchecked; a malformed callable produces opaque matmul errors and a non-symmetric distance matrix produces a non-symmetric vcov. CI reviewer flagged as P2 M3 in PR #(spillover-conley). | `diff_diff/conley.py::_pairwise_distance_matrix`, `_compute_conley_vcov` | follow-up (spillover-conley) | Low |
121-
| Extract common Conley estimator-level validation helper. Today `cluster=`, `survey_design=`, `conley_coords=`, and `conley_cutoff_km=` checks are duplicated across `DifferenceInDifferences.fit` (estimators.py:~370-400), `MultiPeriodDiD.fit` (estimators.py:~1395-1455), and `TwoWayFixedEffects.fit` (twfe.py:~165-205). A future Conley-feature change risks updating one estimator but not the others. CI reviewer flagged as P2 MT1. | `diff_diff/estimators.py`, `diff_diff/twfe.py` | follow-up (spillover-conley) | Low |
122-
| Strengthen `tests/test_conley_vcov.py::TestConleyTWFE::test_twfe_conley_FWL_invariance` to actually verify FWL equivalence between TWFE-within Conley and a full-dummy-FE design (build the dummy regression explicitly and compare the ATT coefficient + Conley SE). The current test only asserts both fits produce finite SEs — the name overstates the assertion. CI reviewer flagged as P2 DT3. | `tests/test_conley_vcov.py` | follow-up (spillover-conley) | Low |
123121

124122
#### Performance
125123

0 commit comments

Comments
 (0)