You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reject Conley on panel estimators; remove dead conley result-class wiring
Address CI Codex review of PR #411 (P1#1 + P1#2):
P1#1 — Panel estimators with vcov_type="conley" silently produced wrong
SE because cross-sectional Conley over (unit, time) rows treated same-
unit cross-time pairs as d_ij=0 -> K=1, mishandling the space-time HAC.
Phase 1 supports cross-sectional Conley only; reject panel fits at
fit-time on DifferenceInDifferences, TwoWayFixedEffects, and
MultiPeriodDiD with NotImplementedError. Practitioners pre-collapse to
per-unit first-differences and call compute_robust_vcov directly. Phase
2 will add the space-time product kernel (Driscoll-Kraay) and lift the
rejection. Granular Conley-arg validation collapsed into the single
unconditional reject (cluster/absorb/coords/cutoff combinations all
hit the same path).
P1#2 — conley_metric was dropped at the result boundary and
_format_vcov_label hard-coded "km" for the cutoff label even when
metric was "euclidean". With panels rejected, the conley_cutoff_km /
conley_kernel fields on DiDResults / MultiPeriodDiDResults are now
unreachable; remove the dead fields, the dead arg passes from
estimators.py / twfe.py, and the dead "conley" branch in
_format_vcov_label.
Tests added: TWFE / DiD / MPD panel-rejection regressions, including a
repeated-coords-across-periods regression per the CI reviewer's
recommendation. 70 Conley tests + 401 targeted regression tests pass.
REGISTRY / CHANGELOG / llms.txt / README / TODO updated to reflect
that the only supported Phase 1 Conley path is direct
LinearRegression / compute_robust_vcov on a single-period design.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: CHANGELOG.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
8
8
## [Unreleased]
9
9
10
10
### Added
11
-
- **Conley (1999) spatial-HAC standard errors via `vcov_type="conley"`** on `DifferenceInDifferences`, `TwoWayFixedEffects`, and `MultiPeriodDiD` (Phase 1 of the spillover-conley initiative). New keyword-only kwargs on `__init__`: `conley_coords=(<lat_col>, <lon_col>)` (column-name tuple from `data`), `conley_cutoff_km=<float>` (positive finite bandwidth in km for haversine, or coord units for euclidean — REQUIRED, no default per the no-silent-failures contract), `conley_metric="haversine"|"euclidean"|callable` (default `"haversine"`; great-circle uses Earth's mean radius 6371.01 km matching R `conleyreg`), `conley_kernel="bartlett"|"uniform"` (default `"bartlett"` is PSD-guaranteed; `"uniform"` emits `UserWarning` if the meat has a materially negative eigenvalue per Conley 1999 footnote 11). Variance estimator `Var̂(β) = (X'X)^{-1} · ( Σ_{i,j} K(d_ij/h) · X_i ε_i ε_j X_j' ) · (X'X)^{-1}` (Conley 1999 Eq 4.2). FWL composes cleanly because the meat depends only on scores `X·ε`, both of which within-transformation preserves — `TwoWayFixedEffects(vcov_type="conley", ...)` is supported, UNLIKE `hc2`/`hc2_bm` which need the full hat matrix. TWFE auto-cluster-at-unit is disabled when `vcov_type="conley"`; explicit `cluster=` raises `NotImplementedError` (combined product kernel deferred to Phase 2). `n > 20_000` emits a `UserWarning` about the dense O(n²) distance-matrix memory; sparse k-d-tree fast path is queued for Phase 2. `SyntheticDiD(vcov_type="conley")` raises `TypeError` (uses bootstrap variance, not analytical sandwich); `set_params` mirrors the constructor rejection. `vcov_type="conley"` + `weights=` / `survey_design=` / `absorb=` raises `NotImplementedError` (Bertanha-Imbens 2014 weighted-Conley + arbitrary FE projection are deferred to follow-up phases). `TwoWayFixedEffects(vcov_type="conley", inference="wild_bootstrap")` raises `NotImplementedError` (Conley analytical spatial-HAC and wild cluster bootstrap are different inference paths). Helpers live in new module `diff_diff/conley.py` (`_haversine_km`, `_pairwise_distance_matrix`, `_bartlett_kernel`, `_uniform_kernel`, `_validate_conley_kwargs`, `_compute_conley_vcov`); `compute_robust_vcov` in `diff_diff/linalg.py` imports the dispatch helpers. R `conleyreg` parity (Düsterhöft 2021, CRAN v0.1.9) on three benchmark fixtures (`benchmarks/data/r_conleyreg_conley_golden.json`, regenerable via `benchmarks/R/generate_conley_golden.R`); observed max abs diff 5.7e-16. Earth radius 6371.01 km matches `conleyreg::haversine_dist`. Test file `tests/test_conley_vcov.py` skips parity cleanly when the JSON is absent. `result.summary()` prints `"Conley spatial HAC (bartlett, cutoff=200.0km)"` via the extended `_format_vcov_label` helper. New REGISTRY section `## ConleySpatialHAC`. Tracked on `BRIEFING.md` as Phase 1 of the 6-phase initiative (Phase 2: two-way space×time + sparse fast path; Phase 3: ring-indicator spillover-aware DiD per Butts 2021; Phase 4a/4b: mechanical extension to IF-aggregation and sandwich-derived estimators; Phase 5: survey design support).
11
+
- **Conley (1999) spatial-HAC standard errors via `vcov_type="conley"`** on cross-sectional `LinearRegression` / `compute_robust_vcov` (Phase 1 of the spillover-conley initiative). Keyword arguments: `conley_coords` (n × 2 array of lat/lon or projected coords), `conley_cutoff_km=<float>` (positive finite bandwidth in km for haversine, or coord units for euclidean — REQUIRED, no default per the no-silent-failures contract), `conley_metric="haversine"|"euclidean"|callable` (default `"haversine"`; great-circle uses Earth's mean radius 6371.01 km matching R `conleyreg`), `conley_kernel="bartlett"|"uniform"` (default `"bartlett"` is PSD-guaranteed; `"uniform"` emits `UserWarning` if the meat has a materially negative eigenvalue per Conley 1999 footnote 11). Variance estimator `Var̂(β) = (X'X)^{-1} · ( Σ_{i,j} K(d_ij/h) · X_i ε_i ε_j X_j' ) · (X'X)^{-1}` (Conley 1999 Eq 4.2). **Panel estimators (`DifferenceInDifferences`, `TwoWayFixedEffects`, `MultiPeriodDiD`) reject `vcov_type="conley"` at fit-time with `NotImplementedError`** — Phase 1's cross-sectional Conley does not handle the time dimension. Applying it over (unit, time) rows would treat same-unit cross-time pairs as `d_ij = 0 → K = 1`, mishandling the space-time HAC. Practitioners needing Conley with a panel design should pre-collapse to per-unit first-differences and call `compute_robust_vcov` directly on a single-period regression. Phase 2 will add the space-time product kernel (Driscoll-Kraay) for full panel support. `SyntheticDiD(vcov_type="conley")` raises `TypeError` (uses bootstrap variance, not analytical sandwich); `set_params` mirrors the constructor rejection. `vcov_type="conley"` + `cluster_ids=` / `weights=` / `survey_design=` raises `NotImplementedError` (combined product kernel + Bertanha-Imbens 2014 weighted-Conley deferred to follow-up phases). `n > 20_000` emits a `UserWarning` about the dense O(n²) distance-matrix memory; sparse k-d-tree fast path is queued for Phase 2. Helpers live in new module `diff_diff/conley.py` (`_haversine_km`, `_pairwise_distance_matrix`, `_bartlett_kernel`, `_uniform_kernel`, `_validate_conley_kwargs`, `_compute_conley_vcov`); `compute_robust_vcov` in `diff_diff/linalg.py` imports the dispatch helpers. R `conleyreg` parity (Düsterhöft 2021, CRAN v0.1.9) on three benchmark fixtures (`benchmarks/data/r_conleyreg_conley_golden.json`, regenerable via `benchmarks/R/generate_conley_golden.R`); observed max abs diff 5.7e-16. Earth radius 6371.01 km matches `conleyreg::haversine_dist`. Test file `tests/test_conley_vcov.py` skips parity cleanly when the JSON is absent. New REGISTRY section `## ConleySpatialHAC`. Tracked on `BRIEFING.md` as Phase 1 of the 6-phase initiative (Phase 2: space-time product kernel + sparse fast path + panel-estimator support; Phase 3: ring-indicator spillover-aware DiD per Butts 2021; Phase 4a/4b: mechanical extension to IF-aggregation and sandwich-derived estimators; Phase 5: survey design support).
12
12
- **`ChaisemartinDHaultfoeuille.by_path` and `paths_of_interest` now compose with `survey_design`** for analytical Binder TSL SE and replicate-weight bootstrap variance. The `NotImplementedError` gate at `chaisemartin_dhaultfoeuille.py:1233-1239` is replaced by a per-path multiplier-bootstrap-only gate (`survey_design + n_bootstrap > 0` under by_path / paths_of_interest still raises, since the survey-aware perturbation pivot for path-restricted IFs is methodologically underived). Per-path SE routes through the existing `_survey_se_from_group_if` cell-period allocator: the per-period IF (`U_pp_l_path`) is built with non-path switcher-side contributions skipped (control contributions are unchanged, matching the joiners/leavers IF convention; preserves the row-sum identity `U_pp.sum(axis=1) == U`), cohort-recentered via `_cohort_recenter_per_period`, then expanded to observations as `psi_i = U_pp[g_i, t_i] · (w_i / W_{g_i, t_i})`. Replicate-weight designs unconditionally use the cell allocator (Class A contract from PR #323). New `_refresh_path_inference` helper post-call refreshes `safe_inference` on every populated entry across `multi_horizon_inference`, `placebo_horizon_inference`, `path_effects`, and `path_placebos` so all four surfaces use the same final `df_survey` after per-path replicate fits append `n_valid` to the shared accumulator. Path-enumeration ranking under `survey_design` remains unweighted (group-cardinality, not population-weight mass). Lonely-PSU policy stays sample-wide, not per-path. Telescope invariant: on a single-path panel, per-path SE matches the global non-by_path survey SE bit-exactly. **No R parity** — R `did_multiplegt_dyn` does not support survey weighting; this is a Python-only methodology extension. The global non-by_path TSL multiplier-bootstrap path is unaffected (anti-regression test `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSurveyDesignAnalytical::test_global_survey_plus_n_bootstrap_still_works` locks the per-path-only scope of the new gate). Cross-surface invariants regression-tested at `TestByPathSurveyDesignAnalytical` (~17 tests across gate / dispatch / analytical SE / replicate-weight SE / per-path placebos / `trends_linear` composition / unobserved-path warnings / final-df refresh regressions) and `TestByPathSurveyDesignTelescope`. See `docs/methodology/REGISTRY.md` §`ChaisemartinDHaultfoeuille` `Note (Phase 3 by_path ...)` → "Per-path survey-design SE" for the full contract.
13
13
- **Inference-field aliases on staggered result classes** for adapter / external-consumer compatibility. Read-only `@property` aliases expose the flat `att` / `se` / `conf_int` / `p_value` / `t_stat` names (matching `DiDResults` / `TROPResults` / `SyntheticDiDResults` / `HeterogeneousAdoptionDiDResults`) on every result class that previously only carried prefixed canonical fields: `CallawaySantAnnaResults`, `StackedDiDResults`, `EfficientDiDResults`, `ChaisemartinDHaultfoeuilleResults`, `StaggeredTripleDiffResults`, `WooldridgeDiDResults`, `SunAbrahamResults`, `ImputationDiDResults`, `TwoStageDiDResults` (mapping to `overall_*`); `ContinuousDiDResults` (mapping to `overall_att_*`, ATT-side as the headline, ACRT-side accessible unchanged via `overall_acrt_*`); `MultiPeriodDiDResults` (mapping to `avg_*`). `ContinuousDiDResults` additionally exposes `overall_se` / `overall_conf_int` / `overall_p_value` / `overall_t_stat` aliases for naming consistency with the rest of the staggered family. Aliases are pure read-throughs over the canonical fields — no recomputation, no behavior change — so the `safe_inference()` joint-NaN contract (per CLAUDE.md "Inference computation") is inherited automatically (NaN canonical → NaN alias, locked at `tests/test_result_aliases.py::test_pattern_b_aliases_propagate_nan`). The native `overall_*` / `overall_att_*` / `avg_*` fields remain canonical for documentation and computation. Motivated by the `balance.interop.diff_diff.as_balance_diagnostic()` adapter (`facebookresearch/balance` PR #465) which calls `getattr(res, "se", None)` / `getattr(res, "conf_int", None)` without a fallback chain — pre-alias, every staggered result class returned `None` on those keys, silently dropping `se` and `conf_int` from the adapter's diagnostic dict. 23 alias-mechanic + balance-adapter regression tests at `tests/test_result_aliases.py`. Patch-level (additive on stable surfaces).
14
14
- **`ChaisemartinDHaultfoeuille.by_path` + non-binary integer treatment** — `by_path=k` now accepts integer-coded discrete treatment (D in Z, e.g. ordinal `{0, 1, 2}`); path tuples become integer-state tuples like `(0, 2, 2, 2)`. The previous `NotImplementedError` gate at `chaisemartin_dhaultfoeuille.py:1870` is replaced by a `ValueError` for continuous D (e.g. `D=1.5`) at fit-time per the no-silent-failures contract — the existing `int(round(float(v)))` cast in `_enumerate_treatment_paths` is now defensive (no-op for integer-coded D). Validated against R `did_multiplegt_dyn(..., by_path)` for D in `{0, 1, 2}` via the new `multi_path_reversible_by_path_non_binary` golden-value scenario (78 switchers, 3 paths, single-baseline custom DGP, F_g >= 4): per-path point estimates match R bit-exactly (rtol ~1e-9 on event horizons; rtol+atol envelope for placebo near-zero values), per-path SE inherits the documented cross-path cohort-sharing deviation (~5% rtol observed; SE_RTOL=0.15 envelope). **Deviation from R for D >= 10:** R's `did_multiplegt_by_path` derives the per-path baseline via `path_index$baseline_XX <- substr(path_index$path, 1, 1)`, which captures only the first character of the comma-separated path string (e.g. for `path = "12,12,..."` it captures `"1"` instead of `"12"`); this mis-allocates R's per-path control-pool subset for D >= 10. Python's tuple-key matching is correct in this regime — the per-path point estimates we compute are correct; R's per-path subset for the same path is buggy. The shipped parity scenario stays in `D in {0, 1, 2}` to avoid the R bug. R-parity test at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathNonBinary`; cross-surface invariants regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathNonBinary`.
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -124,7 +124,7 @@ Full guide: `diff_diff.get_llm_guide("practitioner")`.
124
124
-[Honest DiD](https://diff-diff.readthedocs.io/en/stable/api/honest_did.html) - Rambachan & Roth (2023) sensitivity analysis: robust CI under PT violations, breakdown values
125
125
-[Pre-Trends Power Analysis](https://diff-diff.readthedocs.io/en/stable/api/pretrends.html) - Roth (2022) minimum detectable violation and power curves
126
126
-[Power Analysis](https://diff-diff.readthedocs.io/en/stable/api/power.html) - analytical and simulation-based MDE, sample size, power curves for study design
127
-
- Conley spatial HAC SE (`vcov_type="conley"`) on DifferenceInDifferences/TwoWayFixedEffects/MultiPeriodDiD - Conley (1999) spatial-correlation-aware SEs with parity vs R `conleyreg`
127
+
- Conley spatial HAC SE (`vcov_type="conley"`) on cross-sectional `LinearRegression` / `compute_robust_vcov` - Conley (1999) spatial-correlation-aware SEs with parity vs R `conleyreg`
Copy file name to clipboardExpand all lines: TODO.md
+3-5Lines changed: 3 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -113,13 +113,11 @@ Deferred items from PR reviews that were not addressed before merge.
113
113
|`HeterogeneousAdoptionDiD` time-varying dose on event study: Phase 2b REJECTS panels where `D_{g,t}` varies within a unit for `t >= F` (the aggregation uses `D_{g, F}` as the single regressor for all horizons, paper Appendix B.2 constant-dose convention). A follow-up PR could add a time-varying-dose estimator for these panels; current behavior is front-door rejection with a redirect to `ChaisemartinDHaultfoeuille`. |`diff_diff/had.py::_validate_had_panel_event_study`| Phase 2b | Low |
114
114
|`HeterogeneousAdoptionDiD` repeated-cross-section support: paper Section 2 defines HAD on panel OR repeated cross-section, but Phase 2a is panel-only. RCS inputs (disjoint unit IDs between periods) are rejected by the balanced-panel validator with the generic "unit(s) do not appear in both periods" error. A follow-up PR will add an RCS identification path based on pre/post cell means (rather than unit-level first differences), with its own validator and a distinct `data_mode` / API surface. |`diff_diff/had.py::_validate_had_panel`, `diff_diff/had.py::_aggregate_first_difference`| Phase 2a | Medium |
115
115
| SyntheticDiD: bootstrap cross-language parity anchor against R's default `synthdid::vcov(method="bootstrap")` (refit; rebinds `opts` per draw) or Julia `Synthdid.jl::src/vcov.jl::bootstrap_se` (refit by construction). Same-library validation (placebo-SE tracking, AER §6.3 MC truth) is in place; a cross-language anchor is desirable to bolster the methodology contract. Julia is the cleanest target — minimal wrapping work and refit-native vcov. Tolerance target: 1e-6 on Monte Carlo samples (different BLAS + RNG paths preclude 1e-10). The R-parity fixture from the previous release was deleted because it pinned the now-removed fixed-weight path. |`benchmarks/R/`, `benchmarks/julia/`, `tests/`| follow-up | Low |
116
-
| Conley + cluster_ids combined product kernel `K_space(d_ij/h) · 1{cluster_i = cluster_j}`. Phase 2 of the spillover-conley initiative will add this alongside the time-dimension extension (Driscoll-Kraay). Currently raises `NotImplementedError` at both the linalg validator and TWFE early-block. |`linalg.py::_validate_vcov_args`, `twfe.py`, `estimators.py` (DiD/MultiPeriodDiD `fit`)| Phase 2 (spillover-conley) |Medium|
117
-
| Conley + survey weights / `survey_design`. Score-reweighted meat `s_i = w_i · X_i · ε_i` is mechanical, but PSU clustering interaction with the spatial kernel and replicate-weights variance under spatial correlation are non-trivial (Bertanha-Imbens 2014 covers cluster-sample but not the explicit Conley case). Phase 5 of the spillover-conley initiative; paper review prerequisite. Currently raises `NotImplementedError`. |`linalg.py::_validate_vcov_args`, `twfe.py`, `estimators.py`| Phase 5 (spillover-conley) | Medium |
118
-
| Conley + `absorb=` (arbitrary FE projection beyond TWFE's two-FE within-transformation). FWL composability is proven analytically for TWFE's fixed two-FE design but not formally verified for arbitrary `absorb` dimensions; conservatively rejected at fit-time with a redirect to `fixed_effects=` dummies. Lift after empirical verification on multi-FE within-transformations. |`estimators.py::DifferenceInDifferences.fit`, `MultiPeriodDiD.fit`| follow-up (spillover-conley) |Low|
116
+
| Conley space-time product kernel + panel-estimator wire-up. Phase 1 rejects `vcov_type="conley"` on `DifferenceInDifferences`, `TwoWayFixedEffects`, `MultiPeriodDiD` at fit-time because cross-sectional Conley over (unit, time) rows mishandles same-unit cross-time pairs (`d_ij = 0 → K = 1`). Phase 2 will add `K(d_ij, |s-t|) = K_space(d_ij/h_space) · K_time(|s-t|/h_time)` (Driscoll-Kraay) and lift the rejection. |`linalg.py`, `conley.py`, `estimators.py`, `twfe.py`| Phase 2 (spillover-conley) |High|
117
+
| Conley + cluster_ids combined product kernel `K_space(d_ij/h) · 1{cluster_i = cluster_j}`. Phase 2 of the spillover-conley initiative will add this alongside the time-dimension extension. Currently raises `NotImplementedError` at the linalg validator (cross-sectional Conley + cluster). |`linalg.py::_validate_vcov_args`| Phase 2 (spillover-conley) | Medium |
118
+
| Conley + survey weights / `survey_design`. Score-reweighted meat `s_i = w_i · X_i · ε_i` is mechanical, but PSU clustering interaction with the spatial kernel and replicate-weights variance under spatial correlation are non-trivial (Bertanha-Imbens 2014 covers cluster-sample but not the explicit Conley case). Phase 5 of the spillover-conley initiative; paper review prerequisite. Currently raises `NotImplementedError` at the linalg validator. |`linalg.py::_validate_vcov_args`| Phase 5 (spillover-conley) |Medium|
119
119
|`SyntheticDiD(vcov_type="conley")` support. Currently raises `TypeError` at `__init__` because SyntheticDiD uses `variance_method ∈ {bootstrap, jackknife, placebo}` rather than the analytical sandwich that Conley plugs into. Wiring would require either reimplementing an analytical sandwich path for SyntheticDiD or designing a spatial-block bootstrap (new methodology, Politis-Romano 1994 territory). |`synthetic_did.py::SyntheticDiD`| follow-up (spillover-conley) | Low |
120
120
| Validate user-supplied callable `conley_metric` for shape `(n, n)`, finiteness, non-negativity, and symmetry. Currently `np.asarray(metric(coords, coords))` is accepted unchecked; a malformed callable produces opaque matmul errors and a non-symmetric distance matrix produces a non-symmetric vcov. CI reviewer flagged as P2 M3 in PR #(spillover-conley). |`diff_diff/conley.py::_pairwise_distance_matrix`, `_compute_conley_vcov`| follow-up (spillover-conley) | Low |
121
-
| Extract common Conley estimator-level validation helper. Today `cluster=`, `survey_design=`, `conley_coords=`, and `conley_cutoff_km=` checks are duplicated across `DifferenceInDifferences.fit` (estimators.py:~370-400), `MultiPeriodDiD.fit` (estimators.py:~1395-1455), and `TwoWayFixedEffects.fit` (twfe.py:~165-205). A future Conley-feature change risks updating one estimator but not the others. CI reviewer flagged as P2 MT1. |`diff_diff/estimators.py`, `diff_diff/twfe.py`| follow-up (spillover-conley) | Low |
122
-
| Strengthen `tests/test_conley_vcov.py::TestConleyTWFE::test_twfe_conley_FWL_invariance` to actually verify FWL equivalence between TWFE-within Conley and a full-dummy-FE design (build the dummy regression explicitly and compare the ATT coefficient + Conley SE). The current test only asserts both fits produce finite SEs — the name overstates the assertion. CI reviewer flagged as P2 DT3. |`tests/test_conley_vcov.py`| follow-up (spillover-conley) | Low |
0 commit comments