Skip to content

Commit 6d371a8

Browse files
igerberclaude
andcommitted
Remove dead Conley scaffolding from panel estimators; align paper review
Address two P3 findings from CI Codex review of PR #411: P3 (Maintainability) — DiD/MultiPeriodDiD/TWFE all reject vcov_type="conley" unconditionally at fit-time, but each fit() still materialized `_conley_coords_array` from data and forwarded `conley_coords`, `conley_cutoff_km`, `conley_metric`, `conley_kernel` to LinearRegression / solve_ols. Those code paths were unreachable behind the unconditional NotImplementedError raise. Removed the dead extraction + arg-passes from all three estimators. The constructor still accepts the conley_* kwargs for sklearn-style API symmetry (set_params/get_params round-trip works); they have no effect on the panel paths. P3 (Documentation) — `docs/methodology/papers/conley-1999-review.md` Requirements checklist and Tuning Parameters table still said the Bartlett kernel is "PSD by construction" and only flagged uniform as needing the negative-eigenvalue warning. Updated both surfaces to spell out the radial 1-D pairwise specialization vs Conley's explicit 2-D separable PSD lattice formula (Eq 3.14) and to apply the warning to both kernels — matching the registry and the runtime contract. 271 targeted regression tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 7b17cbf commit 6d371a8

3 files changed

Lines changed: 3 additions & 42 deletions

File tree

diff_diff/estimators.py

Lines changed: 0 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -439,13 +439,6 @@ def fit(
439439
# For wild bootstrap, we don't need cluster SEs from the initial fit
440440
cluster_ids = data[self.cluster].values if self.cluster is not None else None
441441

442-
# Extract Conley coords array (n×2 float64) from the user's data.
443-
# Validation of the column existence and the 2-tuple shape happened
444-
# at the top of fit(); here we only need to materialize the array.
445-
_conley_coords_array = None
446-
if self.vcov_type == "conley" and self.conley_coords is not None:
447-
_conley_coords_array = data[list(self.conley_coords)].to_numpy(dtype=np.float64)
448-
449442
# When survey PSU is present, it overrides cluster for variance estimation
450443
effective_cluster_ids = _resolve_effective_cluster(
451444
resolved_survey, cluster_ids, self.cluster
@@ -487,10 +480,6 @@ def fit(
487480
weight_type=survey_weight_type,
488481
survey_design=_lr_survey,
489482
vcov_type=_fit_vcov_type,
490-
conley_coords=_conley_coords_array,
491-
conley_cutoff_km=self.conley_cutoff_km,
492-
conley_metric=self.conley_metric,
493-
conley_kernel=self.conley_kernel,
494483
).fit(X, y, df_adjustment=n_absorbed_effects)
495484

496485
coefficients = reg.coefficients_
@@ -1538,13 +1527,6 @@ def fit( # type: ignore[override]
15381527
# Remap implicit "classical" + cluster to CR1 (legacy backward compat).
15391528
_fit_vcov_type = self._resolve_effective_vcov_type(effective_cluster_ids)
15401529

1541-
# Extract Conley coords array (only when vcov_type='conley'; the
1542-
# estimator-level guards above already validated the column-name
1543-
# tuple against `data`).
1544-
_conley_coords_array_mp = None
1545-
if _fit_vcov_type == "conley" and self.conley_coords is not None:
1546-
_conley_coords_array_mp = data[list(self.conley_coords)].to_numpy(dtype=np.float64)
1547-
15481530
# Note: Wild bootstrap for multi-period effects is complex (multiple coefficients)
15491531
# For now, we use analytical inference even if inference="wild_bootstrap"
15501532
coefficients, residuals, fitted, vcov = solve_ols(
@@ -1558,10 +1540,6 @@ def fit( # type: ignore[override]
15581540
weights=survey_weights,
15591541
weight_type=survey_weight_type,
15601542
vcov_type=_fit_vcov_type,
1561-
conley_coords=_conley_coords_array_mp,
1562-
conley_cutoff_km=self.conley_cutoff_km,
1563-
conley_metric=self.conley_metric,
1564-
conley_kernel=self.conley_kernel,
15651543
)
15661544

15671545
# Compute survey vcov if applicable

diff_diff/twfe.py

Lines changed: 0 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -325,15 +325,6 @@ def fit( # type: ignore[override]
325325
# single source of truth.
326326
_fit_vcov_type = self._resolve_effective_vcov_type(survey_cluster_ids)
327327

328-
# Materialize Conley coords from data (validated above; this is just
329-
# array extraction). NOTE: data passed to LinearRegression is the
330-
# within-transformed matrix, but coords are still in the ORIGINAL
331-
# row order — within-transformation preserves row ordering, so the
332-
# coords align with the demeaned X 1:1.
333-
_twfe_conley_coords = None
334-
if _fit_vcov_type == "conley" and self.conley_coords is not None:
335-
_twfe_conley_coords = data[list(self.conley_coords)].to_numpy(dtype=np.float64)
336-
337328
if self.rank_deficient_action == "error":
338329
reg = LinearRegression(
339330
include_intercept=False,
@@ -344,10 +335,6 @@ def fit( # type: ignore[override]
344335
weight_type=survey_weight_type,
345336
survey_design=_lr_survey_twfe,
346337
vcov_type=_fit_vcov_type,
347-
conley_coords=_twfe_conley_coords,
348-
conley_cutoff_km=self.conley_cutoff_km,
349-
conley_metric=self.conley_metric,
350-
conley_kernel=self.conley_kernel,
351338
).fit(X, y, df_adjustment=df_adjustment)
352339
else:
353340
# Suppress generic warning, TWFE provides context-specific messages below
@@ -364,10 +351,6 @@ def fit( # type: ignore[override]
364351
weight_type=survey_weight_type,
365352
survey_design=_lr_survey_twfe,
366353
vcov_type=_fit_vcov_type,
367-
conley_coords=_twfe_conley_coords,
368-
conley_cutoff_km=self.conley_cutoff_km,
369-
conley_metric=self.conley_metric,
370-
conley_kernel=self.conley_kernel,
371354
).fit(X, y, df_adjustment=df_adjustment)
372355

373356
coefficients = reg.coefficients_

docs/methodology/papers/conley-1999-review.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -214,10 +214,10 @@ The paper itself does NOT distribute code. Conley's Section 5 empirical example
214214
- [ ] Coordinates supplied as two columns (lat, lon) or `(x, y)` projected.
215215
- [ ] Distance metric configured (haversine for lat/lon; euclidean for projected; callable for custom).
216216
- [ ] Cutoff `conley_cutoff_km > 0` (or unitless `conley_cutoff` for euclidean). Document that `h = 0` reduces to HC0.
217-
- [ ] Kernel choice `conley_kernel ∈ {"bartlett", "uniform"}`. Bartlett is PSD by construction; uniform is not in general (warn).
217+
- [ ] Kernel choice `conley_kernel ∈ {"bartlett", "uniform"}`. Conley's explicit PSD Bartlett (Eq 3.14) is the 2-D separable lattice product window; the radial 1-D pairwise Bartlett that diff-diff and R `conleyreg` implement is a practitioner specialization that is **not** formally PSD-guaranteed. Uniform is also not PSD in general. Apply the negative-eigenvalue warning to **both** kernels.
218218
- [ ] Score outer products `x_i ε̂_i` computed identically to HC0 path.
219219
- [ ] Robustness sweep: document that practitioners should report estimates at multiple cutoffs (Conley Section 5 standard).
220-
- [ ] If `conley_kernel="uniform"` and the resulting variance has any negative eigenvalues, warn or fall back to Bartlett.
220+
- [ ] If the resulting Conley meat / variance has any materially negative eigenvalues (under either Bartlett or uniform), warn the user (the implementation does this for both kernels).
221221

222222
---
223223

@@ -242,7 +242,7 @@ The paper itself does NOT distribute code. Conley's Section 5 empirical example
242242
| `vcov_method` | str | `"hc0"` | Set to `"conley"` to activate. |
243243
| `conley_coords` | tuple of 2 str | `None` | User specifies the two column names for lat/lon (or projected x/y). Required when `vcov_method="conley"`. |
244244
| `conley_cutoff_km` | float | `None` (no default) | User-supplied. Conley does not provide a plug-in selector. Recommend a robustness sweep (3-5 values spanning the relevant economic-distance range). For Phase 1, error if not supplied. |
245-
| `conley_kernel` | str | `"bartlett"` | `"bartlett"` is PSD by construction (Conley Eq 3.14 page 12) and is the practitioner default. `"uniform"` matches Conley's "truncated window" (page 11) but may fail PSD; emit warning. |
245+
| `conley_kernel` | str | `"bartlett"` | `"bartlett"` evaluated on pairwise distance `d_ij/h` is the practitioner default, matching R `conleyreg` and Stata `acreg`; this radial 1-D form is a specialization of Conley's explicit 2-D separable PSD-guaranteed Bartlett (Eq 3.14, page 12) and is not formally PSD-guaranteed itself. `"uniform"` matches Conley's "truncated window" (page 11) and is also not PSD in general (footnote 11). Emit a warning under either kernel when the resulting meat has a materially negative eigenvalue. |
246246
| `conley_metric` | str or callable | `"haversine"` | `"haversine"` for lat/lon (km); `"euclidean"` for projected coords (units = whatever the coord units are - so if coords are degrees, cutoff is in degrees); a callable `(coord_i, coord_j) -> float` for custom metrics (e.g., travel time, network distance). |
247247

248248
### Relation to Existing diff-diff Estimators

0 commit comments

Comments
 (0)