Skip to content

Commit 324a71d

Browse files
igerberclaude
andcommitted
Align public docstrings with Phase 1 Conley contract; drop redundant guards
Address P2/P3 findings from CI Codex review of PR #411 R-rebased: P2 — Public docstrings for the cross-sectional supported APIs and the rejected panel surfaces were missing or stale: - `LinearRegression.__init__` and `solve_ols()` docstrings now document `conley_coords`, `conley_cutoff_km`, `conley_metric`, `conley_kernel` (the four newly added kwargs) plus the cluster/weights/survey rejection contract, mirroring the `compute_robust_vcov` docstring. - `SyntheticDiD` class docstring gains a `Notes (Conley spatial-HAC rejection)` block stating that `vcov_type` and `conley_*` kwargs raise `TypeError` at __init__ / set_params, with the bootstrap-variance rationale. - `TwoWayFixedEffects` class docstring gains a paragraph on the Phase 1 panel rejection (parallel to the existing HC2/Bell-McCaffrey paragraph), including the sklearn-style API-symmetry rationale for why constructor kwargs are inherited from DifferenceInDifferences. P3 — Two cleanups: - Removed the redundant `MultiPeriodDiD(absorb=..., vcov_type="conley")` pre-guard; the unconditional Conley reject immediately after covered the same path and the special-cased message was misleading (it told users to "drop absorb=" even though dropping absorb= would NOT make Conley available on MultiPeriodDiD). - Renamed `test_bartlett_psd_on_random_distances` to `test_bartlett_kernel_finite_and_in_unit_interval`. The original name encoded a stronger property than the methodology contract guarantees (radial 1-D Bartlett is a practitioner specialization, not PSD- guaranteed). The renamed test asserts finite / symmetric / [0, 1]- bounded instead; the both-kernel indefiniteness warning is locked separately by `test_indefinite_meat_warning_fires_for_bartlett`. Doc surfaces: - `docs/methodology/papers/colella-et-al-2019-review.md`: updated the Phase 1 parity target paragraph to state that Phase 1 ships only cross-sectional Conley with R `conleyreg` parity; Stata `acreg` TWFE parity is a Phase 2 target. - PR body summary rewritten to match the shipped contract: cross- sectional `LinearRegression` / `compute_robust_vcov` supported, DiD / MultiPeriodDiD / TWFE reject at fit-time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent ebf2a6e commit 324a71d

6 files changed

Lines changed: 102 additions & 15 deletions

File tree

diff_diff/estimators.py

Lines changed: 4 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1374,19 +1374,13 @@ def fit( # type: ignore[override]
13741374
"switch to fixed_effects= dummies for a full-dummy design."
13751375
)
13761376

1377-
# Reject Conley combinations early at the estimator level (see
1378-
# DifferenceInDifferences.fit for the matching block and rationale).
1379-
if absorb and self.vcov_type == "conley":
1380-
raise NotImplementedError(
1381-
"MultiPeriodDiD(absorb=..., vcov_type='conley') is deferred "
1382-
"to a follow-up. Use fixed_effects= dummies for an equivalent "
1383-
"FE design with the full projection, or drop absorb= for "
1384-
"cross-sectional Conley."
1385-
)
13861377
# MultiPeriodDiD is intrinsically a multi-period panel estimator;
13871378
# cross-sectional Conley does not apply (same rationale as
13881379
# DifferenceInDifferences.fit's panel guard above). Phase 2 will
1389-
# add a documented space-time HAC.
1380+
# add a documented space-time HAC. The rejection is unconditional
1381+
# — `absorb` and other Conley-adjacent kwargs cannot make
1382+
# MultiPeriodDiD Conley-compatible because the panel structure is
1383+
# the load-bearing reason Phase 1 cannot apply Conley here.
13901384
if self.vcov_type == "conley":
13911385
raise NotImplementedError(
13921386
"MultiPeriodDiD(vcov_type='conley') is deferred to Phase 2 "

diff_diff/linalg.py

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -520,6 +520,30 @@ def solve_ols(
520520
raises ``NotImplementedError`` because the BM DOF helper is
521521
inconsistent with ``solve_ols``'s WLS transform. Tracked in
522522
``TODO.md``.
523+
- ``"conley"``: Conley (1999) spatial-HAC sandwich. Requires
524+
``conley_coords`` (n × 2 array) and ``conley_cutoff_km`` (positive
525+
bandwidth, no default per Conley 1999 Section 5's sensitivity-grid
526+
recommendation). Combining with ``cluster_ids`` or ``weights``
527+
raises ``NotImplementedError`` (combined product kernel + Bertanha-
528+
Imbens 2014 weighted-Conley deferred to Phase 2+). Cross-sectional
529+
one-way only.
530+
conley_coords : ndarray of shape (n, 2), optional
531+
Required when ``vcov_type="conley"``. Two-column array of
532+
``[lat, lon]`` (degrees, for ``conley_metric="haversine"``) or
533+
projected coordinates (for ``conley_metric="euclidean"`` / callable
534+
metric).
535+
conley_cutoff_km : float, optional
536+
Required when ``vcov_type="conley"``. Positive finite bandwidth in
537+
km (haversine) or coord units (euclidean / callable).
538+
conley_metric : {"haversine", "euclidean", callable}, default "haversine"
539+
Distance metric. Haversine uses Earth's mean radius 6371.01 km
540+
(matching R ``conleyreg``). Euclidean treats coords as already
541+
projected. Callable signature ``(coords1, coords2) -> n×n``.
542+
conley_kernel : {"bartlett", "uniform"}, default "bartlett"
543+
Kernel evaluated on pairwise distance ``d_ij/h``. Both kernels emit
544+
a ``UserWarning`` if the resulting meat is materially indefinite;
545+
the radial 1-D Bartlett (matching R ``conleyreg``) is not formally
546+
PSD-guaranteed — see :func:`compute_robust_vcov`.
523547
524548
Returns
525549
-------
@@ -2363,6 +2387,37 @@ class LinearRegression:
23632387
sandwich, the class stores per-coefficient BM Satterthwaite DOF
23642388
(``self._bm_dof``) and threads it into ``get_inference``.
23652389
2390+
For ``"conley"`` (Conley 1999 spatial-HAC) the supported Phase 1
2391+
path is the cross-sectional `LinearRegression` / `compute_robust_vcov`
2392+
surface; requires ``conley_coords`` (n × 2 array) and a positive
2393+
``conley_cutoff_km``. Combining ``vcov_type="conley"`` with
2394+
``cluster_ids``, ``weights``, or ``survey_design`` raises
2395+
``NotImplementedError`` (combined product kernel + Bertanha-Imbens
2396+
2014 weighted-Conley deferred to Phase 2+). The panel DiD /
2397+
MultiPeriodDiD / TwoWayFixedEffects estimators reject
2398+
``vcov_type="conley"`` at fit-time entirely in Phase 1.
2399+
conley_coords : ndarray of shape (n, 2), optional
2400+
Required when ``vcov_type="conley"``. Two-column array of
2401+
``[lat, lon]`` (degrees, for ``conley_metric="haversine"``) or
2402+
projected coordinates (for ``conley_metric="euclidean"`` / callable
2403+
metric). Raises ``ValueError`` when missing under Conley.
2404+
conley_cutoff_km : float, optional
2405+
Required when ``vcov_type="conley"``. Positive finite bandwidth in
2406+
km (haversine) or coord units (euclidean / callable). No default
2407+
per Conley 1999 Section 5's sensitivity-grid recommendation.
2408+
conley_metric : {"haversine", "euclidean", callable}, default "haversine"
2409+
Distance metric. Haversine uses Earth's mean radius 6371.01 km
2410+
matching R ``conleyreg``. Euclidean treats the coords as already
2411+
projected. Callable signature ``(coords1, coords2) -> n×n``.
2412+
conley_kernel : {"bartlett", "uniform"}, default "bartlett"
2413+
Kernel evaluated on pairwise distance ``d_ij/h``. ``"bartlett"`` is
2414+
the radial 1-D specialization (matching R ``conleyreg``);
2415+
``"uniform"`` is the truncated indicator. Both kernels emit a
2416+
``UserWarning`` if the resulting meat is materially indefinite —
2417+
neither is formally PSD-guaranteed in the radial pairwise form
2418+
(Conley 1999's explicit PSD Bartlett formula is the 2-D separable
2419+
product window, Eq 3.14, not the 1-D radial pairwise form).
2420+
23662421
Attributes
23672422
----------
23682423
coefficients_ : ndarray

diff_diff/synthetic_did.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,19 @@ class SyntheticDiD(DifferenceInDifferences):
8686
Random seed for reproducibility. If None (default), results
8787
will vary between runs.
8888
89+
Notes (Conley spatial-HAC rejection)
90+
------------------------------------
91+
SyntheticDiD does not support the Conley (1999) spatial-HAC analytical
92+
sandwich. Passing ``vcov_type="conley"`` or any non-``None`` Conley
93+
keyword (``conley_coords``, ``conley_cutoff_km``, ``conley_metric``,
94+
``conley_kernel``) to ``__init__`` or ``set_params`` raises
95+
``TypeError``. Rationale: SyntheticDiD's variance is derived from
96+
bootstrap / jackknife / placebo resampling (Arkhangelsky et al. 2021
97+
Algorithms 2–4), not the sandwich identity Conley plugs into. Adding
98+
Conley support would require either an analytical SDID sandwich path
99+
or a spatial-block bootstrap (Politis-Romano 1994 territory). Tracked
100+
as a follow-up in ``TODO.md``.
101+
89102
Attributes
90103
----------
91104
results_ : SyntheticDiDResults

diff_diff/twfe.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,20 @@ class TwoWayFixedEffects(DifferenceInDifferences):
6767
``TODO.md`` under Methodology/Correctness; also documented in
6868
``docs/methodology/REGISTRY.md``.
6969
70+
**Conley spatial-HAC (``vcov_type="conley"``) is rejected at fit-time
71+
in Phase 1.** TwoWayFixedEffects is intrinsically a multi-period panel
72+
estimator and Phase 1's cross-sectional Conley does not handle the
73+
time dimension — applying it over (unit, time) rows would treat same-
74+
unit cross-time pairs as ``d_ij = 0 → K = 1``, mishandling the space-
75+
time HAC. The supported Phase 1 path for Conley is direct
76+
``compute_robust_vcov`` / ``LinearRegression`` on a single-period
77+
regression. The ``conley_*`` kwargs are inherited from
78+
``DifferenceInDifferences.__init__`` for sklearn-style API symmetry
79+
(``get_params`` / ``set_params`` round-trip), but
80+
``TwoWayFixedEffects(vcov_type="conley").fit(...)`` raises
81+
``NotImplementedError``. Phase 2 will add the space-time product kernel
82+
(Driscoll-Kraay) and lift the rejection.
83+
7084
Warning: TWFE can be biased with staggered treatment timing
7185
and heterogeneous treatment effects. Consider using
7286
more robust estimators (e.g., Callaway-Sant'Anna) for

docs/methodology/papers/colella-et-al-2019-review.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -190,7 +190,7 @@ This is a parity gap relative to acreg - implementers must consult acreg source.
190190

191191
### Relation to Existing diff-diff Estimators
192192

193-
- **Phase 1 parity target:** `vcov_method="conley"` on TWFE must match acreg to <=1e-6 on at least 2-3 fixtures. The `coords=("lat","lon")` and `cutoff_km=<float>` parameters map directly onto acreg's lat/lon + cutoff inputs.
193+
- **Phase 1 parity target (UPDATED):** Phase 1 ships `vcov_type="conley"` on **cross-sectional** `compute_robust_vcov` / `LinearRegression` only, with parity verified against R `conleyreg` (Düsterhöft 2021) to 1e-6 on three benchmark fixtures. Panel estimators (`DifferenceInDifferences`, `MultiPeriodDiD`, `TwoWayFixedEffects`) reject `vcov_type="conley"` at fit-time because the radial 1-D pairwise Conley does not handle the time dimension — applying it over (unit, time) rows would treat same-unit cross-time pairs as `d_ij = 0 → K = 1`, mishandling the space-time HAC. **Stata `acreg` parity for TWFE / panel space-time Conley is a Phase 2 target**, alongside the Driscoll-Kraay product-kernel implementation. The `coords` and `cutoff_km` parameter mapping below is still accurate for the cross-sectional path.
194194
- **Reduces to HC0** when the cutoff is small enough that `S = I` (no neighbour pairs). The paper does not state this explicitly, but the meat formula collapses to `X' diag(e^2) X` in that case, which is HC0 (White 1980, equation referenced page 4).
195195
- **Reduces to one-way clustering** when `S = block-diagonal indicator(same cluster)` (see Section 2, p. 6: Cameron et al. 2011 "can be embedded in this framework"). For multiway clustering, the paper says (page 6): "Multiway clustering assumes a particular *regularity condition* in the clustering structure ... However, in many real-life settings, this particular clustering structure may not hold." The acreg estimator is more flexible and the reduction to multiway clustering is approximate (binary `S` with the union-of-clusters structure).
196196
- **Cluster + spatial joint mode:** The paper does NOT formally combine cluster-robust with spatial-HAC. However, since `S` is arbitrary, one can construct `S` as the elementwise OR of the cluster-indicator matrix and the spatial-cutoff matrix; this gives a joint estimator. acreg likely exposes both options - verify.

tests/test_conley_vcov.py

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -83,16 +83,27 @@ def test_uniform_kernel_above_one_zero(self):
8383
def test_uniform_kernel_at_zero_one(self):
8484
np.testing.assert_allclose(_uniform_kernel(np.array([0.0])), 1.0)
8585

86-
def test_bartlett_psd_on_random_distances(self):
87-
"""Bartlett-weighted Gram matrix has all eigenvalues >= -tol."""
86+
def test_bartlett_kernel_finite_and_in_unit_interval(self):
87+
"""Bartlett-weighted kernel matrix on random pairwise distances is
88+
finite, symmetric, and bounded in [0, 1]. We do NOT assert PSD here:
89+
the radial 1-D Bartlett on pairwise distance is a practitioner
90+
specialization of Conley 1999 (matching R conleyreg) and is NOT
91+
formally PSD-guaranteed — see REGISTRY ConleySpatialHAC. The
92+
runtime path emits a UserWarning if the resulting Conley meat is
93+
materially indefinite; that contract is locked separately in
94+
``test_indefinite_meat_warning_fires_for_bartlett``.
95+
"""
8896
rng = np.random.default_rng(seed=11)
8997
n = 25
9098
coords = rng.uniform(0, 1, size=(n, 2))
9199
diff = coords[:, None, :] - coords[None, :, :]
92100
D = np.sqrt((diff * diff).sum(axis=-1))
93101
K = _bartlett_kernel(D / 0.3)
94-
eigvals = np.linalg.eigvalsh(0.5 * (K + K.T)) # ensure symmetric
95-
assert eigvals.min() > -1e-12
102+
assert K.shape == (n, n)
103+
assert np.all(np.isfinite(K))
104+
assert np.all(K >= 0.0)
105+
assert np.all(K <= 1.0)
106+
np.testing.assert_allclose(K, K.T, atol=1e-15)
96107

97108

98109
# ---------------------------------------------------------------------------

0 commit comments

Comments
 (0)