Skip to content

Commit 2b86d8c

Browse files
igerberclaude
andcommitted
SpilloverDiD: replace Conley smoke test with plumbing-verification
Round-8 CI review flagged `test_conley_se_differs_from_hc1` as a test- coverage gap: the name promises a Conley-specific assertion but the body only checked finite SE + ATT invariance. A silent fallback to HC1 (e.g., if SpilloverDiD ever stopped threading the Conley kwargs through to `solve_ols`) would have passed. Replace with: - `test_conley_kwargs_threaded_to_solve_ols`: patches `solve_ols` at the import site, captures the kwargs of the stage-2 invocation, and asserts they include `vcov_type="conley"`, `conley_cutoff_km=200.0`, `conley_metric="haversine"`, `conley_lag_cutoff=0`, plus fit-time-derived `conley_coords` / `conley_time` / `conley_unit` arrays of the right shape. Any silent fallback to HC1 fails this. - `test_conley_att_invariant_vs_hc1`: extracted from the old test — vcov choice does not change ATT (residualization + OLS fit are independent of variance). Now stands as a clean invariant rather than pretending to verify Conley-specific output. Also bumps CHANGELOG test count 156 -> 157 and updates the Conley description to "plumbing (verifies solve_ols is called with vcov_type= 'conley' + Conley kwargs, no silent HC1 fallback)". All 157 tests pass; black + ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 1a71eaf commit 2b86d8c

2 files changed

Lines changed: 67 additions & 15 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
88
## [Unreleased]
99

1010
### Added
11-
- **`SpilloverDiD` — ring-indicator spillover-aware DiD (Butts 2021).** New standalone estimator at `diff_diff/spillover.py` implementing two-stage Gardner methodology with ring-indicator covariates that identify direct effect on treated (`tau_total`) alongside per-ring spillover effects on near-control units (`delta_j`). Documented synthesis of ingredients (no single published software covers the exact recipe — `did2s` implements Gardner two-stage without rings; the Butts ring estimator has no R/Stata package): Butts (2021) Section 5 / Table 2 identification, Gardner (2022) two-stage residualize-then-fit, and the Conley spatial-HAC vcov shipped in 3.3.3. Handles both panel non-staggered (Equations 5/6/8) and Section 5 staggered timing in one estimator — non-staggered is the special case where all treated units share an onset time. **API:** `SpilloverDiD(rings=[0, 50, 100, 200], conley_coords=("lat","lon"), ...).fit(data, outcome="y", unit="unit", time="t", treatment="D")` (binary D auto-converted to `first_treat`) or `.fit(..., first_treat="first_treat")` (Gardner convention). Result: `SpilloverDiDResults(DiDResults)` with `.att` = `tau_total`, `.spillover_effects` (per-ring `pd.DataFrame` with `coef`/`se`/`t_stat`/`p_value`/`ci_low`/`ci_high`), `.ring_breakpoints`, `.d_bar`, `.n_units_ever_in_ring`, `.n_far_away_obs`, `.is_staggered`. `.coefficients` exposes all `(1+K)` stage-2 entries (`"treatment"` + `"_spillover_<ring_label>"`) plus an `"ATT"` alias keyed to vcov columns. **Methodology spec (committed):** stage-2 regressor is the time-varying `(1 - D_it) * Ring_{it,j}` form (paper page 12's `S_it = S_i * 1{t >= t_treat}` notation; Section 5 Table 2's `S^k_{it}` / `Ring^k_{it,j}`). Reading the literal unit-static `(1 - D_it) * S_i` from Equation 5 is algebraically rank-deficient under TWFE (`(1-D_it) * S_i = S_i - D_it`, with `S_i` absorbed by `mu_i`, leaving `-D_it`); only the time-varying form supports the paper's identification (Proposition 2.3). Stage-1 subsample uses Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed), not TwoStageDiD's `{D_it = 0}` alone — this prevents spillover-contaminated near-controls in pre/post periods from biasing the time FE. **Gardner identity (non-staggered):** a 20-seed deterministic regression test pins `SpilloverDiD.att` against a direct single-stage TWFE ring regression on the full sample (`y ~ mu_i + lambda_t + tau * D_it + sum_j delta_j * (1 - D_it) * Ring_{it,j}`) at `atol=1e-10` — empirically bit-identical, so the reported non-staggered `tau_total` IS the Butts Eqs. 4-6 estimator. **Identification-check policy (period strict, unit warn-and-drop, plus connectivity):** every period must have at least one Omega_0 row (hard `ValueError` — dropping a period removes all units' cross-time identification). Units lacking Omega_0 rows (e.g. baseline-treated units with `D_it = 1` at every observed `t`) are warned-and-dropped: their unit FE is NaN, residualization writes NaN on their rows, and the downstream finite-mask path excludes them from stage 2 — mirrors `TwoStageDiD`'s always-treated convention. Additionally, the supported-units bipartite graph (units linked by shared Omega_0 periods) must form a single connected component; `K > 1` components raise `ValueError` because the FE solver would return only component-specific constants and residualization would silently mix them across components (defense-in-depth — under absorbing treatment the disconnected case may be unreachable through the upstream validators, but the check future-proofs Wave B follow-ups). **Public API restrictions (Wave B MVP):** `covariates=` raises `NotImplementedError` because Gardner-style two-stage requires covariate effects estimated on the untreated-and-unexposed subsample at stage 1 (appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates); non-absorbing / reversible treatment patterns (e.g. `[0, 1, 0]`) raise `ValueError` rather than being silently coerced into "treated from first 1 onward"; non-constant `first_treat` values across rows of the same unit raise `ValueError`; `conley_coords` is required on every fit path (not just `vcov_type="conley"`) because ring construction always uses it. **Far-away control identification:** uses CURRENT-period untreated status (`D_it = 0`) rather than never-treated-only, so all-eventually-treated staggered designs (no never-treated units) can identify the counterfactual via not-yet-treated far-away rows. **Variance (Wave B MVP):** stage-2 OLS variance via `solve_ols` (HC1 / Conley / cluster paths all flow through). The Gardner GMM first-stage uncertainty correction is NOT applied at stage 2 in this PR (documented limitation; planned follow-up extends `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the influence-function outer-product step). **Deferred features (planned follow-ups):** `event_study=True` per-event-time × ring coefficients (Butts Table 2), `survey_design=` integration, `ring_method="count"` (count-of-treated-in-ring), data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight), Gardner GMM first-stage correction at stage 2, sparse staggered ring-distance path. **Tests:** `tests/test_spillover.py` (156 tests across ring-construction primitives, validators, fit integration, raw-data invariant, identification MC — non-staggered DGP at 50 seeds + 200-seed `@pytest.mark.slow` variant recovers both `tau_total` and `delta_1`; staggered DGP at 30 seeds anchors both `tau_total` and `delta_1` — Conley wiring, Gardner identity bit-identity, coefficients-vs-vcov alignment, warn-and-drop, rank_deficient_action validation, Omega_0 bipartite-graph connectivity, anticipation behavior on both fit paths). DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` satisfy Butts Assumptions 1/3/5/7 by construction.
11+
- **`SpilloverDiD` — ring-indicator spillover-aware DiD (Butts 2021).** New standalone estimator at `diff_diff/spillover.py` implementing two-stage Gardner methodology with ring-indicator covariates that identify direct effect on treated (`tau_total`) alongside per-ring spillover effects on near-control units (`delta_j`). Documented synthesis of ingredients (no single published software covers the exact recipe — `did2s` implements Gardner two-stage without rings; the Butts ring estimator has no R/Stata package): Butts (2021) Section 5 / Table 2 identification, Gardner (2022) two-stage residualize-then-fit, and the Conley spatial-HAC vcov shipped in 3.3.3. Handles both panel non-staggered (Equations 5/6/8) and Section 5 staggered timing in one estimator — non-staggered is the special case where all treated units share an onset time. **API:** `SpilloverDiD(rings=[0, 50, 100, 200], conley_coords=("lat","lon"), ...).fit(data, outcome="y", unit="unit", time="t", treatment="D")` (binary D auto-converted to `first_treat`) or `.fit(..., first_treat="first_treat")` (Gardner convention). Result: `SpilloverDiDResults(DiDResults)` with `.att` = `tau_total`, `.spillover_effects` (per-ring `pd.DataFrame` with `coef`/`se`/`t_stat`/`p_value`/`ci_low`/`ci_high`), `.ring_breakpoints`, `.d_bar`, `.n_units_ever_in_ring`, `.n_far_away_obs`, `.is_staggered`. `.coefficients` exposes all `(1+K)` stage-2 entries (`"treatment"` + `"_spillover_<ring_label>"`) plus an `"ATT"` alias keyed to vcov columns. **Methodology spec (committed):** stage-2 regressor is the time-varying `(1 - D_it) * Ring_{it,j}` form (paper page 12's `S_it = S_i * 1{t >= t_treat}` notation; Section 5 Table 2's `S^k_{it}` / `Ring^k_{it,j}`). Reading the literal unit-static `(1 - D_it) * S_i` from Equation 5 is algebraically rank-deficient under TWFE (`(1-D_it) * S_i = S_i - D_it`, with `S_i` absorbed by `mu_i`, leaving `-D_it`); only the time-varying form supports the paper's identification (Proposition 2.3). Stage-1 subsample uses Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed), not TwoStageDiD's `{D_it = 0}` alone — this prevents spillover-contaminated near-controls in pre/post periods from biasing the time FE. **Gardner identity (non-staggered):** a 20-seed deterministic regression test pins `SpilloverDiD.att` against a direct single-stage TWFE ring regression on the full sample (`y ~ mu_i + lambda_t + tau * D_it + sum_j delta_j * (1 - D_it) * Ring_{it,j}`) at `atol=1e-10` — empirically bit-identical, so the reported non-staggered `tau_total` IS the Butts Eqs. 4-6 estimator. **Identification-check policy (period strict, unit warn-and-drop, plus connectivity):** every period must have at least one Omega_0 row (hard `ValueError` — dropping a period removes all units' cross-time identification). Units lacking Omega_0 rows (e.g. baseline-treated units with `D_it = 1` at every observed `t`) are warned-and-dropped: their unit FE is NaN, residualization writes NaN on their rows, and the downstream finite-mask path excludes them from stage 2 — mirrors `TwoStageDiD`'s always-treated convention. Additionally, the supported-units bipartite graph (units linked by shared Omega_0 periods) must form a single connected component; `K > 1` components raise `ValueError` because the FE solver would return only component-specific constants and residualization would silently mix them across components (defense-in-depth — under absorbing treatment the disconnected case may be unreachable through the upstream validators, but the check future-proofs Wave B follow-ups). **Public API restrictions (Wave B MVP):** `covariates=` raises `NotImplementedError` because Gardner-style two-stage requires covariate effects estimated on the untreated-and-unexposed subsample at stage 1 (appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates); non-absorbing / reversible treatment patterns (e.g. `[0, 1, 0]`) raise `ValueError` rather than being silently coerced into "treated from first 1 onward"; non-constant `first_treat` values across rows of the same unit raise `ValueError`; `conley_coords` is required on every fit path (not just `vcov_type="conley"`) because ring construction always uses it. **Far-away control identification:** uses CURRENT-period untreated status (`D_it = 0`) rather than never-treated-only, so all-eventually-treated staggered designs (no never-treated units) can identify the counterfactual via not-yet-treated far-away rows. **Variance (Wave B MVP):** stage-2 OLS variance via `solve_ols` (HC1 / Conley / cluster paths all flow through). The Gardner GMM first-stage uncertainty correction is NOT applied at stage 2 in this PR (documented limitation; planned follow-up extends `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the influence-function outer-product step). **Deferred features (planned follow-ups):** `event_study=True` per-event-time × ring coefficients (Butts Table 2), `survey_design=` integration, `ring_method="count"` (count-of-treated-in-ring), data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight), Gardner GMM first-stage correction at stage 2, sparse staggered ring-distance path. **Tests:** `tests/test_spillover.py` (157 tests across ring-construction primitives, validators, fit integration, raw-data invariant, identification MC — non-staggered DGP at 50 seeds + 200-seed `@pytest.mark.slow` variant recovers both `tau_total` and `delta_1`; staggered DGP at 30 seeds anchors both `tau_total` and `delta_1` — Conley plumbing (verifies `solve_ols` is called with `vcov_type="conley"` + Conley kwargs, no silent HC1 fallback), Gardner identity bit-identity, coefficients-vs-vcov alignment, warn-and-drop, rank_deficient_action validation, Omega_0 bipartite-graph connectivity, anticipation behavior on both fit paths). DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` satisfy Butts Assumptions 1/3/5/7 by construction.
1212
- **`ChaisemartinDHaultfoeuille.predict_het` × `placebo`: R-parity on both global and per-path surfaces.** R-verified — `did_multiplegt_dyn(predict_het, placebo)` emits heterogeneity OLS results on backward (placebo) horizons via R's `DIDmultiplegtDYN:::did_multiplegt_main` placebo block (`effect = matrix(-i, ...)` rbind site); the same block runs per-by_level under `did_multiplegt_dyn(by_path, predict_het, placebo)`, so both global `res$results$predict_het` and per-by_level `res$by_level_i$results$predict_het` slots emit backward rows. R's predict_het syntax with `placebo > 0` requires the `c(-1)` sentinel in the horizon vector to trigger "compute heterogeneity for ALL forward (1..effects) AND ALL placebo (1..placebo) positions" — passing positive-only horizons errors with "specified numbers in predict_het that exceed the number of placebos". Python mirrors via `_compute_heterogeneity_test(..., placebo=L_max)` (set automatically from `self.placebo` at both global and per-path call sites in `fit()`) — the function iterates forward (1..L_max) and backward (-1..-L_max) horizons in a single loop with an explicit `out_idx < 0` eligibility guard for backward horizons whose `F_g` is too small (would otherwise silently misread `N_mat` via numpy negative indexing). `results.heterogeneity_effects` uses negative-int keys for backward horizons; `path_heterogeneity_effects` does the same per path. Placebo rows in `to_dataframe(level="by_path")` have non-NaN `het_*` columns when `placebo=True` and `heterogeneity=` are both set. **Survey gate (warn + skip):** `survey_design + placebo + heterogeneity` emits a `UserWarning` at fit-time and falls back to forward-horizon-only heterogeneity on both surfaces — the Binder TSL cell-period allocator's REGISTRY justification is tied to **post-period** attribution; backward-horizon attribution puts ψ_g mass on a pre-period cell, a separate library-extension claim that needs its own derivation. Forward-horizon `predict_het + survey_design` continues to work unchanged on both global and per-path surfaces. The function-level `_compute_heterogeneity_test` keeps a per-iteration `NotImplementedError` backstop for direct callers that bypass fit(). Pre-period allocator derivation deferred to a follow-up methodology PR (tracked in TODO.md). R parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityHeterogeneityWithPlacebo` (scenario 23, `multi_path_reversible_predict_het_with_placebo_global`, `placebo=2, effects=3, no by_path`) and `::TestDCDHDynRParityByPathHeterogeneityWithPlacebo` (scenario 22, same DGP plus `by_path=3`); pinned at `BETA_RTOL=1e-6` / `SE_RTOL=1e-5` for `beta` / `se` / `t_stat` / `n_obs` and `INFERENCE_RTOL=1e-4` for `p_value` / `conf_int` across 3 paths × (3 forward + 2 placebo) = 15 horizons + 1 global × 5 horizons. Cross-surface invariants regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPredictHetPlacebo` (placebo het column population, survey-gate warn+skip behavior, forward+survey anti-regression, `out_idx<0` eligibility guard, single-path telescope `path_heterogeneity_effects[(only_path,)] == heterogeneity_effects` bit-exactly, summary rendering, direct-call `NotImplementedError` backstop). Closes TODO #422.
1313

1414
### Changed

tests/test_spillover.py

Lines changed: 66 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -866,32 +866,84 @@ def test_conley_fit_runs(self):
866866
assert result.conley_lag_cutoff == 0
867867
assert np.isfinite(result.se)
868868

869-
def test_conley_se_differs_from_hc1(self):
870-
"""Conley SE differs from HC1 baseline (spatial correlation in errors)."""
869+
def test_conley_kwargs_threaded_to_solve_ols(self):
870+
"""Round-8 CI review P3 (test coverage gap): the previous test was a
871+
smoke test that only asserted finite SE + ATT invariance — a silent
872+
fallback to HC1 would have passed. This test plumbing-verifies that
873+
`solve_ols` is actually called with `vcov_type="conley"` AND the
874+
Conley-specific kwargs (`conley_coords`, `conley_cutoff_km`,
875+
`conley_metric`, `conley_time`, `conley_unit`, `conley_lag_cutoff`).
876+
"""
877+
from unittest.mock import patch
878+
871879
df = generate_butts_nonstaggered_dgp(
872880
seed=42, n_treated=20, n_near_control=80, n_far_control=100
873881
)
874-
est_hc1 = SpilloverDiD(
882+
est = SpilloverDiD(
875883
rings=[0.0, 100.0],
876884
conley_coords=("lat", "lon"),
877-
vcov_type="hc1",
885+
vcov_type="conley",
886+
conley_cutoff_km=200.0,
887+
conley_metric="haversine",
888+
conley_lag_cutoff=0,
878889
)
879-
result_hc1 = est_hc1.fit(df, outcome="y", unit="unit", time="time", treatment="D")
880-
est_conley = SpilloverDiD(
890+
# Patch solve_ols at the import site in spillover.py so we can
891+
# observe the kwargs SpilloverDiD passes through at stage 2.
892+
import diff_diff.spillover as spillover_mod
893+
894+
captured: dict = {}
895+
896+
original_solve_ols = spillover_mod.solve_ols
897+
898+
def spy_solve_ols(*args, **kwargs):
899+
# Capture the LAST call's kwargs (stage 2 is the last solve_ols
900+
# invocation in fit()).
901+
captured.clear()
902+
captured.update(kwargs)
903+
return original_solve_ols(*args, **kwargs)
904+
905+
with patch.object(spillover_mod, "solve_ols", side_effect=spy_solve_ols):
906+
result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
907+
908+
# Conley kwargs reached solve_ols (no silent HC1 fallback).
909+
assert (
910+
captured.get("vcov_type") == "conley"
911+
), f"expected solve_ols vcov_type='conley', got {captured.get('vcov_type')!r}"
912+
assert captured.get("conley_cutoff_km") == 200.0
913+
assert captured.get("conley_metric") == "haversine"
914+
assert captured.get("conley_lag_cutoff") == 0
915+
# The fit-time-derived spatial / temporal arrays must be present and
916+
# have the right shape.
917+
coords = captured.get("conley_coords")
918+
assert coords is not None and coords.shape == (result.n_obs, 2)
919+
conley_time = captured.get("conley_time")
920+
conley_unit = captured.get("conley_unit")
921+
assert conley_time is not None and len(conley_time) == result.n_obs
922+
assert conley_unit is not None and len(conley_unit) == result.n_obs
923+
# And the reported SE is finite (the actual Conley computation
924+
# completed end-to-end).
925+
assert np.isfinite(result.se)
926+
927+
def test_conley_att_invariant_vs_hc1(self):
928+
"""Point-estimate invariance: vcov choice does not change ATT
929+
(the residualization + OLS fit are independent of variance).
930+
"""
931+
df = generate_butts_nonstaggered_dgp(
932+
seed=42, n_treated=20, n_near_control=80, n_far_control=100
933+
)
934+
result_hc1 = SpilloverDiD(
935+
rings=[0.0, 100.0],
936+
conley_coords=("lat", "lon"),
937+
vcov_type="hc1",
938+
).fit(df, outcome="y", unit="unit", time="time", treatment="D")
939+
result_conley = SpilloverDiD(
881940
rings=[0.0, 100.0],
882941
conley_coords=("lat", "lon"),
883942
vcov_type="conley",
884943
conley_cutoff_km=200.0,
885944
conley_lag_cutoff=0,
886-
)
887-
result_conley = est_conley.fit(df, outcome="y", unit="unit", time="time", treatment="D")
888-
# ATT point estimate unchanged across vcov; SE may differ.
945+
).fit(df, outcome="y", unit="unit", time="time", treatment="D")
889946
assert abs(result_hc1.att - result_conley.att) < 1e-10
890-
# Conley SE may be larger or smaller than HC1 depending on spatial
891-
# error correlation; just assert it's not identical.
892-
# (Synthetic DGP has independent errors so they may be very close;
893-
# use a loose tolerance — primarily a wiring test.)
894-
assert np.isfinite(result_conley.se)
895947

896948

897949
# =============================================================================

0 commit comments

Comments
 (0)