Skip to content

Commit fc48d69

Browse files
igerberclaude
andcommitted
SpilloverDiD: ring-indicator spillover-aware DiD (Butts 2021)
New standalone estimator at `diff_diff/spillover.py` implementing two-stage Gardner (2022) DiD with ring-indicator covariates that identify, alongside the direct effect on treated (`tau_total`), per-ring spillover effects on near-control units (`delta_j`). Reference: Butts, K. (2023, originally 2021) "Difference-in-Differences with Spatial Spillovers" arXiv:2105.03737v3; Gardner, J. (2022) "Two-stage differences in differences" arXiv:2207.05943. Handles panel non-staggered (paper Eqs 5/6/8) and Section 5 staggered timing in one estimator — non-staggered is the special case where all treated units share an onset time. Methodology ----------- - Stage-2 regressor: time-varying `(1 - D_it) * Ring_{it,j}` (paper page 12's `S_it = S_i * 1{t >= t_treat}` notation; Section 5 Table 2's `S^k_{it}` / `Ring^k_{it,j}`). Reading the literal unit-static `(1 - D_it) * S_i` from Equation 5 is rank-deficient under TWFE; only the time-varying form supports the paper's identification (Prop 2.3). - Stage-1 subsample: Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed) — not TwoStageDiD's `{D_it = 0}` — prevents spillover-contaminated near-controls from biasing the time FE. - Gardner identity (non-staggered): empirically bit-identical to direct single-stage TWFE ring regression on the full sample at atol=1e-10 (20-seed deterministic regression test). The reported non-staggered `tau_total` IS the Butts Eqs. 4-6 estimator. API --- SpilloverDiD( rings=[0, 50, 100, 200], conley_coords=("lat", "lon"), vcov_type="conley", # or "hc1" / cluster conley_cutoff_km=200.0, conley_lag_cutoff=0, ).fit(data, outcome="y", unit="unit", time="t", treatment="D") Binary `D` auto-converts to a Gardner `first_treat` column; users with canonical staggered data can pass `first_treat=` directly. Result is `SpilloverDiDResults(DiDResults)` with `.att` = `tau_total`, `.spillover_effects` (per-ring DataFrame with coef/se/t_stat/p_value/CI), `.ring_breakpoints`, `.d_bar`, `.n_units_ever_in_ring`, `.n_far_away_obs`, `.is_staggered`. `.coefficients` exposes all `(1+K)` stage-2 entries keyed to vcov columns plus an `"ATT"` alias. Identification-check policy --------------------------- - Period level (structural): every period must have at least one Omega_0 row, else time FE for that period is unidentified — hard ValueError. - Unit level (recoverable): units lacking Omega_0 rows (e.g. baseline- treated units with `D_it = 1` at all observed `t`) are warned-and- dropped; their unit FE is NaN, residualization writes NaN on their rows, and the downstream finite-mask path excludes them from stage 2. Mirrors TwoStageDiD's always-treated convention. Variance (Wave B MVP) --------------------- Stage-2 OLS variance via `solve_ols` — HC1, Conley spatial-HAC, and cluster-robust paths all flow through. The Gardner GMM first-stage uncertainty correction is NOT applied at stage 2 in this PR (documented limitation; planned follow-up extends `two_stage.py::_compute_gmm_ variance` to accept a Conley kernel matrix in place of HC1's identity at the influence-function outer-product step). Reported SEs are conservative relative to the full GMM + Conley sandwich. Deferred features (planned follow-ups) -------------------------------------- - `event_study=True` per-event-time × ring coefficients (Butts Table 2) - `survey_design=` integration - `ring_method="count"` (count-of-treated-in-ring) - Data-driven `d_bar` selection (Butts 2021b / 2023 JUE Insight) - Gardner GMM first-stage correction at stage 2 - Sparse staggered ring-distance path - TwoStageDiD / SpilloverDiD shared-internals refactor Tests ----- 139 tests at `tests/test_spillover.py` across ring-construction primitives, validators, fit integration, raw-data invariant, identification MC (50-seed default + 200-seed `@pytest.mark.slow` variant), Conley wiring, Gardner identity bit-identity (20-seed deterministic regression test against direct single-stage TWFE ring regression), coefficients-vs-vcov column alignment, and Omega_0 warn- and-drop. DGP factories at `tests/_dgp_utils.py::generate_butts_ nonstaggered_dgp` / `generate_butts_staggered_dgp` satisfy Butts Assumptions 1/3/5/7 by construction. Documentation ------------- - `docs/methodology/REGISTRY.md` — new SpilloverDiD section adjacent to ConleySpatialHAC with the methodology spec, edge-case table, and documented deviations. - `docs/api/spillover.rst` — API reference with Wave B MVP limitations. - `diff_diff/guides/llms.txt` + `llms-full.txt` — agent-facing catalog entries. - `README.md` — one-line catalog entry under `## Estimators`. - `docs/references.rst` — Butts (2021/2023) + Gardner (2022) citations. - `docs/doc-deps.yaml` — `diff_diff/spillover.py` → `[REGISTRY.md#spillover, docs/api/spillover.rst]`. - `TODO.md` — deferred-features rows under "Tech Debt from Code Reviews" for the planned follow-ups. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent f2dbce5 commit fc48d69

18 files changed

Lines changed: 5157 additions & 0 deletions

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
### Added
11+
- **`SpilloverDiD` — ring-indicator spillover-aware DiD (Butts 2021).** New standalone estimator at `diff_diff/spillover.py` implementing two-stage Gardner methodology with ring-indicator covariates that identify direct effect on treated (`tau_total`) alongside per-ring spillover effects on near-control units (`delta_j`). Documented synthesis of ingredients (no single published software covers the exact recipe — `did2s` implements Gardner two-stage without rings; the Butts ring estimator has no R/Stata package): Butts (2021) Section 5 / Table 2 identification, Gardner (2022) two-stage residualize-then-fit, and the Conley spatial-HAC vcov shipped in 3.3.3. Handles both panel non-staggered (Equations 5/6/8) and Section 5 staggered timing in one estimator — non-staggered is the special case where all treated units share an onset time. **API:** `SpilloverDiD(rings=[0, 50, 100, 200], conley_coords=("lat","lon"), ...).fit(data, outcome="y", unit="unit", time="t", treatment="D")` (binary D auto-converted to `first_treat`) or `.fit(..., first_treat="first_treat")` (Gardner convention). Result: `SpilloverDiDResults(DiDResults)` with `.att` = `tau_total`, `.spillover_effects` (per-ring `pd.DataFrame` with `coef`/`se`/`t_stat`/`p_value`/`ci_low`/`ci_high`), `.ring_breakpoints`, `.d_bar`, `.n_units_ever_in_ring`, `.n_far_away_obs`, `.is_staggered`. `.coefficients` exposes all `(1+K)` stage-2 entries (`"treatment"` + `"_spillover_<ring_label>"`) plus an `"ATT"` alias keyed to vcov columns. **Methodology spec (committed):** stage-2 regressor is the time-varying `(1 - D_it) * Ring_{it,j}` form (paper page 12's `S_it = S_i * 1{t >= t_treat}` notation; Section 5 Table 2's `S^k_{it}` / `Ring^k_{it,j}`). Reading the literal unit-static `(1 - D_it) * S_i` from Equation 5 is algebraically rank-deficient under TWFE (`(1-D_it) * S_i = S_i - D_it`, with `S_i` absorbed by `mu_i`, leaving `-D_it`); only the time-varying form supports the paper's identification (Proposition 2.3). Stage-1 subsample uses Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed), not TwoStageDiD's `{D_it = 0}` alone — this prevents spillover-contaminated near-controls in pre/post periods from biasing the time FE. **Gardner identity (non-staggered):** a 20-seed deterministic regression test pins `SpilloverDiD.att` against a direct single-stage TWFE ring regression on the full sample (`y ~ mu_i + lambda_t + tau * D_it + sum_j delta_j * (1 - D_it) * Ring_{it,j}`) at `atol=1e-10` — empirically bit-identical, so the reported non-staggered `tau_total` IS the Butts Eqs. 4-6 estimator. **Identification-check policy (period strict, unit warn-and-drop):** every period must have at least one Omega_0 row (hard `ValueError` — dropping a period removes all units' cross-time identification). Units lacking Omega_0 rows (e.g. baseline-treated units with `D_it = 1` at every observed `t`) are warned-and-dropped: their unit FE is NaN, residualization writes NaN on their rows, and the downstream finite-mask path excludes them from stage 2 — mirrors `TwoStageDiD`'s always-treated convention. **Public API restrictions (Wave B MVP):** `covariates=` raises `NotImplementedError` because Gardner-style two-stage requires covariate effects estimated on the untreated-and-unexposed subsample at stage 1 (appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates); non-absorbing / reversible treatment patterns (e.g. `[0, 1, 0]`) raise `ValueError` rather than being silently coerced into "treated from first 1 onward"; non-constant `first_treat` values across rows of the same unit raise `ValueError`; `conley_coords` is required on every fit path (not just `vcov_type="conley"`) because ring construction always uses it. **Far-away control identification:** uses CURRENT-period untreated status (`D_it = 0`) rather than never-treated-only, so all-eventually-treated staggered designs (no never-treated units) can identify the counterfactual via not-yet-treated far-away rows. **Variance (Wave B MVP):** stage-2 OLS variance via `solve_ols` (HC1 / Conley / cluster paths all flow through). The Gardner GMM first-stage uncertainty correction is NOT applied at stage 2 in this PR (documented limitation; planned follow-up extends `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the influence-function outer-product step). **Deferred features (planned follow-ups):** `event_study=True` per-event-time × ring coefficients (Butts Table 2), `survey_design=` integration, `ring_method="count"` (count-of-treated-in-ring), data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight), Gardner GMM first-stage correction at stage 2, sparse staggered ring-distance path. **Tests:** `tests/test_spillover.py` (139 tests across ring-construction primitives, validators, fit integration, raw-data invariant, identification MC at 50 seeds + 200-seed `@pytest.mark.slow` variant, Conley wiring, Gardner identity bit-identity, coefficients-vs-vcov alignment, warn-and-drop). DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` satisfy Butts Assumptions 1/3/5/7 by construction.
12+
1013
## [3.3.3] - 2026-05-15
1114

1215
### Added

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,7 @@ Full guide: `diff_diff.get_llm_guide("practitioner")`.
106106
- [SunAbraham](https://diff-diff.readthedocs.io/en/stable/api/staggered.html) - Sun & Abraham (2021) interaction-weighted estimator for heterogeneity-robust event studies
107107
- [ImputationDiD](https://diff-diff.readthedocs.io/en/stable/api/imputation.html) - Borusyak, Jaravel & Spiess (2024) imputation estimator, most efficient under homogeneous effects
108108
- [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html) - Gardner (2022) two-stage estimator with GMM sandwich variance
109+
- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html) - Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover on near-control units; handles non-staggered and staggered timing
109110
- [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html) - Synthetic DiD combining standard DiD and synthetic control for few treated units
110111
- [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html) - triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility
111112
- [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html) - Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves

TODO.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,13 @@ Deferred items from PR reviews that were not addressed before merge.
118118
| SyntheticDiD: bootstrap cross-language parity anchor against R's default `synthdid::vcov(method="bootstrap")` (refit; rebinds `opts` per draw) or Julia `Synthdid.jl::src/vcov.jl::bootstrap_se` (refit by construction). Same-library validation (placebo-SE tracking, AER §6.3 MC truth) is in place; a cross-language anchor is desirable to bolster the methodology contract. Julia is the cleanest target — minimal wrapping work and refit-native vcov. Tolerance target: 1e-6 on Monte Carlo samples (different BLAS + RNG paths preclude 1e-10). The R-parity fixture from the previous release was deleted because it pinned the now-removed fixed-weight path. | `benchmarks/R/`, `benchmarks/julia/`, `tests/` | follow-up | Low |
119119
| Conley + survey weights / `survey_design`. Score-reweighted meat `s_i = w_i · X_i · ε_i` is mechanical, but PSU clustering interaction with the spatial kernel and replicate-weights variance under spatial correlation are non-trivial (Bertanha-Imbens 2014 covers cluster-sample but not the explicit Conley case). Phase 5 of the spillover-conley initiative; paper review prerequisite. Currently raises `NotImplementedError` at the linalg validator. | `linalg.py::_validate_vcov_args` | Phase 5 (spillover-conley) | Medium |
120120
| `SyntheticDiD(vcov_type="conley")` support. Currently raises `TypeError` at `__init__` because SyntheticDiD uses `variance_method ∈ {bootstrap, jackknife, placebo}` rather than the analytical sandwich that Conley plugs into. Wiring would require either reimplementing an analytical sandwich path for SyntheticDiD or designing a spatial-block bootstrap (new methodology, Politis-Romano 1994 territory). | `synthetic_did.py::SyntheticDiD` | follow-up (spillover-conley) | Low |
121+
| `SpilloverDiD` Gardner GMM first-stage uncertainty correction at stage 2. Wave B MVP uses standard `solve_ols` variance (HC1 / Conley / cluster) without the influence-function adjustment for stage-1 FE estimation. Extending `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the IF outer-product step gives the full Butts (2021) Section 3.1 + Gardner (2022) Section 4 composition. See plan Risks #2 for the IF formula. | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_gmm_variance` | follow-up (Wave B) | Medium |
122+
| `SpilloverDiD(event_study=True)` per-event-time × ring decomposition (Butts Section 5 / Table 2 `S^k_{it}` / `Ring^k_{it,j}`). Currently raises `NotImplementedError`. The implementation adds event-time dummies × ring covariates to the stage-2 design and emits a MultiIndex on `spillover_effects`. | `spillover.py::SpilloverDiD.fit` | follow-up (Wave B) | Medium |
123+
| `SpilloverDiD(survey_design=...)` integration. Currently raises `NotImplementedError`. Requires threading survey weights through the inline stage 1 + stage 2 and lifting `two_stage.py`'s survey path patterns. | `spillover.py::SpilloverDiD.fit` | follow-up (Wave B) | Low |
124+
| `SpilloverDiD(ring_method="count")` extension. Currently only the nearest-treated-ring specification is exposed. Count-of-treated-in-ring (paper Section 3.2 end) is methodologically supported by Butts but re-introduces functional-form dependence; expose with an explicit kwarg gate and documentation warning. | `spillover.py::SpilloverDiD.fit` | follow-up | Low |
125+
| `SpilloverDiD` data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight cross-validation). | `spillover.py::SpilloverDiD` | follow-up | Low |
126+
| `SpilloverDiD` T22 TVA tutorial (`docs/tutorials/22_spillover_did.ipynb`): synthetic TVA-style DGP reproducing Butts (2021) Section 4 Table 1 Panel A bias-correction direction (~40% understatement). Split from the methodology PR per user-confirmed scope split (2026-05-15). | `docs/tutorials/`, `tests/test_t22_*_drift.py` | follow-up (Wave B) | Medium |
127+
| Extend `TwoStageDiD` with Conley vcov as a first-class feature (mirrors Wave A's TWFE/MPD/DiD extension). Currently `TwoStageDiD.__init__` lacks `vcov_type` / `conley_*` kwargs; `SpilloverDiD` works around this by threading Conley directly via `solve_ols` at stage 2. Promoting Conley to TwoStageDiD's API removes the workaround and lets non-spillover users access Conley + Gardner two-stage. | `diff_diff/two_stage.py` | follow-up | Medium |
121128

122129
#### Performance
123130

diff_diff/__init__.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -171,6 +171,10 @@
171171
TwoStageDiDResults,
172172
two_stage_did,
173173
)
174+
from diff_diff.spillover import (
175+
SpilloverDiD,
176+
)
177+
from diff_diff.results import SpilloverDiDResults # re-export
174178
from diff_diff.stacked_did import (
175179
StackedDiD,
176180
StackedDiDResults,
@@ -300,6 +304,7 @@
300304
"SunAbraham",
301305
"ImputationDiD",
302306
"TwoStageDiD",
307+
"SpilloverDiD",
303308
"TripleDifference",
304309
"TROP",
305310
"StackedDiD",
@@ -341,6 +346,7 @@
341346
"TwoStageDiDResults",
342347
"TwoStageBootstrapResults",
343348
"two_stage_did",
349+
"SpilloverDiDResults",
344350
"TripleDifferenceResults",
345351
"triple_difference",
346352
"StaggeredTripleDifference",

diff_diff/guides/llms-full.txt

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -461,6 +461,68 @@ results = est.fit(data, outcome='outcome', unit='unit',
461461
results.print_summary()
462462
```
463463

464+
### SpilloverDiD
465+
466+
Butts (2021) ring-indicator spillover-aware DiD. Augments two-stage Gardner with ring-indicator covariates that identify direct effect on treated (`tau_total`) and per-ring spillover effects on near-control units (`delta_j`). Handles non-staggered and staggered timing in one estimator. Recommends `vcov_type="conley"` with cutoff = `d_bar` (paper Section 3.1).
467+
468+
```python
469+
SpilloverDiD(
470+
rings: list[float], # K+1 sorted breakpoints; K rings
471+
d_bar: float | None = None, # Far-away cutoff (defaults to max(rings))
472+
vcov_type: str = "hc1", # "hc1", "conley", or default cluster
473+
conley_coords: tuple[str, str] | None = None, # (lat_col, lon_col), required
474+
conley_metric: str = "haversine", # or "euclidean" / callable
475+
conley_cutoff_km: float | None = None,
476+
conley_lag_cutoff: int | None = None,
477+
cluster: str | None = None,
478+
alpha: float = 0.05,
479+
anticipation: int = 0,
480+
event_study: bool = False, # Deferred: raises NotImplementedError if True
481+
horizon_max: int | None = None, # Deferred (event-study mode)
482+
rank_deficient_action: str = "warn",
483+
)
484+
```
485+
486+
**fit() parameters:**
487+
488+
```python
489+
sp.fit(
490+
data: pd.DataFrame,
491+
outcome: str,
492+
unit: str,
493+
time: str,
494+
treatment: str | None = None, # binary D_it; auto-converted to first_treat
495+
first_treat: str | None = None, # OR onset time per unit (Gardner)
496+
covariates: list[str] | None = None,
497+
survey_design: object = None, # Deferred: NotImplementedError if non-None
498+
) -> SpilloverDiDResults
499+
```
500+
501+
**Restrictions (Wave B MVP — planned follow-ups):**
502+
503+
- `survey_design=` raises `NotImplementedError` (planned: SurveyDesign integration)
504+
- `event_study=True` raises `NotImplementedError` (planned: per-event-time × ring decomposition per Butts Table 2)
505+
- `horizon_max=` raises `NotImplementedError` (used only with event_study)
506+
- Stage-2 variance is `solve_ols` HC1 / Conley / cluster — Gardner GMM first-stage uncertainty correction NOT applied (planned follow-up; SE is biased downward / too small, CIs too narrow, p-values too small — treat reported significance conservatively until the GMM correction lands)
507+
- Only nearest-treated rings supported; `ring_method="count"` (count of treated neighbors in ring) not yet exposed
508+
509+
**Usage:**
510+
511+
```python
512+
from diff_diff import SpilloverDiD
513+
514+
est = SpilloverDiD(
515+
rings=[0, 50, 100, 200],
516+
conley_coords=("lat", "lon"),
517+
vcov_type="conley",
518+
conley_cutoff_km=200.0,
519+
conley_lag_cutoff=0,
520+
)
521+
results = est.fit(data, outcome="y", unit="unit", time="time", treatment="D")
522+
print(f"tau_total = {results.att:.4f}")
523+
print(results.spillover_effects) # per-ring DataFrame
524+
```
525+
464526
### SyntheticDiD
465527

466528
Synthetic Difference-in-Differences (Arkhangelsky et al. 2021). Combines DiD with synthetic control by re-weighting control units.

diff_diff/guides/llms.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@ Full practitioner guide: call `diff_diff.get_llm_guide("practitioner")`
5858
- [SunAbraham](https://diff-diff.readthedocs.io/en/stable/api/staggered.html): Sun & Abraham (2021) interaction-weighted estimator for heterogeneity-robust event studies
5959
- [ImputationDiD](https://diff-diff.readthedocs.io/en/stable/api/imputation.html): Borusyak, Jaravel & Spiess (2024) imputation estimator — most efficient under homogeneous effects
6060
- [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html): Gardner (2022) two-stage estimator with GMM sandwich variance
61+
- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html): Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover-on-control; reuses `conley_coords` for ring construction; handles non-staggered and staggered timing
6162
- [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html): Synthetic DiD combining standard DiD and synthetic control methods for few treated units
6263
- [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html): Triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility
6364
- [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html): Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves

0 commit comments

Comments
 (0)