Skip to content

Commit 890f414

Browse files
authored
Merge pull request #159 from igerber/two-stage-notebook
Add tutorial notebook for Two-Stage DiD (Gardner 2022)
2 parents 8aaed67 + 113354e commit 890f414

2 files changed

Lines changed: 252 additions & 0 deletions

File tree

CLAUDE.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -360,6 +360,8 @@ See `docs/performance-plan.md` for full optimization details and `docs/benchmark
360360
- `08_triple_diff.ipynb` - Triple Difference (DDD) estimation with proper covariate handling
361361
- `09_real_world_examples.ipynb` - Real-world data examples (Card-Krueger, Castle Doctrine, Divorce Laws)
362362
- `10_trop.ipynb` - Triply Robust Panel (TROP) estimation with factor model adjustment
363+
- `11_imputation_did.ipynb` - Imputation DiD (Borusyak et al. 2024), pre-trend test, efficiency comparison
364+
- `12_two_stage_did.ipynb` - Two-Stage DiD (Gardner 2022), GMM sandwich variance, per-observation effects
363365

364366
### Benchmarks
365367

Lines changed: 250 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,250 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Two-Stage DiD (Gardner 2022)\n",
8+
"\n",
9+
"This tutorial demonstrates the `TwoStageDiD` estimator, which implements the two-stage difference-in-differences method from Gardner (2022), \"Two-stage differences in differences\", with inference from Butts & Gardner (2022), \"did2s: Two-Stage Difference-in-Differences\".\n",
10+
"\n",
11+
"**When to use TwoStageDiD:**\n",
12+
"- Staggered adoption settings where you want **GMM sandwich variance** that accounts for first-stage estimation uncertainty\n",
13+
"- When you want **per-observation treatment effects** (`treatment_effects` DataFrame) for granular analysis\n",
14+
"- As a **robustness check** alongside ImputationDiD: identical point estimates with different inference confirm results are not an artifact of variance estimator choice"
15+
]
16+
},
17+
{
18+
"cell_type": "code",
19+
"execution_count": null,
20+
"metadata": {},
21+
"outputs": [],
22+
"source": [
23+
"import numpy as np\n",
24+
"import warnings\n",
25+
"warnings.filterwarnings('ignore')\n",
26+
"\n",
27+
"from diff_diff import (\n",
28+
" TwoStageDiD, ImputationDiD, CallawaySantAnna,\n",
29+
" generate_staggered_data, plot_event_study\n",
30+
")"
31+
]
32+
},
33+
{
34+
"cell_type": "markdown",
35+
"metadata": {},
36+
"source": [
37+
"## Basic Usage\n",
38+
"\n",
39+
"The two-stage estimator follows a simple algorithm:\n",
40+
"1. Estimate unit and time fixed effects using only **untreated observations** (never-treated + not-yet-treated periods)\n",
41+
"2. Residualize **all** outcomes using those estimated FEs\n",
42+
"3. Regress residualized outcomes on treatment indicators to obtain the ATT\n",
43+
"\n",
44+
"This avoids TWFE bias because the fixed effect model is estimated only on clean (untreated) data, preventing treated outcomes from contaminating the counterfactual."
45+
]
46+
},
47+
{
48+
"cell_type": "code",
49+
"execution_count": null,
50+
"metadata": {},
51+
"outputs": [],
52+
"source": [
53+
"# Generate staggered adoption data with known treatment effect\n",
54+
"data = generate_staggered_data(n_units=300, n_periods=10, treatment_effect=2.0, seed=42)\n",
55+
"\n",
56+
"# Fit the two-stage estimator\n",
57+
"est = TwoStageDiD()\n",
58+
"results = est.fit(data, outcome='outcome', unit='unit', time='period', first_treat='first_treat')\n",
59+
"results.print_summary()"
60+
]
61+
},
62+
{
63+
"cell_type": "markdown",
64+
"metadata": {},
65+
"source": [
66+
"## Event Study\n",
67+
"\n",
68+
"Event study aggregation estimates treatment effects at each relative time horizon, enabling visualization of dynamic effects and informal pre-trend assessment."
69+
]
70+
},
71+
{
72+
"cell_type": "code",
73+
"execution_count": null,
74+
"metadata": {},
75+
"outputs": [],
76+
"source": [
77+
"# Fit with event study aggregation\n",
78+
"est = TwoStageDiD()\n",
79+
"results_es = est.fit(data, outcome='outcome', unit='unit', time='period',\n",
80+
" first_treat='first_treat', aggregate='event_study')\n",
81+
"\n",
82+
"# Plot event study\n",
83+
"plot_event_study(results_es, title='Two-Stage DiD Event Study')"
84+
]
85+
},
86+
{
87+
"cell_type": "code",
88+
"execution_count": null,
89+
"metadata": {},
90+
"outputs": [],
91+
"source": [
92+
"# View event study effects as a table\n",
93+
"results_es.to_dataframe(level='event_study')"
94+
]
95+
},
96+
{
97+
"cell_type": "markdown",
98+
"metadata": {},
99+
"source": "## Per-Observation Treatment Effects\n\nBoth `TwoStageDiD` and `ImputationDiD` provide a `treatment_effects` DataFrame containing one row per treated observation with:\n- `tau_hat`: the residualized outcome (actual outcome minus estimated counterfactual)\n- The unit and time columns (using the original column names from the input data, e.g., `unit` and `period`)\n- `rel_time`: relative time since treatment\n- `weight`: aggregation weight — `1/n_valid` for observations with finite `tau_hat`, `0` for NaN rows (e.g., rank-deficient cases)\n\nThis enables granular analysis: examining which units or periods drive the aggregate effect, detecting outliers, or constructing custom aggregation schemes."
100+
},
101+
{
102+
"cell_type": "code",
103+
"execution_count": null,
104+
"metadata": {},
105+
"outputs": [],
106+
"source": [
107+
"# Per-observation treatment effects (available from the basic fit)\n",
108+
"te = results.treatment_effects\n",
109+
"print(f\"Shape: {te.shape}\")\n",
110+
"print(f\"Columns: {list(te.columns)}\")\n",
111+
"print()\n",
112+
"te.head(10)"
113+
]
114+
},
115+
{
116+
"cell_type": "markdown",
117+
"metadata": {},
118+
"source": "## Comparison with Other Estimators\n\nTwoStageDiD and ImputationDiD produce **identical point estimates** because both estimate fixed effects on untreated observations and use them to residualize outcomes. The key difference is the variance estimator: TwoStageDiD uses the GMM sandwich from Butts & Gardner (2022), while ImputationDiD uses the conservative variance from Borusyak et al. (2024, Theorem 3).\n\nCallawaySantAnna uses a fundamentally different estimation approach — computing group-time ATT(g,t) effects via outcome regression, IPW, or doubly robust methods, then aggregating — so point estimates may differ, especially under heterogeneous effects. It uses analytical influence-function standard errors by default, with optional multiplier bootstrap when `n_bootstrap > 0`.\n\n*Note: Tutorial 11 compared ImputationDiD against CallawaySantAnna and SunAbraham. Here we focus on the TwoStageDiD vs ImputationDiD point-estimate identity, with CallawaySantAnna as a widely used reference point. For SunAbraham comparisons, see Tutorial 11.*"
119+
},
120+
{
121+
"cell_type": "code",
122+
"execution_count": null,
123+
"metadata": {},
124+
"outputs": [],
125+
"source": [
126+
"# Fit all three estimators on the same data\n",
127+
"ts = TwoStageDiD().fit(data, outcome='outcome', unit='unit',\n",
128+
" time='period', first_treat='first_treat')\n",
129+
"imp = ImputationDiD().fit(data, outcome='outcome', unit='unit',\n",
130+
" time='period', first_treat='first_treat')\n",
131+
"cs = CallawaySantAnna().fit(data, outcome='outcome', unit='unit',\n",
132+
" time='period', first_treat='first_treat')\n",
133+
"\n",
134+
"print(\"Estimator Comparison (True effect = 2.0)\")\n",
135+
"print(\"=\" * 55)\n",
136+
"print(f\"{'Estimator':<25} {'ATT':>8} {'SE':>8} {'CI Width':>10}\")\n",
137+
"print(\"-\" * 55)\n",
138+
"\n",
139+
"for name, r in [(\"TwoStageDiD\", ts), (\"ImputationDiD\", imp), (\"CallawaySantAnna\", cs)]:\n",
140+
" ci_width = r.overall_conf_int[1] - r.overall_conf_int[0]\n",
141+
" print(f\"{name:<25} {r.overall_att:>8.3f} {r.overall_se:>8.3f} {ci_width:>10.3f}\")"
142+
]
143+
},
144+
{
145+
"cell_type": "markdown",
146+
"metadata": {},
147+
"source": [
148+
"## Group Aggregation\n",
149+
"\n",
150+
"Group aggregation estimates average treatment effects by treatment cohort (groups defined by first treatment period)."
151+
]
152+
},
153+
{
154+
"cell_type": "code",
155+
"execution_count": null,
156+
"metadata": {},
157+
"outputs": [],
158+
"source": [
159+
"# Fit with group aggregation\n",
160+
"results_grp = TwoStageDiD().fit(data, outcome='outcome', unit='unit',\n",
161+
" time='period', first_treat='first_treat',\n",
162+
" aggregate='group')\n",
163+
"results_grp.to_dataframe(level='group')"
164+
]
165+
},
166+
{
167+
"cell_type": "markdown",
168+
"metadata": {},
169+
"source": [
170+
"## Advanced Features\n",
171+
"\n",
172+
"### Anticipation\n",
173+
"\n",
174+
"If treatment effects begin before the official treatment date (e.g., firms change behavior in anticipation of a policy), use the `anticipation` parameter to shift the treatment onset back."
175+
]
176+
},
177+
{
178+
"cell_type": "code",
179+
"execution_count": null,
180+
"metadata": {},
181+
"outputs": [],
182+
"source": [
183+
"# Compare ATT with and without anticipation\n",
184+
"est_antic = TwoStageDiD(anticipation=1)\n",
185+
"results_antic = est_antic.fit(data, outcome='outcome', unit='unit',\n",
186+
" time='period', first_treat='first_treat')\n",
187+
"print(f\"ATT (no anticipation): {results.overall_att:.3f}\")\n",
188+
"print(f\"ATT (1-period anticipation): {results_antic.overall_att:.3f}\")"
189+
]
190+
},
191+
{
192+
"cell_type": "markdown",
193+
"metadata": {},
194+
"source": [
195+
"### GMM Sandwich vs Conservative Variance\n",
196+
"\n",
197+
"The key methodological distinction between TwoStageDiD and ImputationDiD is the variance estimator:\n",
198+
"\n",
199+
"- **ImputationDiD's conservative variance** (Theorem 3) is valid under heterogeneous treatment effects but may produce wider confidence intervals than necessary\n",
200+
"- **TwoStageDiD's GMM sandwich** accounts for first-stage estimation uncertainty via an influence function correction term\n",
201+
"- In practice they usually agree closely; large divergence signals potential specification concerns\n",
202+
"- Bootstrap inference is also available via `n_bootstrap=199`"
203+
]
204+
},
205+
{
206+
"cell_type": "code",
207+
"execution_count": null,
208+
"metadata": {},
209+
"outputs": [],
210+
"source": [
211+
"# Horizon-by-horizon SE comparison\n",
212+
"ts_es = TwoStageDiD().fit(data, outcome='outcome', unit='unit',\n",
213+
" time='period', first_treat='first_treat',\n",
214+
" aggregate='event_study')\n",
215+
"imp_es = ImputationDiD().fit(data, outcome='outcome', unit='unit',\n",
216+
" time='period', first_treat='first_treat',\n",
217+
" aggregate='event_study')\n",
218+
"\n",
219+
"print(\"Horizon-by-Horizon Comparison: GMM Sandwich vs Conservative Variance\")\n",
220+
"print(\"=\" * 70)\n",
221+
"print(f\"{'Horizon':>8} {'Effect':>10} {'GMM SE':>10} {'Cons. SE':>10} {'Ratio':>8}\")\n",
222+
"print(\"-\" * 70)\n",
223+
"\n",
224+
"for h in sorted(ts_es.event_study_effects.keys()):\n",
225+
" ts_eff = ts_es.event_study_effects[h]\n",
226+
" imp_eff = imp_es.event_study_effects[h]\n",
227+
" if ts_eff.get('n_obs', 0) == 0:\n",
228+
" print(f\"{h:>8} {'[ref]':>10} {'---':>10} {'---':>10} {'---':>8}\")\n",
229+
" continue\n",
230+
" effect = ts_eff['effect']\n",
231+
" gmm_se = ts_eff['se']\n",
232+
" cons_se = imp_eff['se']\n",
233+
" ratio = gmm_se / cons_se if cons_se > 0 else np.nan\n",
234+
" print(f\"{h:>8} {effect:>10.4f} {gmm_se:>10.4f} {cons_se:>10.4f} {ratio:>8.3f}\")"
235+
]
236+
},
237+
{
238+
"cell_type": "markdown",
239+
"metadata": {},
240+
"source": "## Summary\n\n| Feature | TwoStageDiD | ImputationDiD | CallawaySantAnna |\n|---------|-------------|---------------|------------------|\n| **Approach** | Residualize via FE, regress on treatment | Impute Y(0) via FE model | Group-time ATT(g,t) |\n| **Point estimates** | Identical to ImputationDiD | Identical to TwoStageDiD | Different weighting |\n| **Variance** | GMM sandwich (influence function) | Conservative (Theorem 3) | Analytical influence function (optional bootstrap) |\n| **Per-obs effects** | Yes (`treatment_effects`) | Yes (`treatment_effects`) | No |\n| **Pre-trend test** | Via event study pre-periods | Yes (built-in F-test) | Via event study pre-periods |\n| **Best for** | Robustness check, granular effects | Maximum efficiency under homogeneity | Heterogeneous effects |\n\n**References:**\n- Gardner, J. (2022). Two-stage differences in differences. *arXiv:2207.05943*.\n- Butts, K. & Gardner, J. (2022). did2s: Two-Stage Difference-in-Differences. *R Journal*, 14(1), 162-173."
241+
}
242+
],
243+
"metadata": {
244+
"language_info": {
245+
"name": "python"
246+
}
247+
},
248+
"nbformat": 4,
249+
"nbformat_minor": 4
250+
}

0 commit comments

Comments
 (0)