Skip to content

Commit 06ada33

Browse files
igerberclaude
andcommitted
Add tutorial notebook for Two-Stage DiD (Gardner 2022)
New tutorial 12 covering TwoStageDiD estimator: basic usage, event study, per-observation treatment effects, three-estimator comparison (TwoStageDiD vs ImputationDiD vs CallawaySantAnna), group aggregation, anticipation, and GMM vs conservative variance. Also adds tutorials 11 and 12 to CLAUDE.md listing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 8aaed67 commit 06ada33

2 files changed

Lines changed: 285 additions & 0 deletions

File tree

CLAUDE.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -360,6 +360,8 @@ See `docs/performance-plan.md` for full optimization details and `docs/benchmark
360360
- `08_triple_diff.ipynb` - Triple Difference (DDD) estimation with proper covariate handling
361361
- `09_real_world_examples.ipynb` - Real-world data examples (Card-Krueger, Castle Doctrine, Divorce Laws)
362362
- `10_trop.ipynb` - Triply Robust Panel (TROP) estimation with factor model adjustment
363+
- `11_imputation_did.ipynb` - Imputation DiD (Borusyak et al. 2024), pre-trend test, efficiency comparison
364+
- `12_two_stage_did.ipynb` - Two-Stage DiD (Gardner 2022), GMM sandwich variance, per-observation effects
363365

364366
### Benchmarks
365367

Lines changed: 283 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,283 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Two-Stage DiD (Gardner 2022)\n",
8+
"\n",
9+
"This tutorial demonstrates the `TwoStageDiD` estimator, which implements the two-stage difference-in-differences method from Gardner (2022), \"Two-stage differences in differences\", with inference from Butts & Gardner (2022), \"did2s: Two-Stage Difference-in-Differences\".\n",
10+
"\n",
11+
"**When to use TwoStageDiD:**\n",
12+
"- Staggered adoption settings where you want **GMM sandwich variance** that accounts for first-stage estimation uncertainty\n",
13+
"- When you want **per-observation treatment effects** (`treatment_effects` DataFrame) for granular analysis\n",
14+
"- As a **robustness check** alongside ImputationDiD: identical point estimates with different inference confirm results are not an artifact of variance estimator choice"
15+
]
16+
},
17+
{
18+
"cell_type": "code",
19+
"execution_count": null,
20+
"metadata": {},
21+
"outputs": [],
22+
"source": [
23+
"import numpy as np\n",
24+
"import warnings\n",
25+
"warnings.filterwarnings('ignore')\n",
26+
"\n",
27+
"from diff_diff import (\n",
28+
" TwoStageDiD, ImputationDiD, CallawaySantAnna,\n",
29+
" generate_staggered_data, plot_event_study\n",
30+
")"
31+
]
32+
},
33+
{
34+
"cell_type": "markdown",
35+
"metadata": {},
36+
"source": [
37+
"## Basic Usage\n",
38+
"\n",
39+
"The two-stage estimator follows a simple algorithm:\n",
40+
"1. Estimate unit and time fixed effects using only **untreated observations** (never-treated + not-yet-treated periods)\n",
41+
"2. Residualize **all** outcomes using those estimated FEs\n",
42+
"3. Regress residualized outcomes on treatment indicators to obtain the ATT\n",
43+
"\n",
44+
"This avoids TWFE bias because the fixed effect model is estimated only on clean (untreated) data, preventing treated outcomes from contaminating the counterfactual."
45+
]
46+
},
47+
{
48+
"cell_type": "code",
49+
"execution_count": null,
50+
"metadata": {},
51+
"outputs": [],
52+
"source": [
53+
"# Generate staggered adoption data with known treatment effect\n",
54+
"data = generate_staggered_data(n_units=300, n_periods=10, treatment_effect=2.0, seed=42)\n",
55+
"\n",
56+
"# Fit the two-stage estimator\n",
57+
"est = TwoStageDiD()\n",
58+
"results = est.fit(data, outcome='outcome', unit='unit', time='period', first_treat='first_treat')\n",
59+
"results.print_summary()"
60+
]
61+
},
62+
{
63+
"cell_type": "markdown",
64+
"metadata": {},
65+
"source": [
66+
"## Event Study\n",
67+
"\n",
68+
"Event study aggregation estimates treatment effects at each relative time horizon, enabling visualization of dynamic effects and informal pre-trend assessment."
69+
]
70+
},
71+
{
72+
"cell_type": "code",
73+
"execution_count": null,
74+
"metadata": {},
75+
"outputs": [],
76+
"source": [
77+
"# Fit with event study aggregation\n",
78+
"est = TwoStageDiD()\n",
79+
"results_es = est.fit(data, outcome='outcome', unit='unit', time='period',\n",
80+
" first_treat='first_treat', aggregate='event_study')\n",
81+
"\n",
82+
"# Plot event study\n",
83+
"plot_event_study(results_es, title='Two-Stage DiD Event Study')"
84+
]
85+
},
86+
{
87+
"cell_type": "code",
88+
"execution_count": null,
89+
"metadata": {},
90+
"outputs": [],
91+
"source": [
92+
"# View event study effects as a table\n",
93+
"results_es.to_dataframe(level='event_study')"
94+
]
95+
},
96+
{
97+
"cell_type": "markdown",
98+
"metadata": {},
99+
"source": [
100+
"## Per-Observation Treatment Effects\n",
101+
"\n",
102+
"A feature unique to `TwoStageDiD` is the `treatment_effects` DataFrame, which contains one row per treated observation with:\n",
103+
"- `tau_hat`: the residualized outcome (actual outcome minus estimated counterfactual)\n",
104+
"- The unit and time columns (using the original column names from the input data, e.g., `unit` and `period`)\n",
105+
"- `rel_time`: relative time since treatment\n",
106+
"- `weight`: aggregation weight (1/n_treated)\n",
107+
"\n",
108+
"This enables granular analysis: examining which units or periods drive the aggregate effect, detecting outliers, or constructing custom aggregation schemes."
109+
]
110+
},
111+
{
112+
"cell_type": "code",
113+
"execution_count": null,
114+
"metadata": {},
115+
"outputs": [],
116+
"source": [
117+
"# Per-observation treatment effects (available from the basic fit)\n",
118+
"te = results.treatment_effects\n",
119+
"print(f\"Shape: {te.shape}\")\n",
120+
"print(f\"Columns: {list(te.columns)}\")\n",
121+
"print()\n",
122+
"te.head(10)"
123+
]
124+
},
125+
{
126+
"cell_type": "markdown",
127+
"metadata": {},
128+
"source": [
129+
"## Comparison with Other Estimators\n",
130+
"\n",
131+
"TwoStageDiD and ImputationDiD produce **identical point estimates** because both estimate fixed effects on untreated observations and use them to residualize outcomes. The key difference is the variance estimator: TwoStageDiD uses the GMM sandwich from Butts & Gardner (2022), while ImputationDiD uses the conservative variance from Borusyak et al. (2024, Theorem 3).\n",
132+
"\n",
133+
"CallawaySantAnna uses a fundamentally different estimation approach — computing group-time ATT(g,t) effects via outcome regression, IPW, or doubly robust methods, then aggregating — so point estimates may differ, especially under heterogeneous effects. Its standard errors come from an analytical multiplier bootstrap on the influence function.\n",
134+
"\n",
135+
"*Note: Tutorial 11 compared ImputationDiD against CallawaySantAnna and SunAbraham. Here we focus on the TwoStageDiD vs ImputationDiD point-estimate identity, with CallawaySantAnna as a widely used reference point. For SunAbraham comparisons, see Tutorial 11.*"
136+
]
137+
},
138+
{
139+
"cell_type": "code",
140+
"execution_count": null,
141+
"metadata": {},
142+
"outputs": [],
143+
"source": [
144+
"# Fit all three estimators on the same data\n",
145+
"ts = TwoStageDiD().fit(data, outcome='outcome', unit='unit',\n",
146+
" time='period', first_treat='first_treat')\n",
147+
"imp = ImputationDiD().fit(data, outcome='outcome', unit='unit',\n",
148+
" time='period', first_treat='first_treat')\n",
149+
"cs = CallawaySantAnna().fit(data, outcome='outcome', unit='unit',\n",
150+
" time='period', first_treat='first_treat')\n",
151+
"\n",
152+
"print(\"Estimator Comparison (True effect = 2.0)\")\n",
153+
"print(\"=\" * 55)\n",
154+
"print(f\"{'Estimator':<25} {'ATT':>8} {'SE':>8} {'CI Width':>10}\")\n",
155+
"print(\"-\" * 55)\n",
156+
"\n",
157+
"for name, r in [(\"TwoStageDiD\", ts), (\"ImputationDiD\", imp), (\"CallawaySantAnna\", cs)]:\n",
158+
" ci_width = r.overall_conf_int[1] - r.overall_conf_int[0]\n",
159+
" print(f\"{name:<25} {r.overall_att:>8.3f} {r.overall_se:>8.3f} {ci_width:>10.3f}\")"
160+
]
161+
},
162+
{
163+
"cell_type": "markdown",
164+
"metadata": {},
165+
"source": [
166+
"## Group Aggregation\n",
167+
"\n",
168+
"Group aggregation estimates average treatment effects by treatment cohort (groups defined by first treatment period)."
169+
]
170+
},
171+
{
172+
"cell_type": "code",
173+
"execution_count": null,
174+
"metadata": {},
175+
"outputs": [],
176+
"source": [
177+
"# Fit with group aggregation\n",
178+
"results_grp = TwoStageDiD().fit(data, outcome='outcome', unit='unit',\n",
179+
" time='period', first_treat='first_treat',\n",
180+
" aggregate='group')\n",
181+
"results_grp.to_dataframe(level='group')"
182+
]
183+
},
184+
{
185+
"cell_type": "markdown",
186+
"metadata": {},
187+
"source": [
188+
"## Advanced Features\n",
189+
"\n",
190+
"### Anticipation\n",
191+
"\n",
192+
"If treatment effects begin before the official treatment date (e.g., firms change behavior in anticipation of a policy), use the `anticipation` parameter to shift the treatment onset back."
193+
]
194+
},
195+
{
196+
"cell_type": "code",
197+
"execution_count": null,
198+
"metadata": {},
199+
"outputs": [],
200+
"source": [
201+
"# Compare ATT with and without anticipation\n",
202+
"est_antic = TwoStageDiD(anticipation=1)\n",
203+
"results_antic = est_antic.fit(data, outcome='outcome', unit='unit',\n",
204+
" time='period', first_treat='first_treat')\n",
205+
"print(f\"ATT (no anticipation): {results.overall_att:.3f}\")\n",
206+
"print(f\"ATT (1-period anticipation): {results_antic.overall_att:.3f}\")"
207+
]
208+
},
209+
{
210+
"cell_type": "markdown",
211+
"metadata": {},
212+
"source": [
213+
"### GMM Sandwich vs Conservative Variance\n",
214+
"\n",
215+
"The key methodological distinction between TwoStageDiD and ImputationDiD is the variance estimator:\n",
216+
"\n",
217+
"- **ImputationDiD's conservative variance** (Theorem 3) is valid under heterogeneous treatment effects but may produce wider confidence intervals than necessary\n",
218+
"- **TwoStageDiD's GMM sandwich** accounts for first-stage estimation uncertainty via an influence function correction term\n",
219+
"- In practice they usually agree closely; large divergence signals potential specification concerns\n",
220+
"- Bootstrap inference is also available via `n_bootstrap=199`"
221+
]
222+
},
223+
{
224+
"cell_type": "code",
225+
"execution_count": null,
226+
"metadata": {},
227+
"outputs": [],
228+
"source": [
229+
"# Horizon-by-horizon SE comparison\n",
230+
"ts_es = TwoStageDiD().fit(data, outcome='outcome', unit='unit',\n",
231+
" time='period', first_treat='first_treat',\n",
232+
" aggregate='event_study')\n",
233+
"imp_es = ImputationDiD().fit(data, outcome='outcome', unit='unit',\n",
234+
" time='period', first_treat='first_treat',\n",
235+
" aggregate='event_study')\n",
236+
"\n",
237+
"print(\"Horizon-by-Horizon Comparison: GMM Sandwich vs Conservative Variance\")\n",
238+
"print(\"=\" * 70)\n",
239+
"print(f\"{'Horizon':>8} {'Effect':>10} {'GMM SE':>10} {'Cons. SE':>10} {'Ratio':>8}\")\n",
240+
"print(\"-\" * 70)\n",
241+
"\n",
242+
"for h in sorted(ts_es.event_study_effects.keys()):\n",
243+
" ts_eff = ts_es.event_study_effects[h]\n",
244+
" imp_eff = imp_es.event_study_effects[h]\n",
245+
" if ts_eff.get('n_obs', 0) == 0:\n",
246+
" print(f\"{h:>8} {'[ref]':>10} {'---':>10} {'---':>10} {'---':>8}\")\n",
247+
" continue\n",
248+
" effect = ts_eff['effect']\n",
249+
" gmm_se = ts_eff['se']\n",
250+
" cons_se = imp_eff['se']\n",
251+
" ratio = gmm_se / cons_se if cons_se > 0 else np.nan\n",
252+
" print(f\"{h:>8} {effect:>10.4f} {gmm_se:>10.4f} {cons_se:>10.4f} {ratio:>8.3f}\")"
253+
]
254+
},
255+
{
256+
"cell_type": "markdown",
257+
"metadata": {},
258+
"source": [
259+
"## Summary\n",
260+
"\n",
261+
"| Feature | TwoStageDiD | ImputationDiD | CallawaySantAnna |\n",
262+
"|---------|-------------|---------------|------------------|\n",
263+
"| **Approach** | Residualize via FE, regress on treatment | Impute Y(0) via FE model | Group-time ATT(g,t) |\n",
264+
"| **Point estimates** | Identical to ImputationDiD | Identical to TwoStageDiD | Different weighting |\n",
265+
"| **Variance** | GMM sandwich (influence function) | Conservative (Theorem 3) | Analytical (influence function) |\n",
266+
"| **Per-obs effects** | Yes (`treatment_effects`) | No | No |\n",
267+
"| **Pre-trend test** | Via event study pre-periods | Yes (built-in F-test) | Via event study pre-periods |\n",
268+
"| **Best for** | Robustness check, granular effects | Maximum efficiency under homogeneity | Heterogeneous effects |\n",
269+
"\n",
270+
"**References:**\n",
271+
"- Gardner, J. (2022). Two-stage differences in differences. *arXiv:2207.05943*.\n",
272+
"- Butts, K. & Gardner, J. (2022). did2s: Two-Stage Difference-in-Differences. *R Journal*, 14(1), 162-173."
273+
]
274+
}
275+
],
276+
"metadata": {
277+
"language_info": {
278+
"name": "python"
279+
}
280+
},
281+
"nbformat": 4,
282+
"nbformat_minor": 4
283+
}

0 commit comments

Comments
 (0)