Skip to content

Commit bc23fa0

Browse files
authored
Merge pull request #199 from igerber/efficient-did-notebook
Add EfficientDiD documentation and tutorial notebook
2 parents 7e0f873 + 85b76f4 commit bc23fa0

6 files changed

Lines changed: 833 additions & 2 deletions

File tree

README.md

Lines changed: 52 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
7070
- **Wild cluster bootstrap**: Valid inference with few clusters (<50) using Rademacher, Webb, or Mammen weights
7171
- **Panel data support**: Two-way fixed effects estimator for panel designs
7272
- **Multi-period analysis**: Event-study style DiD with period-specific treatment effects
73-
- **Staggered adoption**: Callaway-Sant'Anna (2021), Sun-Abraham (2021), Borusyak-Jaravel-Spiess (2024) imputation, Two-Stage DiD (Gardner 2022), and Stacked DiD (Wing, Freedman & Hollingsworth 2024) estimators for heterogeneous treatment timing
73+
- **Staggered adoption**: Callaway-Sant'Anna (2021), Sun-Abraham (2021), Borusyak-Jaravel-Spiess (2024) imputation, Two-Stage DiD (Gardner 2022), Stacked DiD (Wing, Freedman & Hollingsworth 2024), and Efficient DiD (Chen, Sant'Anna & Xie 2025) estimators for heterogeneous treatment timing
7474
- **Triple Difference (DDD)**: Ortiz-Villavicencio & Sant'Anna (2025) estimators with proper covariate handling
7575
- **Synthetic DiD**: Combined DiD with synthetic control for improved robustness
7676
- **Triply Robust Panel (TROP)**: Factor-adjusted DiD with synthetic weights (Athey et al. 2025)
@@ -125,6 +125,7 @@ We provide Jupyter notebook tutorials in `docs/tutorials/`:
125125
| `11_imputation_did.ipynb` | Imputation DiD (Borusyak et al. 2024), pre-trend test, efficiency comparison |
126126
| `12_two_stage_did.ipynb` | Two-Stage DiD (Gardner 2022), GMM sandwich variance, per-observation effects |
127127
| `13_stacked_did.ipynb` | Stacked DiD (Wing et al. 2024), Q-weights, sub-experiment inspection, trimming, clean control definitions |
128+
| `15_efficient_did.ipynb` | Efficient DiD (Chen et al. 2025), optimal weighting, PT-All vs PT-Post, efficiency gains, bootstrap inference |
128129

129130
## Data Preparation
130131

@@ -1071,6 +1072,56 @@ results = stacked_did(
10711072
)
10721073
```
10731074

1075+
### Efficient DiD (Chen, Sant'Anna & Xie 2025)
1076+
1077+
Efficient DiD achieves the semiparametric efficiency bound for ATT estimation in staggered adoption designs. It optimally weights across all valid comparison groups and baselines via the inverse covariance matrix Omega*, producing tighter confidence intervals than standard estimators like Callaway-Sant'Anna when the stronger PT-All assumption holds.
1078+
1079+
```python
1080+
from diff_diff import EfficientDiD, generate_staggered_data
1081+
1082+
# Generate sample data
1083+
data = generate_staggered_data(n_units=300, n_periods=10,
1084+
cohort_periods=[4, 6, 8], seed=42)
1085+
1086+
# Fit with PT-All (overidentified, tighter SEs)
1087+
edid = EfficientDiD(pt_assumption="all")
1088+
results = edid.fit(data, outcome='outcome', unit='unit',
1089+
time='period', first_treat='first_treat',
1090+
aggregate='all')
1091+
results.print_summary()
1092+
1093+
# PT-Post mode (matches CS for post-treatment effects)
1094+
edid_post = EfficientDiD(pt_assumption="post")
1095+
results_post = edid_post.fit(data, outcome='outcome', unit='unit',
1096+
time='period', first_treat='first_treat')
1097+
```
1098+
1099+
**Parameters:**
1100+
1101+
```python
1102+
EfficientDiD(
1103+
pt_assumption='all', # 'all' (overidentified) or 'post' (matches CS post-treatment ATT)
1104+
alpha=0.05, # Significance level
1105+
n_bootstrap=0, # Bootstrap iterations (0 = analytical only)
1106+
bootstrap_weights='rademacher', # 'rademacher', 'mammen', or 'webb'
1107+
seed=None, # Random seed
1108+
anticipation=0, # Anticipation periods
1109+
)
1110+
```
1111+
1112+
> **Note:** Phase 1 supports the no-covariates path only. Use CallawaySantAnna with
1113+
> `estimation_method='dr'` if you need covariate adjustment.
1114+
1115+
**When to use Efficient DiD vs Callaway-Sant'Anna:**
1116+
1117+
| Aspect | Efficient DiD | Callaway-Sant'Anna |
1118+
|--------|--------------|-------------------|
1119+
| Approach | Optimal EIF-based weighting | Separate 2x2 DiD aggregation |
1120+
| PT assumption | PT-All (stronger) or PT-Post | Conditional PT |
1121+
| Efficiency | Achieves semiparametric bound | Not efficient |
1122+
| Covariates | Not yet (Phase 2) | Supported (OR, IPW, DR) |
1123+
| When to choose | Maximum efficiency, PT-All credible | Covariates needed, weaker PT |
1124+
10741125
### Triple Difference (DDD)
10751126

10761127
Triple Difference (DDD) is used when treatment requires satisfying two criteria: belonging to a treated **group** AND being in an eligible **partition**. The `TripleDifference` class implements the methodology from Ortiz-Villavicencio & Sant'Anna (2025), which correctly handles covariate adjustment (unlike naive implementations).

docs/api/efficient_did.rst

Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
Efficient Difference-in-Differences
2+
====================================
3+
4+
Semiparametrically efficient ATT estimator for staggered adoption designs
5+
from Chen, Sant'Anna & Xie (2025).
6+
7+
This module implements the efficiency-bound-attaining estimator that:
8+
9+
1. **Achieves the semiparametric efficiency bound** for ATT(g,t) estimation
10+
2. **Optimally weights** across comparison groups and baselines via the
11+
inverse covariance matrix Ω*
12+
3. **Supports two PT assumptions**: PT-All (overidentified, tighter SEs) and
13+
PT-Post (just-identified, matches CS for post-treatment effects)
14+
4. **Uses EIF-based inference** for analytical standard errors and multiplier
15+
bootstrap
16+
17+
.. note::
18+
19+
Phase 1 supports the **no-covariates** path only. The with-covariates
20+
path (Phase 2) will be added in a future version.
21+
22+
**When to use EfficientDiD:**
23+
24+
- Staggered adoption design where you want **maximum efficiency**
25+
- You believe parallel trends holds across all pre-treatment periods (PT-All)
26+
- You want tighter confidence intervals than Callaway-Sant'Anna
27+
- You need a formal efficiency benchmark for comparing estimators
28+
29+
**Reference:** Chen, X., Sant'Anna, P. H. C., & Xie, H. (2025). Efficient
30+
Difference-in-Differences and Event Study Estimators.
31+
32+
.. module:: diff_diff.efficient_did
33+
34+
EfficientDiD
35+
-------------
36+
37+
Main estimator class for Efficient Difference-in-Differences.
38+
39+
.. autoclass:: diff_diff.EfficientDiD
40+
:members:
41+
:undoc-members:
42+
:show-inheritance:
43+
:inherited-members:
44+
45+
.. rubric:: Methods
46+
47+
.. autosummary::
48+
49+
~EfficientDiD.fit
50+
~EfficientDiD.get_params
51+
~EfficientDiD.set_params
52+
53+
EfficientDiDResults
54+
-------------------
55+
56+
Results container for Efficient DiD estimation.
57+
58+
.. autoclass:: diff_diff.efficient_did_results.EfficientDiDResults
59+
:members:
60+
:undoc-members:
61+
:show-inheritance:
62+
63+
.. rubric:: Methods
64+
65+
.. autosummary::
66+
67+
~EfficientDiDResults.summary
68+
~EfficientDiDResults.print_summary
69+
~EfficientDiDResults.to_dataframe
70+
71+
EDiDBootstrapResults
72+
--------------------
73+
74+
Bootstrap inference results for Efficient DiD.
75+
76+
.. autoclass:: diff_diff.efficient_did_bootstrap.EDiDBootstrapResults
77+
:members:
78+
:undoc-members:
79+
:show-inheritance:
80+
81+
Example Usage
82+
-------------
83+
84+
Basic usage::
85+
86+
from diff_diff import EfficientDiD, generate_staggered_data
87+
88+
data = generate_staggered_data(n_units=300, n_periods=10,
89+
cohort_periods=[4, 6, 8], seed=42)
90+
91+
edid = EfficientDiD(pt_assumption="all")
92+
results = edid.fit(data, outcome='outcome', unit='unit',
93+
time='period', first_treat='first_treat',
94+
aggregate='all')
95+
results.print_summary()
96+
97+
PT-Post mode (matches CS for post-treatment ATT)::
98+
99+
edid_post = EfficientDiD(pt_assumption="post")
100+
results_post = edid_post.fit(data, outcome='outcome', unit='unit',
101+
time='period', first_treat='first_treat',
102+
aggregate='all')
103+
print(f"PT-All ATT: {results.overall_att:.4f} (SE={results.overall_se:.4f})")
104+
print(f"PT-Post ATT: {results_post.overall_att:.4f} (SE={results_post.overall_se:.4f})")
105+
106+
Bootstrap inference::
107+
108+
edid_boot = EfficientDiD(pt_assumption="all", n_bootstrap=999, seed=42)
109+
results_boot = edid_boot.fit(data, outcome='outcome', unit='unit',
110+
time='period', first_treat='first_treat',
111+
aggregate='all')
112+
print(f"Bootstrap SE: {results_boot.overall_se:.4f}")
113+
print(f"Bootstrap CI: [{results_boot.overall_conf_int[0]:.4f}, "
114+
f"{results_boot.overall_conf_int[1]:.4f}]")
115+
116+
Comparison with Other Staggered Estimators
117+
------------------------------------------
118+
119+
.. list-table::
120+
:header-rows: 1
121+
:widths: 20 27 27 26
122+
123+
* - Feature
124+
- EfficientDiD
125+
- CallawaySantAnna
126+
- ImputationDiD
127+
* - Approach
128+
- Optimal EIF-based weighting
129+
- Separate 2x2 DiD aggregation
130+
- Impute Y(0) via FE model
131+
* - PT assumption
132+
- PT-All (stronger) or PT-Post
133+
- Conditional PT
134+
- Strict exogeneity
135+
* - Efficiency
136+
- Achieves semiparametric bound
137+
- Not efficient
138+
- Efficient under homogeneity
139+
* - Covariates
140+
- Not yet (Phase 2)
141+
- Supported (OR, IPW, DR)
142+
- Supported
143+
* - Bootstrap
144+
- Multiplier bootstrap (EIF)
145+
- Multiplier bootstrap
146+
- Multiplier bootstrap
147+
* - PT-Post equivalence
148+
- Matches CS post-treatment ATT(g,t)
149+
- Baseline
150+
- Different framework

docs/api/index.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ Core estimator classes for DiD analysis:
2323
diff_diff.TripleDifference
2424
diff_diff.TROP
2525
diff_diff.ContinuousDiD
26+
diff_diff.EfficientDiD
2627

2728
Results Classes
2829
---------------
@@ -49,6 +50,8 @@ Result containers returned by estimators:
4950
diff_diff.trop.TROPResults
5051
diff_diff.ContinuousDiDResults
5152
diff_diff.DoseResponseCurve
53+
diff_diff.EfficientDiDResults
54+
diff_diff.EDiDBootstrapResults
5255

5356
Visualization
5457
-------------
@@ -195,6 +198,7 @@ Detailed documentation by module:
195198
triple_diff
196199
trop
197200
continuous_did
201+
efficient_did
198202
results
199203
visualization
200204
diagnostics

docs/choosing_estimator.rst

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Start here and follow the questions:
1616
1. **Is treatment staggered?** (Different units treated at different times)
1717

1818
- **No** → Go to question 2
19-
- **Yes** → Use :class:`~diff_diff.CallawaySantAnna`
19+
- **Yes** → Use :class:`~diff_diff.CallawaySantAnna` (or :class:`~diff_diff.EfficientDiD` for tighter SEs under PT-All)
2020

2121
2. **Do you have panel data?** (Multiple observations per unit over time)
2222

@@ -63,6 +63,10 @@ Quick Reference
6363
- Few treated units, many controls
6464
- Synthetic parallel trends
6565
- ATT with unit/time weights
66+
* - ``EfficientDiD``
67+
- Staggered adoption with optimal efficiency
68+
- PT-All (overidentified) or PT-Post
69+
- Group-time ATT(g,t), aggregations
6670
* - ``ContinuousDiD``
6771
- Continuous dose / treatment intensity
6872
- Strong Parallel Trends (SPT) for dose-response; PT for binarized ATT
@@ -214,6 +218,32 @@ Use :class:`~diff_diff.ContinuousDiD` when:
214218
print(f"Overall ATT: {results.overall_att:.3f}")
215219
att_curve = results.dose_response_att.to_dataframe()
216220
221+
Efficient DiD
222+
~~~~~~~~~~~~~
223+
224+
Use :class:`~diff_diff.EfficientDiD` when:
225+
226+
- You have staggered adoption and want **maximum statistical efficiency**
227+
- You believe parallel trends holds across all pre-treatment periods (PT-All)
228+
- You want tighter confidence intervals than Callaway-Sant'Anna
229+
- You need a formal efficiency benchmark for comparing estimators
230+
231+
.. note::
232+
233+
Phase 1 supports the **no-covariates** path only. If you need covariate
234+
adjustment, use :class:`~diff_diff.CallawaySantAnna` with ``estimation_method='dr'``
235+
or :class:`~diff_diff.ImputationDiD`.
236+
237+
.. code-block:: python
238+
239+
from diff_diff import EfficientDiD
240+
241+
edid = EfficientDiD(pt_assumption="all") # or "post" for post-treatment CS match
242+
results = edid.fit(data, outcome='y', unit='unit_id',
243+
time='period', first_treat='first_treat',
244+
aggregate='all')
245+
results.print_summary()
246+
217247
Common Pitfalls
218248
---------------
219249

0 commit comments

Comments
 (0)