Skip to content

Commit 6de1bff

Browse files
authored
Merge pull request #41 from igerber/claude/update-roadmap-future-DYY4i
2 parents b6a1bae + 33fd41f commit 6de1bff

2 files changed

Lines changed: 192 additions & 197 deletions

File tree

ROADMAP.md

Lines changed: 158 additions & 113 deletions
Original file line numberDiff line numberDiff line change
@@ -2,190 +2,235 @@
22

33
This document outlines the feature roadmap for diff-diff, prioritized by practitioner value and academic credibility.
44

5-
## What Makes a Credible 1.0?
5+
For past changes and release history, see [CHANGELOG.md](CHANGELOG.md).
66

7-
A production-ready DiD library needs:
7+
---
8+
9+
## Current Status (v1.0.2)
810

9-
1.**Core estimators** - Basic DiD, TWFE, MultiPeriod, Staggered (Callaway-Sant'Anna), Synthetic DiD
10-
2.**Valid inference** - Robust SEs, cluster SEs, wild bootstrap for few clusters
11-
3.**Assumption diagnostics** - Parallel trends tests, placebo tests
12-
4.**Sensitivity analysis** - What if parallel trends is violated? (Rambachan-Roth)
13-
5.**Conditional parallel trends** - Covariate adjustment for staggered DiD
14-
6.**Documentation** - API reference site for discoverability
11+
diff-diff is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` ecosystem for core DiD analysis:
1512

16-
**All 1.0 blockers are complete.** diff-diff has feature parity with R's `did` + `HonestDiD` ecosystem for core DiD analysis.
13+
- **Core estimators**: Basic DiD, TWFE, MultiPeriod, Callaway-Sant'Anna, Synthetic DiD
14+
- **Valid inference**: Robust SEs, cluster SEs, wild bootstrap, multiplier bootstrap
15+
- **Assumption diagnostics**: Parallel trends tests, placebo tests, Goodman-Bacon decomposition
16+
- **Sensitivity analysis**: Honest DiD (Rambachan-Roth)
17+
- **Study design**: Power analysis tools
1718

1819
---
1920

20-
## Status Overview
21-
22-
| Feature | Status | Priority | Why It Matters |
23-
|---------|--------|----------|----------------|
24-
| Honest DiD (Rambachan-Roth) | ✅ Done || Reviewers expect sensitivity analysis |
25-
| CallawaySantAnna Covariates | ✅ Done || Conditional PT often required in practice |
26-
| API Documentation Site | ✅ Done || Credibility and discoverability |
27-
| Goodman-Bacon Decomposition | ✅ Done || Explains when TWFE fails |
28-
| Power Analysis | ✅ Done || Study design tool |
29-
| CallawaySantAnna Bootstrap | ✅ Done || Valid inference with few clusters |
30-
| Sun-Abraham Estimator | Not Started | Post-1.0 | Alternative to CS, some prefer it |
31-
| Gardner's did2s | Not Started | Post-1.0 | Two-stage approach, available in pyfixest |
32-
| Local Projections DiD | Not Started | Post-1.0 | Dynamic effects (Dube et al. 2023) |
33-
| Borusyak-Jaravel-Spiess | Not Started | Post-1.0 | More efficient under homogeneous effects |
34-
| Double/Debiased ML | Not Started | Post-1.0 | High-dimensional covariates |
21+
## Near-Term Enhancements (v1.1–v1.2)
3522

36-
---
23+
High-value additions building on our existing foundation.
24+
25+
### Sun-Abraham Estimator
26+
27+
Interaction-weighted estimator providing an alternative to Callaway-Sant'Anna. Many practitioners run both as a robustness check.
28+
29+
- Event-study coefficients via saturated regression with cohort-time interactions
30+
- Different weighting scheme than CS; can give different results under heterogeneous effects
31+
- Useful robustness check when CS and SA agree
32+
33+
**Reference**: Sun & Abraham (2021). *Journal of Econometrics*.
3734

38-
## 1.0 Target Features
35+
### Borusyak-Jaravel-Spiess Imputation Estimator
36+
37+
More efficient than Callaway-Sant'Anna when treatment effects are homogeneous across groups/time. Uses imputation rather than aggregation.
3938

40-
These would strengthen the 1.0 release but aren't strictly blocking.
39+
- Imputes untreated potential outcomes using pre-treatment data
40+
- More efficient under homogeneous effects assumption
41+
- Can handle unbalanced panels more naturally
4142

42-
### ✅ Goodman-Bacon Decomposition (Done)
43+
**Reference**: Borusyak, Jaravel, and Spiess (2024). *Review of Economic Studies*.
4344

44-
Helps users understand *why* TWFE can be biased with staggered adoption. Shows weights on "forbidden comparisons" (already-treated as controls). Essential diagnostic before deciding whether to use Callaway-Sant'Anna.
45+
### Gardner's Two-Stage DiD (did2s)
4546

46-
- ✅ Decompose TWFE into 2x2 comparisons
47-
- ✅ Show weights by comparison type (clean vs. forbidden)
48-
- ✅ Visualization of decomposition (scatter and bar charts)
49-
- ✅ Integration with `TwoWayFixedEffects.decompose()` method
50-
- ✅ Automatic warning when TWFE detects staggered treatment timing
47+
Two-stage approach gaining traction in applied work. First residualizes outcomes, then estimates effects.
5148

52-
**Reference**: Goodman-Bacon (2021). *Journal of Econometrics*.
49+
- Stage 1: Estimate unit and time FEs using only untreated observations
50+
- Stage 2: Regress residualized outcomes on treatment indicators
51+
- Clean separation of identification and estimation
5352

54-
### ✅ Power Analysis Tools (Done)
53+
**Reference**: Gardner (2022). *Working Paper*.
5554

56-
Practitioners need to know "how many units/periods do I need to detect an effect of size X?" Now available in diff-diff.
55+
### Triple Difference (DDD) Estimators
5756

58-
- ✅ Minimum detectable effect given sample size
59-
- ✅ Required sample size for target power
60-
- ✅ Simulation-based power for any estimator (including staggered designs)
61-
- ✅ Visualization of power curves
62-
- ✅ Panel data considerations (ICC, multiple periods)
57+
Extends DiD to settings requiring a third differencing dimension. Common DDD implementations are invalid when covariates are needed for identification.
6358

64-
**References**: Bloom (1995); Burlig, Preonas, & Woerman (2020).
59+
- Regression adjustment, IPW, and doubly robust DDD estimators
60+
- Staggered adoption support with multiple comparison groups
61+
- Proper covariate integration (naive "two DiD difference" approaches fail)
62+
- Bias reduction and precision gains over standard approaches
6563

66-
### ✅ CallawaySantAnna Bootstrap Inference (Done)
64+
**Reference**: [Ortiz-Villavicencio & Sant'Anna (2025)](https://arxiv.org/abs/2505.09942). *Working Paper*. R package: `triplediff`.
6765

68-
With few clusters or groups, analytical SEs may be unreliable. Multiplier bootstrap provides valid inference following the R `did` package approach.
66+
### Pre-Trends Power Analysis
6967

70-
- ✅ Multiplier bootstrap at unit level with influence function perturbation
71-
- ✅ Aggregate bootstrap samples for overall ATT, event study, and group effects
72-
- ✅ Rademacher, Mammen, and Webb weight distributions
73-
- ✅ Percentile confidence intervals and bootstrap p-values
68+
Assess whether pre-trends tests have adequate power to detect meaningful parallel trends violations. Complements our Honest DiD implementation.
7469

75-
**Reference**: Callaway & Sant'Anna (2021). *Journal of Econometrics*.
70+
- Minimum detectable violation size for pre-trends tests
71+
- Visualization of power against various violation magnitudes
72+
- Integration with existing parallel trends diagnostics
73+
74+
**Reference**: [Roth (2022)](https://www.aeaweb.org/articles?id=10.1257/aeri.20210236). *AER: Insights*. R package: `pretrends`.
7675

7776
### Enhanced Visualization
7877

7978
- Synthetic control weight visualization (bar chart of unit weights)
80-
- ✅ Bacon decomposition visualization (scatter and bar charts)
81-
- Treatment adoption "staircase" plot
79+
- Treatment adoption "staircase" plot for staggered designs
80+
- Interactive plots with plotly backend option
8281

8382
---
8483

85-
## Post-1.0 Features
84+
## Medium-Term Enhancements (v1.3+)
8685

87-
These are valuable but can wait for future versions.
86+
Extending diff-diff to handle more complex settings.
8887

89-
### Sun-Abraham Estimator
88+
### Continuous Treatment DiD
9089

91-
Alternative to Callaway-Sant'Anna using interaction-weighted approach. Some practitioners prefer it; provides a robustness check.
90+
Many treatments have dose/intensity rather than binary on/off. Active research area with recent breakthroughs.
9291

93-
**Reference**: Sun & Abraham (2021). *Journal of Econometrics*.
92+
- Treatment effect on treated (ATT) parameters under generalized parallel trends
93+
- Dose-response curves and marginal effects
94+
- Handle settings where "dose" varies across units and time
95+
- Event studies with continuous treatments
9496

95-
### Gardner's Two-Stage DiD (did2s)
97+
**References**:
98+
- [Callaway, Goodman-Bacon & Sant'Anna (2024)](https://arxiv.org/abs/2107.02637). *NBER Working Paper*.
99+
- [de Chaisemartin, D'Haultfœuille & Vazquez-Bare (2024)](https://arxiv.org/abs/2402.05432). *AEA Papers and Proceedings*.
100+
101+
### de Chaisemartin-D'Haultfœuille Estimator
102+
103+
Handles treatment that switches on and off (reversible treatments), unlike most other methods.
96104

97-
Two-stage approach to staggered DiD that first residualizes outcomes using untreated observations, then estimates treatment effects. Available in pyfixest (Python) and did2s (R).
105+
- Allows units to move into and out of treatment
106+
- Time-varying, heterogeneous treatment effects
107+
- Comparison with never-switchers or flexible control groups
108+
- Different assumptions than CS/SA—useful for different settings
98109

99-
**Reference**: Gardner (2022). *Two-stage differences in differences*.
110+
**Reference**: [de Chaisemartin & D'Haultfœuille (2020, 2024)](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3980758). *American Economic Review*.
100111

101112
### Local Projections DiD
102113

103-
Implements local projections for dynamic treatment effects. Flexible approach that doesn't require specifying the full dynamic structure. Gaining traction in applied work.
114+
Implements local projections for dynamic treatment effects. Doesn't require specifying full dynamic structure.
115+
116+
- Flexible impulse response estimation
117+
- Robust to misspecification of dynamics
118+
- Natural handling of anticipation effects
119+
- Growing use in macroeconomics and policy evaluation
104120

105121
**Reference**: Dube, Girardi, Jordà, and Taylor (2023).
106122

107-
### Borusyak-Jaravel-Spiess Imputation Estimator
123+
### Nonlinear DiD
108124

109-
More efficient than Callaway-Sant'Anna when parallel trends holds across all periods. Uses imputation approach.
125+
For outcomes where linear models are inappropriate (binary, count, bounded).
110126

111-
**Reference**: Borusyak, Jaravel, and Spiess (2024).
127+
- Logit/probit DiD for binary outcomes
128+
- Poisson DiD for count outcomes
129+
- Flexible strategies for staggered designs with nonlinear models
130+
- Proper handling of incidence rate ratios and odds ratios
112131

113-
### Double/Debiased ML for DiD
132+
**Reference**: [Wooldridge (2023)](https://academic.oup.com/ectj/article/26/3/C31/7250479). *The Econometrics Journal*.
114133

115-
For high-dimensional settings with many covariates. Uses ML for nuisance parameter estimation with cross-fitting.
134+
### Doubly Robust DiD + Synthetic Control
116135

117-
**Reference**: Chernozhukov et al. (2018), Chang (2020).
136+
Unified framework combining DiD and synthetic control with doubly robust identification—valid under *either* parallel trends or synthetic control assumptions.
118137

119-
### Alternative Inference Methods
138+
- ATT identified under parallel trends OR group-level SC condition
139+
- Semiparametric estimation framework
140+
- Multiplier bootstrap for valid inference under either assumption
141+
- Strengthens credibility by avoiding the DiD vs. SC trade-off
142+
143+
**Reference**: [Sun, Xie & Zhang (2025)](https://arxiv.org/abs/2503.11375). *Working Paper*.
120144

121-
- Randomization inference for small samples
122-
- Bayesian DiD with prior on parallel trends
123-
- Conformal inference for prediction intervals
145+
### Causal Duration Analysis with DiD
146+
147+
Extends DiD to duration/survival outcomes where standard methods fail (hazard rates, time-to-event).
148+
149+
- Duration analogue of parallel trends on hazard rates
150+
- Avoids distributional assumptions and hazard function specification
151+
- Visual and formal pre-trends assessment for duration data
152+
- Handles absorbing states approaching probability bounds
153+
154+
**Reference**: [Deaner & Ku (2025)](https://www.aeaweb.org/conference/2025/program/paper/k77Kh8iS). *AEA Conference Paper*.
124155

125156
---
126157

127-
## Release History
158+
## Long-Term Research Directions (v2.0+)
159+
160+
Frontier methods requiring more research investment.
128161

129-
### v0.9.0 (Current)
162+
### Matrix Completion Methods
130163

131-
- ✅ Callaway-Sant'Anna multiplier bootstrap inference
132-
- ✅ Rademacher, Mammen, and Webb weight distributions
133-
- ✅ Bootstrap SEs, CIs, and p-values for all aggregations (overall ATT, event study, group effects)
134-
-`CSBootstrapResults` dataclass for bootstrap results
164+
Unified framework encompassing synthetic control and regression approaches. Moves seamlessly between cross-sectional and time-series patterns.
135165

136-
### v0.8.0
166+
- Nuclear norm regularization for low-rank structure
167+
- Handles missing data patterns common in panel settings
168+
- Bridges synthetic control (few units, many periods) and regression (many units, few periods)
169+
- Confidence intervals via debiasing
137170

138-
- ✅ Power analysis tools (`PowerAnalysis`, `simulate_power`)
139-
- ✅ MDE, sample size, and power calculations
140-
- ✅ Simulation-based power for any DiD estimator
141-
- ✅ Power curve visualization (`plot_power_curve`)
142-
- ✅ Panel data support with ICC adjustment
171+
**Reference**: [Athey et al. (2021)](https://arxiv.org/abs/1710.10251). *Journal of the American Statistical Association*.
143172

144-
### v0.7.0
173+
### Causal Forests for DiD
145174

146-
- ✅ Goodman-Bacon decomposition for TWFE diagnostics
147-
-`plot_bacon()` visualization (scatter and bar charts)
148-
-`TwoWayFixedEffects.decompose()` integration
149-
- ✅ Automatic staggered treatment warning in TWFE
175+
Machine learning methods for discovering heterogeneous treatment effects in DiD settings.
150176

151-
### v0.6.0
177+
- Estimate treatment effect heterogeneity across covariates
178+
- Data-driven subgroup discovery
179+
- Combine with DiD identification for observational data
180+
- Honest confidence intervals for discovered heterogeneity
152181

153-
-**All 1.0 Blockers Complete**
154-
- ✅ Honest DiD sensitivity analysis (Rambachan & Roth 2023)
155-
- ✅ CallawaySantAnna covariate adjustment (DR, IPW, Reg)
156-
- ✅ API documentation site with Sphinx
182+
**References**:
183+
- [Kattenberg, Scheer & Thiel (2023)](https://ideas.repec.org/p/cpb/discus/452.html). *CPB Discussion Paper*.
184+
- Athey & Wager (2019). *Annals of Statistics*.
157185

158-
### v0.5.0
186+
### Double/Debiased ML for DiD
187+
188+
For high-dimensional settings with many potential confounders.
159189

160-
- Wild cluster bootstrap (Rademacher, Webb, Mammen weights)
161-
- Placebo tests module
162-
- Tutorial notebooks
190+
- ML for nuisance parameter estimation (propensity, outcome models)
191+
- Cross-fitting for valid inference
192+
- Handles many covariates without overfitting concerns
193+
- Doubly-robust estimation with ML flexibility
163194

164-
### v0.4.0
195+
**Reference**: Chernozhukov et al. (2018). *The Econometrics Journal*.
165196

166-
- Callaway-Sant'Anna estimator for staggered DiD
167-
- Event study and group effects visualization
168-
- Parallel trends testing utilities
197+
### Alternative Inference Methods
169198

170-
### v0.3.0
199+
- **Randomization inference**: Exact p-values for small samples
200+
- **Bayesian DiD**: Priors on parallel trends violations
201+
- **Conformal inference**: Prediction intervals with finite-sample guarantees
202+
203+
---
171204

172-
- Synthetic Difference-in-Differences
173-
- Multi-period DiD with event study
174-
- Data preparation utilities
205+
## Infrastructure Improvements
175206

176-
### v0.2.0
207+
Ongoing maintenance and developer experience.
177208

178-
- Two-Way Fixed Effects estimator
179-
- Fixed effects support (absorb parameter)
180-
- Cluster-robust standard errors
181-
- Formula interface
209+
### Performance
182210

183-
### v0.1.0
211+
- JIT compilation for bootstrap loops (numba)
212+
- Parallel bootstrap iterations
213+
- Sparse matrix handling for large fixed effects
214+
- Memory-efficient estimation for large panels
184215

185-
- Initial release with basic DiD estimator
216+
### Code Quality
217+
218+
- Extract shared within-transformation logic to utils
219+
- Consolidate linear regression helpers
220+
- Consider splitting `staggered.py` (1800+ lines)
221+
222+
### Documentation
223+
224+
- Real-world data examples (beyond synthetic)
225+
- Performance benchmarks vs. R packages
226+
- Video tutorials and worked examples
186227

187228
---
188229

189230
## Contributing
190231

191-
Interested in contributing? See the [GitHub repository](https://github.com/igerber/diff-diff) for open issues. Features marked "Not Started" are good candidates for contributions.
232+
Interested in contributing? Features in the "Near-Term" and "Medium-Term" sections are good candidates. See the [GitHub repository](https://github.com/igerber/diff-diff) for open issues.
233+
234+
Key references for implementation:
235+
- [Roth et al. (2023)](https://www.sciencedirect.com/science/article/abs/pii/S0304407623001318). "What's Trending in Difference-in-Differences?" *Journal of Econometrics*.
236+
- [Baker et al. (2025)](https://arxiv.org/pdf/2503.13323). "Difference-in-Differences Designs: A Practitioner's Guide."

0 commit comments

Comments
 (0)