Skip to content

Commit 4328197

Browse files
committed
Update roadmap and TODO for v1.0.2 with future focus
- Remove completed items tracking (now in CHANGELOG.md) - Reorganize roadmap into Near-Term, Medium-Term, Long-Term sections - Add new research-backed future enhancements: - Continuous Treatment DiD (Callaway, Goodman-Bacon, Sant'Anna 2024) - de Chaisemartin-D'Haultfœuille (reversible treatments) - Nonlinear DiD (Wooldridge 2023) - Matrix Completion Methods (Athey et al.) - Causal Forests for HTE in DiD settings - Clean up TODO.md by removing completed items - Focus on actionable technical debt and improvements
1 parent b6a1bae commit 4328197

2 files changed

Lines changed: 151 additions & 199 deletions

File tree

ROADMAP.md

Lines changed: 117 additions & 115 deletions
Original file line numberDiff line numberDiff line change
@@ -2,190 +2,192 @@
22

33
This document outlines the feature roadmap for diff-diff, prioritized by practitioner value and academic credibility.
44

5-
## What Makes a Credible 1.0?
5+
For past changes and release history, see [CHANGELOG.md](CHANGELOG.md).
66

7-
A production-ready DiD library needs:
8-
9-
1.**Core estimators** - Basic DiD, TWFE, MultiPeriod, Staggered (Callaway-Sant'Anna), Synthetic DiD
10-
2.**Valid inference** - Robust SEs, cluster SEs, wild bootstrap for few clusters
11-
3.**Assumption diagnostics** - Parallel trends tests, placebo tests
12-
4.**Sensitivity analysis** - What if parallel trends is violated? (Rambachan-Roth)
13-
5.**Conditional parallel trends** - Covariate adjustment for staggered DiD
14-
6.**Documentation** - API reference site for discoverability
7+
---
158

16-
**All 1.0 blockers are complete.** diff-diff has feature parity with R's `did` + `HonestDiD` ecosystem for core DiD analysis.
9+
## Current Status (v1.0.2)
1710

18-
---
11+
diff-diff is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` ecosystem for core DiD analysis:
1912

20-
## Status Overview
21-
22-
| Feature | Status | Priority | Why It Matters |
23-
|---------|--------|----------|----------------|
24-
| Honest DiD (Rambachan-Roth) | ✅ Done || Reviewers expect sensitivity analysis |
25-
| CallawaySantAnna Covariates | ✅ Done || Conditional PT often required in practice |
26-
| API Documentation Site | ✅ Done || Credibility and discoverability |
27-
| Goodman-Bacon Decomposition | ✅ Done || Explains when TWFE fails |
28-
| Power Analysis | ✅ Done || Study design tool |
29-
| CallawaySantAnna Bootstrap | ✅ Done || Valid inference with few clusters |
30-
| Sun-Abraham Estimator | Not Started | Post-1.0 | Alternative to CS, some prefer it |
31-
| Gardner's did2s | Not Started | Post-1.0 | Two-stage approach, available in pyfixest |
32-
| Local Projections DiD | Not Started | Post-1.0 | Dynamic effects (Dube et al. 2023) |
33-
| Borusyak-Jaravel-Spiess | Not Started | Post-1.0 | More efficient under homogeneous effects |
34-
| Double/Debiased ML | Not Started | Post-1.0 | High-dimensional covariates |
13+
- **Core estimators**: Basic DiD, TWFE, MultiPeriod, Callaway-Sant'Anna, Synthetic DiD
14+
- **Valid inference**: Robust SEs, cluster SEs, wild bootstrap, multiplier bootstrap
15+
- **Assumption diagnostics**: Parallel trends tests, placebo tests, Goodman-Bacon decomposition
16+
- **Sensitivity analysis**: Honest DiD (Rambachan-Roth)
17+
- **Study design**: Power analysis tools
3518

3619
---
3720

38-
## 1.0 Target Features
21+
## Near-Term Enhancements (v1.1–v1.2)
3922

40-
These would strengthen the 1.0 release but aren't strictly blocking.
23+
High-value additions building on our existing foundation.
4124

42-
### ✅ Goodman-Bacon Decomposition (Done)
25+
### Sun-Abraham Estimator
4326

44-
Helps users understand *why* TWFE can be biased with staggered adoption. Shows weights on "forbidden comparisons" (already-treated as controls). Essential diagnostic before deciding whether to use Callaway-Sant'Anna.
27+
Interaction-weighted estimator providing an alternative to Callaway-Sant'Anna. Many practitioners run both as a robustness check.
4528

46-
- ✅ Decompose TWFE into 2x2 comparisons
47-
- ✅ Show weights by comparison type (clean vs. forbidden)
48-
- ✅ Visualization of decomposition (scatter and bar charts)
49-
- ✅ Integration with `TwoWayFixedEffects.decompose()` method
50-
- ✅ Automatic warning when TWFE detects staggered treatment timing
29+
- Event-study coefficients via saturated regression with cohort-time interactions
30+
- Different weighting scheme than CS; can give different results under heterogeneous effects
31+
- Useful robustness check when CS and SA agree
5132

52-
**Reference**: Goodman-Bacon (2021). *Journal of Econometrics*.
33+
**Reference**: Sun & Abraham (2021). *Journal of Econometrics*.
5334

54-
### ✅ Power Analysis Tools (Done)
35+
### Borusyak-Jaravel-Spiess Imputation Estimator
5536

56-
Practitioners need to know "how many units/periods do I need to detect an effect of size X?" Now available in diff-diff.
37+
More efficient than Callaway-Sant'Anna when treatment effects are homogeneous across groups/time. Uses imputation rather than aggregation.
5738

58-
- ✅ Minimum detectable effect given sample size
59-
- ✅ Required sample size for target power
60-
- ✅ Simulation-based power for any estimator (including staggered designs)
61-
- ✅ Visualization of power curves
62-
- ✅ Panel data considerations (ICC, multiple periods)
39+
- Imputes untreated potential outcomes using pre-treatment data
40+
- More efficient under homogeneous effects assumption
41+
- Can handle unbalanced panels more naturally
6342

64-
**References**: Bloom (1995); Burlig, Preonas, & Woerman (2020).
43+
**Reference**: Borusyak, Jaravel, and Spiess (2024). *Review of Economic Studies*.
6544

66-
### ✅ CallawaySantAnna Bootstrap Inference (Done)
45+
### Gardner's Two-Stage DiD (did2s)
6746

68-
With few clusters or groups, analytical SEs may be unreliable. Multiplier bootstrap provides valid inference following the R `did` package approach.
47+
Two-stage approach gaining traction in applied work. First residualizes outcomes, then estimates effects.
6948

70-
- ✅ Multiplier bootstrap at unit level with influence function perturbation
71-
- ✅ Aggregate bootstrap samples for overall ATT, event study, and group effects
72-
- ✅ Rademacher, Mammen, and Webb weight distributions
73-
- ✅ Percentile confidence intervals and bootstrap p-values
49+
- Stage 1: Estimate unit and time FEs using only untreated observations
50+
- Stage 2: Regress residualized outcomes on treatment indicators
51+
- Clean separation of identification and estimation
7452

75-
**Reference**: Callaway & Sant'Anna (2021). *Journal of Econometrics*.
53+
**Reference**: Gardner (2022). *Working Paper*.
7654

7755
### Enhanced Visualization
7856

7957
- Synthetic control weight visualization (bar chart of unit weights)
80-
- ✅ Bacon decomposition visualization (scatter and bar charts)
81-
- Treatment adoption "staircase" plot
58+
- Treatment adoption "staircase" plot for staggered designs
59+
- Interactive plots with plotly backend option
8260

8361
---
8462

85-
## Post-1.0 Features
63+
## Medium-Term Enhancements (v1.3+)
8664

87-
These are valuable but can wait for future versions.
65+
Extending diff-diff to handle more complex settings.
8866

89-
### Sun-Abraham Estimator
67+
### Continuous Treatment DiD
9068

91-
Alternative to Callaway-Sant'Anna using interaction-weighted approach. Some practitioners prefer it; provides a robustness check.
69+
Many treatments have dose/intensity rather than binary on/off. Active research area with recent breakthroughs.
9270

93-
**Reference**: Sun & Abraham (2021). *Journal of Econometrics*.
71+
- Treatment effect on treated (ATT) parameters under generalized parallel trends
72+
- Dose-response curves and marginal effects
73+
- Handle settings where "dose" varies across units and time
74+
- Event studies with continuous treatments
9475

95-
### Gardner's Two-Stage DiD (did2s)
76+
**References**:
77+
- [Callaway, Goodman-Bacon & Sant'Anna (2024)](https://arxiv.org/abs/2107.02637). *NBER Working Paper*.
78+
- [de Chaisemartin, D'Haultfœuille & Vazquez-Bare (2024)](https://arxiv.org/abs/2402.05432). *AEA Papers and Proceedings*.
79+
80+
### de Chaisemartin-D'Haultfœuille Estimator
81+
82+
Handles treatment that switches on and off (reversible treatments), unlike most other methods.
9683

97-
Two-stage approach to staggered DiD that first residualizes outcomes using untreated observations, then estimates treatment effects. Available in pyfixest (Python) and did2s (R).
84+
- Allows units to move into and out of treatment
85+
- Time-varying, heterogeneous treatment effects
86+
- Comparison with never-switchers or flexible control groups
87+
- Different assumptions than CS/SA—useful for different settings
9888

99-
**Reference**: Gardner (2022). *Two-stage differences in differences*.
89+
**Reference**: [de Chaisemartin & D'Haultfœuille (2020, 2024)](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3980758). *American Economic Review*.
10090

10191
### Local Projections DiD
10292

103-
Implements local projections for dynamic treatment effects. Flexible approach that doesn't require specifying the full dynamic structure. Gaining traction in applied work.
93+
Implements local projections for dynamic treatment effects. Doesn't require specifying full dynamic structure.
94+
95+
- Flexible impulse response estimation
96+
- Robust to misspecification of dynamics
97+
- Natural handling of anticipation effects
98+
- Growing use in macroeconomics and policy evaluation
10499

105100
**Reference**: Dube, Girardi, Jordà, and Taylor (2023).
106101

107-
### Borusyak-Jaravel-Spiess Imputation Estimator
102+
### Nonlinear DiD
108103

109-
More efficient than Callaway-Sant'Anna when parallel trends holds across all periods. Uses imputation approach.
104+
For outcomes where linear models are inappropriate (binary, count, bounded).
110105

111-
**Reference**: Borusyak, Jaravel, and Spiess (2024).
106+
- Logit/probit DiD for binary outcomes
107+
- Poisson DiD for count outcomes
108+
- Flexible strategies for staggered designs with nonlinear models
109+
- Proper handling of incidence rate ratios and odds ratios
112110

113-
### Double/Debiased ML for DiD
111+
**Reference**: [Wooldridge (2023)](https://academic.oup.com/ectj/article/26/3/C31/7250479). *The Econometrics Journal*.
114112

115-
For high-dimensional settings with many covariates. Uses ML for nuisance parameter estimation with cross-fitting.
113+
---
116114

117-
**Reference**: Chernozhukov et al. (2018), Chang (2020).
115+
## Long-Term Research Directions (v2.0+)
118116

119-
### Alternative Inference Methods
117+
Frontier methods requiring more research investment.
120118

121-
- Randomization inference for small samples
122-
- Bayesian DiD with prior on parallel trends
123-
- Conformal inference for prediction intervals
119+
### Matrix Completion Methods
124120

125-
---
121+
Unified framework encompassing synthetic control and regression approaches. Moves seamlessly between cross-sectional and time-series patterns.
126122

127-
## Release History
123+
- Nuclear norm regularization for low-rank structure
124+
- Handles missing data patterns common in panel settings
125+
- Bridges synthetic control (few units, many periods) and regression (many units, few periods)
126+
- Confidence intervals via debiasing
128127

129-
### v0.9.0 (Current)
128+
**Reference**: [Athey et al. (2021)](https://arxiv.org/abs/1710.10251). *Journal of the American Statistical Association*.
130129

131-
- ✅ Callaway-Sant'Anna multiplier bootstrap inference
132-
- ✅ Rademacher, Mammen, and Webb weight distributions
133-
- ✅ Bootstrap SEs, CIs, and p-values for all aggregations (overall ATT, event study, group effects)
134-
-`CSBootstrapResults` dataclass for bootstrap results
130+
### Causal Forests for DiD
135131

136-
### v0.8.0
132+
Machine learning methods for discovering heterogeneous treatment effects in DiD settings.
137133

138-
- ✅ Power analysis tools (`PowerAnalysis`, `simulate_power`)
139-
- ✅ MDE, sample size, and power calculations
140-
- ✅ Simulation-based power for any DiD estimator
141-
- ✅ Power curve visualization (`plot_power_curve`)
142-
- ✅ Panel data support with ICC adjustment
134+
- Estimate treatment effect heterogeneity across covariates
135+
- Data-driven subgroup discovery
136+
- Combine with DiD identification for observational data
137+
- Honest confidence intervals for discovered heterogeneity
143138

144-
### v0.7.0
139+
**References**:
140+
- [Kattenberg, Scheer & Thiel (2023)](https://ideas.repec.org/p/cpb/discus/452.html). *CPB Discussion Paper*.
141+
- Athey & Wager (2019). *Annals of Statistics*.
145142

146-
- ✅ Goodman-Bacon decomposition for TWFE diagnostics
147-
-`plot_bacon()` visualization (scatter and bar charts)
148-
-`TwoWayFixedEffects.decompose()` integration
149-
- ✅ Automatic staggered treatment warning in TWFE
143+
### Double/Debiased ML for DiD
144+
145+
For high-dimensional settings with many potential confounders.
146+
147+
- ML for nuisance parameter estimation (propensity, outcome models)
148+
- Cross-fitting for valid inference
149+
- Handles many covariates without overfitting concerns
150+
- Doubly-robust estimation with ML flexibility
150151

151-
### v0.6.0
152+
**Reference**: Chernozhukov et al. (2018). *The Econometrics Journal*.
152153

153-
-**All 1.0 Blockers Complete**
154-
- ✅ Honest DiD sensitivity analysis (Rambachan & Roth 2023)
155-
- ✅ CallawaySantAnna covariate adjustment (DR, IPW, Reg)
156-
- ✅ API documentation site with Sphinx
154+
### Alternative Inference Methods
157155

158-
### v0.5.0
156+
- **Randomization inference**: Exact p-values for small samples
157+
- **Bayesian DiD**: Priors on parallel trends violations
158+
- **Conformal inference**: Prediction intervals with finite-sample guarantees
159159

160-
- Wild cluster bootstrap (Rademacher, Webb, Mammen weights)
161-
- Placebo tests module
162-
- Tutorial notebooks
160+
---
163161

164-
### v0.4.0
162+
## Infrastructure Improvements
165163

166-
- Callaway-Sant'Anna estimator for staggered DiD
167-
- Event study and group effects visualization
168-
- Parallel trends testing utilities
164+
Ongoing maintenance and developer experience.
169165

170-
### v0.3.0
166+
### Performance
171167

172-
- Synthetic Difference-in-Differences
173-
- Multi-period DiD with event study
174-
- Data preparation utilities
168+
- JIT compilation for bootstrap loops (numba)
169+
- Parallel bootstrap iterations
170+
- Sparse matrix handling for large fixed effects
171+
- Memory-efficient estimation for large panels
175172

176-
### v0.2.0
173+
### Code Quality
177174

178-
- Two-Way Fixed Effects estimator
179-
- Fixed effects support (absorb parameter)
180-
- Cluster-robust standard errors
181-
- Formula interface
175+
- Extract shared within-transformation logic to utils
176+
- Consolidate linear regression helpers
177+
- Consider splitting `staggered.py` (1800+ lines)
182178

183-
### v0.1.0
179+
### Documentation
184180

185-
- Initial release with basic DiD estimator
181+
- Real-world data examples (beyond synthetic)
182+
- Performance benchmarks vs. R packages
183+
- Video tutorials and worked examples
186184

187185
---
188186

189187
## Contributing
190188

191-
Interested in contributing? See the [GitHub repository](https://github.com/igerber/diff-diff) for open issues. Features marked "Not Started" are good candidates for contributions.
189+
Interested in contributing? Features in the "Near-Term" and "Medium-Term" sections are good candidates. See the [GitHub repository](https://github.com/igerber/diff-diff) for open issues.
190+
191+
Key references for implementation:
192+
- [Roth et al. (2023)](https://www.sciencedirect.com/science/article/abs/pii/S0304407623001318). "What's Trending in Difference-in-Differences?" *Journal of Econometrics*.
193+
- [Baker et al. (2025)](https://arxiv.org/pdf/2503.13323). "Difference-in-Differences Designs: A Practitioner's Guide."

0 commit comments

Comments
 (0)