|
2 | 2 |
|
3 | 3 | This document outlines the feature roadmap for diff-diff, prioritized by practitioner value and academic credibility. |
4 | 4 |
|
5 | | -## What Makes a Credible 1.0? |
| 5 | +For past changes and release history, see [CHANGELOG.md](CHANGELOG.md). |
6 | 6 |
|
7 | | -A production-ready DiD library needs: |
8 | | - |
9 | | -1. ✅ **Core estimators** - Basic DiD, TWFE, MultiPeriod, Staggered (Callaway-Sant'Anna), Synthetic DiD |
10 | | -2. ✅ **Valid inference** - Robust SEs, cluster SEs, wild bootstrap for few clusters |
11 | | -3. ✅ **Assumption diagnostics** - Parallel trends tests, placebo tests |
12 | | -4. ✅ **Sensitivity analysis** - What if parallel trends is violated? (Rambachan-Roth) |
13 | | -5. ✅ **Conditional parallel trends** - Covariate adjustment for staggered DiD |
14 | | -6. ✅ **Documentation** - API reference site for discoverability |
| 7 | +--- |
15 | 8 |
|
16 | | -**All 1.0 blockers are complete.** diff-diff has feature parity with R's `did` + `HonestDiD` ecosystem for core DiD analysis. |
| 9 | +## Current Status (v1.0.2) |
17 | 10 |
|
18 | | ---- |
| 11 | +diff-diff is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` ecosystem for core DiD analysis: |
19 | 12 |
|
20 | | -## Status Overview |
21 | | - |
22 | | -| Feature | Status | Priority | Why It Matters | |
23 | | -|---------|--------|----------|----------------| |
24 | | -| Honest DiD (Rambachan-Roth) | ✅ Done | — | Reviewers expect sensitivity analysis | |
25 | | -| CallawaySantAnna Covariates | ✅ Done | — | Conditional PT often required in practice | |
26 | | -| API Documentation Site | ✅ Done | — | Credibility and discoverability | |
27 | | -| Goodman-Bacon Decomposition | ✅ Done | — | Explains when TWFE fails | |
28 | | -| Power Analysis | ✅ Done | — | Study design tool | |
29 | | -| CallawaySantAnna Bootstrap | ✅ Done | — | Valid inference with few clusters | |
30 | | -| Sun-Abraham Estimator | Not Started | Post-1.0 | Alternative to CS, some prefer it | |
31 | | -| Gardner's did2s | Not Started | Post-1.0 | Two-stage approach, available in pyfixest | |
32 | | -| Local Projections DiD | Not Started | Post-1.0 | Dynamic effects (Dube et al. 2023) | |
33 | | -| Borusyak-Jaravel-Spiess | Not Started | Post-1.0 | More efficient under homogeneous effects | |
34 | | -| Double/Debiased ML | Not Started | Post-1.0 | High-dimensional covariates | |
| 13 | +- **Core estimators**: Basic DiD, TWFE, MultiPeriod, Callaway-Sant'Anna, Synthetic DiD |
| 14 | +- **Valid inference**: Robust SEs, cluster SEs, wild bootstrap, multiplier bootstrap |
| 15 | +- **Assumption diagnostics**: Parallel trends tests, placebo tests, Goodman-Bacon decomposition |
| 16 | +- **Sensitivity analysis**: Honest DiD (Rambachan-Roth) |
| 17 | +- **Study design**: Power analysis tools |
35 | 18 |
|
36 | 19 | --- |
37 | 20 |
|
38 | | -## 1.0 Target Features |
| 21 | +## Near-Term Enhancements (v1.1–v1.2) |
39 | 22 |
|
40 | | -These would strengthen the 1.0 release but aren't strictly blocking. |
| 23 | +High-value additions building on our existing foundation. |
41 | 24 |
|
42 | | -### ✅ Goodman-Bacon Decomposition (Done) |
| 25 | +### Sun-Abraham Estimator |
43 | 26 |
|
44 | | -Helps users understand *why* TWFE can be biased with staggered adoption. Shows weights on "forbidden comparisons" (already-treated as controls). Essential diagnostic before deciding whether to use Callaway-Sant'Anna. |
| 27 | +Interaction-weighted estimator providing an alternative to Callaway-Sant'Anna. Many practitioners run both as a robustness check. |
45 | 28 |
|
46 | | -- ✅ Decompose TWFE into 2x2 comparisons |
47 | | -- ✅ Show weights by comparison type (clean vs. forbidden) |
48 | | -- ✅ Visualization of decomposition (scatter and bar charts) |
49 | | -- ✅ Integration with `TwoWayFixedEffects.decompose()` method |
50 | | -- ✅ Automatic warning when TWFE detects staggered treatment timing |
| 29 | +- Event-study coefficients via saturated regression with cohort-time interactions |
| 30 | +- Different weighting scheme than CS; can give different results under heterogeneous effects |
| 31 | +- Useful robustness check when CS and SA agree |
51 | 32 |
|
52 | | -**Reference**: Goodman-Bacon (2021). *Journal of Econometrics*. |
| 33 | +**Reference**: Sun & Abraham (2021). *Journal of Econometrics*. |
53 | 34 |
|
54 | | -### ✅ Power Analysis Tools (Done) |
| 35 | +### Borusyak-Jaravel-Spiess Imputation Estimator |
55 | 36 |
|
56 | | -Practitioners need to know "how many units/periods do I need to detect an effect of size X?" Now available in diff-diff. |
| 37 | +More efficient than Callaway-Sant'Anna when treatment effects are homogeneous across groups/time. Uses imputation rather than aggregation. |
57 | 38 |
|
58 | | -- ✅ Minimum detectable effect given sample size |
59 | | -- ✅ Required sample size for target power |
60 | | -- ✅ Simulation-based power for any estimator (including staggered designs) |
61 | | -- ✅ Visualization of power curves |
62 | | -- ✅ Panel data considerations (ICC, multiple periods) |
| 39 | +- Imputes untreated potential outcomes using pre-treatment data |
| 40 | +- More efficient under homogeneous effects assumption |
| 41 | +- Can handle unbalanced panels more naturally |
63 | 42 |
|
64 | | -**References**: Bloom (1995); Burlig, Preonas, & Woerman (2020). |
| 43 | +**Reference**: Borusyak, Jaravel, and Spiess (2024). *Review of Economic Studies*. |
65 | 44 |
|
66 | | -### ✅ CallawaySantAnna Bootstrap Inference (Done) |
| 45 | +### Gardner's Two-Stage DiD (did2s) |
67 | 46 |
|
68 | | -With few clusters or groups, analytical SEs may be unreliable. Multiplier bootstrap provides valid inference following the R `did` package approach. |
| 47 | +Two-stage approach gaining traction in applied work. First residualizes outcomes, then estimates effects. |
69 | 48 |
|
70 | | -- ✅ Multiplier bootstrap at unit level with influence function perturbation |
71 | | -- ✅ Aggregate bootstrap samples for overall ATT, event study, and group effects |
72 | | -- ✅ Rademacher, Mammen, and Webb weight distributions |
73 | | -- ✅ Percentile confidence intervals and bootstrap p-values |
| 49 | +- Stage 1: Estimate unit and time FEs using only untreated observations |
| 50 | +- Stage 2: Regress residualized outcomes on treatment indicators |
| 51 | +- Clean separation of identification and estimation |
74 | 52 |
|
75 | | -**Reference**: Callaway & Sant'Anna (2021). *Journal of Econometrics*. |
| 53 | +**Reference**: Gardner (2022). *Working Paper*. |
76 | 54 |
|
77 | 55 | ### Enhanced Visualization |
78 | 56 |
|
79 | 57 | - Synthetic control weight visualization (bar chart of unit weights) |
80 | | -- ✅ Bacon decomposition visualization (scatter and bar charts) |
81 | | -- Treatment adoption "staircase" plot |
| 58 | +- Treatment adoption "staircase" plot for staggered designs |
| 59 | +- Interactive plots with plotly backend option |
82 | 60 |
|
83 | 61 | --- |
84 | 62 |
|
85 | | -## Post-1.0 Features |
| 63 | +## Medium-Term Enhancements (v1.3+) |
86 | 64 |
|
87 | | -These are valuable but can wait for future versions. |
| 65 | +Extending diff-diff to handle more complex settings. |
88 | 66 |
|
89 | | -### Sun-Abraham Estimator |
| 67 | +### Continuous Treatment DiD |
90 | 68 |
|
91 | | -Alternative to Callaway-Sant'Anna using interaction-weighted approach. Some practitioners prefer it; provides a robustness check. |
| 69 | +Many treatments have dose/intensity rather than binary on/off. Active research area with recent breakthroughs. |
92 | 70 |
|
93 | | -**Reference**: Sun & Abraham (2021). *Journal of Econometrics*. |
| 71 | +- Treatment effect on treated (ATT) parameters under generalized parallel trends |
| 72 | +- Dose-response curves and marginal effects |
| 73 | +- Handle settings where "dose" varies across units and time |
| 74 | +- Event studies with continuous treatments |
94 | 75 |
|
95 | | -### Gardner's Two-Stage DiD (did2s) |
| 76 | +**References**: |
| 77 | +- [Callaway, Goodman-Bacon & Sant'Anna (2024)](https://arxiv.org/abs/2107.02637). *NBER Working Paper*. |
| 78 | +- [de Chaisemartin, D'Haultfœuille & Vazquez-Bare (2024)](https://arxiv.org/abs/2402.05432). *AEA Papers and Proceedings*. |
| 79 | + |
| 80 | +### de Chaisemartin-D'Haultfœuille Estimator |
| 81 | + |
| 82 | +Handles treatment that switches on and off (reversible treatments), unlike most other methods. |
96 | 83 |
|
97 | | -Two-stage approach to staggered DiD that first residualizes outcomes using untreated observations, then estimates treatment effects. Available in pyfixest (Python) and did2s (R). |
| 84 | +- Allows units to move into and out of treatment |
| 85 | +- Time-varying, heterogeneous treatment effects |
| 86 | +- Comparison with never-switchers or flexible control groups |
| 87 | +- Different assumptions than CS/SA—useful for different settings |
98 | 88 |
|
99 | | -**Reference**: Gardner (2022). *Two-stage differences in differences*. |
| 89 | +**Reference**: [de Chaisemartin & D'Haultfœuille (2020, 2024)](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3980758). *American Economic Review*. |
100 | 90 |
|
101 | 91 | ### Local Projections DiD |
102 | 92 |
|
103 | | -Implements local projections for dynamic treatment effects. Flexible approach that doesn't require specifying the full dynamic structure. Gaining traction in applied work. |
| 93 | +Implements local projections for dynamic treatment effects. Doesn't require specifying full dynamic structure. |
| 94 | + |
| 95 | +- Flexible impulse response estimation |
| 96 | +- Robust to misspecification of dynamics |
| 97 | +- Natural handling of anticipation effects |
| 98 | +- Growing use in macroeconomics and policy evaluation |
104 | 99 |
|
105 | 100 | **Reference**: Dube, Girardi, Jordà, and Taylor (2023). |
106 | 101 |
|
107 | | -### Borusyak-Jaravel-Spiess Imputation Estimator |
| 102 | +### Nonlinear DiD |
108 | 103 |
|
109 | | -More efficient than Callaway-Sant'Anna when parallel trends holds across all periods. Uses imputation approach. |
| 104 | +For outcomes where linear models are inappropriate (binary, count, bounded). |
110 | 105 |
|
111 | | -**Reference**: Borusyak, Jaravel, and Spiess (2024). |
| 106 | +- Logit/probit DiD for binary outcomes |
| 107 | +- Poisson DiD for count outcomes |
| 108 | +- Flexible strategies for staggered designs with nonlinear models |
| 109 | +- Proper handling of incidence rate ratios and odds ratios |
112 | 110 |
|
113 | | -### Double/Debiased ML for DiD |
| 111 | +**Reference**: [Wooldridge (2023)](https://academic.oup.com/ectj/article/26/3/C31/7250479). *The Econometrics Journal*. |
114 | 112 |
|
115 | | -For high-dimensional settings with many covariates. Uses ML for nuisance parameter estimation with cross-fitting. |
| 113 | +--- |
116 | 114 |
|
117 | | -**Reference**: Chernozhukov et al. (2018), Chang (2020). |
| 115 | +## Long-Term Research Directions (v2.0+) |
118 | 116 |
|
119 | | -### Alternative Inference Methods |
| 117 | +Frontier methods requiring more research investment. |
120 | 118 |
|
121 | | -- Randomization inference for small samples |
122 | | -- Bayesian DiD with prior on parallel trends |
123 | | -- Conformal inference for prediction intervals |
| 119 | +### Matrix Completion Methods |
124 | 120 |
|
125 | | ---- |
| 121 | +Unified framework encompassing synthetic control and regression approaches. Moves seamlessly between cross-sectional and time-series patterns. |
126 | 122 |
|
127 | | -## Release History |
| 123 | +- Nuclear norm regularization for low-rank structure |
| 124 | +- Handles missing data patterns common in panel settings |
| 125 | +- Bridges synthetic control (few units, many periods) and regression (many units, few periods) |
| 126 | +- Confidence intervals via debiasing |
128 | 127 |
|
129 | | -### v0.9.0 (Current) |
| 128 | +**Reference**: [Athey et al. (2021)](https://arxiv.org/abs/1710.10251). *Journal of the American Statistical Association*. |
130 | 129 |
|
131 | | -- ✅ Callaway-Sant'Anna multiplier bootstrap inference |
132 | | -- ✅ Rademacher, Mammen, and Webb weight distributions |
133 | | -- ✅ Bootstrap SEs, CIs, and p-values for all aggregations (overall ATT, event study, group effects) |
134 | | -- ✅ `CSBootstrapResults` dataclass for bootstrap results |
| 130 | +### Causal Forests for DiD |
135 | 131 |
|
136 | | -### v0.8.0 |
| 132 | +Machine learning methods for discovering heterogeneous treatment effects in DiD settings. |
137 | 133 |
|
138 | | -- ✅ Power analysis tools (`PowerAnalysis`, `simulate_power`) |
139 | | -- ✅ MDE, sample size, and power calculations |
140 | | -- ✅ Simulation-based power for any DiD estimator |
141 | | -- ✅ Power curve visualization (`plot_power_curve`) |
142 | | -- ✅ Panel data support with ICC adjustment |
| 134 | +- Estimate treatment effect heterogeneity across covariates |
| 135 | +- Data-driven subgroup discovery |
| 136 | +- Combine with DiD identification for observational data |
| 137 | +- Honest confidence intervals for discovered heterogeneity |
143 | 138 |
|
144 | | -### v0.7.0 |
| 139 | +**References**: |
| 140 | +- [Kattenberg, Scheer & Thiel (2023)](https://ideas.repec.org/p/cpb/discus/452.html). *CPB Discussion Paper*. |
| 141 | +- Athey & Wager (2019). *Annals of Statistics*. |
145 | 142 |
|
146 | | -- ✅ Goodman-Bacon decomposition for TWFE diagnostics |
147 | | -- ✅ `plot_bacon()` visualization (scatter and bar charts) |
148 | | -- ✅ `TwoWayFixedEffects.decompose()` integration |
149 | | -- ✅ Automatic staggered treatment warning in TWFE |
| 143 | +### Double/Debiased ML for DiD |
| 144 | + |
| 145 | +For high-dimensional settings with many potential confounders. |
| 146 | + |
| 147 | +- ML for nuisance parameter estimation (propensity, outcome models) |
| 148 | +- Cross-fitting for valid inference |
| 149 | +- Handles many covariates without overfitting concerns |
| 150 | +- Doubly-robust estimation with ML flexibility |
150 | 151 |
|
151 | | -### v0.6.0 |
| 152 | +**Reference**: Chernozhukov et al. (2018). *The Econometrics Journal*. |
152 | 153 |
|
153 | | -- ✅ **All 1.0 Blockers Complete** |
154 | | -- ✅ Honest DiD sensitivity analysis (Rambachan & Roth 2023) |
155 | | -- ✅ CallawaySantAnna covariate adjustment (DR, IPW, Reg) |
156 | | -- ✅ API documentation site with Sphinx |
| 154 | +### Alternative Inference Methods |
157 | 155 |
|
158 | | -### v0.5.0 |
| 156 | +- **Randomization inference**: Exact p-values for small samples |
| 157 | +- **Bayesian DiD**: Priors on parallel trends violations |
| 158 | +- **Conformal inference**: Prediction intervals with finite-sample guarantees |
159 | 159 |
|
160 | | -- Wild cluster bootstrap (Rademacher, Webb, Mammen weights) |
161 | | -- Placebo tests module |
162 | | -- Tutorial notebooks |
| 160 | +--- |
163 | 161 |
|
164 | | -### v0.4.0 |
| 162 | +## Infrastructure Improvements |
165 | 163 |
|
166 | | -- Callaway-Sant'Anna estimator for staggered DiD |
167 | | -- Event study and group effects visualization |
168 | | -- Parallel trends testing utilities |
| 164 | +Ongoing maintenance and developer experience. |
169 | 165 |
|
170 | | -### v0.3.0 |
| 166 | +### Performance |
171 | 167 |
|
172 | | -- Synthetic Difference-in-Differences |
173 | | -- Multi-period DiD with event study |
174 | | -- Data preparation utilities |
| 168 | +- JIT compilation for bootstrap loops (numba) |
| 169 | +- Parallel bootstrap iterations |
| 170 | +- Sparse matrix handling for large fixed effects |
| 171 | +- Memory-efficient estimation for large panels |
175 | 172 |
|
176 | | -### v0.2.0 |
| 173 | +### Code Quality |
177 | 174 |
|
178 | | -- Two-Way Fixed Effects estimator |
179 | | -- Fixed effects support (absorb parameter) |
180 | | -- Cluster-robust standard errors |
181 | | -- Formula interface |
| 175 | +- Extract shared within-transformation logic to utils |
| 176 | +- Consolidate linear regression helpers |
| 177 | +- Consider splitting `staggered.py` (1800+ lines) |
182 | 178 |
|
183 | | -### v0.1.0 |
| 179 | +### Documentation |
184 | 180 |
|
185 | | -- Initial release with basic DiD estimator |
| 181 | +- Real-world data examples (beyond synthetic) |
| 182 | +- Performance benchmarks vs. R packages |
| 183 | +- Video tutorials and worked examples |
186 | 184 |
|
187 | 185 | --- |
188 | 186 |
|
189 | 187 | ## Contributing |
190 | 188 |
|
191 | | -Interested in contributing? See the [GitHub repository](https://github.com/igerber/diff-diff) for open issues. Features marked "Not Started" are good candidates for contributions. |
| 189 | +Interested in contributing? Features in the "Near-Term" and "Medium-Term" sections are good candidates. See the [GitHub repository](https://github.com/igerber/diff-diff) for open issues. |
| 190 | + |
| 191 | +Key references for implementation: |
| 192 | +- [Roth et al. (2023)](https://www.sciencedirect.com/science/article/abs/pii/S0304407623001318). "What's Trending in Difference-in-Differences?" *Journal of Econometrics*. |
| 193 | +- [Baker et al. (2025)](https://arxiv.org/pdf/2503.13323). "Difference-in-Differences Designs: A Practitioner's Guide." |
0 commit comments