igerber
diff --git a/‎ROADMAP.md‎
Lines changed: 158 additions & 113 deletions b/‎ROADMAP.md‎
Lines changed: 158 additions & 113 deletions
@@ -2,190 +2,235 @@
 
 This document outlines the feature roadmap for diff-diff, prioritized by practitioner value and academic credibility.
 
-## What Makes a Credible 1.0?
+For past changes and release history, see [CHANGELOG.md](CHANGELOG.md).
 
-A production-ready DiD library needs:
+---
+
+## Current Status (v1.0.2)
 
-1. ✅ **Core estimators** - Basic DiD, TWFE, MultiPeriod, Staggered (Callaway-Sant'Anna), Synthetic DiD
-2. ✅ **Valid inference** - Robust SEs, cluster SEs, wild bootstrap for few clusters
-3. ✅ **Assumption diagnostics** - Parallel trends tests, placebo tests
-4. ✅ **Sensitivity analysis** - What if parallel trends is violated? (Rambachan-Roth)
-5. ✅ **Conditional parallel trends** - Covariate adjustment for staggered DiD
-6. ✅ **Documentation** - API reference site for discoverability
+diff-diff is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` ecosystem for core DiD analysis:
 
-**All 1.0 blockers are complete.** diff-diff has feature parity with R's `did` + `HonestDiD` ecosystem for core DiD analysis.
+- **Core estimators**: Basic DiD, TWFE, MultiPeriod, Callaway-Sant'Anna, Synthetic DiD
+- **Valid inference**: Robust SEs, cluster SEs, wild bootstrap, multiplier bootstrap
+- **Assumption diagnostics**: Parallel trends tests, placebo tests, Goodman-Bacon decomposition
+- **Sensitivity analysis**: Honest DiD (Rambachan-Roth)
+- **Study design**: Power analysis tools
 
 ---
 
-## Status Overview
-
-| Feature | Status | Priority | Why It Matters |
-|---------|--------|----------|----------------|
-| Honest DiD (Rambachan-Roth) | ✅ Done | — | Reviewers expect sensitivity analysis |
-| CallawaySantAnna Covariates | ✅ Done | — | Conditional PT often required in practice |
-| API Documentation Site | ✅ Done | — | Credibility and discoverability |
-| Goodman-Bacon Decomposition | ✅ Done | — | Explains when TWFE fails |
-| Power Analysis | ✅ Done | — | Study design tool |
-| CallawaySantAnna Bootstrap | ✅ Done | — | Valid inference with few clusters |
-| Sun-Abraham Estimator | Not Started | Post-1.0 | Alternative to CS, some prefer it |
-| Gardner's did2s | Not Started | Post-1.0 | Two-stage approach, available in pyfixest |
-| Local Projections DiD | Not Started | Post-1.0 | Dynamic effects (Dube et al. 2023) |
-| Borusyak-Jaravel-Spiess | Not Started | Post-1.0 | More efficient under homogeneous effects |
-| Double/Debiased ML | Not Started | Post-1.0 | High-dimensional covariates |
+## Near-Term Enhancements (v1.1–v1.2)
 
----
+High-value additions building on our existing foundation.
+
+### Sun-Abraham Estimator
+
+Interaction-weighted estimator providing an alternative to Callaway-Sant'Anna. Many practitioners run both as a robustness check.
+
+- Event-study coefficients via saturated regression with cohort-time interactions
+- Different weighting scheme than CS; can give different results under heterogeneous effects
+- Useful robustness check when CS and SA agree
+
+**Reference**: Sun & Abraham (2021). *Journal of Econometrics*.
 
-## 1.0 Target Features
+### Borusyak-Jaravel-Spiess Imputation Estimator
+
+More efficient than Callaway-Sant'Anna when treatment effects are homogeneous across groups/time. Uses imputation rather than aggregation.
 
-These would strengthen the 1.0 release but aren't strictly blocking.
+- Imputes untreated potential outcomes using pre-treatment data
+- More efficient under homogeneous effects assumption
+- Can handle unbalanced panels more naturally
 
-### ✅ Goodman-Bacon Decomposition (Done)
+**Reference**: Borusyak, Jaravel, and Spiess (2024). *Review of Economic Studies*.
 
-Helps users understand *why* TWFE can be biased with staggered adoption. Shows weights on "forbidden comparisons" (already-treated as controls). Essential diagnostic before deciding whether to use Callaway-Sant'Anna.
+### Gardner's Two-Stage DiD (did2s)
 
-- ✅ Decompose TWFE into 2x2 comparisons
-- ✅ Show weights by comparison type (clean vs. forbidden)
-- ✅ Visualization of decomposition (scatter and bar charts)
-- ✅ Integration with `TwoWayFixedEffects.decompose()` method
-- ✅ Automatic warning when TWFE detects staggered treatment timing
+Two-stage approach gaining traction in applied work. First residualizes outcomes, then estimates effects.
 
-**Reference**: Goodman-Bacon (2021). *Journal of Econometrics*.
+- Stage 1: Estimate unit and time FEs using only untreated observations
+- Stage 2: Regress residualized outcomes on treatment indicators
+- Clean separation of identification and estimation
 
-### ✅ Power Analysis Tools (Done)
+**Reference**: Gardner (2022). *Working Paper*.
 
-Practitioners need to know "how many units/periods do I need to detect an effect of size X?" Now available in diff-diff.
+### Triple Difference (DDD) Estimators
 
-- ✅ Minimum detectable effect given sample size
-- ✅ Required sample size for target power
-- ✅ Simulation-based power for any estimator (including staggered designs)
-- ✅ Visualization of power curves
-- ✅ Panel data considerations (ICC, multiple periods)
+Extends DiD to settings requiring a third differencing dimension. Common DDD implementations are invalid when covariates are needed for identification.
 
-**References**: Bloom (1995); Burlig, Preonas, & Woerman (2020).
+- Regression adjustment, IPW, and doubly robust DDD estimators
+- Staggered adoption support with multiple comparison groups
+- Proper covariate integration (naive "two DiD difference" approaches fail)
+- Bias reduction and precision gains over standard approaches
 
-### ✅ CallawaySantAnna Bootstrap Inference (Done)
+**Reference**: [Ortiz-Villavicencio & Sant'Anna (2025)](https://arxiv.org/abs/2505.09942). *Working Paper*. R package: `triplediff`.
 
-With few clusters or groups, analytical SEs may be unreliable. Multiplier bootstrap provides valid inference following the R `did` package approach.
+### Pre-Trends Power Analysis
 
-- ✅ Multiplier bootstrap at unit level with influence function perturbation
-- ✅ Aggregate bootstrap samples for overall ATT, event study, and group effects
-- ✅ Rademacher, Mammen, and Webb weight distributions
-- ✅ Percentile confidence intervals and bootstrap p-values
+Assess whether pre-trends tests have adequate power to detect meaningful parallel trends violations. Complements our Honest DiD implementation.
 
-**Reference**: Callaway & Sant'Anna (2021). *Journal of Econometrics*.
+- Minimum detectable violation size for pre-trends tests
+- Visualization of power against various violation magnitudes
+- Integration with existing parallel trends diagnostics
+
+**Reference**: [Roth (2022)](https://www.aeaweb.org/articles?id=10.1257/aeri.20210236). *AER: Insights*. R package: `pretrends`.
 
 ### Enhanced Visualization
 
 - Synthetic control weight visualization (bar chart of unit weights)
-- ✅ Bacon decomposition visualization (scatter and bar charts)
-- Treatment adoption "staircase" plot
+- Treatment adoption "staircase" plot for staggered designs
+- Interactive plots with plotly backend option
 
 ---
 
-## Post-1.0 Features
+## Medium-Term Enhancements (v1.3+)
 
-These are valuable but can wait for future versions.
+Extending diff-diff to handle more complex settings.
 
-### Sun-Abraham Estimator
+### Continuous Treatment DiD
 
-Alternative to Callaway-Sant'Anna using interaction-weighted approach. Some practitioners prefer it; provides a robustness check.
+Many treatments have dose/intensity rather than binary on/off. Active research area with recent breakthroughs.
 
-**Reference**: Sun & Abraham (2021). *Journal of Econometrics*.
+- Treatment effect on treated (ATT) parameters under generalized parallel trends
+- Dose-response curves and marginal effects
+- Handle settings where "dose" varies across units and time
+- Event studies with continuous treatments
 
-### Gardner's Two-Stage DiD (did2s)
+**References**:
+- [Callaway, Goodman-Bacon & Sant'Anna (2024)](https://arxiv.org/abs/2107.02637). *NBER Working Paper*.
+- [de Chaisemartin, D'Haultfœuille & Vazquez-Bare (2024)](https://arxiv.org/abs/2402.05432). *AEA Papers and Proceedings*.
+
+### de Chaisemartin-D'Haultfœuille Estimator
+
+Handles treatment that switches on and off (reversible treatments), unlike most other methods.
 
-Two-stage approach to staggered DiD that first residualizes outcomes using untreated observations, then estimates treatment effects. Available in pyfixest (Python) and did2s (R).
+- Allows units to move into and out of treatment
+- Time-varying, heterogeneous treatment effects
+- Comparison with never-switchers or flexible control groups
+- Different assumptions than CS/SA—useful for different settings
 
-**Reference**: Gardner (2022). *Two-stage differences in differences*.
+**Reference**: [de Chaisemartin & D'Haultfœuille (2020, 2024)](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3980758). *American Economic Review*.
 
 ### Local Projections DiD
 
-Implements local projections for dynamic treatment effects. Flexible approach that doesn't require specifying the full dynamic structure. Gaining traction in applied work.
+Implements local projections for dynamic treatment effects. Doesn't require specifying full dynamic structure.
+
+- Flexible impulse response estimation
+- Robust to misspecification of dynamics
+- Natural handling of anticipation effects
+- Growing use in macroeconomics and policy evaluation
 
 **Reference**: Dube, Girardi, Jordà, and Taylor (2023).
 
-### Borusyak-Jaravel-Spiess Imputation Estimator
+### Nonlinear DiD
 
-More efficient than Callaway-Sant'Anna when parallel trends holds across all periods. Uses imputation approach.
+For outcomes where linear models are inappropriate (binary, count, bounded).
 
-**Reference**: Borusyak, Jaravel, and Spiess (2024).
+- Logit/probit DiD for binary outcomes
+- Poisson DiD for count outcomes
+- Flexible strategies for staggered designs with nonlinear models
+- Proper handling of incidence rate ratios and odds ratios
 
-### Double/Debiased ML for DiD
+**Reference**: [Wooldridge (2023)](https://academic.oup.com/ectj/article/26/3/C31/7250479). *The Econometrics Journal*.
 
-For high-dimensional settings with many covariates. Uses ML for nuisance parameter estimation with cross-fitting.
+### Doubly Robust DiD + Synthetic Control
 
-**Reference**: Chernozhukov et al. (2018), Chang (2020).
+Unified framework combining DiD and synthetic control with doubly robust identification—valid under *either* parallel trends or synthetic control assumptions.
 
-### Alternative Inference Methods
+- ATT identified under parallel trends OR group-level SC condition
+- Semiparametric estimation framework
+- Multiplier bootstrap for valid inference under either assumption
+- Strengthens credibility by avoiding the DiD vs. SC trade-off
+
+**Reference**: [Sun, Xie & Zhang (2025)](https://arxiv.org/abs/2503.11375). *Working Paper*.
 
-- Randomization inference for small samples
-- Bayesian DiD with prior on parallel trends
-- Conformal inference for prediction intervals
+### Causal Duration Analysis with DiD
+
+Extends DiD to duration/survival outcomes where standard methods fail (hazard rates, time-to-event).
+
+- Duration analogue of parallel trends on hazard rates
+- Avoids distributional assumptions and hazard function specification
+- Visual and formal pre-trends assessment for duration data
+- Handles absorbing states approaching probability bounds
+
+**Reference**: [Deaner & Ku (2025)](https://www.aeaweb.org/conference/2025/program/paper/k77Kh8iS). *AEA Conference Paper*.
 
 ---
 
-## Release History
+## Long-Term Research Directions (v2.0+)
+
+Frontier methods requiring more research investment.
 
-### v0.9.0 (Current)
+### Matrix Completion Methods
 
-- ✅ Callaway-Sant'Anna multiplier bootstrap inference
-- ✅ Rademacher, Mammen, and Webb weight distributions
-- ✅ Bootstrap SEs, CIs, and p-values for all aggregations (overall ATT, event study, group effects)
-- ✅ `CSBootstrapResults` dataclass for bootstrap results
+Unified framework encompassing synthetic control and regression approaches. Moves seamlessly between cross-sectional and time-series patterns.
 
-### v0.8.0
+- Nuclear norm regularization for low-rank structure
+- Handles missing data patterns common in panel settings
+- Bridges synthetic control (few units, many periods) and regression (many units, few periods)
+- Confidence intervals via debiasing
 
-- ✅ Power analysis tools (`PowerAnalysis`, `simulate_power`)
-- ✅ MDE, sample size, and power calculations
-- ✅ Simulation-based power for any DiD estimator
-- ✅ Power curve visualization (`plot_power_curve`)
-- ✅ Panel data support with ICC adjustment
+**Reference**: [Athey et al. (2021)](https://arxiv.org/abs/1710.10251). *Journal of the American Statistical Association*.
 
-### v0.7.0
+### Causal Forests for DiD
 
-- ✅ Goodman-Bacon decomposition for TWFE diagnostics
-- ✅ `plot_bacon()` visualization (scatter and bar charts)
-- ✅ `TwoWayFixedEffects.decompose()` integration
-- ✅ Automatic staggered treatment warning in TWFE
+Machine learning methods for discovering heterogeneous treatment effects in DiD settings.
 
-### v0.6.0
+- Estimate treatment effect heterogeneity across covariates
+- Data-driven subgroup discovery
+- Combine with DiD identification for observational data
+- Honest confidence intervals for discovered heterogeneity
 
-- ✅ **All 1.0 Blockers Complete**
-- ✅ Honest DiD sensitivity analysis (Rambachan & Roth 2023)
-- ✅ CallawaySantAnna covariate adjustment (DR, IPW, Reg)
-- ✅ API documentation site with Sphinx
+**References**:
+- [Kattenberg, Scheer & Thiel (2023)](https://ideas.repec.org/p/cpb/discus/452.html). *CPB Discussion Paper*.
+- Athey & Wager (2019). *Annals of Statistics*.
 
-### v0.5.0
+### Double/Debiased ML for DiD
+
+For high-dimensional settings with many potential confounders.
 
-- Wild cluster bootstrap (Rademacher, Webb, Mammen weights)
-- Placebo tests module
-- Tutorial notebooks
+- ML for nuisance parameter estimation (propensity, outcome models)
+- Cross-fitting for valid inference
+- Handles many covariates without overfitting concerns
+- Doubly-robust estimation with ML flexibility
 
-### v0.4.0
+**Reference**: Chernozhukov et al. (2018). *The Econometrics Journal*.
 
-- Callaway-Sant'Anna estimator for staggered DiD
-- Event study and group effects visualization
-- Parallel trends testing utilities
+### Alternative Inference Methods
 
-### v0.3.0
+- **Randomization inference**: Exact p-values for small samples
+- **Bayesian DiD**: Priors on parallel trends violations
+- **Conformal inference**: Prediction intervals with finite-sample guarantees
+
+---
 
-- Synthetic Difference-in-Differences
-- Multi-period DiD with event study
-- Data preparation utilities
+## Infrastructure Improvements
 
-### v0.2.0
+Ongoing maintenance and developer experience.
 
-- Two-Way Fixed Effects estimator
-- Fixed effects support (absorb parameter)
-- Cluster-robust standard errors
-- Formula interface
+### Performance
 
-### v0.1.0
+- JIT compilation for bootstrap loops (numba)
+- Parallel bootstrap iterations
+- Sparse matrix handling for large fixed effects
+- Memory-efficient estimation for large panels
 
-- Initial release with basic DiD estimator
+### Code Quality
+
+- Extract shared within-transformation logic to utils
+- Consolidate linear regression helpers
+- Consider splitting `staggered.py` (1800+ lines)
+
+### Documentation
+
+- Real-world data examples (beyond synthetic)
+- Performance benchmarks vs. R packages
+- Video tutorials and worked examples
 
 ---
 
 ## Contributing
 
-Interested in contributing? See the [GitHub repository](https://github.com/igerber/diff-diff) for open issues. Features marked "Not Started" are good candidates for contributions.
+Interested in contributing? Features in the "Near-Term" and "Medium-Term" sections are good candidates. See the [GitHub repository](https://github.com/igerber/diff-diff) for open issues.
+
+Key references for implementation:
+- [Roth et al. (2023)](https://www.sciencedirect.com/science/article/abs/pii/S0304407623001318). "What's Trending in Difference-in-Differences?" *Journal of Econometrics*.
+- [Baker et al. (2025)](https://arxiv.org/pdf/2503.13323). "Difference-in-Differences Designs: A Practitioner's Guide."