Skip to content

Commit 8c5ed73

Browse files
authored
Merge pull request #26 from igerber/claude/honest-did-implementation-AIFAH
2 parents 57bc1be + e40d6b4 commit 8c5ed73

8 files changed

Lines changed: 3350 additions & 12 deletions

File tree

CLAUDE.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,9 @@ mypy diff_diff
5353
- **`diff_diff/visualization.py`** - Plotting functions:
5454
- `plot_event_study` - Publication-ready event study coefficient plots
5555
- `plot_group_effects` - Treatment effects by cohort visualization
56-
- Works with MultiPeriodDiD, CallawaySantAnna, or DataFrames
56+
- `plot_sensitivity` - Honest DiD sensitivity analysis plots (bounds vs M)
57+
- `plot_honest_event_study` - Event study with honest confidence intervals
58+
- Works with MultiPeriodDiD, CallawaySantAnna, HonestDiD, or DataFrames
5759

5860
- **`diff_diff/utils.py`** - Statistical utilities:
5961
- Robust/cluster standard errors (`compute_robust_se`)
@@ -70,6 +72,14 @@ mypy diff_diff
7072
- `run_all_placebo_tests()` - Comprehensive suite of diagnostics
7173
- `PlaceboTestResults` - Dataclass for test results
7274

75+
- **`diff_diff/honest_did.py`** - Honest DiD sensitivity analysis (Rambachan & Roth 2023):
76+
- `HonestDiD` - Main class for computing bounds under parallel trends violations
77+
- `DeltaSD`, `DeltaRM`, `DeltaSDRM` - Restriction classes for smoothness and relative magnitudes
78+
- `HonestDiDResults` - Results with identified set bounds and robust CIs
79+
- `SensitivityResults` - Results from sensitivity analysis over M grid
80+
- `compute_honest_did()` - Convenience function for quick bounds computation
81+
- `sensitivity_plot()` - Convenience function for plotting sensitivity analysis
82+
7383
- **`diff_diff/prep.py`** - Data preparation utilities:
7484
- `generate_did_data` - Create synthetic data with known treatment effect
7585
- `make_treatment_indicator`, `make_post_indicator` - Create binary indicators
@@ -95,6 +105,7 @@ mypy diff_diff
95105
- `02_staggered_did.ipynb` - Staggered adoption with Callaway-Sant'Anna
96106
- `03_synthetic_did.ipynb` - Synthetic DiD with unit/time weights
97107
- `04_parallel_trends.ipynb` - Parallel trends testing and diagnostics
108+
- `05_honest_did.ipynb` - Honest DiD sensitivity analysis for parallel trends violations
98109

99110
### Test Structure
100111

@@ -106,6 +117,7 @@ Tests mirror the source modules:
106117
- `tests/test_wild_bootstrap.py` - Tests for wild cluster bootstrap
107118
- `tests/test_prep.py` - Tests for data preparation utilities
108119
- `tests/test_visualization.py` - Tests for plotting functions
120+
- `tests/test_honest_did.py` - Tests for Honest DiD sensitivity analysis
109121

110122
### Dependencies
111123

README.md

Lines changed: 161 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,7 @@ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
7575
- **Event study plots**: Publication-ready visualization of treatment effects
7676
- **Parallel trends testing**: Multiple methods including equivalence tests
7777
- **Placebo tests**: Comprehensive diagnostics including fake timing, fake group, permutation, and leave-one-out tests
78+
- **Honest DiD sensitivity analysis**: Rambachan-Roth (2023) bounds and breakdown analysis for parallel trends violations
7879
- **Data prep utilities**: Helper functions for common data preparation tasks
7980

8081
## Tutorials
@@ -87,6 +88,7 @@ We provide Jupyter notebook tutorials in `docs/tutorials/`:
8788
| `02_staggered_did.ipynb` | Staggered adoption with Callaway-Sant'Anna, group-time effects, aggregation methods |
8889
| `03_synthetic_did.ipynb` | Synthetic DiD, unit/time weights, inference methods, regularization |
8990
| `04_parallel_trends.ipynb` | Testing parallel trends, equivalence tests, placebo tests, diagnostics |
91+
| `05_honest_did.ipynb` | Honest DiD sensitivity analysis, bounds, breakdown values, visualization |
9092

9193
## Data Preparation
9294

@@ -980,6 +982,81 @@ print(f"TOST p-value: {results['tost_p_value']:.4f}")
980982
print(f"Trends equivalent: {results['equivalent']}")
981983
```
982984

985+
### Honest DiD Sensitivity Analysis (Rambachan-Roth)
986+
987+
Pre-trends tests have low power and can exacerbate bias. **Honest DiD** (Rambachan & Roth 2023) provides sensitivity analysis showing how robust your results are to violations of parallel trends.
988+
989+
```python
990+
from diff_diff import HonestDiD, MultiPeriodDiD
991+
992+
# First, fit a standard event study
993+
did = MultiPeriodDiD()
994+
event_results = did.fit(
995+
data,
996+
outcome='outcome',
997+
treatment='treated',
998+
time='period',
999+
post_periods=[5, 6, 7, 8, 9]
1000+
)
1001+
1002+
# Compute honest bounds with relative magnitudes restriction
1003+
# M=1 means post-treatment violations can be up to 1x the worst pre-treatment violation
1004+
honest = HonestDiD(method='relative_magnitude', M=1.0)
1005+
honest_results = honest.fit(event_results)
1006+
1007+
print(honest_results.summary())
1008+
print(f"Original estimate: {honest_results.original_estimate:.4f}")
1009+
print(f"Robust 95% CI: [{honest_results.ci_lb:.4f}, {honest_results.ci_ub:.4f}]")
1010+
print(f"Effect robust to violations: {honest_results.is_significant}")
1011+
```
1012+
1013+
**Sensitivity analysis over M values:**
1014+
1015+
```python
1016+
# How do results change as we allow larger violations?
1017+
sensitivity = honest.sensitivity_analysis(
1018+
event_results,
1019+
M_grid=[0, 0.5, 1.0, 1.5, 2.0]
1020+
)
1021+
1022+
print(sensitivity.summary())
1023+
print(f"Breakdown value: M = {sensitivity.breakdown_M}")
1024+
# Breakdown = smallest M where the robust CI includes zero
1025+
```
1026+
1027+
**Breakdown value:**
1028+
1029+
The breakdown value tells you how robust your conclusion is:
1030+
1031+
```python
1032+
breakdown = honest.breakdown_value(event_results)
1033+
if breakdown >= 1.0:
1034+
print("Result holds even if post-treatment violations are as bad as pre-treatment")
1035+
else:
1036+
print(f"Result requires violations smaller than {breakdown:.1f}x pre-treatment")
1037+
```
1038+
1039+
**Smoothness restriction (alternative approach):**
1040+
1041+
```python
1042+
# Bounds second differences of trend violations
1043+
# M=0 means linear extrapolation of pre-trends
1044+
honest_smooth = HonestDiD(method='smoothness', M=0.5)
1045+
smooth_results = honest_smooth.fit(event_results)
1046+
```
1047+
1048+
**Visualization:**
1049+
1050+
```python
1051+
from diff_diff import plot_sensitivity, plot_honest_event_study
1052+
1053+
# Plot sensitivity analysis
1054+
plot_sensitivity(sensitivity, title="Sensitivity to Parallel Trends Violations")
1055+
1056+
# Event study with honest confidence intervals
1057+
plot_honest_event_study(event_results, honest_results)
1058+
```
1059+
9831060
### Placebo Tests
9841061

9851062
Placebo tests help validate the parallel trends assumption by checking whether effects appear where they shouldn't (before treatment or in untreated groups).
@@ -1278,6 +1355,75 @@ SyntheticDiD(
12781355
| `get_unit_weights_df()` | Get unit weights as DataFrame |
12791356
| `get_time_weights_df()` | Get time weights as DataFrame |
12801357

1358+
### HonestDiD
1359+
1360+
```python
1361+
HonestDiD(
1362+
method='relative_magnitude', # 'relative_magnitude' or 'smoothness'
1363+
M=None, # Restriction parameter (default: 1.0 for RM, 0.0 for SD)
1364+
alpha=0.05, # Significance level for CIs
1365+
l_vec=None # Linear combination vector for target parameter
1366+
)
1367+
```
1368+
1369+
**fit() Parameters:**
1370+
1371+
| Parameter | Type | Description |
1372+
|-----------|------|-------------|
1373+
| `results` | MultiPeriodDiDResults | Results from MultiPeriodDiD.fit() |
1374+
| `M` | float | Restriction parameter (overrides constructor value) |
1375+
1376+
**Methods:**
1377+
1378+
| Method | Description |
1379+
|--------|-------------|
1380+
| `fit(results, M)` | Compute bounds for given event study results |
1381+
| `sensitivity_analysis(results, M_grid)` | Compute bounds over grid of M values |
1382+
| `breakdown_value(results, tol)` | Find smallest M where CI includes zero |
1383+
1384+
### HonestDiDResults
1385+
1386+
**Attributes:**
1387+
1388+
| Attribute | Description |
1389+
|-----------|-------------|
1390+
| `original_estimate` | Point estimate under parallel trends |
1391+
| `lb` | Lower bound of identified set |
1392+
| `ub` | Upper bound of identified set |
1393+
| `ci_lb` | Lower bound of robust confidence interval |
1394+
| `ci_ub` | Upper bound of robust confidence interval |
1395+
| `ci_width` | Width of robust CI |
1396+
| `M` | Restriction parameter used |
1397+
| `method` | Restriction method ('relative_magnitude' or 'smoothness') |
1398+
| `alpha` | Significance level |
1399+
| `is_significant` | True if robust CI excludes zero |
1400+
1401+
**Methods:**
1402+
1403+
| Method | Description |
1404+
|--------|-------------|
1405+
| `summary()` | Get formatted summary string |
1406+
| `to_dict()` | Convert to dictionary |
1407+
| `to_dataframe()` | Convert to pandas DataFrame |
1408+
1409+
### SensitivityResults
1410+
1411+
**Attributes:**
1412+
1413+
| Attribute | Description |
1414+
|-----------|-------------|
1415+
| `M_grid` | Array of M values analyzed |
1416+
| `results` | List of HonestDiDResults for each M |
1417+
| `breakdown_M` | Smallest M where CI includes zero (None if always significant) |
1418+
1419+
**Methods:**
1420+
1421+
| Method | Description |
1422+
|--------|-------------|
1423+
| `summary()` | Get formatted summary string |
1424+
| `plot(ax)` | Plot sensitivity analysis |
1425+
| `to_dataframe()` | Convert to pandas DataFrame |
1426+
12811427
### Data Preparation Functions
12821428

12831429
#### generate_did_data
@@ -1501,9 +1647,23 @@ This library implements methods from the following scholarly works:
15011647

15021648
- **Roth, J. (2022).** "Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends." *American Economic Review: Insights*, 4(3), 305-322. [https://doi.org/10.1257/aeri.20210236](https://doi.org/10.1257/aeri.20210236)
15031649

1650+
- **Lakens, D. (2017).** "Equivalence Tests: A Practical Primer for t Tests, Correlations, and Meta-Analyses." *Social Psychological and Personality Science*, 8(4), 355-362. [https://doi.org/10.1177/1948550617697177](https://doi.org/10.1177/1948550617697177)
1651+
1652+
### Honest DiD / Sensitivity Analysis
1653+
1654+
The `HonestDiD` module implements sensitivity analysis methods for relaxing the parallel trends assumption:
1655+
15041656
- **Rambachan, A., & Roth, J. (2023).** "A More Credible Approach to Parallel Trends." *The Review of Economic Studies*, 90(5), 2555-2591. [https://doi.org/10.1093/restud/rdad018](https://doi.org/10.1093/restud/rdad018)
15051657

1506-
- **Lakens, D. (2017).** "Equivalence Tests: A Practical Primer for t Tests, Correlations, and Meta-Analyses." *Social Psychological and Personality Science*, 8(4), 355-362. [https://doi.org/10.1177/1948550617697177](https://doi.org/10.1177/1948550617697177)
1658+
This paper introduces the "Honest DiD" framework implemented in our `HonestDiD` class:
1659+
- **Relative Magnitudes (ΔRM)**: Bounds post-treatment violations by a multiple of observed pre-treatment violations
1660+
- **Smoothness (ΔSD)**: Bounds on second differences of trend violations, allowing for linear extrapolation of pre-trends
1661+
- **Breakdown Analysis**: Finding the smallest violation magnitude that would overturn conclusions
1662+
- **Robust Confidence Intervals**: Valid inference under partial identification
1663+
1664+
- **Roth, J., & Sant'Anna, P. H. C. (2023).** "When Is Parallel Trends Sensitive to Functional Form?" *Econometrica*, 91(2), 737-747. [https://doi.org/10.3982/ECTA19402](https://doi.org/10.3982/ECTA19402)
1665+
1666+
Discusses functional form sensitivity in parallel trends assumptions, relevant to understanding when smoothness restrictions are appropriate.
15071667

15081668
### Multi-Period and Staggered Adoption
15091669

TODO.md

Lines changed: 36 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ A production-ready DiD library needs:
1818

1919
| Feature | Status | Priority | Why It Matters |
2020
|---------|--------|----------|----------------|
21-
| **Honest DiD (Rambachan-Roth)** | Not Started | 1.0 Blocker | Reviewers expect sensitivity analysis |
21+
| **Honest DiD (Rambachan-Roth)** | ✅ Implemented | 1.0 Blocker | Reviewers expect sensitivity analysis |
2222
| **CallawaySantAnna Covariates** | ✅ Implemented | 1.0 Blocker | Conditional PT often required in practice |
2323
| **API Documentation Site** | Not Started | 1.0 Blocker | Credibility and discoverability |
2424
| Goodman-Bacon Decomposition | Not Started | 1.0 Target | Explains when TWFE fails |
@@ -35,17 +35,30 @@ A production-ready DiD library needs:
3535
These features are essential for a credible 1.0 release. Without them, the library has significant gaps compared to R alternatives.
3636

3737
### Honest DiD / Sensitivity Analysis (Rambachan-Roth)
38-
**Status**: Not Started
38+
**Status**: ✅ Implemented
3939
**Effort**: High
4040
**Practitioner Value**: ⭐⭐⭐⭐⭐
4141

4242
**Why this matters**: Pre-trends tests have low power and can exacerbate bias. Increasingly, journal reviewers and seminar audiences expect sensitivity analysis showing "how robust are results to violations of parallel trends?" This is becoming as standard as reporting robust SEs.
4343

44-
**Features needed**:
45-
- Compute bounds under restrictions on trend deviations (relative magnitudes)
46-
- Confidence intervals valid under partial identification
47-
- Breakdown analysis: "How much violation would nullify the result?"
48-
- Visualization of sensitivity curves
44+
**Implemented features**:
45+
- ✅ Relative magnitudes (ΔRM): Bounds post-treatment violations by M̄ × max pre-period violation
46+
- ✅ Smoothness (ΔSD): Bounds on second differences of trend violations
47+
- ✅ Combined restrictions (ΔSDRM): Both smoothness and relative magnitude bounds
48+
- ✅ FLCI (Fixed Length Confidence Interval) for smoothness restrictions
49+
- ✅ C-LF (Conditional Least Favorable) for relative magnitudes
50+
- ✅ Breakdown analysis: Find smallest M where robust CI includes zero
51+
- ✅ Sensitivity analysis over grid of M values
52+
- ✅ Visualization: `plot_sensitivity()` and `plot_honest_event_study()`
53+
- ✅ Comprehensive test suite (49 tests)
54+
- ✅ Tutorial notebook: `docs/tutorials/05_honest_did.ipynb`
55+
56+
**Future extensions** (post-1.0):
57+
- Improved C-LF implementation with direct optimization instead of grid search
58+
- Support for CallawaySantAnnaResults (currently only MultiPeriodDiDResults)
59+
- Event-study-specific bounds for each post-period
60+
- Hybrid inference methods
61+
- Simulation-based power analysis for honest bounds
4962

5063
**References**:
5164
- Rambachan, A., & Roth, J. (2023). A More Credible Approach to Parallel Trends. *Review of Economic Studies*.
@@ -245,6 +258,19 @@ Beyond the API site:
245258

246259
## Completed Features
247260

261+
### v0.5.2
262+
- [x] **Honest DiD sensitivity analysis** (Rambachan & Roth 2023)
263+
- Relative magnitudes (ΔRM) and smoothness (ΔSD) restrictions
264+
- Combined restrictions (ΔSDRM)
265+
- FLCI and C-LF confidence interval methods
266+
- Breakdown value computation
267+
- Sensitivity analysis over M grid
268+
- `plot_sensitivity()` and `plot_honest_event_study()` visualization
269+
- HonestDiD, HonestDiDResults, SensitivityResults classes
270+
- DeltaSD, DeltaRM, DeltaSDRM restriction classes
271+
- Tutorial notebook: `05_honest_did.ipynb`
272+
- 49 comprehensive tests
273+
248274
### v0.5.1
249275
- [x] Comprehensive test coverage for `utils.py` module (72 tests)
250276
- [x] Tutorial notebooks in `docs/tutorials/`
@@ -267,10 +293,10 @@ Beyond the API site:
267293

268294
## Suggested 1.0 Milestone Plan
269295

270-
1. **CallawaySantAnna Covariates** - Makes the staggered estimator production-ready
271-
2. **Honest DiD (Rambachan-Roth)** - Addresses the key credibility gap
296+
1. **CallawaySantAnna Covariates** - Makes the staggered estimator production-ready
297+
2. **Honest DiD (Rambachan-Roth)** - Addresses the key credibility gap
272298
3. **API Documentation Site** - Professional presentation
273299
4. **Goodman-Bacon Decomposition** - Key diagnostic for TWFE users
274300
5. **Power Analysis** - Study design tool practitioners need
275301

276-
With these five additions, diff-diff would be competitive with R's `did` + `HonestDiD` ecosystem.
302+
With items 1-2 complete, diff-diff now has feature parity with R's `did` + `HonestDiD` ecosystem for core sensitivity analysis. The remaining items (3-5) will complete the 1.0 release.

diff_diff/__init__.py

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,8 @@
2525
from diff_diff.visualization import (
2626
plot_event_study,
2727
plot_group_effects,
28+
plot_sensitivity,
29+
plot_honest_event_study,
2830
)
2931
from diff_diff.prep import (
3032
make_treatment_indicator,
@@ -54,6 +56,16 @@
5456
leave_one_out_test,
5557
run_all_placebo_tests,
5658
)
59+
from diff_diff.honest_did import (
60+
HonestDiD,
61+
HonestDiDResults,
62+
SensitivityResults,
63+
DeltaSD,
64+
DeltaRM,
65+
DeltaSDRM,
66+
compute_honest_did,
67+
sensitivity_plot,
68+
)
5769

5870
__version__ = "0.5.0"
5971
__all__ = [
@@ -73,6 +85,8 @@
7385
# Visualization
7486
"plot_event_study",
7587
"plot_group_effects",
88+
"plot_sensitivity",
89+
"plot_honest_event_study",
7690
# Parallel trends testing
7791
"check_parallel_trends",
7892
"check_parallel_trends_robust",
@@ -99,4 +113,13 @@
99113
"create_event_time",
100114
"aggregate_to_cohorts",
101115
"rank_control_units",
116+
# Honest DiD sensitivity analysis
117+
"HonestDiD",
118+
"HonestDiDResults",
119+
"SensitivityResults",
120+
"DeltaSD",
121+
"DeltaRM",
122+
"DeltaSDRM",
123+
"compute_honest_did",
124+
"sensitivity_plot",
102125
]

0 commit comments

Comments
 (0)