igerber
diff --git a/‎CLAUDE.md‎
Lines changed: 13 additions & 1 deletion b/‎CLAUDE.md‎
Lines changed: 13 additions & 1 deletion
diff --git a/‎README.md‎
Lines changed: 161 additions & 1 deletion b/‎README.md‎
Lines changed: 161 additions & 1 deletion
diff --git a/‎TODO.md‎
Lines changed: 36 additions & 10 deletions b/‎TODO.md‎
Lines changed: 36 additions & 10 deletions
diff --git a/‎diff_diff/__init__.py‎
Lines changed: 23 additions & 0 deletions b/‎diff_diff/__init__.py‎
Lines changed: 23 additions & 0 deletions
@@ -53,7 +53,9 @@ mypy diff_diff
 - **`diff_diff/visualization.py`** - Plotting functions:
   - `plot_event_study` - Publication-ready event study coefficient plots
   - `plot_group_effects` - Treatment effects by cohort visualization
-  - Works with MultiPeriodDiD, CallawaySantAnna, or DataFrames
+  - `plot_sensitivity` - Honest DiD sensitivity analysis plots (bounds vs M)
+  - `plot_honest_event_study` - Event study with honest confidence intervals
+  - Works with MultiPeriodDiD, CallawaySantAnna, HonestDiD, or DataFrames
 
 - **`diff_diff/utils.py`** - Statistical utilities:
   - Robust/cluster standard errors (`compute_robust_se`)
@@ -70,6 +72,14 @@ mypy diff_diff
   - `run_all_placebo_tests()` - Comprehensive suite of diagnostics
   - `PlaceboTestResults` - Dataclass for test results
 
+- **`diff_diff/honest_did.py`** - Honest DiD sensitivity analysis (Rambachan & Roth 2023):
+  - `HonestDiD` - Main class for computing bounds under parallel trends violations
+  - `DeltaSD`, `DeltaRM`, `DeltaSDRM` - Restriction classes for smoothness and relative magnitudes
+  - `HonestDiDResults` - Results with identified set bounds and robust CIs
+  - `SensitivityResults` - Results from sensitivity analysis over M grid
+  - `compute_honest_did()` - Convenience function for quick bounds computation
+  - `sensitivity_plot()` - Convenience function for plotting sensitivity analysis
+
 - **`diff_diff/prep.py`** - Data preparation utilities:
   - `generate_did_data` - Create synthetic data with known treatment effect
   - `make_treatment_indicator`, `make_post_indicator` - Create binary indicators
@@ -95,6 +105,7 @@ mypy diff_diff
   - `02_staggered_did.ipynb` - Staggered adoption with Callaway-Sant'Anna
   - `03_synthetic_did.ipynb` - Synthetic DiD with unit/time weights
   - `04_parallel_trends.ipynb` - Parallel trends testing and diagnostics
+  - `05_honest_did.ipynb` - Honest DiD sensitivity analysis for parallel trends violations
 
 ### Test Structure
 
@@ -106,6 +117,7 @@ Tests mirror the source modules:
 - `tests/test_wild_bootstrap.py` - Tests for wild cluster bootstrap
 - `tests/test_prep.py` - Tests for data preparation utilities
 - `tests/test_visualization.py` - Tests for plotting functions
+- `tests/test_honest_did.py` - Tests for Honest DiD sensitivity analysis
 
 ### Dependencies
 
 
@@ -75,6 +75,7 @@ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
 - **Event study plots**: Publication-ready visualization of treatment effects
 - **Parallel trends testing**: Multiple methods including equivalence tests
 - **Placebo tests**: Comprehensive diagnostics including fake timing, fake group, permutation, and leave-one-out tests
+- **Honest DiD sensitivity analysis**: Rambachan-Roth (2023) bounds and breakdown analysis for parallel trends violations
 - **Data prep utilities**: Helper functions for common data preparation tasks
 
 ## Tutorials
@@ -87,6 +88,7 @@ We provide Jupyter notebook tutorials in `docs/tutorials/`:
 | `02_staggered_did.ipynb` | Staggered adoption with Callaway-Sant'Anna, group-time effects, aggregation methods |
 | `03_synthetic_did.ipynb` | Synthetic DiD, unit/time weights, inference methods, regularization |
 | `04_parallel_trends.ipynb` | Testing parallel trends, equivalence tests, placebo tests, diagnostics |
+| `05_honest_did.ipynb` | Honest DiD sensitivity analysis, bounds, breakdown values, visualization |
 
 ## Data Preparation
 
@@ -980,6 +982,81 @@ print(f"TOST p-value: {results['tost_p_value']:.4f}")
 print(f"Trends equivalent: {results['equivalent']}")
 ```
 
+### Honest DiD Sensitivity Analysis (Rambachan-Roth)
+
+Pre-trends tests have low power and can exacerbate bias. **Honest DiD** (Rambachan & Roth 2023) provides sensitivity analysis showing how robust your results are to violations of parallel trends.
+
+```python
+from diff_diff import HonestDiD, MultiPeriodDiD
+
+# First, fit a standard event study
+did = MultiPeriodDiD()
+event_results = did.fit(
+    data,
+    outcome='outcome',
+    treatment='treated',
+    time='period',
+    post_periods=[5, 6, 7, 8, 9]
+)
+
+# Compute honest bounds with relative magnitudes restriction
+# M=1 means post-treatment violations can be up to 1x the worst pre-treatment violation
+honest = HonestDiD(method='relative_magnitude', M=1.0)
+honest_results = honest.fit(event_results)
+
+print(honest_results.summary())
+print(f"Original estimate: {honest_results.original_estimate:.4f}")
+print(f"Robust 95% CI: [{honest_results.ci_lb:.4f}, {honest_results.ci_ub:.4f}]")
+print(f"Effect robust to violations: {honest_results.is_significant}")
+```
+
+**Sensitivity analysis over M values:**
+
+```python
+# How do results change as we allow larger violations?
+sensitivity = honest.sensitivity_analysis(
+    event_results,
+    M_grid=[0, 0.5, 1.0, 1.5, 2.0]
+)
+
+print(sensitivity.summary())
+print(f"Breakdown value: M = {sensitivity.breakdown_M}")
+# Breakdown = smallest M where the robust CI includes zero
+```
+
+**Breakdown value:**
+
+The breakdown value tells you how robust your conclusion is:
+
+```python
+breakdown = honest.breakdown_value(event_results)
+if breakdown >= 1.0:
+    print("Result holds even if post-treatment violations are as bad as pre-treatment")
+else:
+    print(f"Result requires violations smaller than {breakdown:.1f}x pre-treatment")
+```
+
+**Smoothness restriction (alternative approach):**
+
+```python
+# Bounds second differences of trend violations
+# M=0 means linear extrapolation of pre-trends
+honest_smooth = HonestDiD(method='smoothness', M=0.5)
+smooth_results = honest_smooth.fit(event_results)
+```
+
+**Visualization:**
+
+```python
+from diff_diff import plot_sensitivity, plot_honest_event_study
+
+# Plot sensitivity analysis
+plot_sensitivity(sensitivity, title="Sensitivity to Parallel Trends Violations")
+
+# Event study with honest confidence intervals
+plot_honest_event_study(event_results, honest_results)
+```
+
 ### Placebo Tests
 
 Placebo tests help validate the parallel trends assumption by checking whether effects appear where they shouldn't (before treatment or in untreated groups).
@@ -1278,6 +1355,75 @@ SyntheticDiD(
 | `get_unit_weights_df()` | Get unit weights as DataFrame |
 | `get_time_weights_df()` | Get time weights as DataFrame |
 
+### HonestDiD
+
+```python
+HonestDiD(
+    method='relative_magnitude',  # 'relative_magnitude' or 'smoothness'
+    M=None,               # Restriction parameter (default: 1.0 for RM, 0.0 for SD)
+    alpha=0.05,           # Significance level for CIs
+    l_vec=None            # Linear combination vector for target parameter
+)
+```
+
+**fit() Parameters:**
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `results` | MultiPeriodDiDResults | Results from MultiPeriodDiD.fit() |
+| `M` | float | Restriction parameter (overrides constructor value) |
+
+**Methods:**
+
+| Method | Description |
+|--------|-------------|
+| `fit(results, M)` | Compute bounds for given event study results |
+| `sensitivity_analysis(results, M_grid)` | Compute bounds over grid of M values |
+| `breakdown_value(results, tol)` | Find smallest M where CI includes zero |
+
+### HonestDiDResults
+
+**Attributes:**
+
+| Attribute | Description |
+|-----------|-------------|
+| `original_estimate` | Point estimate under parallel trends |
+| `lb` | Lower bound of identified set |
+| `ub` | Upper bound of identified set |
+| `ci_lb` | Lower bound of robust confidence interval |
+| `ci_ub` | Upper bound of robust confidence interval |
+| `ci_width` | Width of robust CI |
+| `M` | Restriction parameter used |
+| `method` | Restriction method ('relative_magnitude' or 'smoothness') |
+| `alpha` | Significance level |
+| `is_significant` | True if robust CI excludes zero |
+
+**Methods:**
+
+| Method | Description |
+|--------|-------------|
+| `summary()` | Get formatted summary string |
+| `to_dict()` | Convert to dictionary |
+| `to_dataframe()` | Convert to pandas DataFrame |
+
+### SensitivityResults
+
+**Attributes:**
+
+| Attribute | Description |
+|-----------|-------------|
+| `M_grid` | Array of M values analyzed |
+| `results` | List of HonestDiDResults for each M |
+| `breakdown_M` | Smallest M where CI includes zero (None if always significant) |
+
+**Methods:**
+
+| Method | Description |
+|--------|-------------|
+| `summary()` | Get formatted summary string |
+| `plot(ax)` | Plot sensitivity analysis |
+| `to_dataframe()` | Convert to pandas DataFrame |
+
 ### Data Preparation Functions
 
 #### generate_did_data
@@ -1501,9 +1647,23 @@ This library implements methods from the following scholarly works:
 
 - **Roth, J. (2022).** "Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends." *American Economic Review: Insights*, 4(3), 305-322. [https://doi.org/10.1257/aeri.20210236](https://doi.org/10.1257/aeri.20210236)
 
+- **Lakens, D. (2017).** "Equivalence Tests: A Practical Primer for t Tests, Correlations, and Meta-Analyses." *Social Psychological and Personality Science*, 8(4), 355-362. [https://doi.org/10.1177/1948550617697177](https://doi.org/10.1177/1948550617697177)
+
+### Honest DiD / Sensitivity Analysis
+
+The `HonestDiD` module implements sensitivity analysis methods for relaxing the parallel trends assumption:
+
 - **Rambachan, A., & Roth, J. (2023).** "A More Credible Approach to Parallel Trends." *The Review of Economic Studies*, 90(5), 2555-2591. [https://doi.org/10.1093/restud/rdad018](https://doi.org/10.1093/restud/rdad018)
 
-- **Lakens, D. (2017).** "Equivalence Tests: A Practical Primer for t Tests, Correlations, and Meta-Analyses." *Social Psychological and Personality Science*, 8(4), 355-362. [https://doi.org/10.1177/1948550617697177](https://doi.org/10.1177/1948550617697177)
+  This paper introduces the "Honest DiD" framework implemented in our `HonestDiD` class:
+  - **Relative Magnitudes (ΔRM)**: Bounds post-treatment violations by a multiple of observed pre-treatment violations
+  - **Smoothness (ΔSD)**: Bounds on second differences of trend violations, allowing for linear extrapolation of pre-trends
+  - **Breakdown Analysis**: Finding the smallest violation magnitude that would overturn conclusions
+  - **Robust Confidence Intervals**: Valid inference under partial identification
+
+- **Roth, J., & Sant'Anna, P. H. C. (2023).** "When Is Parallel Trends Sensitive to Functional Form?" *Econometrica*, 91(2), 737-747. [https://doi.org/10.3982/ECTA19402](https://doi.org/10.3982/ECTA19402)
+
+  Discusses functional form sensitivity in parallel trends assumptions, relevant to understanding when smoothness restrictions are appropriate.
 
 ### Multi-Period and Staggered Adoption
 
 
@@ -18,7 +18,7 @@ A production-ready DiD library needs:
 
 | Feature | Status | Priority | Why It Matters |
 |---------|--------|----------|----------------|
-| **Honest DiD (Rambachan-Roth)** | Not Started | 1.0 Blocker | Reviewers expect sensitivity analysis |
+| **Honest DiD (Rambachan-Roth)** | ✅ Implemented | 1.0 Blocker | Reviewers expect sensitivity analysis |
 | **CallawaySantAnna Covariates** | ✅ Implemented | 1.0 Blocker | Conditional PT often required in practice |
 | **API Documentation Site** | Not Started | 1.0 Blocker | Credibility and discoverability |
 | Goodman-Bacon Decomposition | Not Started | 1.0 Target | Explains when TWFE fails |
@@ -35,17 +35,30 @@ A production-ready DiD library needs:
 These features are essential for a credible 1.0 release. Without them, the library has significant gaps compared to R alternatives.
 
 ### Honest DiD / Sensitivity Analysis (Rambachan-Roth)
-**Status**: Not Started
+**Status**: ✅ Implemented
 **Effort**: High
 **Practitioner Value**: ⭐⭐⭐⭐⭐
 
 **Why this matters**: Pre-trends tests have low power and can exacerbate bias. Increasingly, journal reviewers and seminar audiences expect sensitivity analysis showing "how robust are results to violations of parallel trends?" This is becoming as standard as reporting robust SEs.
 
-**Features needed**:
-- Compute bounds under restrictions on trend deviations (relative magnitudes)
-- Confidence intervals valid under partial identification
-- Breakdown analysis: "How much violation would nullify the result?"
-- Visualization of sensitivity curves
+**Implemented features**:
+- ✅ Relative magnitudes (ΔRM): Bounds post-treatment violations by M̄ × max pre-period violation
+- ✅ Smoothness (ΔSD): Bounds on second differences of trend violations
+- ✅ Combined restrictions (ΔSDRM): Both smoothness and relative magnitude bounds
+- ✅ FLCI (Fixed Length Confidence Interval) for smoothness restrictions
+- ✅ C-LF (Conditional Least Favorable) for relative magnitudes
+- ✅ Breakdown analysis: Find smallest M where robust CI includes zero
+- ✅ Sensitivity analysis over grid of M values
+- ✅ Visualization: `plot_sensitivity()` and `plot_honest_event_study()`
+- ✅ Comprehensive test suite (49 tests)
+- ✅ Tutorial notebook: `docs/tutorials/05_honest_did.ipynb`
+
+**Future extensions** (post-1.0):
+- Improved C-LF implementation with direct optimization instead of grid search
+- Support for CallawaySantAnnaResults (currently only MultiPeriodDiDResults)
+- Event-study-specific bounds for each post-period
+- Hybrid inference methods
+- Simulation-based power analysis for honest bounds
 
 **References**:
 - Rambachan, A., & Roth, J. (2023). A More Credible Approach to Parallel Trends. *Review of Economic Studies*.
@@ -245,6 +258,19 @@ Beyond the API site:
 
 ## Completed Features
 
+### v0.5.2
+- [x] **Honest DiD sensitivity analysis** (Rambachan & Roth 2023)
+  - Relative magnitudes (ΔRM) and smoothness (ΔSD) restrictions
+  - Combined restrictions (ΔSDRM)
+  - FLCI and C-LF confidence interval methods
+  - Breakdown value computation
+  - Sensitivity analysis over M grid
+  - `plot_sensitivity()` and `plot_honest_event_study()` visualization
+  - HonestDiD, HonestDiDResults, SensitivityResults classes
+  - DeltaSD, DeltaRM, DeltaSDRM restriction classes
+  - Tutorial notebook: `05_honest_did.ipynb`
+  - 49 comprehensive tests
+
 ### v0.5.1
 - [x] Comprehensive test coverage for `utils.py` module (72 tests)
 - [x] Tutorial notebooks in `docs/tutorials/`
@@ -267,10 +293,10 @@ Beyond the API site:
 
 ## Suggested 1.0 Milestone Plan
 
-1. **CallawaySantAnna Covariates** - Makes the staggered estimator production-ready
-2. **Honest DiD (Rambachan-Roth)** - Addresses the key credibility gap
+1. ✅ **CallawaySantAnna Covariates** - Makes the staggered estimator production-ready
+2. ✅ **Honest DiD (Rambachan-Roth)** - Addresses the key credibility gap
 3. **API Documentation Site** - Professional presentation
 4. **Goodman-Bacon Decomposition** - Key diagnostic for TWFE users
 5. **Power Analysis** - Study design tool practitioners need
 
-With these five additions, diff-diff would be competitive with R's `did` + `HonestDiD` ecosystem.
+With items 1-2 complete, diff-diff now has feature parity with R's `did` + `HonestDiD` ecosystem for core sensitivity analysis. The remaining items (3-5) will complete the 1.0 release.
@@ -25,6 +25,8 @@
 from diff_diff.visualization import (
     plot_event_study,
     plot_group_effects,
+    plot_sensitivity,
+    plot_honest_event_study,
 )
 from diff_diff.prep import (
     make_treatment_indicator,
@@ -54,6 +56,16 @@
     leave_one_out_test,
     run_all_placebo_tests,
 )
+from diff_diff.honest_did import (
+    HonestDiD,
+    HonestDiDResults,
+    SensitivityResults,
+    DeltaSD,
+    DeltaRM,
+    DeltaSDRM,
+    compute_honest_did,
+    sensitivity_plot,
+)
 
 __version__ = "0.5.0"
 __all__ = [
@@ -73,6 +85,8 @@
     # Visualization
     "plot_event_study",
     "plot_group_effects",
+    "plot_sensitivity",
+    "plot_honest_event_study",
     # Parallel trends testing
     "check_parallel_trends",
     "check_parallel_trends_robust",
@@ -99,4 +113,13 @@
     "create_event_time",
     "aggregate_to_cohorts",
     "rank_control_units",
+    # Honest DiD sensitivity analysis
+    "HonestDiD",
+    "HonestDiDResults",
+    "SensitivityResults",
+    "DeltaSD",
+    "DeltaRM",
+    "DeltaSDRM",
+    "compute_honest_did",
+    "sensitivity_plot",
 ]