Skip to content

Commit d21d29f

Browse files
authored
Merge pull request #55 from igerber/claude/v1.4.0-performance-improvements
feat: Major performance improvements for v1.4.0
2 parents b4685e3 + 0d9a928 commit d21d29f

15 files changed

Lines changed: 1218 additions & 353 deletions

CHANGELOG.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,39 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [1.4.0] - 2026-01-11
9+
10+
### Added
11+
- **Unified linear algebra backend** (`diff_diff/linalg.py`)
12+
- `solve_ols()` - Optimized OLS solver using scipy's gelsy LAPACK driver
13+
- `compute_robust_vcov()` - Vectorized (clustered) robust variance-covariance
14+
- Single optimization point for all estimators; prepares for future Rust backend
15+
- New `tests/test_linalg.py` with comprehensive tests
16+
17+
### Changed
18+
- **Major performance improvements** - All estimators now significantly faster
19+
- BasicDiD/TWFE @ 10K: 0.835s → 0.011s (76x faster, now 4.2x faster than R)
20+
- CallawaySantAnna @ 10K: 2.234s → 0.109s (20x faster, now 7.2x faster than R)
21+
- All results numerically identical to previous versions
22+
- **CallawaySantAnna optimizations** (`staggered.py`)
23+
- Pre-computed wide-format outcome matrix and cohort masks
24+
- Vectorized ATT(g,t) computation using numpy operations (23x faster)
25+
- Batch bootstrap weight generation
26+
- Vectorized multiplier bootstrap using matrix operations (26x faster)
27+
- **TWFE optimization** (`twfe.py`)
28+
- Cached groupby indexes for within-transformation
29+
- **All estimators migrated** to unified `linalg.py` backend
30+
- `estimators.py`, `twfe.py`, `staggered.py`, `triple_diff.py`,
31+
`synthetic_did.py`, `sun_abraham.py`, `utils.py`
32+
33+
### Behavioral Changes
34+
- **Rank-deficient design matrices**: The new `gelsy` LAPACK driver handles
35+
rank-deficient matrices gracefully (returning a least-norm solution) rather
36+
than raising an explicit error. Previously, `DifferenceInDifferences` would
37+
raise `ValueError("Design matrix is rank-deficient")`. Users relying on this
38+
error for collinearity detection should validate their design matrices
39+
separately. Results remain numerically correct for well-specified models.
40+
841
## [1.3.1] - 2026-01-10
942

1043
### Added
@@ -282,6 +315,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
282315
- `to_dict()` and `to_dataframe()` export methods
283316
- `is_significant` and `significance_stars` properties
284317

318+
[1.4.0]: https://github.com/igerber/diff-diff/compare/v1.3.1...v1.4.0
285319
[1.3.1]: https://github.com/igerber/diff-diff/compare/v1.3.0...v1.3.1
286320
[1.3.0]: https://github.com/igerber/diff-diff/compare/v1.2.1...v1.3.0
287321
[1.2.1]: https://github.com/igerber/diff-diff/compare/v1.2.0...v1.2.1

ROADMAP.md

Lines changed: 18 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ For past changes and release history, see [CHANGELOG.md](CHANGELOG.md).
66

77
---
88

9-
## Current Status (v1.3.1)
9+
## Current Status (v1.4.0)
1010

1111
diff-diff is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` + `synthdid` ecosystem for core DiD analysis:
1212

@@ -15,35 +15,38 @@ diff-diff is a **production-ready** DiD library with feature parity with R's `di
1515
- **Assumption diagnostics**: Parallel trends tests, placebo tests, Goodman-Bacon decomposition
1616
- **Sensitivity analysis**: Honest DiD (Rambachan-Roth), Pre-trends power analysis (Roth 2022)
1717
- **Study design**: Power analysis tools
18+
- **Performance**: Now faster than R at scale (see below)
1819

1920
---
2021

2122
## Priority: Performance Improvements
2223

23-
**Status:** Planning complete, implementation pending
24+
**Status:** ✅ Phase 1 Complete (v1.4.0)
2425

25-
Benchmarks show diff-diff is 3-17x slower than R's fixest for BasicDiD/TWFE at large scales (10K+ units). This is our top priority for v1.4.
26+
Phase 1 pure Python optimizations exceeded targets. diff-diff now **beats R** at scale:
2627

27-
### Summary
28+
| Estimator | v1.3 (10K scale) | v1.4 (10K scale) | vs R |
29+
|-----------|------------------|------------------|------|
30+
| BasicDiD/TWFE | 0.835s | **0.011s** | **4.2x faster than R** |
31+
| CallawaySantAnna | 2.234s | **0.109s** | **7.2x faster than R** |
32+
| SyntheticDiD | Already 37x faster | N/A | **37x faster than R** |
2833

29-
| Estimator | Current (10K scale) | Target | Approach |
30-
|-----------|---------------------|--------|----------|
31-
| BasicDiD/TWFE | 0.835s (R: 0.049s) | Match R | Rust backend |
32-
| CallawaySantAnna | 2.234s (R: 0.816s) | Match R | Vectorization + Rust |
33-
| SyntheticDiD | Already 37-1600x faster than R | Maintain | N/A |
34+
### What Was Done (v1.4.0)
3435

35-
### Approach
36+
1. **Unified `linalg.py` backend** - Single OLS/SE implementation for all estimators
37+
2. **Vectorized cluster-robust SE** - Eliminated O(n × clusters) loop
38+
3. **Pre-computed data structures** - Wide-format outcome matrix, cohort masks
39+
4. **Vectorized bootstrap** - Matrix operations instead of nested loops
3640

37-
1. **Phase 1:** Pure Python optimizations (vectorized cluster SE, scipy lstsq, cached groupby)
38-
2. **Phase 2:** Rust backend via PyO3 for performance-critical paths (cluster SE, demeaning, bootstrap)
41+
### Phase 2 (Future)
3942

40-
The Rust backend will be optional with graceful fallback to pure Python.
43+
Rust backend remains available if further optimization needed, but pure Python now exceeds R performance.
4144

4245
**Full details:** [docs/performance-plan.md](docs/performance-plan.md)
4346

4447
---
4548

46-
## Near-Term Enhancements (v1.4)
49+
## Near-Term Enhancements (v1.5)
4750

4851
High-value additions building on our existing foundation.
4952

@@ -99,7 +102,7 @@ Extend the existing `TripleDifference` estimator to handle staggered adoption se
99102

100103
---
101104

102-
## Medium-Term Enhancements (v1.5+)
105+
## Medium-Term Enhancements (v1.6+)
103106

104107
Extending diff-diff to handle more complex settings.
105108

diff_diff/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@
103103
plot_sensitivity,
104104
)
105105

106-
__version__ = "1.3.1"
106+
__version__ = "1.4.0"
107107
__all__ = [
108108
# Estimators
109109
"DifferenceInDifferences",

diff_diff/estimators.py

Lines changed: 29 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -17,12 +17,12 @@
1717
import numpy as np
1818
import pandas as pd
1919

20+
from diff_diff.linalg import compute_r_squared, compute_robust_vcov, solve_ols
2021
from diff_diff.results import DiDResults, MultiPeriodDiDResults, PeriodEffect
2122
from diff_diff.utils import (
2223
WildBootstrapResults,
2324
compute_confidence_interval,
2425
compute_p_value,
25-
compute_robust_se,
2626
validate_binary,
2727
wild_bootstrap_se,
2828
)
@@ -261,8 +261,11 @@ def fit(
261261
X = np.column_stack([X, dummies[col].values.astype(float)])
262262
var_names.append(col)
263263

264-
# Fit OLS
265-
coefficients, residuals, fitted, r_squared = self._fit_ols(X, y)
264+
# Fit OLS using unified backend
265+
coefficients, residuals, fitted, vcov = solve_ols(
266+
X, y, return_fitted=True, return_vcov=False
267+
)
268+
r_squared = compute_r_squared(y, residuals)
266269

267270
# Extract ATT (coefficient on interaction term)
268271
att_idx = 3 # Index of interaction term
@@ -285,13 +288,13 @@ def fit(
285288
)
286289
elif self.cluster is not None:
287290
cluster_ids = data[self.cluster].values
288-
vcov = compute_robust_se(X, residuals, cluster_ids)
291+
vcov = compute_robust_vcov(X, residuals, cluster_ids)
289292
se = np.sqrt(vcov[att_idx, att_idx])
290293
t_stat = att / se
291294
p_value = compute_p_value(t_stat, df=df)
292295
conf_int = compute_confidence_interval(att, se, self.alpha, df=df)
293296
elif self.robust:
294-
vcov = compute_robust_se(X, residuals)
297+
vcov = compute_robust_vcov(X, residuals)
295298
se = np.sqrt(vcov[att_idx, att_idx])
296299
t_stat = att / se
297300
p_value = compute_p_value(t_stat, df=df)
@@ -300,7 +303,7 @@ def fit(
300303
# Classical OLS standard errors
301304
n = len(y)
302305
k = X.shape[1]
303-
mse = np.sum(residuals ** 2) / (n - k)
306+
mse = np.sum(residuals**2) / (n - k)
304307
# Use solve() instead of inv() for numerical stability
305308
# solve(A, B) computes X where AX=B, so this yields (X'X)^{-1} * mse
306309
vcov = np.linalg.solve(X.T @ X, mse * np.eye(k))
@@ -352,10 +355,15 @@ def fit(
352355

353356
return self.results_
354357

355-
def _fit_ols(self, X: np.ndarray, y: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray, float]:
358+
def _fit_ols(
359+
self, X: np.ndarray, y: np.ndarray
360+
) -> Tuple[np.ndarray, np.ndarray, np.ndarray, float]:
356361
"""
357362
Fit OLS regression.
358363
364+
This method is kept for backwards compatibility. Internally uses the
365+
unified solve_ols from diff_diff.linalg for optimized computation.
366+
359367
Parameters
360368
----------
361369
X : np.ndarray
@@ -367,32 +375,12 @@ def _fit_ols(self, X: np.ndarray, y: np.ndarray) -> Tuple[np.ndarray, np.ndarray
367375
-------
368376
tuple
369377
(coefficients, residuals, fitted_values, r_squared)
370-
371-
Raises
372-
------
373-
ValueError
374-
If design matrix is rank-deficient (perfect multicollinearity).
375378
"""
376-
# Check for rank deficiency (perfect multicollinearity)
377-
rank = np.linalg.matrix_rank(X)
378-
if rank < X.shape[1]:
379-
raise ValueError(
380-
f"Design matrix is rank-deficient (rank {rank} < {X.shape[1]} columns). "
381-
"This indicates perfect multicollinearity. Check your fixed effects "
382-
"and covariates for linear dependencies."
383-
)
384-
385-
# Solve normal equations: β = (X'X)^(-1) X'y
386-
coefficients = np.linalg.lstsq(X, y, rcond=None)[0]
387-
388-
# Compute fitted values and residuals
389-
fitted = X @ coefficients
390-
residuals = y - fitted
391-
392-
# Compute R-squared
393-
ss_res = np.sum(residuals ** 2)
394-
ss_tot = np.sum((y - np.mean(y)) ** 2)
395-
r_squared = 1 - (ss_res / ss_tot) if ss_tot > 0 else 0.0
379+
# Use unified OLS backend
380+
coefficients, residuals, fitted, _ = solve_ols(
381+
X, y, return_fitted=True, return_vcov=False
382+
)
383+
r_squared = compute_r_squared(y, residuals)
396384

397385
return coefficients, residuals, fitted, r_squared
398386

@@ -442,7 +430,7 @@ def _run_wild_bootstrap_inference(
442430
t_stat = bootstrap_results.t_stat_original
443431

444432
# Also compute vcov for storage (using cluster-robust for consistency)
445-
vcov = compute_robust_se(X, residuals, cluster_ids)
433+
vcov = compute_robust_vcov(X, residuals, cluster_ids)
446434

447435
return se, p_value, conf_int, t_stat, vcov, bootstrap_results
448436

@@ -889,8 +877,11 @@ def fit( # type: ignore[override]
889877
X = np.column_stack([X, dummies[col].values.astype(float)])
890878
var_names.append(col)
891879

892-
# Fit OLS
893-
coefficients, residuals, fitted, r_squared = self._fit_ols(X, y)
880+
# Fit OLS using unified backend
881+
coefficients, residuals, fitted, _ = solve_ols(
882+
X, y, return_fitted=True, return_vcov=False
883+
)
884+
r_squared = compute_r_squared(y, residuals)
894885

895886
# Degrees of freedom
896887
df = len(y) - X.shape[1] - n_absorbed_effects
@@ -900,13 +891,13 @@ def fit( # type: ignore[override]
900891
# For now, we use analytical inference even if inference="wild_bootstrap"
901892
if self.cluster is not None:
902893
cluster_ids = data[self.cluster].values
903-
vcov = compute_robust_se(X, residuals, cluster_ids)
894+
vcov = compute_robust_vcov(X, residuals, cluster_ids)
904895
elif self.robust:
905-
vcov = compute_robust_se(X, residuals)
896+
vcov = compute_robust_vcov(X, residuals)
906897
else:
907898
n = len(y)
908899
k = X.shape[1]
909-
mse = np.sum(residuals ** 2) / (n - k)
900+
mse = np.sum(residuals**2) / (n - k)
910901
# Use solve() instead of inv() for numerical stability
911902
# solve(A, B) computes X where AX=B, so this yields (X'X)^{-1} * mse
912903
vcov = np.linalg.solve(X.T @ X, mse * np.eye(k))

0 commit comments

Comments
 (0)