Skip to content

Commit 6d4f973

Browse files
igerberclaude
andcommitted
docs: Document v1.4.0 performance architecture in CLAUDE.md
Add documentation for the unified linalg.py backend and performance optimizations introduced in v1.4.0: - Add linalg.py module entry with solve_ols, compute_robust_vcov details - Add Performance Architecture section explaining key optimizations - Document CallawaySantAnna-specific optimizations (pre-computed structures, vectorized ATT, batch bootstrap) - Add test_linalg.py to test structure - Add unified linear algebra backend to design patterns Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 8a57092 commit 6d4f973

1 file changed

Lines changed: 46 additions & 0 deletions

File tree

CLAUDE.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,13 @@ mypy diff_diff
7474
- `bacon_decompose()` - Convenience function for quick decomposition
7575
- Integrated with `TwoWayFixedEffects.decompose()` method
7676

77+
- **`diff_diff/linalg.py`** - Unified linear algebra backend (v1.4.0):
78+
- `solve_ols()` - OLS solver using scipy's gelsy LAPACK driver (QR-based, faster than SVD)
79+
- `compute_robust_vcov()` - Vectorized HC1 and cluster-robust variance-covariance estimation
80+
- `compute_r_squared()` - R-squared and adjusted R-squared computation
81+
- Single optimization point for all estimators (reduces code duplication)
82+
- Cluster-robust SEs use pandas groupby instead of O(n × clusters) loop
83+
7784
- **`diff_diff/results.py`** - Dataclass containers for estimation results:
7885
- `DiDResults`, `MultiPeriodDiDResults`, `SyntheticDiDResults`, `PeriodEffect`
7986
- Each provides `summary()`, `to_dict()`, `to_dataframe()` methods
@@ -144,6 +151,44 @@ mypy diff_diff
144151
- `fixed_effects` parameter creates dummy variables (for low-dimensional FE)
145152
- `absorb` parameter uses within-transformation (for high-dimensional FE)
146153
4. **Results objects**: Rich dataclass objects with statistical properties (`is_significant`, `significance_stars`)
154+
5. **Unified linear algebra backend**: All estimators use `linalg.py` for OLS and variance estimation
155+
156+
### Performance Architecture (v1.4.0)
157+
158+
diff-diff achieved significant performance improvements in v1.4.0, now **faster than R** at all scales. Key optimizations:
159+
160+
#### Unified `linalg.py` Backend
161+
162+
All estimators use a single optimized OLS/SE implementation:
163+
164+
- **scipy.linalg.lstsq with 'gelsy' driver**: QR-based solving, faster than NumPy's default SVD-based solver
165+
- **Vectorized cluster-robust SE**: Uses pandas groupby aggregation instead of O(n × clusters) Python loop
166+
- **Single optimization point**: Changes to `linalg.py` benefit all estimators
167+
168+
```python
169+
# All estimators import from linalg.py
170+
from diff_diff.linalg import solve_ols, compute_robust_vcov
171+
172+
# Example usage
173+
coefficients, residuals, vcov = solve_ols(X, y, cluster_ids=cluster_ids)
174+
```
175+
176+
#### CallawaySantAnna Optimizations (`staggered.py`)
177+
178+
- **Pre-computed data structures**: `_precompute_structures()` creates wide-format outcome matrix and cohort masks once
179+
- **Vectorized ATT(g,t)**: `_compute_att_gt_fast()` uses numpy operations (23x faster than loop-based)
180+
- **Batch bootstrap weights**: `_generate_bootstrap_weights_batch()` generates all weights at once
181+
- **Matrix-based bootstrap**: Bootstrap iterations use matrix operations instead of nested loops (26x faster)
182+
183+
#### Performance Results
184+
185+
| Estimator | v1.3 (10K scale) | v1.4 (10K scale) | vs R |
186+
|-----------|------------------|------------------|------|
187+
| BasicDiD/TWFE | 0.835s | 0.011s | **4x faster than R** |
188+
| CallawaySantAnna | 2.234s | 0.109s | **8x faster than R** |
189+
| SyntheticDiD | Already optimized | N/A | **37x faster than R** |
190+
191+
See `docs/performance-plan.md` for full optimization details and `docs/benchmarks.rst` for validation results.
147192

148193
### Documentation
149194

@@ -189,6 +234,7 @@ Tests mirror the source modules:
189234
- `tests/test_sun_abraham.py` - Tests for SunAbraham interaction-weighted estimator
190235
- `tests/test_triple_diff.py` - Tests for Triple Difference (DDD) estimator
191236
- `tests/test_bacon.py` - Tests for Goodman-Bacon decomposition
237+
- `tests/test_linalg.py` - Tests for unified OLS backend and robust variance estimation
192238
- `tests/test_utils.py` - Tests for parallel trends, robust SE, synthetic weights
193239
- `tests/test_diagnostics.py` - Tests for placebo tests
194240
- `tests/test_wild_bootstrap.py` - Tests for wild cluster bootstrap

0 commit comments

Comments
 (0)