You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MultiPeriodDiD was producing astronomically wrong estimates (~252 trillion
instead of ~2-5) due to rank-deficient design matrices being solved
incorrectly by the gelsy LAPACK driver.
Changes:
- Python: Switch from gelsy to gelsd driver (SVD-based with truncation)
- Rust: Replace least_squares() with explicit SVD + truncated pseudoinverse
- Add comprehensive tests for rank-deficient matrices in both backends
- Add Rust vs NumPy equivalence tests for rank-deficient cases
- Document NaN standard errors limitation in TODO.md
The gelsd driver properly handles rank-deficient matrices by truncating
small singular values below rcond * max(S), producing valid minimum-norm
solutions instead of garbage coefficients.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
-`compute_robust_vcov()` - Vectorized HC1 and cluster-robust variance-covariance estimation
124
124
-`compute_r_squared()` - R-squared and adjusted R-squared computation
125
125
-`LinearRegression` - High-level OLS helper class with unified coefficient extraction and inference
@@ -240,7 +240,7 @@ diff-diff achieved significant performance improvements in v1.4.0, now **faster
240
240
241
241
All estimators use a single optimized OLS/SE implementation:
242
242
243
-
-**scipy.linalg.lstsq with 'gelsy' driver**: QR-based solving, faster than NumPy's default SVD-based solver
243
+
-**scipy.linalg.lstsq with 'gelsd' driver**: SVD-based solving that properly handles rank-deficient matrices (critical for MultiPeriodDiD and other estimators with potentially redundant columns)
### NaN Standard Errors for Rank-Deficient Matrices
20
+
21
+
**Problem**: When the design matrix is rank-deficient (e.g., MultiPeriodDiD with redundant period dummies + treatment interactions), the coefficients are now computed correctly via SVD truncation, but the variance-covariance matrix computation produces NaN values.
22
+
23
+
**Root cause**: The vcov computation in `compute_robust_vcov()` computes `(X'X)^{-1}` which doesn't exist for rank-deficient matrices. The current implementation uses Cholesky factorization which fails silently, producing NaN values.
24
+
25
+
**Affected estimators**:
26
+
-`MultiPeriodDiD` - when design matrix has redundant columns
27
+
- Any estimator using `solve_ols()` with rank-deficient X
28
+
29
+
**Potential fix**: Use the Moore-Penrose pseudoinverse `(X'X)^+` instead of `(X'X)^{-1}` for the bread matrix in the sandwich estimator. This would provide valid (though potentially conservative) standard errors for the identifiable parameters.
30
+
31
+
**Workaround**: Users can use bootstrap inference which doesn't rely on the analytical vcov.
0 commit comments