You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implement R-style rank deficiency handling instead of silent SVD truncation
Replace silent SVD truncation with R's lm() approach for rank-deficient matrices:
- Detect rank deficiency using pivoted QR decomposition
- Warn users with clear message listing dropped columns
- Set NaN for coefficients of linearly dependent columns
- Compute valid SEs for identified coefficients only
- Expand vcov matrix with NaN for dropped rows/columns
Add rank_deficient_action parameter ("warn", "error", "silent") to control behavior.
Hybrid Rust/Python routing:
- Full-rank matrices use fast Rust backend (when available)
- Rank-deficient matrices use Python backend for proper NA handling
(ndarray-linalg doesn't support QR with pivoting)
Also fixes tutorial notebook 02 to avoid rank deficiency by including
both treated cohorts in the MultiPeriodDiD example.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Single optimization point for all estimators (reduces code duplication)
128
129
- Cluster-robust SEs use pandas groupby instead of O(n × clusters) loop
130
+
-**Rank deficiency handling** (R-style):
131
+
- Detects rank-deficient matrices using pivoted QR decomposition
132
+
-`rank_deficient_action` parameter: "warn" (default), "error", or "silent"
133
+
- Dropped columns have NaN coefficients (like R's `lm()`)
134
+
- VCoV matrix has NaN for rows/cols of dropped coefficients
135
+
- Warnings include column names when provided
129
136
130
137
-**`diff_diff/_backend.py`** - Backend detection and configuration (v2.0.0):
131
138
- Detects optional Rust backend availability
@@ -240,16 +247,20 @@ diff-diff achieved significant performance improvements in v1.4.0, now **faster
240
247
241
248
All estimators use a single optimized OLS/SE implementation:
242
249
243
-
-**scipy.linalg.lstsq with 'gelsd' driver**: SVD-based solving that properly handles rank-deficient matrices (critical for MultiPeriodDiD and other estimators with potentially redundant columns)
250
+
-**R-style rank deficiency handling**: Uses pivoted QR to detect linearly dependent columns, drops them, sets NaN for their coefficients, and emits informative warnings (following R's `lm()` approach)
0 commit comments