You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unify Rust TROP inner solver to SVD; close finding #23 grid-search divergence
Closes the grid-search half of silent-failures finding #23 (TODO row 87).
The `xfail(strict=True)` regression `test_grid_search_rank_deficient_Y`
baselined a ~6% ATT divergence between Rust and Python on two near-parallel
control units. Root cause: Rust's `solve_joint_no_lowrank` used iterative
block coordinate descent (50 iter, tol=1e-8) while Python used SVD-based
minimum-norm least squares. On rank-deficient Y the two solvers converge
to different stationary points of the same objective.
Python is canonical (SVD / minimum-norm least squares per Golub & Van Loan).
Rust's iterative solver was a speed optimization, not a methodology choice.
Port the Rust inner TWFE step to SVD-based WLS that mirrors Python's
`np.linalg.lstsq(rcond=None)` step-for-step, with numpy-compatible
`rcond = eps * max(n, k)`.
Changes
- rust/src/linalg.rs: promote ndarray_to_faer to pub(crate) so trop.rs can reuse it.
- rust/src/trop.rs: new module-private solve_wls_svd helper — thin-SVD + rcond truncation, matches numpy's minimum-norm semantics. Rewrite solve_joint_no_lowrank body to flatten y/weights row-major, build the [intercept | unit_dummies[1..] | time_dummies[1..]] design matrix, apply sqrt-weights, and solve via solve_wls_svd. Function signature unchanged — all 4 call sites (LOOCV, FISTA TWFE step x2, bootstrap) benefit transitively.
- tests/test_rust_backend.py: remove @pytest.mark.xfail from test_grid_search_rank_deficient_Y; the gap is closed. Bootstrap-seed test retains its xfail (row 87 RNG mismatch, out of scope).
- docs/methodology/REGISTRY.md: update the TROP Global Estimation bullet at the existing `np.linalg.lstsq` line to note Rust and Python now both use SVD-based minimum-norm WLS with numpy-compatible rcond.
- TODO.md: delete row 87 (grid-search divergence entry).
Verification
- maturin develop --release --features accelerate: clean build, no warnings.
- pytest tests/test_rust_backend.py::TestTROPRustEdgeCaseParity: grid-search test now passes; bootstrap-seed test correctly xfails.
- pytest tests/test_rust_backend.py -k TROP -m '': 23 passed, 1 xfailed, no regressions.
- pytest tests/test_trop.py: 83 passed, 37 deselected (slow).
- TestTROPGlobalRustVsNumpy (incl. lambda_nn=0 low-rank FISTA path): 8 passed — FISTA TWFE step unchanged in behavior on well-conditioned data.
- grep for other 'for _ in 0..50' coordinate-descent patterns in rust/src/*.rs: none found.
Non-goals
- No changes to row 87 (bootstrap RNG mismatch — Rust rand crate vs numpy default_rng ~28% SE gap on seed=42). Separate PR.
- No changes to linalg.rs::solve_ols (rcond=1e-7 is load-bearing for MultiPeriodDiD / DiD / TWFE).
- No public API changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: TODO.md
-1Lines changed: 0 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -83,7 +83,6 @@ Deferred items from PR reviews that were not addressed before merge.
83
83
| Weighted CR2 Bell-McCaffrey cluster-robust (`vcov_type="hc2_bm"` + `cluster_ids` + `weights`) currently raises `NotImplementedError`. Weighted hat matrix and residual rebalancing need threading per clubSandwich WLS handling. |`linalg.py::_compute_cr2_bm`| Phase 1a | Medium |
84
84
| Regenerate `benchmarks/data/clubsandwich_cr2_golden.json` from R (`Rscript benchmarks/R/generate_clubsandwich_golden.R`). Current JSON has `source: python_self_reference` as a stability anchor until an authoritative R run. |`benchmarks/R/generate_clubsandwich_golden.R`| Phase 1a | Medium |
85
85
|`honest_did.py:1907``np.linalg.solve(A_sys, b_sys) / except LinAlgError: continue` is a silent basis-rejection in the vertex-enumeration loop that is algorithmically intentional (try the next basis). Consider surfacing a count of rejected bases as a diagnostic when ARP enumeration exhausts, so users see when the vertex search was heavily constrained. Not a silent failure in the sense of the Phase 2 audit (the algorithm is supposed to skip), but the diagnostic would help debug borderline cases. |`honest_did.py`|#334| Low |
86
-
| TROP Rust vs Python grid-search divergence on rank-deficient Y: on two near-parallel control units, LOOCV grid-search ATT diverges ~6% between Rust (`trop_global.py:688`) and Python fallback (`trop_global.py:753`). Either grid-winner ties are broken differently or the per-λ solver reaches different stationary points under rank deficiency. Audit finding #23 flagged this surface. `@pytest.mark.xfail(strict=True)` in `tests/test_rust_backend.py::TestTROPRustEdgeCaseParity::test_grid_search_rank_deficient_Y` baselines the gap. |`trop_global.py`, `rust/`| follow-up | Medium |
87
86
| TROP Rust vs Python bootstrap SE divergence under fixed seed: `seed=42` on a tiny panel produces ~28% bootstrap-SE gap. Root cause: Rust bootstrap uses its own RNG (`rand` crate) while Python uses `numpy.random.default_rng`; same seed value maps to different bytestreams across backends. Audit axis-H (RNG/seed) adjacent. `@pytest.mark.xfail(strict=True)` in `tests/test_rust_backend.py::TestTROPRustEdgeCaseParity::test_bootstrap_seed_reproducibility` baselines the gap. Unifying RNG (threading a numpy-generated seed-sequence into Rust, or porting Python to ChaCha) would close it. |`trop_global.py`, `rust/`| follow-up | Medium |
88
87
|`bias_corrected_local_linear`: extend golden parity to `kernel="triangular"` and `kernel="uniform"` (currently epa-only; all three kernels share `kernel_W` and the `lprobust` math, so parity is expected but not separately asserted). |`benchmarks/R/generate_nprobust_lprobust_golden.R`, `tests/test_bias_corrected_lprobust.py`| Phase 1c | Low |
89
88
|`bias_corrected_local_linear`: expose `vce in {"hc0", "hc1", "hc2", "hc3"}` on the public wrapper once R parity goldens exist (currently raises `NotImplementedError`). The port-level `lprobust` and `lprobust_res` already support all four; expanding the public surface requires a golden generator for each hc mode and a decision on hc2/hc3 q-fit leverage (R reuses p-fit `hii` for q-fit residuals; whether to match that or stage-match deserves a derivation before the wrapper advertises CCT-2014 conformance). |`diff_diff/local_linear.py::bias_corrected_local_linear`, `benchmarks/R/generate_nprobust_lprobust_golden.R`, `tests/test_bias_corrected_lprobust.py`| Phase 1c | Medium |
0 commit comments