Merge pull request #345 from igerber/fix/rust-cleanup-synthetic-weights

igerber · web-flow · commit 6b24c428d149 · 2026-04-20T19:39:07.000-04:00
Remove dead Rust compute_synthetic_weights (follow-up to PR #344)
diff --git a/TODO.md b/TODO.md
@@ -83,7 +83,6 @@ Deferred items from PR reviews that were not addressed before merge.
 | Weighted CR2 Bell-McCaffrey cluster-robust (`vcov_type="hc2_bm"` + `cluster_ids` + `weights`) currently raises `NotImplementedError`. Weighted hat matrix and residual rebalancing need threading per clubSandwich WLS handling. | `linalg.py::_compute_cr2_bm` | Phase 1a | Medium |
 | Regenerate `benchmarks/data/clubsandwich_cr2_golden.json` from R (`Rscript benchmarks/R/generate_clubsandwich_golden.R`). Current JSON has `source: python_self_reference` as a stability anchor until an authoritative R run. | `benchmarks/R/generate_clubsandwich_golden.R` | Phase 1a | Medium |
 | `honest_did.py:1907` `np.linalg.solve(A_sys, b_sys) / except LinAlgError: continue` is a silent basis-rejection in the vertex-enumeration loop that is algorithmically intentional (try the next basis). Consider surfacing a count of rejected bases as a diagnostic when ARP enumeration exhausts, so users see when the vertex search was heavily constrained. Not a silent failure in the sense of the Phase 2 audit (the algorithm is supposed to skip), but the diagnostic would help debug borderline cases. | `honest_did.py` | #334 | Low |
-| Rust `compute_synthetic_weights` + `compute_synthetic_weights_internal` (now dead code) can be removed from `rust/src/weights.rs:43-117` in a future Rust-cleanup PR. Python-side wrapper was deleted (post-audit cleanup for finding #22) and its sole caller now inlines Frank-Wolfe via `_sc_weight_fw`. The Rust symbol remains callable via `from diff_diff._rust_backend import compute_synthetic_weights` but no Python code calls it. Removal requires `maturin develop` rebuild. No functional impact of leaving it. | `rust/src/weights.rs` | follow-up | Low |
 | TROP Rust vs Python grid-search divergence on rank-deficient Y: on two near-parallel control units, LOOCV grid-search ATT diverges ~6% between Rust (`trop_global.py:688`) and Python fallback (`trop_global.py:753`). Either grid-winner ties are broken differently or the per-λ solver reaches different stationary points under rank deficiency. Audit finding #23 flagged this surface. `@pytest.mark.xfail(strict=True)` in `tests/test_rust_backend.py::TestTROPRustEdgeCaseParity::test_grid_search_rank_deficient_Y` baselines the gap. | `trop_global.py`, `rust/` | follow-up | Medium |
 | TROP Rust vs Python bootstrap SE divergence under fixed seed: `seed=42` on a tiny panel produces ~28% bootstrap-SE gap. Root cause: Rust bootstrap uses its own RNG (`rand` crate) while Python uses `numpy.random.default_rng`; same seed value maps to different bytestreams across backends. Audit axis-H (RNG/seed) adjacent. `@pytest.mark.xfail(strict=True)` in `tests/test_rust_backend.py::TestTROPRustEdgeCaseParity::test_bootstrap_seed_reproducibility` baselines the gap. Unifying RNG (threading a numpy-generated seed-sequence into Rust, or porting Python to ChaCha) would close it. | `trop_global.py`, `rust/` | follow-up | Medium |
 | `bias_corrected_local_linear`: extend golden parity to `kernel="triangular"` and `kernel="uniform"` (currently epa-only; all three kernels share `kernel_W` and the `lprobust` math, so parity is expected but not separately asserted). | `benchmarks/R/generate_nprobust_lprobust_golden.R`, `tests/test_bias_corrected_lprobust.py` | Phase 1c | Low |
diff --git a/rust/src/lib.rs b/rust/src/lib.rs
@@ -28,8 +28,7 @@ fn _rust_backend(m: &Bound<'_, PyModule>) -> PyResult<()> {
         m
     )?)?;
 
-    // Synthetic control weights (legacy projected gradient descent)
-    m.add_function(wrap_pyfunction!(weights::compute_synthetic_weights, m)?)?;
+    // Simplex projection
     m.add_function(wrap_pyfunction!(weights::project_simplex, m)?)?;
 
     // SDID weights (Frank-Wolfe matching R's synthdid)
diff --git a/rust/src/weights.rs b/rust/src/weights.rs
@@ -1,7 +1,6 @@
 //! Synthetic control weight computation.
 //!
 //! This module provides optimized implementations of:
-//! - Legacy synthetic control weight optimization (projected gradient descent)
 //! - Frank-Wolfe synthetic control weights (matching R's synthdid)
 //! - Simplex projection
 //! - SDID unit and time weight computation
@@ -11,111 +10,6 @@ use ndarray::linalg::general_mat_vec_mul;
 use numpy::{PyArray1, PyReadonlyArray1, PyReadonlyArray2, ToPyArray};
 use pyo3::prelude::*;
 
-/// Maximum number of optimization iterations.
-const MAX_ITER: usize = 1000;
-
-/// Default convergence tolerance (matches Python's _OPTIMIZATION_TOL).
-const DEFAULT_TOL: f64 = 1e-8;
-
-/// Default step size for gradient descent.
-const DEFAULT_STEP_SIZE: f64 = 0.1;
-
-// =========================================================================
-// Legacy synthetic control weights (projected gradient descent)
-// =========================================================================
-
-/// Compute synthetic control weights via projected gradient descent.
-///
-/// Solves: min_w ||Y_treated - Y_control @ w||² + lambda * ||w||²
-/// subject to: w >= 0, sum(w) = 1
-///
-/// # Arguments
-/// * `y_control` - Control unit outcomes matrix (n_pre, n_control)
-/// * `y_treated` - Treated unit outcomes (n_pre,)
-/// * `lambda_reg` - L2 regularization parameter
-/// * `max_iter` - Maximum number of iterations (default: 1000)
-/// * `tol` - Convergence tolerance (default: 1e-6)
-///
-/// # Returns
-/// Optimal weights (n_control,) that sum to 1
-#[pyfunction]
-#[pyo3(signature = (y_control, y_treated, lambda_reg=0.0, max_iter=None, tol=None))]
-pub fn compute_synthetic_weights<'py>(
-    py: Python<'py>,
-    y_control: PyReadonlyArray2<'py, f64>,
-    y_treated: PyReadonlyArray1<'py, f64>,
-    lambda_reg: f64,
-    max_iter: Option<usize>,
-    tol: Option<f64>,
-) -> PyResult<Bound<'py, PyArray1<f64>>> {
-    let y_control_arr = y_control.as_array();
-    let y_treated_arr = y_treated.as_array();
-
-    let weights =
-        compute_synthetic_weights_internal(&y_control_arr, &y_treated_arr, lambda_reg, max_iter, tol)?;
-
-    Ok(weights.to_pyarray(py))
-}
-
-/// Internal implementation of synthetic weight computation.
-fn compute_synthetic_weights_internal(
-    y_control: &ArrayView2<f64>,
-    y_treated: &ArrayView1<f64>,
-    lambda_reg: f64,
-    max_iter: Option<usize>,
-    tol: Option<f64>,
-) -> PyResult<Array1<f64>> {
-    let n_control = y_control.ncols();
-    let max_iter = max_iter.unwrap_or(MAX_ITER);
-    let tol = tol.unwrap_or(DEFAULT_TOL);
-
-    // Precompute Hessian: H = Y_control' @ Y_control + lambda * I
-    let h = {
-        let ytc = y_control.t().dot(y_control);
-        let mut h = ytc;
-        // Add regularization to diagonal
-        for i in 0..n_control {
-            h[[i, i]] += lambda_reg;
-        }
-        h
-    };
-
-    // Precompute linear term: f = Y_control' @ Y_treated
-    let f = y_control.t().dot(y_treated);
-
-    // Initialize with uniform weights
-    let mut weights = Array1::from_elem(n_control, 1.0 / n_control as f64);
-
-    // Projected gradient descent
-    let step_size = DEFAULT_STEP_SIZE;
-    let mut prev_weights = weights.clone();
-
-    for _ in 0..max_iter {
-        // Gradient: grad = H @ weights - f
-        let grad = h.dot(&weights) - &f;
-
-        // Gradient step
-        weights = &weights - step_size * &grad;
-
-        // Project onto simplex
-        weights = project_simplex_internal(&weights.view());
-
-        // Check convergence
-        let diff: f64 = weights
-            .iter()
-            .zip(prev_weights.iter())
-            .map(|(a, b)| (a - b).powi(2))
-            .sum();
-        if diff.sqrt() < tol {
-            break;
-        }
-
-        prev_weights.assign(&weights);
-    }
-
-    Ok(weights)
-}
-
 // =========================================================================
 // Simplex projection
 // =========================================================================
@@ -806,23 +700,6 @@ mod tests {
         assert!(result.iter().all(|&x| x >= -1e-10));
     }
 
-    #[test]
-    fn test_compute_weights_sum_to_one() {
-        let y_control = array![[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]];
-        let y_treated = array![2.0, 5.0, 8.0];
-
-        let weights =
-            compute_synthetic_weights_internal(&y_control.view(), &y_treated.view(), 0.0, None, None)
-                .unwrap();
-
-        let sum: f64 = weights.sum();
-        assert!((sum - 1.0).abs() < 1e-6, "Weights should sum to 1, got {}", sum);
-        assert!(
-            weights.iter().all(|&w| w >= -1e-10),
-            "Weights should be non-negative"
-        );
-    }
-
     #[test]
     fn test_sum_normalize() {
         let v = array![2.0, 3.0, 5.0];
diff --git a/tests/test_rust_backend.py b/tests/test_rust_backend.py
@@ -106,8 +106,26 @@ def test_bootstrap_different_seeds(self):
 
     # Tests for `compute_synthetic_weights` direct Rust binding removed in
     # the silent-failures audit post-cleanup (finding #22). The helper was
-    # deleted from the Python layer; remaining Rust symbol is dead code
-    # (tracked as a low-priority Rust cleanup TODO).
+    # deleted from the Python layer and the Rust symbol was subsequently
+    # removed from `rust/src/weights.rs` + unregistered in `rust/src/lib.rs`.
+
+    def test_compute_synthetic_weights_is_removed(self):
+        """Regression guard against accidental re-export of the deleted
+        `compute_synthetic_weights` PyO3 binding (silent-failures finding
+        #22). If this test fails, someone reintroduced the binding — audit
+        the reason before adding it back."""
+        import diff_diff._rust_backend as rb
+
+        with pytest.raises(ImportError):
+            from diff_diff._rust_backend import (  # noqa: F401
+                compute_synthetic_weights,
+            )
+
+        assert not hasattr(rb, "compute_synthetic_weights"), (
+            "compute_synthetic_weights was removed from the Rust backend "
+            "in the post-audit cleanup for finding #22; its presence here "
+            "indicates accidental re-export."
+        )
 
     # =========================================================================
     # Simplex Projection Tests

Original file line number	Diff line number	Diff line change
`@@ -28,8 +28,7 @@ fn _rust_backend(m: &Bound<'_, PyModule>) -> PyResult<()> {`
`28`	`28`	`m`
`29`	`29`	`)?)?;`
`30`	`30`
`31`		`- // Synthetic control weights (legacy projected gradient descent)`
`32`		`- m.add_function(wrap_pyfunction!(weights::compute_synthetic_weights, m)?)?;`
	`31`	`+ // Simplex projection`
`33`	`32`	`m.add_function(wrap_pyfunction!(weights::project_simplex, m)?)?;`
`34`	`33`
`35`	`34`	`// SDID weights (Frank-Wolfe matching R's synthdid)`