Improve crossover dual simplex by rg20 · Pull Request #948 · NVIDIA/cuopt

rg20 · 2026-03-10T18:47:09Z

Description

This PR includes two performance improvements.

Take advantage of hyper sparsity while computing reduced costs in dual pushes in crossover code
Optimize right looking lu by replacing the linear degree bucket search with O(1) search

Issue

Checklist

I am familiar with the Contributing Guidelines.
Testing
- New or existing tests cover these changes
- Added tests
- Created an issue to follow-up
- NA
Documentation
- The documentation is up to date with these changes
- Added new documentation
- NA

…Replace linear degree-bucket search with O(1) swap-with-last removal using col_pos/row_pos position arrays, and eliminate O(row_degree) pre-traversal in schur_complement via a persistent last_in_row[] array

copy-pr-bot · 2026-03-10T18:47:13Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

rg20 · 2026-03-10T18:49:25Z

/ok to test 5f377a2

coderabbitai · 2026-03-10T18:56:10Z

📝 Walkthrough

Walkthrough

Introduced a sparse row representation (Arow) and sparse delta_y handling in the dual-simplex crossover; added per-bucket position-tracking state (col_pos, row_pos, last_in_row) and propagated signature changes through right-looking LU factorization routines.

Changes

Cohort / File(s)	Summary
Sparse Dual Computation `cpp/src/dual_simplex/crossover.cpp`	Added `Arow` (CSR) parameter to `dual_push` and call sites; constructed `Arow` via `lp.A.to_compressed_row(Arow)` in `crossover`. Implemented sparse vs dense delta_y strategy, introduced `delta_zN`, `delta_expanded`, and `delta_y_dense` work buffers, sparse expansion using `Arow`, and updated y/delta_zN updates to apply only nonzero sparse contributions. Minor control-flow/formatting adjustments around the new paths.
LU Factorization Position Tracking `cpp/src/dual_simplex/right_looking_lu.cpp`	Added `initialize_bucket_positions` and per-bucket position arrays `col_pos`, `row_pos`; extended `load_elements` to accept `last_in_row`. Threaded `col_pos`, `row_pos`, and `last_in_row` through `update_Cdegree_and_col_count`, `update_Rdegree_and_row_count`, `schur_complement`, `remove_pivot_col`, and top-level LU routines (`right_looking_lu`, `right_looking_lu_row_permutation_only`) with signature updates and state maintenance for O(1) bucket removals.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title "Improve crossover dual simplex" directly relates to the main changes: performance improvements in crossover/dual-simplex code including sparsity optimization and LU factorization enhancement.
Description check	✅ Passed	The description clearly outlines the two performance improvements implemented: hyper-sparsity optimization in dual pushes and O(1) degree-bucket search in right-looking LU, matching the file changes provided.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

cpp/src/dual_simplex/crossover.cpp (1)
418-436: Consider hoisting delta_expanded allocation outside the while loop.

Currently delta_expanded is allocated as a new std::vector<f_t>(n, 0.) on every iteration of the while (superbasic_list.size() > 0) loop (starting at line 391). For large problems with many superbasic variables, this results in repeated O(n) allocations and deallocations.

Moving the allocation before the loop and using std::fill to reset would avoid this overhead.
♻️ Suggested refactor
+  std::vector<f_t> delta_expanded(n);
   while (superbasic_list.size() > 0) {
     // ... existing code ...
     
     // delta_zN = -N^T delta_y
     std::vector<f_t> delta_zN(n - m);
-    std::vector<f_t> delta_expanded(n, 0.);
+    std::fill(delta_expanded.begin(), delta_expanded.end(), 0.);
     
     // Iterate directly over sparse delta_y instead of checking zeros
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/dual_simplex/crossover.cpp` around lines 418 - 436, Hoist the
allocation of delta_expanded (and optionally delta_zN) out of the while
(superbasic_list.size() > 0) loop: create std::vector<f_t> delta_expanded(n)
once before the loop and inside each iteration reset its contents with
std::fill(delta_expanded.begin(), delta_expanded.end(), 0.0) (and similarly
reset delta_zN or ensure it is resized once and overwritten). Update the loop
body that accumulates into delta_expanded and the subsequent assignment to
delta_zN[k] = -delta_expanded[nonbasic_list[k]] to rely on the reused buffers;
if n can change, ensure you call delta_expanded.resize(n) before the fill.
cpp/src/dual_simplex/right_looking_lu.cpp (1)

649-661: Extract the bucket-position bootstrap into one helper.

The col_pos / row_pos initialization is now duplicated in both factorization entry points, with only the bucket bounds differing. This is subtle invariant code; a small helper next to initialize_degree_data() would reduce drift the next time the bucket bookkeeping changes.

Also applies to: 1049-1062

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/dual_simplex/right_looking_lu.cpp`:
- Around line 405-409: work_estimate is using Cdegree[pivot_j] and
Rdegree[pivot_i] after those entries have been set to -1 by
update_Cdegree_and_col_count() / update_Rdegree_and_row_count(), causing
work_estimate to decrease incorrectly; fix by using the actual degree before it
is clobbered — either capture a local variable like int deg_j = Cdegree[pivot_j]
and/or deg_i = Rdegree[pivot_i] before calling the update functions, or compute
the degree from the corresponding count arrays (e.g. col_count[pivot_j] /
row_count[pivot_i]) or by traversing first_in_col/first_in_row if counts are not
available — then use deg_j/deg_i when updating work_estimate instead of reading
Cdegree/Rdegree after they were set to -1.

---

Nitpick comments:
In `@cpp/src/dual_simplex/crossover.cpp`:
- Around line 418-436: Hoist the allocation of delta_expanded (and optionally
delta_zN) out of the while (superbasic_list.size() > 0) loop: create
std::vector<f_t> delta_expanded(n) once before the loop and inside each
iteration reset its contents with std::fill(delta_expanded.begin(),
delta_expanded.end(), 0.0) (and similarly reset delta_zN or ensure it is resized
once and overwritten). Update the loop body that accumulates into delta_expanded
and the subsequent assignment to delta_zN[k] = -delta_expanded[nonbasic_list[k]]
to rely on the reused buffers; if n can change, ensure you call
delta_expanded.resize(n) before the fill.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6a904205-beca-40f2-bce0-a3dcdb15be1e

📥 Commits

Reviewing files that changed from the base of the PR and between 2b21118 and 5f377a2.

📒 Files selected for processing (2)

cpp/src/dual_simplex/crossover.cpp
cpp/src/dual_simplex/right_looking_lu.cpp

coderabbitai · 2026-03-10T18:56:14Z