Conversation
…Replace linear degree-bucket search with O(1) swap-with-last removal using col_pos/row_pos position arrays, and eliminate O(row_degree) pre-traversal in schur_complement via a persistent last_in_row[] array
|
/ok to test 5f377a2 |
📝 WalkthroughWalkthroughIntroduced a sparse row representation (Arow) and sparse delta_y handling in the dual-simplex crossover; added per-bucket position-tracking state (col_pos, row_pos, last_in_row) and propagated signature changes through right-looking LU factorization routines. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
cpp/src/dual_simplex/crossover.cpp (1)
418-436: Consider hoistingdelta_expandedallocation outside the while loop.Currently
delta_expandedis allocated as a newstd::vector<f_t>(n, 0.)on every iteration of thewhile (superbasic_list.size() > 0)loop (starting at line 391). For large problems with many superbasic variables, this results in repeated O(n) allocations and deallocations.Moving the allocation before the loop and using
std::fillto reset would avoid this overhead.♻️ Suggested refactor
+ std::vector<f_t> delta_expanded(n); while (superbasic_list.size() > 0) { // ... existing code ... // delta_zN = -N^T delta_y std::vector<f_t> delta_zN(n - m); - std::vector<f_t> delta_expanded(n, 0.); + std::fill(delta_expanded.begin(), delta_expanded.end(), 0.); // Iterate directly over sparse delta_y instead of checking zeros🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/dual_simplex/crossover.cpp` around lines 418 - 436, Hoist the allocation of delta_expanded (and optionally delta_zN) out of the while (superbasic_list.size() > 0) loop: create std::vector<f_t> delta_expanded(n) once before the loop and inside each iteration reset its contents with std::fill(delta_expanded.begin(), delta_expanded.end(), 0.0) (and similarly reset delta_zN or ensure it is resized once and overwritten). Update the loop body that accumulates into delta_expanded and the subsequent assignment to delta_zN[k] = -delta_expanded[nonbasic_list[k]] to rely on the reused buffers; if n can change, ensure you call delta_expanded.resize(n) before the fill.cpp/src/dual_simplex/right_looking_lu.cpp (1)
649-661: Extract the bucket-position bootstrap into one helper.The
col_pos/row_posinitialization is now duplicated in both factorization entry points, with only the bucket bounds differing. This is subtle invariant code; a small helper next toinitialize_degree_data()would reduce drift the next time the bucket bookkeeping changes.Also applies to: 1049-1062
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/dual_simplex/right_looking_lu.cpp`:
- Around line 405-409: work_estimate is using Cdegree[pivot_j] and
Rdegree[pivot_i] after those entries have been set to -1 by
update_Cdegree_and_col_count() / update_Rdegree_and_row_count(), causing
work_estimate to decrease incorrectly; fix by using the actual degree before it
is clobbered — either capture a local variable like int deg_j = Cdegree[pivot_j]
and/or deg_i = Rdegree[pivot_i] before calling the update functions, or compute
the degree from the corresponding count arrays (e.g. col_count[pivot_j] /
row_count[pivot_i]) or by traversing first_in_col/first_in_row if counts are not
available — then use deg_j/deg_i when updating work_estimate instead of reading
Cdegree/Rdegree after they were set to -1.
---
Nitpick comments:
In `@cpp/src/dual_simplex/crossover.cpp`:
- Around line 418-436: Hoist the allocation of delta_expanded (and optionally
delta_zN) out of the while (superbasic_list.size() > 0) loop: create
std::vector<f_t> delta_expanded(n) once before the loop and inside each
iteration reset its contents with std::fill(delta_expanded.begin(),
delta_expanded.end(), 0.0) (and similarly reset delta_zN or ensure it is resized
once and overwritten). Update the loop body that accumulates into delta_expanded
and the subsequent assignment to delta_zN[k] = -delta_expanded[nonbasic_list[k]]
to rely on the reused buffers; if n can change, ensure you call
delta_expanded.resize(n) before the fill.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6a904205-beca-40f2-bce0-a3dcdb15be1e
📒 Files selected for processing (2)
cpp/src/dual_simplex/crossover.cppcpp/src/dual_simplex/right_looking_lu.cpp
| // Initialize row_last_workspace from last_in_row (O(1) per row, no full row traversal) | ||
| for (i_t p1 = first_in_col[pivot_j]; p1 != kNone; p1 = elements[p1].next_in_column) { | ||
| element_t<i_t, f_t>* e = &elements[p1]; | ||
| const i_t i = e->i; | ||
| i_t row_last = kNone; | ||
| for (i_t p3 = first_in_row[i]; p3 != kNone; p3 = elements[p3].next_in_row) { | ||
| row_last = p3; | ||
| } | ||
| work_estimate += 2 * Rdegree[i]; | ||
| row_last_workspace[i] = row_last; | ||
| row_last_workspace[elements[p1].i] = last_in_row[elements[p1].i]; | ||
| } | ||
| work_estimate += 4 * Cdegree[pivot_j]; |
There was a problem hiding this comment.
Fix work_estimate accounting after pivot elimination.
Line 409, Line 499, and Line 509 still read Cdegree[pivot_j] / Rdegree[pivot_i], but those were already set to -1 by update_Cdegree_and_col_count() and update_Rdegree_and_row_count() before this helper is called. That makes work_estimate decrease on every pivot instead of approximating the actual loop cost.
Also applies to: 499-509
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/dual_simplex/right_looking_lu.cpp` around lines 405 - 409,
work_estimate is using Cdegree[pivot_j] and Rdegree[pivot_i] after those entries
have been set to -1 by update_Cdegree_and_col_count() /
update_Rdegree_and_row_count(), causing work_estimate to decrease incorrectly;
fix by using the actual degree before it is clobbered — either capture a local
variable like int deg_j = Cdegree[pivot_j] and/or deg_i = Rdegree[pivot_i]
before calling the update functions, or compute the degree from the
corresponding count arrays (e.g. col_count[pivot_j] / row_count[pivot_i]) or by
traversing first_in_col/first_in_row if counts are not available — then use
deg_j/deg_i when updating work_estimate instead of reading Cdegree/Rdegree after
they were set to -1.
| elements[nz].x = A.x[p]; | ||
| elements[nz].next_in_column = kNone; | ||
| if (p > col_start) { elements[nz - 1].next_in_column = nz; } | ||
| elements[nz].next_in_row = kNone; // set the current next in row to None (since we don't know |
There was a problem hiding this comment.
Would you mind restoring the comments here. Those are helpful
| } | ||
| work_estimate += 2 * Rdegree[i]; | ||
| row_last_workspace[i] = row_last; | ||
| row_last_workspace[elements[p1].i] = last_in_row[elements[p1].i]; |
There was a problem hiding this comment.
Nit. I think this would read clearer if we did
const i_t i = elements[p1].i;
row_last_workspace[i] = last_in_row[i];
| initialize_degree_data(A, column_list, Cdegree, Rdegree, col_count, row_count, work_estimate); | ||
|
|
||
| // Position arrays for O(1) degree-bucket removal | ||
| std::vector<i_t> col_pos(n); |
There was a problem hiding this comment.
We should document what col_pos stores. As we do with other variables. I think it is
std::vector<i_t> col_pos(n); // if Cdegree[j] = nz, than j is in col_count[nz][col_pos[j]]
| col_pos[col_count[d][pos]] = pos; | ||
| } | ||
| } | ||
| std::vector<i_t> row_pos(n); |
There was a problem hiding this comment.
Could we add the comment
std::vector<i_t> row_pos(n); // If Rdegree[i] = nz, than i is in row_count[nz][row_pos[i]]
| std::vector<i_t>& Cdegree, | ||
| std::vector<std::vector<i_t>>& row_count, | ||
| std::vector<std::vector<i_t>>& col_count, | ||
| std::vector<i_t>& last_in_row, |
There was a problem hiding this comment.
What's the difference between row_last_workspace and last_in_row?
|
|
||
| // Position arrays for O(1) degree-bucket removal | ||
| // col_count has m+1 buckets, row_count has n+1 buckets | ||
| std::vector<i_t> col_pos(n); |
There was a problem hiding this comment.
Same here can we add comment explaining col_pos
| col_pos[col_count[d][pos]] = pos; | ||
| } | ||
| } | ||
| std::vector<i_t> row_pos(m); |
There was a problem hiding this comment.
Same here can we add comments explaining row_pos
| element_t<i_t, f_t>* e = &elements[p1]; | ||
| const i_t i = e->i; | ||
| i_t last = kNone; | ||
| i_t last_surviving = kNone; |
There was a problem hiding this comment.
What's the difference between last and last_surviving. I'm not sure if last is used. Maybe we can use last for last_surviving?
cpp/src/dual_simplex/crossover.cpp
Outdated
| f_t dot = 0.0; | ||
| for (i_t p = col_start; p < col_end; ++p) { | ||
| dot += lp.A.x[p] * delta_y[lp.A.i[p]]; | ||
| std::vector<f_t> delta_expanded(n, 0.); |
There was a problem hiding this comment.
This always computes delta_z using delta_z = -sum_{ delta_y_i != 0} A(i, non_basic).
However, when delta_y is dense it can be better to compute it using the previous method.
I dynamically decide which method to use in dual simplex by looking at the sparsity of delta_y. If delta_y is less than 30% sparse I use this way, otherwise I use the previous method.
Also, we should probably pull out the code for computing this into a function that can be used in both dual simplex and crossover. .
But if you want you can merge this as is. And I can make those changes.
Allocate buffers once before the superbasic loop and reset with std::fill each iteration to avoid repeated O(n) allocations (PR NVIDIA#948 review). Made-with: Cursor
|
/ok to test cc54886 |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
cpp/src/dual_simplex/right_looking_lu.cpp (1)
371-372:⚠️ Potential issue | 🟡 MinorPreserve pivot degrees until Schur work accounting is finished.
Line 371 and Line 408 set pivot degrees to
-1, butschur_complement()still uses those pivot degrees forwork_estimateterms, which can push the estimate downward incorrectly. Please cache pivot degrees (or defer sentinel assignment) before those bookkeeping terms are computed.Also applies to: 408-409
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/dual_simplex/right_looking_lu.cpp` around lines 371 - 372, The code currently sets Cdegree[pivot_j] = -1 (in the block around pivot handling) before schur_complement() computes work_estimate terms, which causes those terms to use the sentinel -1 and undercount work; preserve the original pivot degree value until after all Schur work accounting is finished by caching the degree (e.g., save old_degree = Cdegree[pivot_j]) or by deferring the sentinel assignment until after schur_complement() and any work_estimate calculations complete; update both spots where Cdegree[pivot_j] is set to -1 (the one near pivot_j and the one around the other referenced location) to use the cached value approach and only assign -1 after work_estimate is computed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@cpp/src/dual_simplex/right_looking_lu.cpp`:
- Around line 371-372: The code currently sets Cdegree[pivot_j] = -1 (in the
block around pivot handling) before schur_complement() computes work_estimate
terms, which causes those terms to use the sentinel -1 and undercount work;
preserve the original pivot degree value until after all Schur work accounting
is finished by caching the degree (e.g., save old_degree = Cdegree[pivot_j]) or
by deferring the sentinel assignment until after schur_complement() and any
work_estimate calculations complete; update both spots where Cdegree[pivot_j] is
set to -1 (the one near pivot_j and the one around the other referenced
location) to use the cached value approach and only assign -1 after
work_estimate is computed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: eb075e7a-fa49-46ae-b96a-caf985a7b449
📒 Files selected for processing (2)
cpp/src/dual_simplex/crossover.cppcpp/src/dual_simplex/right_looking_lu.cpp
|
/ok to test 3e2352b |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
cpp/src/dual_simplex/crossover.cpp (1)
421-453: Consider extracting sparse/dense delta_z computation into a shared utility.As noted in prior review discussion, the sparse vs. dense computation of
delta_z = -A^T * delta_ycould be factored into a shared function for use in both dual simplex and crossover. This would reduce code duplication and ensure consistent threshold tuning.This can be deferred to a follow-up PR as suggested by the reviewer.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/dual_simplex/crossover.cpp` around lines 421 - 453, Extract the sparse vs dense computation of delta_z (currently computing delta_zN using delta_y_sparse / delta_y_dense, delta_expanded, nonbasic_list, lp.A and Arow) into a shared utility function (e.g., compute_delta_z_from_delta_y) that accepts delta_y in sparse or dense form, A (or Arow), nonbasic_list and outputs delta_zN; move the sparse path (loop over delta_y_sparse.i/x and Arow to build delta_expanded) and the dense path (delta_y_sparse.to_dense → dot product over lp.A columns) into the single function, preserve the existing 30% sparsity threshold outside the helper, and update callers in dual simplex and crossover to call the new utility to avoid duplication and centralize threshold/tuning.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/dual_simplex/crossover.cpp`:
- Around line 462-466: The loop over sparse entries in crossover.cpp uses
delta_y_sparse.i.size() as the upper bound without casting, causing a
signed/unsigned comparison warning and inconsistency with the earlier loop;
change the loop condition to use static_cast<i_t>(delta_y_sparse.i.size()) so
the type of nnz_idx (i_t) matches the cast size, keeping the body unchanged
(variables: delta_y_sparse, nnz_idx, i_t, y, step_length).
---
Nitpick comments:
In `@cpp/src/dual_simplex/crossover.cpp`:
- Around line 421-453: Extract the sparse vs dense computation of delta_z
(currently computing delta_zN using delta_y_sparse / delta_y_dense,
delta_expanded, nonbasic_list, lp.A and Arow) into a shared utility function
(e.g., compute_delta_z_from_delta_y) that accepts delta_y in sparse or dense
form, A (or Arow), nonbasic_list and outputs delta_zN; move the sparse path
(loop over delta_y_sparse.i/x and Arow to build delta_expanded) and the dense
path (delta_y_sparse.to_dense → dot product over lp.A columns) into the single
function, preserve the existing 30% sparsity threshold outside the helper, and
update callers in dual simplex and crossover to call the new utility to avoid
duplication and centralize threshold/tuning.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: cecd1046-6c96-4a21-a4a0-df1a91062ace
📒 Files selected for processing (1)
cpp/src/dual_simplex/crossover.cpp
| // Optimized: Only update non-zero elements from sparse representation | ||
| for (i_t nnz_idx = 0; nnz_idx < delta_y_sparse.i.size(); ++nnz_idx) { | ||
| const i_t i = delta_y_sparse.i[nnz_idx]; | ||
| y[i] += step_length * delta_y_sparse.x[nnz_idx]; | ||
| } |
There was a problem hiding this comment.
Add explicit cast for consistency and to avoid signed/unsigned comparison warning.
Line 428 uses static_cast<i_t>(delta_y_sparse.i.size()) for the same pattern, but line 463 omits it. This inconsistency may trigger compiler warnings under strict settings.
Proposed fix
- for (i_t nnz_idx = 0; nnz_idx < delta_y_sparse.i.size(); ++nnz_idx) {
+ for (i_t nnz_idx = 0; nnz_idx < static_cast<i_t>(delta_y_sparse.i.size()); ++nnz_idx) {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Optimized: Only update non-zero elements from sparse representation | |
| for (i_t nnz_idx = 0; nnz_idx < delta_y_sparse.i.size(); ++nnz_idx) { | |
| const i_t i = delta_y_sparse.i[nnz_idx]; | |
| y[i] += step_length * delta_y_sparse.x[nnz_idx]; | |
| } | |
| // Optimized: Only update non-zero elements from sparse representation | |
| for (i_t nnz_idx = 0; nnz_idx < static_cast<i_t>(delta_y_sparse.i.size()); ++nnz_idx) { | |
| const i_t i = delta_y_sparse.i[nnz_idx]; | |
| y[i] += step_length * delta_y_sparse.x[nnz_idx]; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/dual_simplex/crossover.cpp` around lines 462 - 466, The loop over
sparse entries in crossover.cpp uses delta_y_sparse.i.size() as the upper bound
without casting, causing a signed/unsigned comparison warning and inconsistency
with the earlier loop; change the loop condition to use
static_cast<i_t>(delta_y_sparse.i.size()) so the type of nnz_idx (i_t) matches
the cast size, keeping the body unchanged (variables: delta_y_sparse, nnz_idx,
i_t, y, step_length).
Description
This PR includes two performance improvements.
Issue
Checklist