Skip to content

Race batch PDLP and dual simplex in strong branching / reliability branching#994

Open
Kh4ster wants to merge 74 commits intorelease/26.04from
race_strong_branching_realibility_branching
Open

Race batch PDLP and dual simplex in strong branching / reliability branching#994
Kh4ster wants to merge 74 commits intorelease/26.04from
race_strong_branching_realibility_branching

Conversation

@Kh4ster
Copy link
Copy Markdown
Contributor

@Kh4ster Kh4ster commented Mar 25, 2026

This PR enables the following regarding batch PDLP:

  • Enable batch PDLP in reliability branching
  • Add work stealing so that batch PDLP and Dual Simplex can run concurrently and steal LPs from each other if one solves it first
  • Use correct problem representation with cuts for batch PDLP
  • Use a PDLP warm start cache across strong branching at the root and in reliability branching
  • Increase tolerance on batch PDLP to have higher quality solution
  • Increase iteration limit to allow instances that needs a high iteration count (with low cost per iteration) to still come through (only while solving the original LP to get warm start data)
  • Multiple heuristics to not run batch PDLP to not create overheads when Dual Simplex is clearly superior
  • Don't store and copy primal dual solution unless need it to save on memory
  • Handle batch PDLP errors better, allowing Dual Simplex to still continue in strong branching even if BPDLP fails
  • No early exit if the initial warm start PDLP solution is already feasible in BPDLP
  • Correct objective for BPDLP when there is an offset

Currently we still keep BPDLP off by default both at the root and in reliability branching

Kh4ster and others added 30 commits February 9, 2026 17:47
…te_lp_problem' into race_strong_branching_realibility_branching
…ng, correctly fill the ds_obj objective before merging results at the root, correctly clamp the PDLP objective, remove the unucessary cuopt_assert regarding fixed point error
@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 1, 2026

/ok to test de8fac6

Copy link
Copy Markdown
Contributor

@chris-maes chris-maes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I mostly checked the interface code. I did not review as the deep changes to support reliability branching in batch PDLP.

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 2, 2026

/ok to test 8f0e2f6

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 2, 2026

Still waiting for one last thing before merging

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 2, 2026

/ok to test 329c255

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 2, 2026

/ok to test aecae93

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 3, 2026

/ok to test 0d5ae1c

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 3, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 3, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 3, 2026

/ok to test 7504340

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 3, 2026

/ok to test a7f79fa

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 3, 2026

/ok to test dd07576

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 3, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 3, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
cpp/src/branch_and_bound/pseudo_costs.cpp (1)

547-584: ⚠️ Potential issue | 🟠 Major

Assert or remap the slack-column suffix invariant.

remove_columns(cols_to_remove) treats new_slacks as arbitrary column indices, but Lines 560-584 still feed the first n entries of root_soln, objective, lower, and upper into PDLP. That only stays aligned if every slack index already lives in [n, lp.num_cols). Otherwise the compacted matrix, the vectors, and the branching indices no longer describe the same columns.

Minimal guard if appended slacks are the intended invariant
   cuopt::mps_parser::mps_data_model_t<i_t, f_t> mps_model;
   int m = lp.num_rows;
   int n = lp.num_cols - new_slacks.size();
   original_root_soln_x.resize(n);
+  for (i_t slack : new_slacks) {
+    assert(slack >= n &&
+           "simplex_problem_to_mps_data_model assumes slack columns are appended");
+  }

If that invariant is not guaranteed, this helper needs a real old→new column map and the callers need to remap the candidate indices as well.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 547 - 584, The code
removes columns via A_no_slacks.remove_columns(cols_to_remove) but still blindly
slices the first n entries of root_soln, lp.objective, lp.lower, lp.upper into
PDLP (original_root_soln_x and mps_model.set_* calls), which is only correct if
all slack indices in new_slacks are appended (>= n); add an explicit invariant
check or build a proper old→new column map: either assert all new_slacks[j] >= n
before proceeding (and document the invariant), or construct a mapping from
original column index → compacted index after remove_columns and use it to remap
original_root_soln_x, the objective vector, lower/upper bound arrays, and any
branching candidate indices so the CSR matrix and these vectors stay aligned
(refer to cols_to_remove, A_no_slacks, original_root_soln_x,
mps_model.set_objective_coefficients, set_variable_lower_bounds,
set_variable_upper_bounds).
cpp/src/pdlp/pdlp.cu (1)

801-864: ⚠️ Potential issue | 🟠 Major

Mark the last resized-batch climbers as solved before this return.

Unlike Line 865, this early-return branch never calls sb_view_.mark_solved(...) for the climbers still left in the batch. Any Dual Simplex task that has not started those branches yet will keep solving them even though PDLP already finished them, so the final chunk cannot be stolen/skipped.

Suggested fix
     if (original_batch_size != climber_strategies_.size()) {
 `#ifdef` BATCH_VERBOSE_MODE
       std::cout << "Original batch size was " << original_batch_size << " but is now "
                 << climber_strategies_.size() << std::endl;
 `#endif`
@@
         if (current_termination_strategy_.get_termination_status(i) !=
             pdlp_termination_status_t::ConcurrentLimit) {
           batch_solution_to_return_
             .get_additional_termination_informations()[climber_strategies_[i].original_index]
             .solved_by = method_t::PDLP;
         }
       }
+      if (sb_view_.is_valid()) {
+        for (size_t i = 0; i < climber_strategies_.size(); ++i) {
+          sb_view_.mark_solved(climber_strategies_[i].original_index);
+        }
+      }
       current_termination_strategy_.fill_gpu_terms_stats(total_pdlp_iterations_);
       RAFT_CUDA_TRY(cudaStreamSynchronize(stream_view_));
       current_termination_strategy_.convert_gpu_terms_stats_to_host(
         batch_solution_to_return_.get_additional_termination_informations());
       return optimization_problem_solution_t<i_t, f_t>{
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/pdlp.cu` around lines 801 - 864, This early-return path fails to
notify the solver-bookkeeping that the remaining resized-batch climbers are
finished; before returning, iterate over
current_termination_strategy_.get_terminations_status().size() and call
sb_view_.mark_solved(...) for each climber using
climber_strategies_[i].original_index (and the appropriate termination/solution
metadata you have just copied into batch_solution_to_return_), mirroring the
behavior used at the later return (line 865) so in-progress Dual Simplex tasks
can stop/skip those chunks.
♻️ Duplicate comments (6)
cpp/src/branch_and_bound/pseudo_costs.hpp (1)

416-425: ⚠️ Potential issue | 🟠 Major

Invalidate the PDLP warm-start cache when the LP context changes.

Once populated flips true, this struct has no reset path, so reusing the same pseudo_costs_t after cuts/slacks/original-LP changes can replay stale warm-start vectors and scalars into a different batch LP. Add a reset() helper here and call it from the context-reset path.

Suggested fix
 struct batch_pdlp_warm_cache_t {
   const raft::handle_t batch_pdlp_handle{};
   rmm::device_uvector<f_t> initial_primal{0, batch_pdlp_handle.get_stream()};
   rmm::device_uvector<f_t> initial_dual{0, batch_pdlp_handle.get_stream()};
   f_t step_size{std::numeric_limits<f_t>::signaling_NaN()};
   f_t primal_weight{std::numeric_limits<f_t>::signaling_NaN()};
   i_t pdlp_iteration{-1};
   f_t percent_solved_by_batch_pdlp_at_root{f_t(0.0)};
   bool populated{false};
+
+  void reset()
+  {
+    initial_primal.resize(0, batch_pdlp_handle.get_stream());
+    initial_dual.resize(0, batch_pdlp_handle.get_stream());
+    step_size                           = std::numeric_limits<f_t>::signaling_NaN();
+    primal_weight                       = std::numeric_limits<f_t>::signaling_NaN();
+    pdlp_iteration                      = -1;
+    percent_solved_by_batch_pdlp_at_root = f_t(0.0);
+    populated                           = false;
+  }
 };

As per coding guidelines "Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.hpp` around lines 416 - 425, The
batch_pdlp_warm_cache_t struct can retain stale warm-start data after the LP
context changes because there is no reset path; add a member method reset() on
batch_pdlp_warm_cache_t that clears initial_primal and initial_dual
(release/resize to 0 on batch_pdlp_handle.get_stream()), sets step_size and
primal_weight back to NaN, pdlp_iteration to -1,
percent_solved_by_batch_pdlp_at_root to f_t(0.0) and populated to false, and
then call this reset() from the pseudo_costs_t code path that
reinitializes/changes the LP context (where the context-reset currently occurs)
so any time cuts/slacks/original-LP are changed the warm cache is invalidated.
cpp/src/pdlp/solve.cu (2)

1049-1057: ⚠️ Potential issue | 🟠 Major

Validate shared_sb_solved before taking subspans.

The only guard here is !empty(). If the caller passes fewer solved flags than new_bounds.size(), subspan(i, current_batch_size) walks past the end of the coordination array and corrupts cooperative termination state. Check the span length once before the loop.

Suggested fix
+  cuopt_assert(settings.shared_sb_solved.empty() ||
+                 settings.shared_sb_solved.size() >= max_batch_size,
+               "shared_sb_solved must cover every batch entry");
   for (size_t i = 0; i < max_batch_size; i += optimal_batch_size) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/solve.cu` around lines 1049 - 1057, The code only checks
settings.shared_sb_solved.empty() but not its length relative to new_bounds, so
subspan(i, current_batch_size) can out-of-bounds; before the batching loop
validate that when settings.shared_sb_solved is non-empty its size() (or
extent()) is >= max_batch_size (or original_new_bounds.size()), and if not,
either trim max_batch_size/current_batch_size accordingly or return/error; then
inside the loop you can safely call settings.shared_sb_solved.subspan(i,
current_batch_size) to populate batch_settings.shared_sb_solved. Ensure you
reference settings.shared_sb_solved, original_new_bounds/new_bounds,
max_batch_size and subspan in the check and handle the shorter span case
consistently.

968-972: ⚠️ Potential issue | 🟠 Major

Don't let sub_batch_size bypass the memory ceiling.

memory_max_batch_size is computed specifically to avoid OOM, but a positive settings.sub_batch_size overwrites it unchecked. A caller can request a sub-batch larger than the memory-checked limit and reintroduce the allocation failure this code just avoided.

Suggested fix
   size_t optimal_batch_size = use_optimal_batch_size
                                 ? detail::optimal_batch_size_handler(problem, memory_max_batch_size)
                                 : max_batch_size;
-  if (settings.sub_batch_size > 0) { optimal_batch_size = settings.sub_batch_size; }
+  if (settings.sub_batch_size > 0) {
+    const auto requested_sub_batch_size = static_cast<size_t>(settings.sub_batch_size);
+    cuopt_assert(requested_sub_batch_size <= memory_max_batch_size,
+                 "sub_batch_size exceeds the memory-checked batch size");
+    optimal_batch_size = requested_sub_batch_size;
+  }

As per coding guidelines "Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/solve.cu` around lines 968 - 972, The current logic lets
settings.sub_batch_size override the memory-checked limit, which can reintroduce
OOMs; instead, when computing optimal_batch_size in solve.cu (symbols:
optimal_batch_size, settings.sub_batch_size, memory_max_batch_size,
detail::optimal_batch_size_handler, max_batch_size, cuopt_assert), clamp
settings.sub_batch_size to the memory ceiling (e.g., use
min(settings.sub_batch_size, memory_max_batch_size) or validate and reject
values > memory_max_batch_size) so the final optimal_batch_size never exceeds
memory_max_batch_size (and still respects max_batch_size), then keep the
cuopt_assert as a final sanity check.
cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp (1)

303-305: ⚠️ Potential issue | 🔴 Critical

Keep batch solution collection opt-out, not opt-in.

With this default at false, generic batch LP solves now return empty primal/dual/reduced-cost buffers unless every caller remembers to opt in. Existing batch paths in cpp/tests/linear_programming/pdlp_test.cu still slice those buffers unconditionally, so this regresses the non-MIP batch API.

Suggested fix
-  // By default to save memory and speed we don't store and copy each climber's primal and dual
-  // solutions We only retrieve termination statistics and the objective values
-  bool generate_batch_primal_dual_solution{false};
+  // Preserve the historical batch behavior and collect per-climber solutions unless callers opt out.
+  bool generate_batch_primal_dual_solution{true};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp` around lines
303 - 305, Defaulting generate_batch_primal_dual_solution to false makes batch
solves opt‑in and breaks callers that expect populated primal/dual/reduced-cost
buffers; change the bool generate_batch_primal_dual_solution default to true (so
batch solution collection is enabled by default/opt‑out) and update the adjacent
comment to reflect "opt-out" semantics so existing tests and code that slice
these buffers continue to work without modification.
cpp/src/branch_and_bound/shared_strong_branching_context.hpp (1)

38-56: ⚠️ Potential issue | 🟠 Major

Make the view helpers release-safe and overflow-safe.

These accessors still rely on assert, so a bad index becomes unchecked UB in release builds. subview() also computes offset + count in i_t before the cast, which can overflow; validate the casted sizes separately and use those converted indices for operator[]/subspan.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/shared_strong_branching_context.hpp` around lines 38
- 56, The accessors (is_solved, mark_solved, subview) use asserts and can
overflow in release builds; replace the assert-based checks with runtime bounds
validation and throw std::out_of_range on failure, and avoid signed arithmetic
overflow in subview by converting indices to size_t first. Concretely: in
is_solved(i_t local_idx) and mark_solved(i_t local_idx) cast local_idx to size_t
(e.g. size_t idx = static_cast<size_t>(local_idx)), verify local_idx >= 0 and
idx < solved.size(), and throw std::out_of_range with a clear message instead of
asserting; in subview(i_t offset, i_t count) cast both to size_t (size_t off =
static_cast<size_t>(offset), cnt = static_cast<size_t>(count)), ensure offset
and count are non-negative, check off <= solved.size() and cnt <= solved.size()
- off, then return {solved.subspan(off, cnt)} to avoid offset+count overflow.
cpp/tests/linear_programming/pdlp_test.cu (1)

2243-2254: ⚠️ Potential issue | 🟠 Major

Replace the fixed 200 ms sleep with deterministic synchronization.

On afiro-sized batches PDLP can finish before the delay. That makes these tests flaky and can skip the ConcurrentLimit path entirely. Use a latch/promise/barrier that the solver thread signals once the cooperative path is live, then apply the external marks.

Also applies to: 2309-2320

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/linear_programming/pdlp_test.cu` around lines 2243 - 2254, The
fixed sleep is flaky; replace it with a deterministic synchronization (e.g.,
std::promise/std::future or std::latch) so the main thread waits until the
solver thread actually enters the cooperative DS path. Create a promise/future
(or latch), pass the signal into the pdlp_thread lambda (or have solve_lp expose
a hook) and have the solver thread set the promise when the cooperative path is
live; then have the main thread wait on the future before calling
sb_view.mark_solved(i). Apply the same change to the other occurrence around the
second pdlp_thread usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 1044-1054: When simplex_iteration_limit < 1 the code calls
initialize_pseudo_costs_with_estimate(...) which populates pc.strong_branch_down
and pc.strong_branch_up, but the function continues and later merges/overwrites
those estimates with UNSET/NaN values; to fix, add an early return (or
conditional skip of the merge/overwrite block) immediately after calling
initialize_pseudo_costs_with_estimate so the function exits with the
estimate-only pseudo-costs intact and does not proceed to the later
merge/overwrite logic before update_pseudo_costs_from_strong_branching().
- Around line 1587-1595: The current gate "if (rb_mode != 2)" skips the whole
taskloop when rb_mode==2 even if use_pdlp is false, so unreliable_list
candidates never get evaluated; change the condition to allow Dual Simplex
fallback when PDLP is disabled by using "if (rb_mode != 2 || !use_pdlp)" (or
logically equivalent) around the pragma/taskloop that references score_mutex,
sb_view, dual_simplex_obj_down/up, dual_simplex_status_down/up and
unreliable_list, and update any related log text so it correctly reflects that
DS will run when PDLP is unavailable.

In `@cpp/src/branch_and_bound/pseudo_costs.hpp`:
- Line 541: The pdlp_warm_cache member on pseudo_costs_t is mutable shared state
and can race when multiple workers perform reliability/strong branching; make
the cache worker-local or add synchronization: either move the
batch_pdlp_warm_cache_t<i_t, f_t> pdlp_warm_cache out of the shared
pseudo_costs_t (create a stack/local instance inside the branching worker
routine) or add a dedicated omp_mutex_t (e.g., pdlp_warm_cache_mutex) and wrap
every access/update to pdlp_warm_cache in a lock/unlock pair (similar to
existing omp_atomic_t/omp_mutex_t usage) to prevent cross-contamination of
device buffers, scalars, and populated state. Ensure all functions that read or
write pdlp_warm_cache (reliability/strong branching paths) use the new mutex or
the local per-worker instance.

In `@cpp/src/dual_simplex/simplex_solver_settings.hpp`:
- Around line 194-197: The member mip_batch_pdlp_reliability_branching is not
initialized in the class constructor, leaving it with indeterminate value;
update the constructor initializer list (next to
mip_batch_pdlp_strong_branching(0)) to include
mip_batch_pdlp_reliability_branching(0) so the default-constructed
SimplexSolverSettings/related class always sets this field to a defined value.

In `@cpp/src/mip_heuristics/CMakeLists.txt`:
- Line 17: Remove clique_table.cu from the MIP_LP_NECESSARY_FILES list and add
it to the MIP_NON_LP_FILES list in the CMakeLists.txt so that clique_table.cu
(which includes mip_heuristics/utils.cuh) is built only as MIP-non-LP code;
specifically, edit the lines that construct the MIP_LP_NECESSARY_FILES and
MIP_NON_LP_FILES variables to stop listing clique_table.cu in the former and
include it in the latter, and verify any targets or install/grouping logic that
consumes those variables still references the file from the new list.

In `@cpp/src/pdlp/pdlp.cuh`:
- Around line 142-143: The initialization of sb_view_
(dual_simplex::shared_strong_branching_context_view_t<i_t, f_t>) is passing
settings_.shared_sb_solved which is a cuda::std::span<std::atomic<int>> into a
constructor expecting std::span<std::atomic<int>>; explicitly construct a
std::span from the underlying data and size (use
settings_.shared_sb_solved.data() and settings_.shared_sb_solved.size()) so the
types match instead of relying on an implicit conversion between cuda::std::span
and std::span.

In `@cpp/src/pdlp/solve.cu`:
- Around line 897-903: The batch memory estimator always accounts for three full
output buffers but the actual allocations for the third (solution collection)
buffer are skipped when generate_batch_primal_dual_solution is false; update the
total_memory calculation in solve.cu so it conditionally charges the third
output buffer only when generate_batch_primal_dual_solution is true (adjust the
same logic around the corresponding block later in the function where
allocations are made; see references to total_memory, trial_batch_size,
problem.get_n_variables(), problem.get_n_constraints(), and
generate_batch_primal_dual_solution and apply the same fix for the duplicate
estimator at the other occurrence mentioned in the comment).

In `@cpp/src/pdlp/solver_settings.cu`:
- Around line 62-88: The bulk warm-start path in set_pdlp_warm_start_data is
writing initial_primal_weight and initial_step_size directly and bypasses the
numeric guards; update set_pdlp_warm_start_data to validate those values the
same way as set_initial_primal_weight and set_initial_step_size (either by
calling those two setters from set_pdlp_warm_start_data or by applying the same
cuopt_expects checks: >0, !isinf, !isnan) so the warm-start overload enforces
the same invariants before assigning initial_primal_weight_ and
initial_step_size_.

In `@cpp/src/pdlp/translate.hpp`:
- Line 12: Remove the unused CUDA device header include from translate.hpp:
delete the line including <mip_heuristics/problem/problem.cuh> because
translate.hpp only references detail::problem_t from
optimization_problem_interface.hpp and does not use any symbols from
problem.cuh; after removal, confirm translate.hpp still compiles standalone and
that no other code relies on that include being present.

---

Outside diff comments:
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 547-584: The code removes columns via
A_no_slacks.remove_columns(cols_to_remove) but still blindly slices the first n
entries of root_soln, lp.objective, lp.lower, lp.upper into PDLP
(original_root_soln_x and mps_model.set_* calls), which is only correct if all
slack indices in new_slacks are appended (>= n); add an explicit invariant check
or build a proper old→new column map: either assert all new_slacks[j] >= n
before proceeding (and document the invariant), or construct a mapping from
original column index → compacted index after remove_columns and use it to remap
original_root_soln_x, the objective vector, lower/upper bound arrays, and any
branching candidate indices so the CSR matrix and these vectors stay aligned
(refer to cols_to_remove, A_no_slacks, original_root_soln_x,
mps_model.set_objective_coefficients, set_variable_lower_bounds,
set_variable_upper_bounds).

In `@cpp/src/pdlp/pdlp.cu`:
- Around line 801-864: This early-return path fails to notify the
solver-bookkeeping that the remaining resized-batch climbers are finished;
before returning, iterate over
current_termination_strategy_.get_terminations_status().size() and call
sb_view_.mark_solved(...) for each climber using
climber_strategies_[i].original_index (and the appropriate termination/solution
metadata you have just copied into batch_solution_to_return_), mirroring the
behavior used at the later return (line 865) so in-progress Dual Simplex tasks
can stop/skip those chunks.

---

Duplicate comments:
In `@cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp`:
- Around line 303-305: Defaulting generate_batch_primal_dual_solution to false
makes batch solves opt‑in and breaks callers that expect populated
primal/dual/reduced-cost buffers; change the bool
generate_batch_primal_dual_solution default to true (so batch solution
collection is enabled by default/opt‑out) and update the adjacent comment to
reflect "opt-out" semantics so existing tests and code that slice these buffers
continue to work without modification.

In `@cpp/src/branch_and_bound/pseudo_costs.hpp`:
- Around line 416-425: The batch_pdlp_warm_cache_t struct can retain stale
warm-start data after the LP context changes because there is no reset path; add
a member method reset() on batch_pdlp_warm_cache_t that clears initial_primal
and initial_dual (release/resize to 0 on batch_pdlp_handle.get_stream()), sets
step_size and primal_weight back to NaN, pdlp_iteration to -1,
percent_solved_by_batch_pdlp_at_root to f_t(0.0) and populated to false, and
then call this reset() from the pseudo_costs_t code path that
reinitializes/changes the LP context (where the context-reset currently occurs)
so any time cuts/slacks/original-LP are changed the warm cache is invalidated.

In `@cpp/src/branch_and_bound/shared_strong_branching_context.hpp`:
- Around line 38-56: The accessors (is_solved, mark_solved, subview) use asserts
and can overflow in release builds; replace the assert-based checks with runtime
bounds validation and throw std::out_of_range on failure, and avoid signed
arithmetic overflow in subview by converting indices to size_t first.
Concretely: in is_solved(i_t local_idx) and mark_solved(i_t local_idx) cast
local_idx to size_t (e.g. size_t idx = static_cast<size_t>(local_idx)), verify
local_idx >= 0 and idx < solved.size(), and throw std::out_of_range with a clear
message instead of asserting; in subview(i_t offset, i_t count) cast both to
size_t (size_t off = static_cast<size_t>(offset), cnt =
static_cast<size_t>(count)), ensure offset and count are non-negative, check off
<= solved.size() and cnt <= solved.size() - off, then return
{solved.subspan(off, cnt)} to avoid offset+count overflow.

In `@cpp/src/pdlp/solve.cu`:
- Around line 1049-1057: The code only checks settings.shared_sb_solved.empty()
but not its length relative to new_bounds, so subspan(i, current_batch_size) can
out-of-bounds; before the batching loop validate that when
settings.shared_sb_solved is non-empty its size() (or extent()) is >=
max_batch_size (or original_new_bounds.size()), and if not, either trim
max_batch_size/current_batch_size accordingly or return/error; then inside the
loop you can safely call settings.shared_sb_solved.subspan(i,
current_batch_size) to populate batch_settings.shared_sb_solved. Ensure you
reference settings.shared_sb_solved, original_new_bounds/new_bounds,
max_batch_size and subspan in the check and handle the shorter span case
consistently.
- Around line 968-972: The current logic lets settings.sub_batch_size override
the memory-checked limit, which can reintroduce OOMs; instead, when computing
optimal_batch_size in solve.cu (symbols: optimal_batch_size,
settings.sub_batch_size, memory_max_batch_size,
detail::optimal_batch_size_handler, max_batch_size, cuopt_assert), clamp
settings.sub_batch_size to the memory ceiling (e.g., use
min(settings.sub_batch_size, memory_max_batch_size) or validate and reject
values > memory_max_batch_size) so the final optimal_batch_size never exceeds
memory_max_batch_size (and still respects max_batch_size), then keep the
cuopt_assert as a final sanity check.

In `@cpp/tests/linear_programming/pdlp_test.cu`:
- Around line 2243-2254: The fixed sleep is flaky; replace it with a
deterministic synchronization (e.g., std::promise/std::future or std::latch) so
the main thread waits until the solver thread actually enters the cooperative DS
path. Create a promise/future (or latch), pass the signal into the pdlp_thread
lambda (or have solve_lp expose a hook) and have the solver thread set the
promise when the cooperative path is live; then have the main thread wait on the
future before calling sb_view.mark_solved(i). Apply the same change to the other
occurrence around the second pdlp_thread usage.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 982cb111-beb8-4701-9cff-5285678a13b5

📥 Commits

Reviewing files that changed from the base of the PR and between 928a1e1 and dd07576.

📒 Files selected for processing (23)
  • cpp/include/cuopt/linear_programming/constants.h
  • cpp/include/cuopt/linear_programming/mip/solver_settings.hpp
  • cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp
  • cpp/src/branch_and_bound/branch_and_bound.cpp
  • cpp/src/branch_and_bound/pseudo_costs.cpp
  • cpp/src/branch_and_bound/pseudo_costs.hpp
  • cpp/src/branch_and_bound/shared_strong_branching_context.hpp
  • cpp/src/dual_simplex/simplex_solver_settings.hpp
  • cpp/src/math_optimization/solver_settings.cu
  • cpp/src/mip_heuristics/CMakeLists.txt
  • cpp/src/mip_heuristics/solver.cu
  • cpp/src/pdlp/pdlp.cu
  • cpp/src/pdlp/pdlp.cuh
  • cpp/src/pdlp/pdlp_constants.hpp
  • cpp/src/pdlp/restart_strategy/pdlp_restart_strategy.cu
  • cpp/src/pdlp/solve.cu
  • cpp/src/pdlp/solver_settings.cu
  • cpp/src/pdlp/termination_strategy/termination_strategy.cu
  • cpp/src/pdlp/termination_strategy/termination_strategy.hpp
  • cpp/src/pdlp/translate.hpp
  • cpp/src/pdlp/utilities/ping_pong_graph.cu
  • cpp/tests/linear_programming/pdlp_test.cu
  • python/cuopt_server/cuopt_server/utils/linear_programming/data_definition.py
💤 Files with no reviewable changes (1)
  • cpp/src/pdlp/pdlp_constants.hpp

Comment on lines +1044 to +1054
if (simplex_iteration_limit < 1) {
initialize_pseudo_costs_with_estimate(original_lp,
settings,
root_vstatus,
root_solution,
basic_list,
nonbasic_list,
fractional,
basis_factors,
pc);
} else {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Return after the estimate-only fallback.

When simplex_iteration_limit < 1, this branch fills pc.strong_branch_down/up from initialize_pseudo_costs_with_estimate(), but the unconditional merge later still runs with every DS status left UNSET and every PDLP value NaN. That overwrites the estimates back to NaN just before update_pseudo_costs_from_strong_branching().

Suggested fix
   if (simplex_iteration_limit < 1) {
     initialize_pseudo_costs_with_estimate(original_lp,
                                           settings,
                                           root_vstatus,
                                           root_solution,
                                           basic_list,
                                           nonbasic_list,
                                           fractional,
                                           basis_factors,
                                           pc);
+    pc.update_pseudo_costs_from_strong_branching(fractional, root_solution.x);
+    return;
   } else {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 1044 - 1054, When
simplex_iteration_limit < 1 the code calls
initialize_pseudo_costs_with_estimate(...) which populates pc.strong_branch_down
and pc.strong_branch_up, but the function continues and later merges/overwrites
those estimates with UNSET/NaN values; to fix, add an early return (or
conditional skip of the merge/overwrite block) immediately after calling
initialize_pseudo_costs_with_estimate so the function exits with the
estimate-only pseudo-costs intact and does not proceed to the later
merge/overwrite logic before update_pseudo_costs_from_strong_branching().

Comment on lines +1587 to +1595
if (rb_mode != 2) {
#pragma omp taskloop if (num_tasks > 1) priority(task_priority) num_tasks(num_tasks) \
shared(score_mutex, \
sb_view, \
dual_simplex_obj_down, \
dual_simplex_obj_up, \
dual_simplex_status_down, \
dual_simplex_status_up, \
unreliable_list)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fallback to Dual Simplex when PDLP-only RB gets disabled.

This gate uses rb_mode != 2, but use_pdlp can still be false in mode 2 (empty warm cache, deterministic mode, sub-MIP, too few candidates, low root hit rate). In that case the PDLP task is skipped too, so none of the unreliable candidates are actually evaluated even though the earlier logs say "using DS only".

One way to preserve the advertised fallback
-  if (rb_mode != 2) {
+  const bool run_dual_simplex = (rb_mode != 2) || !use_pdlp;
+  if (run_dual_simplex) {
 `#pragma` omp taskloop if (num_tasks > 1) priority(task_priority) num_tasks(num_tasks) \
   shared(score_mutex,                                                                \
            sb_view,                                                                  \
            dual_simplex_obj_down,                                                    \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 1587 - 1595, The
current gate "if (rb_mode != 2)" skips the whole taskloop when rb_mode==2 even
if use_pdlp is false, so unreliable_list candidates never get evaluated; change
the condition to allow Dual Simplex fallback when PDLP is disabled by using "if
(rb_mode != 2 || !use_pdlp)" (or logically equivalent) around the
pragma/taskloop that references score_mutex, sb_view, dual_simplex_obj_down/up,
dual_simplex_status_down/up and unreliable_list, and update any related log text
so it correctly reflects that DS will run when PDLP is unavailable.

omp_atomic_t<i_t> num_strong_branches_completed = 0;
omp_atomic_t<int64_t> strong_branching_lp_iter = 0;

batch_pdlp_warm_cache_t<i_t, f_t> pdlp_warm_cache;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't store the warm-start cache as unsynchronized shared worker state.

pseudo_costs_t already uses omp_atomic_t and omp_mutex_t for data that workers update concurrently, but pdlp_warm_cache is a plain mutable member. If two workers hit reliability/strong branching at once, they can race on the cached device buffers, scalars, and populated, cross-contaminating warm starts between nodes. Make the cache worker-local or guard it with a dedicated mutex.

As per coding guidelines "Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.hpp` at line 541, The pdlp_warm_cache
member on pseudo_costs_t is mutable shared state and can race when multiple
workers perform reliability/strong branching; make the cache worker-local or add
synchronization: either move the batch_pdlp_warm_cache_t<i_t, f_t>
pdlp_warm_cache out of the shared pseudo_costs_t (create a stack/local instance
inside the branching worker routine) or add a dedicated omp_mutex_t (e.g.,
pdlp_warm_cache_mutex) and wrap every access/update to pdlp_warm_cache in a
lock/unlock pair (similar to existing omp_atomic_t/omp_mutex_t usage) to prevent
cross-contamination of device buffers, scalars, and populated state. Ensure all
functions that read or write pdlp_warm_cache (reliability/strong branching
paths) use the new mutex or the local per-worker instance.

Comment on lines +194 to +197
i_t
mip_batch_pdlp_strong_branching; // 0 = DS only, 1 = cooperative DS + PDLP, 2 = batch PDLP only
i_t mip_batch_pdlp_reliability_branching; // 0 = DS only, 1 = cooperative DS + PDLP, 2 = batch
// PDLP only
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '100,112p' cpp/src/dual_simplex/simplex_solver_settings.hpp
echo
sed -n '1,40p' cpp/src/mip_heuristics/diversity/recombiners/sub_mip.cuh
echo
sed -n '1,40p' cpp/src/mip_heuristics/diversity/lns/rins.cu

Repository: NVIDIA/cuopt

Length of output: 3347


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Get the full file to understand structure
echo "=== File structure and class definition ==="
wc -l cpp/src/dual_simplex/simplex_solver_settings.hpp

echo -e "\n=== Constructor and initializer list (lines 40-130) ==="
sed -n '40,130p' cpp/src/dual_simplex/simplex_solver_settings.hpp

echo -e "\n=== Field declarations around lines 180-210 ==="
sed -n '180,210p' cpp/src/dual_simplex/simplex_solver_settings.hpp

echo -e "\n=== Check for in-class initializers (search for = in field declarations) ==="
rg 'mip_batch_pdlp_reliability_branching|mip_batch_pdlp_strong_branching' cpp/src/dual_simplex/simplex_solver_settings.hpp -A 1 -B 1

echo -e "\n=== Check other constructors ==="
rg 'simplex_solver_settings_t\s*\(' cpp/src/dual_simplex/simplex_solver_settings.hpp

Repository: NVIDIA/cuopt

Length of output: 5723


mip_batch_pdlp_reliability_branching is not initialized on default construction.

The constructor initializer list includes mip_batch_pdlp_strong_branching(0) but omits mip_batch_pdlp_reliability_branching, and the field declaration has no in-class initializer. Default-constructed instances will have an indeterminate value, violating the guideline to "validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving." Add mip_batch_pdlp_reliability_branching(0), to the constructor initializer list.

🔧 Proposed fix
       cut_change_threshold(1e-3),
       cut_min_orthogonality(0.5),
       mip_batch_pdlp_strong_branching(0),
+      mip_batch_pdlp_reliability_branching(0),
       strong_branching_simplex_iteration_limit(-1),
       random_seed(0),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/dual_simplex/simplex_solver_settings.hpp` around lines 194 - 197, The
member mip_batch_pdlp_reliability_branching is not initialized in the class
constructor, leaving it with indeterminate value; update the constructor
initializer list (next to mip_batch_pdlp_strong_branching(0)) to include
mip_batch_pdlp_reliability_branching(0) so the default-constructed
SimplexSolverSettings/related class always sets this field to a defined value.

${CMAKE_CURRENT_SOURCE_DIR}/presolve/third_party_presolve.cpp
${CMAKE_CURRENT_SOURCE_DIR}/presolve/gf2_presolve.cpp
${CMAKE_CURRENT_SOURCE_DIR}/solution/solution.cu
${CMAKE_CURRENT_SOURCE_DIR}/presolve/conflict_graph/clique_table.cu
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read-only verification: confirm whether clique_table LP-only inclusion pulls MIP-coupled symbols.

set -euo pipefail

echo "== Locate relevant files =="
fd -a 'CMakeLists\.txt$' | rg 'cpp/src/mip_heuristics/CMakeLists\.txt'
fd -a 'clique_table\.cu$'
fd -a 'utils\.cuh$'
fd -a 'problem\.cuh$'

echo
echo "== CMake source set selection =="
rg -n -C3 'MIP_LP_NECESSARY_FILES|MIP_NON_LP_FILES|BUILD_LP_ONLY|clique_table\.cu' cpp/src/mip_heuristics/CMakeLists.txt

echo
echo "== Include chain from clique_table.cu =="
rg -n -C2 '^#include' cpp/src/mip_heuristics/presolve/conflict_graph/clique_table.cu

echo
echo "== MIP/problem coupling markers inside utils.cuh =="
rg -n -C2 'problem_t|mixed_integer|branch|cut|mip' cpp/src/mip_heuristics/utils.cuh

Repository: NVIDIA/cuopt

Length of output: 4157


🏁 Script executed:

# Check where problem_t is defined and understand its scope
rg -n 'template.*problem_t|struct problem_t|class problem_t' cpp/src/mip_heuristics/problem/problem.cuh | head -20

Repository: NVIDIA/cuopt

Length of output: 76


🏁 Script executed:

# Check what problem_t is actually used for in utils.cuh to understand coupling severity
rg -n 'problem_t' cpp/src/mip_heuristics/utils.cuh | head -30

Repository: NVIDIA/cuopt

Length of output: 441


🏁 Script executed:

# Verify whether clique_table.cu functionality is actually required for LP-only builds
# by checking if there are any references to it in LP-only code paths
rg -r 'clique_table' cpp/src --type-list | head -20

Repository: NVIDIA/cuopt

Length of output: 445


🏁 Script executed:

# Get the full problem_t class definition to understand its scope
sed -n '56,150p' cpp/src/mip_heuristics/problem/problem.cuh

Repository: NVIDIA/cuopt

Length of output: 5038


🏁 Script executed:

# Find all files that reference clique_table (functions or includes)
rg 'clique_table' cpp/src --type cpp --type cu

Repository: NVIDIA/cuopt

Length of output: 82


🏁 Script executed:

# Check if problem.cu (which is in MIP_LP_NECESSARY_FILES) actually uses problem_t in LP-only context
head -50 cpp/src/mip_heuristics/problem/problem.cu

Repository: NVIDIA/cuopt

Length of output: 1548


🏁 Script executed:

# Search for references to clique_table functions/files more carefully
rg 'clique_table' cpp/src --type cpp

Repository: NVIDIA/cuopt

Length of output: 3657


🏁 Script executed:

# Check problem.cuh for clique_table member and where it's initialized
rg -n 'clique_table' cpp/src/mip_heuristics/problem/problem.cuh

Repository: NVIDIA/cuopt

Length of output: 143


🏁 Script executed:

# Search for clique_table_t definition
rg -n 'class clique_table_t|struct clique_table_t|template.*clique_table_t' cpp/src

Repository: NVIDIA/cuopt

Length of output: 449


🏁 Script executed:

# Search for where clique_table member is initialized/used in problem code
rg -n 'clique_table\s*=' cpp/src/mip_heuristics/problem/ -A2 -B2

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Check if clique_table building/initialization is in problem.cu or elsewhere
rg -n 'clique_table_t<' cpp/src/mip_heuristics/problem/problem.cu

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Verify: does clique_table.cu define functions that are called from LP-safe code?
rg -n '^(namespace|template|void|bool|auto|int|std::)' cpp/src/mip_heuristics/presolve/conflict_graph/clique_table.cu | head -30

Repository: NVIDIA/cuopt

Length of output: 1651


🏁 Script executed:

# Check if clique_table building is triggered from problem.cu initialization
rg -n 'make_clique|build.*clique|clique.*build' cpp/src/mip_heuristics/

Repository: NVIDIA/cuopt

Length of output: 492


🏁 Script executed:

# Find all callers of build_clique_table
rg -n 'build_clique_table' cpp/src --type cpp -B2 -A2

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Check if build_clique_table is called from problem.cu or problem-related code
rg -n 'build_clique_table' cpp/src/mip_heuristics/problem/

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Verify the full include chain: does clique_table.cu actually get compiled in LP-only?
# Check if there's any conditional compilation guards
rg -n 'BUILD_LP_ONLY|#if|#ifdef' cpp/src/mip_heuristics/presolve/conflict_graph/clique_table.cu | head -20

Repository: NVIDIA/cuopt

Length of output: 506


🏁 Script executed:

# Find all references to clique_table in problem.cu
rg -n 'clique_table' cpp/src/mip_heuristics/problem/problem.cu

Repository: NVIDIA/cuopt

Length of output: 221


🏁 Script executed:

# Check problem.cu to see if it initializes or uses the clique_table member
sed -n '1,100p' cpp/src/mip_heuristics/problem/problem.cu | rg -n 'clique'

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Get summary: who actually calls build_clique_table?
rg -r 'build_clique_table\(' cpp/src --type cpp

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Final confirmation: verify that clique_table functions are truly never called in LP-only context
# by checking all instantiations and calls
rg -n 'clique_table_t<' cpp/src/cuts/cuts.cpp cpp/src/branch_and_bound/branch_and_bound.cpp | head -5

Repository: NVIDIA/cuopt

Length of output: 531


🏁 Script executed:

# Double-check: are there any LP-only modules that might indirectly call clique_table?
# Check what problem.cu actually does with clique_table
rg -n -A5 'clique_table' cpp/src/mip_heuristics/problem/problem.cu | head -20

Repository: NVIDIA/cuopt

Length of output: 912


Move clique_table.cu from MIP_LP_NECESSARY_FILES to MIP_NON_LP_FILES

clique_table.cu is MIP-exclusive functionality (only used by branch-and-bound and cutting plane generation), yet it includes mip_heuristics/utils.cuh, which brings MIP-coupled types and headers into the LP-only build path unnecessarily. The file should be in MIP_NON_LP_FILES instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/CMakeLists.txt` at line 17, Remove clique_table.cu
from the MIP_LP_NECESSARY_FILES list and add it to the MIP_NON_LP_FILES list in
the CMakeLists.txt so that clique_table.cu (which includes
mip_heuristics/utils.cuh) is built only as MIP-non-LP code; specifically, edit
the lines that construct the MIP_LP_NECESSARY_FILES and MIP_NON_LP_FILES
variables to stop listing clique_table.cu in the former and include it in the
latter, and verify any targets or install/grouping logic that consumes those
variables still references the file from the new list.

Comment on lines +142 to +143
dual_simplex::shared_strong_branching_context_view_t<i_t, f_t> sb_view_{
settings_.shared_sb_solved};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== shared_sb_solved declaration ==="
rg -n -C2 --type=cpp 'shared_sb_solved' cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp

echo "=== shared_strong_branching_context_view_t span type ==="
rg -n -C3 --type=cpp 'shared_strong_branching_context_view_t|std::span<|cuda::std::span<' cpp/src/branch_and_bound/shared_strong_branching_context.hpp

echo "=== sb_view_ initialization site ==="
rg -n -C2 --type=cpp 'sb_view_|shared_sb_solved' cpp/src/pdlp/pdlp.cuh

Repository: NVIDIA/cuopt

Length of output: 1441


Use explicit span conversion to ensure cuda::std::spanstd::span compatibility.

The constructor at lines 142–143 receives settings_.shared_sb_solved (type cuda::std::span<std::atomic<int>>) but expects std::span<std::atomic<int>>. These are distinct types from different namespaces. Avoid relying on implicit conversion; rewrite as:

Suggested fix
dual_simplex::shared_strong_branching_context_view_t<i_t, f_t> sb_view_{
    std::span<std::atomic<int>>(settings_.shared_sb_solved.data(), settings_.shared_sb_solved.size())};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/pdlp.cuh` around lines 142 - 143, The initialization of sb_view_
(dual_simplex::shared_strong_branching_context_view_t<i_t, f_t>) is passing
settings_.shared_sb_solved which is a cuda::std::span<std::atomic<int>> into a
constructor expecting std::span<std::atomic<int>>; explicitly construct a
std::span from the underlying data and size (use
settings_.shared_sb_solved.data() and settings_.shared_sb_solved.size()) so the
types match instead of relying on an implicit conversion between cuda::std::span
and std::span.

Comment on lines +897 to +903
total_memory += problem.get_n_variables() * trial_batch_size * sizeof(f_t);
total_memory += problem.get_n_constraints() * trial_batch_size * sizeof(f_t);
total_memory += problem.get_n_variables() * trial_batch_size * sizeof(f_t);

// Add a 50% overhead to make sure we have enough memory considering other parts of the solver may
// allocate at the same time
total_memory *= 1.5;
// Add a 70% overhead to make sure we have enough memory considering other parts of the solver may
// need memory later while the batch PDLP is running
total_memory *= 1.7;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep the batch memory estimator consistent with optional solution collection.

The estimator always charges three full output buffers, but the actual allocations later in this function are skipped when generate_batch_primal_dual_solution is false. On large strong-branching batches that can force unnecessarily tiny sub-batches or even return NumericalError for a batch that would fit without solution collection.

Also applies to: 1004-1012

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/solve.cu` around lines 897 - 903, The batch memory estimator
always accounts for three full output buffers but the actual allocations for the
third (solution collection) buffer are skipped when
generate_batch_primal_dual_solution is false; update the total_memory
calculation in solve.cu so it conditionally charges the third output buffer only
when generate_batch_primal_dual_solution is true (adjust the same logic around
the corresponding block later in the function where allocations are made; see
references to total_memory, trial_batch_size, problem.get_n_variables(),
problem.get_n_constraints(), and generate_batch_primal_dual_solution and apply
the same fix for the duplicate estimator at the other occurrence mentioned in
the comment).

Comment on lines 62 to 88
void pdlp_solver_settings_t<i_t, f_t>::set_initial_step_size(f_t initial_step_size)
{
cuopt_expects(initial_step_size > f_t(0),
error_type_t::ValidationError,
"Initial step size must be greater than 0");
cuopt_expects(!std::isinf(initial_step_size),
error_type_t::ValidationError,
"Initial step size must be finite");
cuopt_expects(!std::isnan(initial_step_size),
error_type_t::ValidationError,
"Initial step size must be a number");
initial_step_size_ = std::make_optional(initial_step_size);
}

template <typename i_t, typename f_t>
void pdlp_solver_settings_t<i_t, f_t>::set_initial_primal_weight(f_t initial_primal_weight)
{
cuopt_expects(initial_primal_weight > f_t(0),
error_type_t::ValidationError,
"Initial primal weight must be greater than 0");
cuopt_expects(!std::isinf(initial_primal_weight),
error_type_t::ValidationError,
"Initial primal weight must be finite");
cuopt_expects(!std::isnan(initial_primal_weight),
error_type_t::ValidationError,
"Initial primal weight must be a number");
initial_primal_weight_ = std::make_optional(initial_primal_weight);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The bulk warm-start path still bypasses these new numeric guards.

set_pdlp_warm_start_data(...) later in this file writes the warm-start initial_primal_weight and initial_step_size directly at Lines 316-317, so callers using that overload can still inject <=0, inf, or nan values. Please reuse the same validation there so both entry points enforce the same solver-state invariant. As per coding guidelines: "Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)".

🔧 Suggested direction
   cuopt_expects(last_restart_duality_gap_dual_solution != nullptr,
                 error_type_t::ValidationError,
                 "last_restart_duality_gap_dual_solution cannot be null");
+  cuopt_expects(initial_primal_weight > f_t(0) &&
+                  !std::isinf(initial_primal_weight) &&
+                  !std::isnan(initial_primal_weight),
+                error_type_t::ValidationError,
+                "Initial primal weight must be finite and greater than 0");
+  cuopt_expects(initial_step_size > f_t(0) &&
+                  !std::isinf(initial_step_size) &&
+                  !std::isnan(initial_step_size),
+                error_type_t::ValidationError,
+                "Initial step size must be finite and greater than 0");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/solver_settings.cu` around lines 62 - 88, The bulk warm-start
path in set_pdlp_warm_start_data is writing initial_primal_weight and
initial_step_size directly and bypasses the numeric guards; update
set_pdlp_warm_start_data to validate those values the same way as
set_initial_primal_weight and set_initial_step_size (either by calling those two
setters from set_pdlp_warm_start_data or by applying the same cuopt_expects
checks: >0, !isinf, !isnan) so the warm-start overload enforces the same
invariants before assigning initial_primal_weight_ and initial_step_size_.


#include <cuopt/linear_programming/optimization_problem_interface.hpp>

#include <mip_heuristics/problem/problem.cuh>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Check if problem.cuh types are used in translate.hpp and find potential compilation issues

echo "=== 1. Search for usage of mip_heuristics namespace/types in translate.hpp ==="
rg -n "mip_heuristics::" cpp/src/pdlp/translate.hpp || echo "No direct mip_heuristics namespace usage found"

echo -e "\n=== 2. Find all files that include translate.hpp ==="
rg -n '#include.*translate\.hpp' --type=cpp --type=cu -C2

echo -e "\n=== 3. Check if translate.hpp is included in non-CUDA files (.cpp) ==="
rg -l '#include.*translate\.hpp' --type=cpp -g '!*.cu' || echo "No .cpp files include translate.hpp"

echo -e "\n=== 4. Search for problem_t usage to verify source ==="
rg -n "detail::problem_t" cpp/src/pdlp/translate.hpp

echo -e "\n=== 5. Check what problem.cuh exports that might be needed ==="
rg -n "^(class|struct|enum|namespace|template)" cpp/src/mip_heuristics/problem/problem.cuh | head -20

Repository: NVIDIA/cuopt

Length of output: 994


Remove unused CUDA device header include.

Line 12 adds #include <mip_heuristics/problem/problem.cuh>, but verification shows no symbols from this include are used in translate.hpp. Functions use detail::problem_t from optimization_problem_interface.hpp (line 10), not from problem.cuh. Additionally, no files currently include translate.hpp, so there is no compilation impact at present.

Remove this unused include.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/translate.hpp` at line 12, Remove the unused CUDA device header
include from translate.hpp: delete the line including
<mip_heuristics/problem/problem.cuh> because translate.hpp only references
detail::problem_t from optimization_problem_interface.hpp and does not use any
symbols from problem.cuh; after removal, confirm translate.hpp still compiles
standalone and that no other code relies on that include being present.

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 3, 2026

/ok to test bb50ad1

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/pdlp/pdlp.cu`:
- Around line 780-798: Add a single upfront cuopt_assert before using
climber_strategies_[i].original_index to ensure the shared strong-branching span
matches the original batch width: when sb_view_.is_valid() is true, assert that
sb_view_ (or shared_sb_solved/span) size/width equals the original batch width
used to generate climber_strategies_ (so original_index is in-bounds for
shared_strong_branching_context_view_t). Place this check immediately inside the
if (sb_view_.is_valid()) block guarding the loop that accesses
climber_strategies_ (and add the same guard at the other occurrences noted
around the blocks using original_index at the locations corresponding to lines
~849-850, ~866-869, and ~928-929).
- Around line 789-790: When you set a climber to
pdlp_termination_status_t::ConcurrentLimit via
current_termination_strategy_.set_termination_status, do not serialize PDLP
iterates into that climber's result: either propagate the external solver's
result into the per-climber payload or skip filling primal/dual/reduced-cost
buffers, step metadata and the solved_by field for that climber. Locate the
blocks that read from pdhg_solver_/PDLP state and populate per-climber solution
data (the occurrences around the current call and the other ranges noted:
~818-849, 872-878, 897-928) and add a guard that if status ==
pdlp_termination_status_t::ConcurrentLimit then avoid populating the iterate
buffers and step metadata (or instead copy the externally-provided result
structure) and ensure solved_by is set consistently from the external-solver
branch.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 17941937-9cfb-4973-a40a-f3ce01b5a538

📥 Commits

Reviewing files that changed from the base of the PR and between dd07576 and bb50ad1.

📒 Files selected for processing (1)
  • cpp/src/pdlp/pdlp.cu

Comment on lines +780 to +798
if (sb_view_.is_valid()) {
for (size_t i = 0; i < climber_strategies_.size(); ++i) {
// If PDLP has solved it to optimality we want to keep it and resolved both solvers having
// solved the problem later
if (current_termination_strategy_.is_done(
current_termination_strategy_.get_termination_status(i)))
continue;
const i_t local_idx = climber_strategies_[i].original_index;
if (sb_view_.is_solved(local_idx)) {
current_termination_strategy_.set_termination_status(
i, pdlp_termination_status_t::ConcurrentLimit);
#ifdef BATCH_VERBOSE_MODE
std::cout << "[COOP SB] DS already solved climber " << i << " (original_index " << local_idx
<< "), synced to ConcurrentLimit at step " << internal_solver_iterations_
<< std::endl;
#endif
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate shared_sb_solved against the original batch size before using original_index.

sb_view_.is_valid() only proves the span is non-empty. These loads/stores use climber_strategies_[i].original_index, which stays in the original batch numbering after compaction. If the caller passes a shorter span, the assert inside shared_strong_branching_context_view_t disappears in release builds and this becomes an out-of-bounds atomic access. Please add a single upfront cuopt_assert that the shared span matches the original batch width.

Suggested guard
+if (sb_view_.is_valid()) {
+  cuopt_assert(sb_view_.solved.size() == settings_.new_bounds.size(),
+               "shared_sb_solved must match the original batch size");
+}
+
 if (sb_view_.is_valid()) {
   for (size_t i = 0; i < climber_strategies_.size(); ++i) {

As per coding guidelines, "Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations".

Also applies to: 849-850, 866-869, 928-929

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/pdlp.cu` around lines 780 - 798, Add a single upfront
cuopt_assert before using climber_strategies_[i].original_index to ensure the
shared strong-branching span matches the original batch width: when
sb_view_.is_valid() is true, assert that sb_view_ (or shared_sb_solved/span)
size/width equals the original batch width used to generate climber_strategies_
(so original_index is in-bounds for shared_strong_branching_context_view_t).
Place this check immediately inside the if (sb_view_.is_valid()) block guarding
the loop that accesses climber_strategies_ (and add the same guard at the other
occurrences noted around the blocks using original_index at the locations
corresponding to lines ~849-850, ~866-869, and ~928-929).

Comment on lines +789 to +790
current_termination_strategy_.set_termination_status(
i, pdlp_termination_status_t::ConcurrentLimit);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't serialize PDLP iterates as the result for ConcurrentLimit climbers.

Once a climber is flipped to pdlp_termination_status_t::ConcurrentLimit, this method still fills its primal/dual/reduced-cost buffers and step metadata from pdhg_solver_ / PDLP state. That means the status says “externally finished”, but the payload is just PDLP’s last iterate, and solved_by is only written on the PDLP branch. Please either plumb the external solver result through here or skip populating per-climber solution payloads for ConcurrentLimit.

As per coding guidelines, "Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results".

Also applies to: 818-849, 872-878, 897-928

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/pdlp.cu` around lines 789 - 790, When you set a climber to
pdlp_termination_status_t::ConcurrentLimit via
current_termination_strategy_.set_termination_status, do not serialize PDLP
iterates into that climber's result: either propagate the external solver's
result into the per-climber payload or skip filling primal/dual/reduced-cost
buffers, step metadata and the solved_by field for that climber. Locate the
blocks that read from pdhg_solver_/PDLP state and populate per-climber solution
data (the occurrences around the current call and the other ranges noted:
~818-849, 872-878, 897-928) and add a guard that if status ==
pdlp_termination_status_t::ConcurrentLimit then avoid populating the iterate
buffers and step metadata (or instead copy the externally-provided result
structure) and ensure solved_by is set consistently from the external-solver
branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality mip non-breaking Introduces a non-breaking change P1 pdlp

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants