Race batch PDLP and dual simplex in strong branching / reliability branching#994
Race batch PDLP and dual simplex in strong branching / reliability branching#994Kh4ster wants to merge 74 commits intorelease/26.04from
Conversation
…+ fail safe if batch pdlp fails
…te_lp_problem' into race_strong_branching_realibility_branching
…ng, correctly fill the ds_obj objective before merging results at the root, correctly clamp the PDLP objective, remove the unucessary cuopt_assert regarding fixed point error
|
/ok to test de8fac6 |
chris-maes
left a comment
There was a problem hiding this comment.
LGTM. I mostly checked the interface code. I did not review as the deep changes to support reliability branching in batch PDLP.
|
/ok to test 8f0e2f6 |
|
Still waiting for one last thing before merging |
|
/ok to test 329c255 |
|
/ok to test aecae93 |
|
/ok to test 0d5ae1c |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
/ok to test 7504340 |
|
/ok to test a7f79fa |
|
/ok to test dd07576 |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
cpp/src/branch_and_bound/pseudo_costs.cpp (1)
547-584:⚠️ Potential issue | 🟠 MajorAssert or remap the slack-column suffix invariant.
remove_columns(cols_to_remove)treatsnew_slacksas arbitrary column indices, but Lines 560-584 still feed the firstnentries ofroot_soln,objective,lower, andupperinto PDLP. That only stays aligned if every slack index already lives in[n, lp.num_cols). Otherwise the compacted matrix, the vectors, and the branching indices no longer describe the same columns.Minimal guard if appended slacks are the intended invariant
cuopt::mps_parser::mps_data_model_t<i_t, f_t> mps_model; int m = lp.num_rows; int n = lp.num_cols - new_slacks.size(); original_root_soln_x.resize(n); + for (i_t slack : new_slacks) { + assert(slack >= n && + "simplex_problem_to_mps_data_model assumes slack columns are appended"); + }If that invariant is not guaranteed, this helper needs a real old→new column map and the callers need to remap the candidate indices as well.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 547 - 584, The code removes columns via A_no_slacks.remove_columns(cols_to_remove) but still blindly slices the first n entries of root_soln, lp.objective, lp.lower, lp.upper into PDLP (original_root_soln_x and mps_model.set_* calls), which is only correct if all slack indices in new_slacks are appended (>= n); add an explicit invariant check or build a proper old→new column map: either assert all new_slacks[j] >= n before proceeding (and document the invariant), or construct a mapping from original column index → compacted index after remove_columns and use it to remap original_root_soln_x, the objective vector, lower/upper bound arrays, and any branching candidate indices so the CSR matrix and these vectors stay aligned (refer to cols_to_remove, A_no_slacks, original_root_soln_x, mps_model.set_objective_coefficients, set_variable_lower_bounds, set_variable_upper_bounds).cpp/src/pdlp/pdlp.cu (1)
801-864:⚠️ Potential issue | 🟠 MajorMark the last resized-batch climbers as solved before this return.
Unlike Line 865, this early-return branch never calls
sb_view_.mark_solved(...)for the climbers still left in the batch. Any Dual Simplex task that has not started those branches yet will keep solving them even though PDLP already finished them, so the final chunk cannot be stolen/skipped.Suggested fix
if (original_batch_size != climber_strategies_.size()) { `#ifdef` BATCH_VERBOSE_MODE std::cout << "Original batch size was " << original_batch_size << " but is now " << climber_strategies_.size() << std::endl; `#endif` @@ if (current_termination_strategy_.get_termination_status(i) != pdlp_termination_status_t::ConcurrentLimit) { batch_solution_to_return_ .get_additional_termination_informations()[climber_strategies_[i].original_index] .solved_by = method_t::PDLP; } } + if (sb_view_.is_valid()) { + for (size_t i = 0; i < climber_strategies_.size(); ++i) { + sb_view_.mark_solved(climber_strategies_[i].original_index); + } + } current_termination_strategy_.fill_gpu_terms_stats(total_pdlp_iterations_); RAFT_CUDA_TRY(cudaStreamSynchronize(stream_view_)); current_termination_strategy_.convert_gpu_terms_stats_to_host( batch_solution_to_return_.get_additional_termination_informations()); return optimization_problem_solution_t<i_t, f_t>{🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/pdlp.cu` around lines 801 - 864, This early-return path fails to notify the solver-bookkeeping that the remaining resized-batch climbers are finished; before returning, iterate over current_termination_strategy_.get_terminations_status().size() and call sb_view_.mark_solved(...) for each climber using climber_strategies_[i].original_index (and the appropriate termination/solution metadata you have just copied into batch_solution_to_return_), mirroring the behavior used at the later return (line 865) so in-progress Dual Simplex tasks can stop/skip those chunks.
♻️ Duplicate comments (6)
cpp/src/branch_and_bound/pseudo_costs.hpp (1)
416-425:⚠️ Potential issue | 🟠 MajorInvalidate the PDLP warm-start cache when the LP context changes.
Once
populatedflips true, this struct has no reset path, so reusing the samepseudo_costs_tafter cuts/slacks/original-LP changes can replay stale warm-start vectors and scalars into a different batch LP. Add areset()helper here and call it from the context-reset path.Suggested fix
struct batch_pdlp_warm_cache_t { const raft::handle_t batch_pdlp_handle{}; rmm::device_uvector<f_t> initial_primal{0, batch_pdlp_handle.get_stream()}; rmm::device_uvector<f_t> initial_dual{0, batch_pdlp_handle.get_stream()}; f_t step_size{std::numeric_limits<f_t>::signaling_NaN()}; f_t primal_weight{std::numeric_limits<f_t>::signaling_NaN()}; i_t pdlp_iteration{-1}; f_t percent_solved_by_batch_pdlp_at_root{f_t(0.0)}; bool populated{false}; + + void reset() + { + initial_primal.resize(0, batch_pdlp_handle.get_stream()); + initial_dual.resize(0, batch_pdlp_handle.get_stream()); + step_size = std::numeric_limits<f_t>::signaling_NaN(); + primal_weight = std::numeric_limits<f_t>::signaling_NaN(); + pdlp_iteration = -1; + percent_solved_by_batch_pdlp_at_root = f_t(0.0); + populated = false; + } };As per coding guidelines "Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/pseudo_costs.hpp` around lines 416 - 425, The batch_pdlp_warm_cache_t struct can retain stale warm-start data after the LP context changes because there is no reset path; add a member method reset() on batch_pdlp_warm_cache_t that clears initial_primal and initial_dual (release/resize to 0 on batch_pdlp_handle.get_stream()), sets step_size and primal_weight back to NaN, pdlp_iteration to -1, percent_solved_by_batch_pdlp_at_root to f_t(0.0) and populated to false, and then call this reset() from the pseudo_costs_t code path that reinitializes/changes the LP context (where the context-reset currently occurs) so any time cuts/slacks/original-LP are changed the warm cache is invalidated.cpp/src/pdlp/solve.cu (2)
1049-1057:⚠️ Potential issue | 🟠 MajorValidate
shared_sb_solvedbefore taking subspans.The only guard here is
!empty(). If the caller passes fewer solved flags thannew_bounds.size(),subspan(i, current_batch_size)walks past the end of the coordination array and corrupts cooperative termination state. Check the span length once before the loop.Suggested fix
+ cuopt_assert(settings.shared_sb_solved.empty() || + settings.shared_sb_solved.size() >= max_batch_size, + "shared_sb_solved must cover every batch entry"); for (size_t i = 0; i < max_batch_size; i += optimal_batch_size) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/solve.cu` around lines 1049 - 1057, The code only checks settings.shared_sb_solved.empty() but not its length relative to new_bounds, so subspan(i, current_batch_size) can out-of-bounds; before the batching loop validate that when settings.shared_sb_solved is non-empty its size() (or extent()) is >= max_batch_size (or original_new_bounds.size()), and if not, either trim max_batch_size/current_batch_size accordingly or return/error; then inside the loop you can safely call settings.shared_sb_solved.subspan(i, current_batch_size) to populate batch_settings.shared_sb_solved. Ensure you reference settings.shared_sb_solved, original_new_bounds/new_bounds, max_batch_size and subspan in the check and handle the shorter span case consistently.
968-972:⚠️ Potential issue | 🟠 MajorDon't let
sub_batch_sizebypass the memory ceiling.
memory_max_batch_sizeis computed specifically to avoid OOM, but a positivesettings.sub_batch_sizeoverwrites it unchecked. A caller can request a sub-batch larger than the memory-checked limit and reintroduce the allocation failure this code just avoided.Suggested fix
size_t optimal_batch_size = use_optimal_batch_size ? detail::optimal_batch_size_handler(problem, memory_max_batch_size) : max_batch_size; - if (settings.sub_batch_size > 0) { optimal_batch_size = settings.sub_batch_size; } + if (settings.sub_batch_size > 0) { + const auto requested_sub_batch_size = static_cast<size_t>(settings.sub_batch_size); + cuopt_assert(requested_sub_batch_size <= memory_max_batch_size, + "sub_batch_size exceeds the memory-checked batch size"); + optimal_batch_size = requested_sub_batch_size; + }As per coding guidelines "Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/solve.cu` around lines 968 - 972, The current logic lets settings.sub_batch_size override the memory-checked limit, which can reintroduce OOMs; instead, when computing optimal_batch_size in solve.cu (symbols: optimal_batch_size, settings.sub_batch_size, memory_max_batch_size, detail::optimal_batch_size_handler, max_batch_size, cuopt_assert), clamp settings.sub_batch_size to the memory ceiling (e.g., use min(settings.sub_batch_size, memory_max_batch_size) or validate and reject values > memory_max_batch_size) so the final optimal_batch_size never exceeds memory_max_batch_size (and still respects max_batch_size), then keep the cuopt_assert as a final sanity check.cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp (1)
303-305:⚠️ Potential issue | 🔴 CriticalKeep batch solution collection opt-out, not opt-in.
With this default at
false, generic batch LP solves now return empty primal/dual/reduced-cost buffers unless every caller remembers to opt in. Existing batch paths incpp/tests/linear_programming/pdlp_test.custill slice those buffers unconditionally, so this regresses the non-MIP batch API.Suggested fix
- // By default to save memory and speed we don't store and copy each climber's primal and dual - // solutions We only retrieve termination statistics and the objective values - bool generate_batch_primal_dual_solution{false}; + // Preserve the historical batch behavior and collect per-climber solutions unless callers opt out. + bool generate_batch_primal_dual_solution{true};🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp` around lines 303 - 305, Defaulting generate_batch_primal_dual_solution to false makes batch solves opt‑in and breaks callers that expect populated primal/dual/reduced-cost buffers; change the bool generate_batch_primal_dual_solution default to true (so batch solution collection is enabled by default/opt‑out) and update the adjacent comment to reflect "opt-out" semantics so existing tests and code that slice these buffers continue to work without modification.cpp/src/branch_and_bound/shared_strong_branching_context.hpp (1)
38-56:⚠️ Potential issue | 🟠 MajorMake the view helpers release-safe and overflow-safe.
These accessors still rely on
assert, so a bad index becomes unchecked UB in release builds.subview()also computesoffset + countini_tbefore the cast, which can overflow; validate the casted sizes separately and use those converted indices foroperator[]/subspan.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/shared_strong_branching_context.hpp` around lines 38 - 56, The accessors (is_solved, mark_solved, subview) use asserts and can overflow in release builds; replace the assert-based checks with runtime bounds validation and throw std::out_of_range on failure, and avoid signed arithmetic overflow in subview by converting indices to size_t first. Concretely: in is_solved(i_t local_idx) and mark_solved(i_t local_idx) cast local_idx to size_t (e.g. size_t idx = static_cast<size_t>(local_idx)), verify local_idx >= 0 and idx < solved.size(), and throw std::out_of_range with a clear message instead of asserting; in subview(i_t offset, i_t count) cast both to size_t (size_t off = static_cast<size_t>(offset), cnt = static_cast<size_t>(count)), ensure offset and count are non-negative, check off <= solved.size() and cnt <= solved.size() - off, then return {solved.subspan(off, cnt)} to avoid offset+count overflow.cpp/tests/linear_programming/pdlp_test.cu (1)
2243-2254:⚠️ Potential issue | 🟠 MajorReplace the fixed 200 ms sleep with deterministic synchronization.
On
afiro-sized batches PDLP can finish before the delay. That makes these tests flaky and can skip theConcurrentLimitpath entirely. Use a latch/promise/barrier that the solver thread signals once the cooperative path is live, then apply the external marks.Also applies to: 2309-2320
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/linear_programming/pdlp_test.cu` around lines 2243 - 2254, The fixed sleep is flaky; replace it with a deterministic synchronization (e.g., std::promise/std::future or std::latch) so the main thread waits until the solver thread actually enters the cooperative DS path. Create a promise/future (or latch), pass the signal into the pdlp_thread lambda (or have solve_lp expose a hook) and have the solver thread set the promise when the cooperative path is live; then have the main thread wait on the future before calling sb_view.mark_solved(i). Apply the same change to the other occurrence around the second pdlp_thread usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 1044-1054: When simplex_iteration_limit < 1 the code calls
initialize_pseudo_costs_with_estimate(...) which populates pc.strong_branch_down
and pc.strong_branch_up, but the function continues and later merges/overwrites
those estimates with UNSET/NaN values; to fix, add an early return (or
conditional skip of the merge/overwrite block) immediately after calling
initialize_pseudo_costs_with_estimate so the function exits with the
estimate-only pseudo-costs intact and does not proceed to the later
merge/overwrite logic before update_pseudo_costs_from_strong_branching().
- Around line 1587-1595: The current gate "if (rb_mode != 2)" skips the whole
taskloop when rb_mode==2 even if use_pdlp is false, so unreliable_list
candidates never get evaluated; change the condition to allow Dual Simplex
fallback when PDLP is disabled by using "if (rb_mode != 2 || !use_pdlp)" (or
logically equivalent) around the pragma/taskloop that references score_mutex,
sb_view, dual_simplex_obj_down/up, dual_simplex_status_down/up and
unreliable_list, and update any related log text so it correctly reflects that
DS will run when PDLP is unavailable.
In `@cpp/src/branch_and_bound/pseudo_costs.hpp`:
- Line 541: The pdlp_warm_cache member on pseudo_costs_t is mutable shared state
and can race when multiple workers perform reliability/strong branching; make
the cache worker-local or add synchronization: either move the
batch_pdlp_warm_cache_t<i_t, f_t> pdlp_warm_cache out of the shared
pseudo_costs_t (create a stack/local instance inside the branching worker
routine) or add a dedicated omp_mutex_t (e.g., pdlp_warm_cache_mutex) and wrap
every access/update to pdlp_warm_cache in a lock/unlock pair (similar to
existing omp_atomic_t/omp_mutex_t usage) to prevent cross-contamination of
device buffers, scalars, and populated state. Ensure all functions that read or
write pdlp_warm_cache (reliability/strong branching paths) use the new mutex or
the local per-worker instance.
In `@cpp/src/dual_simplex/simplex_solver_settings.hpp`:
- Around line 194-197: The member mip_batch_pdlp_reliability_branching is not
initialized in the class constructor, leaving it with indeterminate value;
update the constructor initializer list (next to
mip_batch_pdlp_strong_branching(0)) to include
mip_batch_pdlp_reliability_branching(0) so the default-constructed
SimplexSolverSettings/related class always sets this field to a defined value.
In `@cpp/src/mip_heuristics/CMakeLists.txt`:
- Line 17: Remove clique_table.cu from the MIP_LP_NECESSARY_FILES list and add
it to the MIP_NON_LP_FILES list in the CMakeLists.txt so that clique_table.cu
(which includes mip_heuristics/utils.cuh) is built only as MIP-non-LP code;
specifically, edit the lines that construct the MIP_LP_NECESSARY_FILES and
MIP_NON_LP_FILES variables to stop listing clique_table.cu in the former and
include it in the latter, and verify any targets or install/grouping logic that
consumes those variables still references the file from the new list.
In `@cpp/src/pdlp/pdlp.cuh`:
- Around line 142-143: The initialization of sb_view_
(dual_simplex::shared_strong_branching_context_view_t<i_t, f_t>) is passing
settings_.shared_sb_solved which is a cuda::std::span<std::atomic<int>> into a
constructor expecting std::span<std::atomic<int>>; explicitly construct a
std::span from the underlying data and size (use
settings_.shared_sb_solved.data() and settings_.shared_sb_solved.size()) so the
types match instead of relying on an implicit conversion between cuda::std::span
and std::span.
In `@cpp/src/pdlp/solve.cu`:
- Around line 897-903: The batch memory estimator always accounts for three full
output buffers but the actual allocations for the third (solution collection)
buffer are skipped when generate_batch_primal_dual_solution is false; update the
total_memory calculation in solve.cu so it conditionally charges the third
output buffer only when generate_batch_primal_dual_solution is true (adjust the
same logic around the corresponding block later in the function where
allocations are made; see references to total_memory, trial_batch_size,
problem.get_n_variables(), problem.get_n_constraints(), and
generate_batch_primal_dual_solution and apply the same fix for the duplicate
estimator at the other occurrence mentioned in the comment).
In `@cpp/src/pdlp/solver_settings.cu`:
- Around line 62-88: The bulk warm-start path in set_pdlp_warm_start_data is
writing initial_primal_weight and initial_step_size directly and bypasses the
numeric guards; update set_pdlp_warm_start_data to validate those values the
same way as set_initial_primal_weight and set_initial_step_size (either by
calling those two setters from set_pdlp_warm_start_data or by applying the same
cuopt_expects checks: >0, !isinf, !isnan) so the warm-start overload enforces
the same invariants before assigning initial_primal_weight_ and
initial_step_size_.
In `@cpp/src/pdlp/translate.hpp`:
- Line 12: Remove the unused CUDA device header include from translate.hpp:
delete the line including <mip_heuristics/problem/problem.cuh> because
translate.hpp only references detail::problem_t from
optimization_problem_interface.hpp and does not use any symbols from
problem.cuh; after removal, confirm translate.hpp still compiles standalone and
that no other code relies on that include being present.
---
Outside diff comments:
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 547-584: The code removes columns via
A_no_slacks.remove_columns(cols_to_remove) but still blindly slices the first n
entries of root_soln, lp.objective, lp.lower, lp.upper into PDLP
(original_root_soln_x and mps_model.set_* calls), which is only correct if all
slack indices in new_slacks are appended (>= n); add an explicit invariant check
or build a proper old→new column map: either assert all new_slacks[j] >= n
before proceeding (and document the invariant), or construct a mapping from
original column index → compacted index after remove_columns and use it to remap
original_root_soln_x, the objective vector, lower/upper bound arrays, and any
branching candidate indices so the CSR matrix and these vectors stay aligned
(refer to cols_to_remove, A_no_slacks, original_root_soln_x,
mps_model.set_objective_coefficients, set_variable_lower_bounds,
set_variable_upper_bounds).
In `@cpp/src/pdlp/pdlp.cu`:
- Around line 801-864: This early-return path fails to notify the
solver-bookkeeping that the remaining resized-batch climbers are finished;
before returning, iterate over
current_termination_strategy_.get_terminations_status().size() and call
sb_view_.mark_solved(...) for each climber using
climber_strategies_[i].original_index (and the appropriate termination/solution
metadata you have just copied into batch_solution_to_return_), mirroring the
behavior used at the later return (line 865) so in-progress Dual Simplex tasks
can stop/skip those chunks.
---
Duplicate comments:
In `@cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp`:
- Around line 303-305: Defaulting generate_batch_primal_dual_solution to false
makes batch solves opt‑in and breaks callers that expect populated
primal/dual/reduced-cost buffers; change the bool
generate_batch_primal_dual_solution default to true (so batch solution
collection is enabled by default/opt‑out) and update the adjacent comment to
reflect "opt-out" semantics so existing tests and code that slice these buffers
continue to work without modification.
In `@cpp/src/branch_and_bound/pseudo_costs.hpp`:
- Around line 416-425: The batch_pdlp_warm_cache_t struct can retain stale
warm-start data after the LP context changes because there is no reset path; add
a member method reset() on batch_pdlp_warm_cache_t that clears initial_primal
and initial_dual (release/resize to 0 on batch_pdlp_handle.get_stream()), sets
step_size and primal_weight back to NaN, pdlp_iteration to -1,
percent_solved_by_batch_pdlp_at_root to f_t(0.0) and populated to false, and
then call this reset() from the pseudo_costs_t code path that
reinitializes/changes the LP context (where the context-reset currently occurs)
so any time cuts/slacks/original-LP are changed the warm cache is invalidated.
In `@cpp/src/branch_and_bound/shared_strong_branching_context.hpp`:
- Around line 38-56: The accessors (is_solved, mark_solved, subview) use asserts
and can overflow in release builds; replace the assert-based checks with runtime
bounds validation and throw std::out_of_range on failure, and avoid signed
arithmetic overflow in subview by converting indices to size_t first.
Concretely: in is_solved(i_t local_idx) and mark_solved(i_t local_idx) cast
local_idx to size_t (e.g. size_t idx = static_cast<size_t>(local_idx)), verify
local_idx >= 0 and idx < solved.size(), and throw std::out_of_range with a clear
message instead of asserting; in subview(i_t offset, i_t count) cast both to
size_t (size_t off = static_cast<size_t>(offset), cnt =
static_cast<size_t>(count)), ensure offset and count are non-negative, check off
<= solved.size() and cnt <= solved.size() - off, then return
{solved.subspan(off, cnt)} to avoid offset+count overflow.
In `@cpp/src/pdlp/solve.cu`:
- Around line 1049-1057: The code only checks settings.shared_sb_solved.empty()
but not its length relative to new_bounds, so subspan(i, current_batch_size) can
out-of-bounds; before the batching loop validate that when
settings.shared_sb_solved is non-empty its size() (or extent()) is >=
max_batch_size (or original_new_bounds.size()), and if not, either trim
max_batch_size/current_batch_size accordingly or return/error; then inside the
loop you can safely call settings.shared_sb_solved.subspan(i,
current_batch_size) to populate batch_settings.shared_sb_solved. Ensure you
reference settings.shared_sb_solved, original_new_bounds/new_bounds,
max_batch_size and subspan in the check and handle the shorter span case
consistently.
- Around line 968-972: The current logic lets settings.sub_batch_size override
the memory-checked limit, which can reintroduce OOMs; instead, when computing
optimal_batch_size in solve.cu (symbols: optimal_batch_size,
settings.sub_batch_size, memory_max_batch_size,
detail::optimal_batch_size_handler, max_batch_size, cuopt_assert), clamp
settings.sub_batch_size to the memory ceiling (e.g., use
min(settings.sub_batch_size, memory_max_batch_size) or validate and reject
values > memory_max_batch_size) so the final optimal_batch_size never exceeds
memory_max_batch_size (and still respects max_batch_size), then keep the
cuopt_assert as a final sanity check.
In `@cpp/tests/linear_programming/pdlp_test.cu`:
- Around line 2243-2254: The fixed sleep is flaky; replace it with a
deterministic synchronization (e.g., std::promise/std::future or std::latch) so
the main thread waits until the solver thread actually enters the cooperative DS
path. Create a promise/future (or latch), pass the signal into the pdlp_thread
lambda (or have solve_lp expose a hook) and have the solver thread set the
promise when the cooperative path is live; then have the main thread wait on the
future before calling sb_view.mark_solved(i). Apply the same change to the other
occurrence around the second pdlp_thread usage.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 982cb111-beb8-4701-9cff-5285678a13b5
📒 Files selected for processing (23)
cpp/include/cuopt/linear_programming/constants.hcpp/include/cuopt/linear_programming/mip/solver_settings.hppcpp/include/cuopt/linear_programming/pdlp/solver_settings.hppcpp/src/branch_and_bound/branch_and_bound.cppcpp/src/branch_and_bound/pseudo_costs.cppcpp/src/branch_and_bound/pseudo_costs.hppcpp/src/branch_and_bound/shared_strong_branching_context.hppcpp/src/dual_simplex/simplex_solver_settings.hppcpp/src/math_optimization/solver_settings.cucpp/src/mip_heuristics/CMakeLists.txtcpp/src/mip_heuristics/solver.cucpp/src/pdlp/pdlp.cucpp/src/pdlp/pdlp.cuhcpp/src/pdlp/pdlp_constants.hppcpp/src/pdlp/restart_strategy/pdlp_restart_strategy.cucpp/src/pdlp/solve.cucpp/src/pdlp/solver_settings.cucpp/src/pdlp/termination_strategy/termination_strategy.cucpp/src/pdlp/termination_strategy/termination_strategy.hppcpp/src/pdlp/translate.hppcpp/src/pdlp/utilities/ping_pong_graph.cucpp/tests/linear_programming/pdlp_test.cupython/cuopt_server/cuopt_server/utils/linear_programming/data_definition.py
💤 Files with no reviewable changes (1)
- cpp/src/pdlp/pdlp_constants.hpp
| if (simplex_iteration_limit < 1) { | ||
| initialize_pseudo_costs_with_estimate(original_lp, | ||
| settings, | ||
| root_vstatus, | ||
| root_solution, | ||
| basic_list, | ||
| nonbasic_list, | ||
| fractional, | ||
| basis_factors, | ||
| pc); | ||
| } else { |
There was a problem hiding this comment.
Return after the estimate-only fallback.
When simplex_iteration_limit < 1, this branch fills pc.strong_branch_down/up from initialize_pseudo_costs_with_estimate(), but the unconditional merge later still runs with every DS status left UNSET and every PDLP value NaN. That overwrites the estimates back to NaN just before update_pseudo_costs_from_strong_branching().
Suggested fix
if (simplex_iteration_limit < 1) {
initialize_pseudo_costs_with_estimate(original_lp,
settings,
root_vstatus,
root_solution,
basic_list,
nonbasic_list,
fractional,
basis_factors,
pc);
+ pc.update_pseudo_costs_from_strong_branching(fractional, root_solution.x);
+ return;
} else {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 1044 - 1054, When
simplex_iteration_limit < 1 the code calls
initialize_pseudo_costs_with_estimate(...) which populates pc.strong_branch_down
and pc.strong_branch_up, but the function continues and later merges/overwrites
those estimates with UNSET/NaN values; to fix, add an early return (or
conditional skip of the merge/overwrite block) immediately after calling
initialize_pseudo_costs_with_estimate so the function exits with the
estimate-only pseudo-costs intact and does not proceed to the later
merge/overwrite logic before update_pseudo_costs_from_strong_branching().
| if (rb_mode != 2) { | ||
| #pragma omp taskloop if (num_tasks > 1) priority(task_priority) num_tasks(num_tasks) \ | ||
| shared(score_mutex, \ | ||
| sb_view, \ | ||
| dual_simplex_obj_down, \ | ||
| dual_simplex_obj_up, \ | ||
| dual_simplex_status_down, \ | ||
| dual_simplex_status_up, \ | ||
| unreliable_list) |
There was a problem hiding this comment.
Fallback to Dual Simplex when PDLP-only RB gets disabled.
This gate uses rb_mode != 2, but use_pdlp can still be false in mode 2 (empty warm cache, deterministic mode, sub-MIP, too few candidates, low root hit rate). In that case the PDLP task is skipped too, so none of the unreliable candidates are actually evaluated even though the earlier logs say "using DS only".
One way to preserve the advertised fallback
- if (rb_mode != 2) {
+ const bool run_dual_simplex = (rb_mode != 2) || !use_pdlp;
+ if (run_dual_simplex) {
`#pragma` omp taskloop if (num_tasks > 1) priority(task_priority) num_tasks(num_tasks) \
shared(score_mutex, \
sb_view, \
dual_simplex_obj_down, \🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 1587 - 1595, The
current gate "if (rb_mode != 2)" skips the whole taskloop when rb_mode==2 even
if use_pdlp is false, so unreliable_list candidates never get evaluated; change
the condition to allow Dual Simplex fallback when PDLP is disabled by using "if
(rb_mode != 2 || !use_pdlp)" (or logically equivalent) around the
pragma/taskloop that references score_mutex, sb_view, dual_simplex_obj_down/up,
dual_simplex_status_down/up and unreliable_list, and update any related log text
so it correctly reflects that DS will run when PDLP is unavailable.
| omp_atomic_t<i_t> num_strong_branches_completed = 0; | ||
| omp_atomic_t<int64_t> strong_branching_lp_iter = 0; | ||
|
|
||
| batch_pdlp_warm_cache_t<i_t, f_t> pdlp_warm_cache; |
There was a problem hiding this comment.
Don't store the warm-start cache as unsynchronized shared worker state.
pseudo_costs_t already uses omp_atomic_t and omp_mutex_t for data that workers update concurrently, but pdlp_warm_cache is a plain mutable member. If two workers hit reliability/strong branching at once, they can race on the cached device buffers, scalars, and populated, cross-contaminating warm starts between nodes. Make the cache worker-local or guard it with a dedicated mutex.
As per coding guidelines "Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/branch_and_bound/pseudo_costs.hpp` at line 541, The pdlp_warm_cache
member on pseudo_costs_t is mutable shared state and can race when multiple
workers perform reliability/strong branching; make the cache worker-local or add
synchronization: either move the batch_pdlp_warm_cache_t<i_t, f_t>
pdlp_warm_cache out of the shared pseudo_costs_t (create a stack/local instance
inside the branching worker routine) or add a dedicated omp_mutex_t (e.g.,
pdlp_warm_cache_mutex) and wrap every access/update to pdlp_warm_cache in a
lock/unlock pair (similar to existing omp_atomic_t/omp_mutex_t usage) to prevent
cross-contamination of device buffers, scalars, and populated state. Ensure all
functions that read or write pdlp_warm_cache (reliability/strong branching
paths) use the new mutex or the local per-worker instance.
| i_t | ||
| mip_batch_pdlp_strong_branching; // 0 = DS only, 1 = cooperative DS + PDLP, 2 = batch PDLP only | ||
| i_t mip_batch_pdlp_reliability_branching; // 0 = DS only, 1 = cooperative DS + PDLP, 2 = batch | ||
| // PDLP only |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '100,112p' cpp/src/dual_simplex/simplex_solver_settings.hpp
echo
sed -n '1,40p' cpp/src/mip_heuristics/diversity/recombiners/sub_mip.cuh
echo
sed -n '1,40p' cpp/src/mip_heuristics/diversity/lns/rins.cuRepository: NVIDIA/cuopt
Length of output: 3347
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Get the full file to understand structure
echo "=== File structure and class definition ==="
wc -l cpp/src/dual_simplex/simplex_solver_settings.hpp
echo -e "\n=== Constructor and initializer list (lines 40-130) ==="
sed -n '40,130p' cpp/src/dual_simplex/simplex_solver_settings.hpp
echo -e "\n=== Field declarations around lines 180-210 ==="
sed -n '180,210p' cpp/src/dual_simplex/simplex_solver_settings.hpp
echo -e "\n=== Check for in-class initializers (search for = in field declarations) ==="
rg 'mip_batch_pdlp_reliability_branching|mip_batch_pdlp_strong_branching' cpp/src/dual_simplex/simplex_solver_settings.hpp -A 1 -B 1
echo -e "\n=== Check other constructors ==="
rg 'simplex_solver_settings_t\s*\(' cpp/src/dual_simplex/simplex_solver_settings.hppRepository: NVIDIA/cuopt
Length of output: 5723
mip_batch_pdlp_reliability_branching is not initialized on default construction.
The constructor initializer list includes mip_batch_pdlp_strong_branching(0) but omits mip_batch_pdlp_reliability_branching, and the field declaration has no in-class initializer. Default-constructed instances will have an indeterminate value, violating the guideline to "validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving." Add mip_batch_pdlp_reliability_branching(0), to the constructor initializer list.
🔧 Proposed fix
cut_change_threshold(1e-3),
cut_min_orthogonality(0.5),
mip_batch_pdlp_strong_branching(0),
+ mip_batch_pdlp_reliability_branching(0),
strong_branching_simplex_iteration_limit(-1),
random_seed(0),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/dual_simplex/simplex_solver_settings.hpp` around lines 194 - 197, The
member mip_batch_pdlp_reliability_branching is not initialized in the class
constructor, leaving it with indeterminate value; update the constructor
initializer list (next to mip_batch_pdlp_strong_branching(0)) to include
mip_batch_pdlp_reliability_branching(0) so the default-constructed
SimplexSolverSettings/related class always sets this field to a defined value.
| ${CMAKE_CURRENT_SOURCE_DIR}/presolve/third_party_presolve.cpp | ||
| ${CMAKE_CURRENT_SOURCE_DIR}/presolve/gf2_presolve.cpp | ||
| ${CMAKE_CURRENT_SOURCE_DIR}/solution/solution.cu | ||
| ${CMAKE_CURRENT_SOURCE_DIR}/presolve/conflict_graph/clique_table.cu |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Read-only verification: confirm whether clique_table LP-only inclusion pulls MIP-coupled symbols.
set -euo pipefail
echo "== Locate relevant files =="
fd -a 'CMakeLists\.txt$' | rg 'cpp/src/mip_heuristics/CMakeLists\.txt'
fd -a 'clique_table\.cu$'
fd -a 'utils\.cuh$'
fd -a 'problem\.cuh$'
echo
echo "== CMake source set selection =="
rg -n -C3 'MIP_LP_NECESSARY_FILES|MIP_NON_LP_FILES|BUILD_LP_ONLY|clique_table\.cu' cpp/src/mip_heuristics/CMakeLists.txt
echo
echo "== Include chain from clique_table.cu =="
rg -n -C2 '^#include' cpp/src/mip_heuristics/presolve/conflict_graph/clique_table.cu
echo
echo "== MIP/problem coupling markers inside utils.cuh =="
rg -n -C2 'problem_t|mixed_integer|branch|cut|mip' cpp/src/mip_heuristics/utils.cuhRepository: NVIDIA/cuopt
Length of output: 4157
🏁 Script executed:
# Check where problem_t is defined and understand its scope
rg -n 'template.*problem_t|struct problem_t|class problem_t' cpp/src/mip_heuristics/problem/problem.cuh | head -20Repository: NVIDIA/cuopt
Length of output: 76
🏁 Script executed:
# Check what problem_t is actually used for in utils.cuh to understand coupling severity
rg -n 'problem_t' cpp/src/mip_heuristics/utils.cuh | head -30Repository: NVIDIA/cuopt
Length of output: 441
🏁 Script executed:
# Verify whether clique_table.cu functionality is actually required for LP-only builds
# by checking if there are any references to it in LP-only code paths
rg -r 'clique_table' cpp/src --type-list | head -20Repository: NVIDIA/cuopt
Length of output: 445
🏁 Script executed:
# Get the full problem_t class definition to understand its scope
sed -n '56,150p' cpp/src/mip_heuristics/problem/problem.cuhRepository: NVIDIA/cuopt
Length of output: 5038
🏁 Script executed:
# Find all files that reference clique_table (functions or includes)
rg 'clique_table' cpp/src --type cpp --type cuRepository: NVIDIA/cuopt
Length of output: 82
🏁 Script executed:
# Check if problem.cu (which is in MIP_LP_NECESSARY_FILES) actually uses problem_t in LP-only context
head -50 cpp/src/mip_heuristics/problem/problem.cuRepository: NVIDIA/cuopt
Length of output: 1548
🏁 Script executed:
# Search for references to clique_table functions/files more carefully
rg 'clique_table' cpp/src --type cppRepository: NVIDIA/cuopt
Length of output: 3657
🏁 Script executed:
# Check problem.cuh for clique_table member and where it's initialized
rg -n 'clique_table' cpp/src/mip_heuristics/problem/problem.cuhRepository: NVIDIA/cuopt
Length of output: 143
🏁 Script executed:
# Search for clique_table_t definition
rg -n 'class clique_table_t|struct clique_table_t|template.*clique_table_t' cpp/srcRepository: NVIDIA/cuopt
Length of output: 449
🏁 Script executed:
# Search for where clique_table member is initialized/used in problem code
rg -n 'clique_table\s*=' cpp/src/mip_heuristics/problem/ -A2 -B2Repository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
# Check if clique_table building/initialization is in problem.cu or elsewhere
rg -n 'clique_table_t<' cpp/src/mip_heuristics/problem/problem.cuRepository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
# Verify: does clique_table.cu define functions that are called from LP-safe code?
rg -n '^(namespace|template|void|bool|auto|int|std::)' cpp/src/mip_heuristics/presolve/conflict_graph/clique_table.cu | head -30Repository: NVIDIA/cuopt
Length of output: 1651
🏁 Script executed:
# Check if clique_table building is triggered from problem.cu initialization
rg -n 'make_clique|build.*clique|clique.*build' cpp/src/mip_heuristics/Repository: NVIDIA/cuopt
Length of output: 492
🏁 Script executed:
# Find all callers of build_clique_table
rg -n 'build_clique_table' cpp/src --type cpp -B2 -A2Repository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
# Check if build_clique_table is called from problem.cu or problem-related code
rg -n 'build_clique_table' cpp/src/mip_heuristics/problem/Repository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
# Verify the full include chain: does clique_table.cu actually get compiled in LP-only?
# Check if there's any conditional compilation guards
rg -n 'BUILD_LP_ONLY|#if|#ifdef' cpp/src/mip_heuristics/presolve/conflict_graph/clique_table.cu | head -20Repository: NVIDIA/cuopt
Length of output: 506
🏁 Script executed:
# Find all references to clique_table in problem.cu
rg -n 'clique_table' cpp/src/mip_heuristics/problem/problem.cuRepository: NVIDIA/cuopt
Length of output: 221
🏁 Script executed:
# Check problem.cu to see if it initializes or uses the clique_table member
sed -n '1,100p' cpp/src/mip_heuristics/problem/problem.cu | rg -n 'clique'Repository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
# Get summary: who actually calls build_clique_table?
rg -r 'build_clique_table\(' cpp/src --type cppRepository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
# Final confirmation: verify that clique_table functions are truly never called in LP-only context
# by checking all instantiations and calls
rg -n 'clique_table_t<' cpp/src/cuts/cuts.cpp cpp/src/branch_and_bound/branch_and_bound.cpp | head -5Repository: NVIDIA/cuopt
Length of output: 531
🏁 Script executed:
# Double-check: are there any LP-only modules that might indirectly call clique_table?
# Check what problem.cu actually does with clique_table
rg -n -A5 'clique_table' cpp/src/mip_heuristics/problem/problem.cu | head -20Repository: NVIDIA/cuopt
Length of output: 912
Move clique_table.cu from MIP_LP_NECESSARY_FILES to MIP_NON_LP_FILES
clique_table.cu is MIP-exclusive functionality (only used by branch-and-bound and cutting plane generation), yet it includes mip_heuristics/utils.cuh, which brings MIP-coupled types and headers into the LP-only build path unnecessarily. The file should be in MIP_NON_LP_FILES instead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/mip_heuristics/CMakeLists.txt` at line 17, Remove clique_table.cu
from the MIP_LP_NECESSARY_FILES list and add it to the MIP_NON_LP_FILES list in
the CMakeLists.txt so that clique_table.cu (which includes
mip_heuristics/utils.cuh) is built only as MIP-non-LP code; specifically, edit
the lines that construct the MIP_LP_NECESSARY_FILES and MIP_NON_LP_FILES
variables to stop listing clique_table.cu in the former and include it in the
latter, and verify any targets or install/grouping logic that consumes those
variables still references the file from the new list.
| dual_simplex::shared_strong_branching_context_view_t<i_t, f_t> sb_view_{ | ||
| settings_.shared_sb_solved}; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== shared_sb_solved declaration ==="
rg -n -C2 --type=cpp 'shared_sb_solved' cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp
echo "=== shared_strong_branching_context_view_t span type ==="
rg -n -C3 --type=cpp 'shared_strong_branching_context_view_t|std::span<|cuda::std::span<' cpp/src/branch_and_bound/shared_strong_branching_context.hpp
echo "=== sb_view_ initialization site ==="
rg -n -C2 --type=cpp 'sb_view_|shared_sb_solved' cpp/src/pdlp/pdlp.cuhRepository: NVIDIA/cuopt
Length of output: 1441
Use explicit span conversion to ensure cuda::std::span → std::span compatibility.
The constructor at lines 142–143 receives settings_.shared_sb_solved (type cuda::std::span<std::atomic<int>>) but expects std::span<std::atomic<int>>. These are distinct types from different namespaces. Avoid relying on implicit conversion; rewrite as:
Suggested fix
dual_simplex::shared_strong_branching_context_view_t<i_t, f_t> sb_view_{
std::span<std::atomic<int>>(settings_.shared_sb_solved.data(), settings_.shared_sb_solved.size())};🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/pdlp/pdlp.cuh` around lines 142 - 143, The initialization of sb_view_
(dual_simplex::shared_strong_branching_context_view_t<i_t, f_t>) is passing
settings_.shared_sb_solved which is a cuda::std::span<std::atomic<int>> into a
constructor expecting std::span<std::atomic<int>>; explicitly construct a
std::span from the underlying data and size (use
settings_.shared_sb_solved.data() and settings_.shared_sb_solved.size()) so the
types match instead of relying on an implicit conversion between cuda::std::span
and std::span.
| total_memory += problem.get_n_variables() * trial_batch_size * sizeof(f_t); | ||
| total_memory += problem.get_n_constraints() * trial_batch_size * sizeof(f_t); | ||
| total_memory += problem.get_n_variables() * trial_batch_size * sizeof(f_t); | ||
|
|
||
| // Add a 50% overhead to make sure we have enough memory considering other parts of the solver may | ||
| // allocate at the same time | ||
| total_memory *= 1.5; | ||
| // Add a 70% overhead to make sure we have enough memory considering other parts of the solver may | ||
| // need memory later while the batch PDLP is running | ||
| total_memory *= 1.7; |
There was a problem hiding this comment.
Keep the batch memory estimator consistent with optional solution collection.
The estimator always charges three full output buffers, but the actual allocations later in this function are skipped when generate_batch_primal_dual_solution is false. On large strong-branching batches that can force unnecessarily tiny sub-batches or even return NumericalError for a batch that would fit without solution collection.
Also applies to: 1004-1012
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/pdlp/solve.cu` around lines 897 - 903, The batch memory estimator
always accounts for three full output buffers but the actual allocations for the
third (solution collection) buffer are skipped when
generate_batch_primal_dual_solution is false; update the total_memory
calculation in solve.cu so it conditionally charges the third output buffer only
when generate_batch_primal_dual_solution is true (adjust the same logic around
the corresponding block later in the function where allocations are made; see
references to total_memory, trial_batch_size, problem.get_n_variables(),
problem.get_n_constraints(), and generate_batch_primal_dual_solution and apply
the same fix for the duplicate estimator at the other occurrence mentioned in
the comment).
| void pdlp_solver_settings_t<i_t, f_t>::set_initial_step_size(f_t initial_step_size) | ||
| { | ||
| cuopt_expects(initial_step_size > f_t(0), | ||
| error_type_t::ValidationError, | ||
| "Initial step size must be greater than 0"); | ||
| cuopt_expects(!std::isinf(initial_step_size), | ||
| error_type_t::ValidationError, | ||
| "Initial step size must be finite"); | ||
| cuopt_expects(!std::isnan(initial_step_size), | ||
| error_type_t::ValidationError, | ||
| "Initial step size must be a number"); | ||
| initial_step_size_ = std::make_optional(initial_step_size); | ||
| } | ||
|
|
||
| template <typename i_t, typename f_t> | ||
| void pdlp_solver_settings_t<i_t, f_t>::set_initial_primal_weight(f_t initial_primal_weight) | ||
| { | ||
| cuopt_expects(initial_primal_weight > f_t(0), | ||
| error_type_t::ValidationError, | ||
| "Initial primal weight must be greater than 0"); | ||
| cuopt_expects(!std::isinf(initial_primal_weight), | ||
| error_type_t::ValidationError, | ||
| "Initial primal weight must be finite"); | ||
| cuopt_expects(!std::isnan(initial_primal_weight), | ||
| error_type_t::ValidationError, | ||
| "Initial primal weight must be a number"); | ||
| initial_primal_weight_ = std::make_optional(initial_primal_weight); |
There was a problem hiding this comment.
The bulk warm-start path still bypasses these new numeric guards.
set_pdlp_warm_start_data(...) later in this file writes the warm-start initial_primal_weight and initial_step_size directly at Lines 316-317, so callers using that overload can still inject <=0, inf, or nan values. Please reuse the same validation there so both entry points enforce the same solver-state invariant. As per coding guidelines: "Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)".
🔧 Suggested direction
cuopt_expects(last_restart_duality_gap_dual_solution != nullptr,
error_type_t::ValidationError,
"last_restart_duality_gap_dual_solution cannot be null");
+ cuopt_expects(initial_primal_weight > f_t(0) &&
+ !std::isinf(initial_primal_weight) &&
+ !std::isnan(initial_primal_weight),
+ error_type_t::ValidationError,
+ "Initial primal weight must be finite and greater than 0");
+ cuopt_expects(initial_step_size > f_t(0) &&
+ !std::isinf(initial_step_size) &&
+ !std::isnan(initial_step_size),
+ error_type_t::ValidationError,
+ "Initial step size must be finite and greater than 0");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/pdlp/solver_settings.cu` around lines 62 - 88, The bulk warm-start
path in set_pdlp_warm_start_data is writing initial_primal_weight and
initial_step_size directly and bypasses the numeric guards; update
set_pdlp_warm_start_data to validate those values the same way as
set_initial_primal_weight and set_initial_step_size (either by calling those two
setters from set_pdlp_warm_start_data or by applying the same cuopt_expects
checks: >0, !isinf, !isnan) so the warm-start overload enforces the same
invariants before assigning initial_primal_weight_ and initial_step_size_.
|
|
||
| #include <cuopt/linear_programming/optimization_problem_interface.hpp> | ||
|
|
||
| #include <mip_heuristics/problem/problem.cuh> |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Check if problem.cuh types are used in translate.hpp and find potential compilation issues
echo "=== 1. Search for usage of mip_heuristics namespace/types in translate.hpp ==="
rg -n "mip_heuristics::" cpp/src/pdlp/translate.hpp || echo "No direct mip_heuristics namespace usage found"
echo -e "\n=== 2. Find all files that include translate.hpp ==="
rg -n '#include.*translate\.hpp' --type=cpp --type=cu -C2
echo -e "\n=== 3. Check if translate.hpp is included in non-CUDA files (.cpp) ==="
rg -l '#include.*translate\.hpp' --type=cpp -g '!*.cu' || echo "No .cpp files include translate.hpp"
echo -e "\n=== 4. Search for problem_t usage to verify source ==="
rg -n "detail::problem_t" cpp/src/pdlp/translate.hpp
echo -e "\n=== 5. Check what problem.cuh exports that might be needed ==="
rg -n "^(class|struct|enum|namespace|template)" cpp/src/mip_heuristics/problem/problem.cuh | head -20Repository: NVIDIA/cuopt
Length of output: 994
Remove unused CUDA device header include.
Line 12 adds #include <mip_heuristics/problem/problem.cuh>, but verification shows no symbols from this include are used in translate.hpp. Functions use detail::problem_t from optimization_problem_interface.hpp (line 10), not from problem.cuh. Additionally, no files currently include translate.hpp, so there is no compilation impact at present.
Remove this unused include.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/pdlp/translate.hpp` at line 12, Remove the unused CUDA device header
include from translate.hpp: delete the line including
<mip_heuristics/problem/problem.cuh> because translate.hpp only references
detail::problem_t from optimization_problem_interface.hpp and does not use any
symbols from problem.cuh; after removal, confirm translate.hpp still compiles
standalone and that no other code relies on that include being present.
|
/ok to test bb50ad1 |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/pdlp/pdlp.cu`:
- Around line 780-798: Add a single upfront cuopt_assert before using
climber_strategies_[i].original_index to ensure the shared strong-branching span
matches the original batch width: when sb_view_.is_valid() is true, assert that
sb_view_ (or shared_sb_solved/span) size/width equals the original batch width
used to generate climber_strategies_ (so original_index is in-bounds for
shared_strong_branching_context_view_t). Place this check immediately inside the
if (sb_view_.is_valid()) block guarding the loop that accesses
climber_strategies_ (and add the same guard at the other occurrences noted
around the blocks using original_index at the locations corresponding to lines
~849-850, ~866-869, and ~928-929).
- Around line 789-790: When you set a climber to
pdlp_termination_status_t::ConcurrentLimit via
current_termination_strategy_.set_termination_status, do not serialize PDLP
iterates into that climber's result: either propagate the external solver's
result into the per-climber payload or skip filling primal/dual/reduced-cost
buffers, step metadata and the solved_by field for that climber. Locate the
blocks that read from pdhg_solver_/PDLP state and populate per-climber solution
data (the occurrences around the current call and the other ranges noted:
~818-849, 872-878, 897-928) and add a guard that if status ==
pdlp_termination_status_t::ConcurrentLimit then avoid populating the iterate
buffers and step metadata (or instead copy the externally-provided result
structure) and ensure solved_by is set consistently from the external-solver
branch.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 17941937-9cfb-4973-a40a-f3ce01b5a538
📒 Files selected for processing (1)
cpp/src/pdlp/pdlp.cu
| if (sb_view_.is_valid()) { | ||
| for (size_t i = 0; i < climber_strategies_.size(); ++i) { | ||
| // If PDLP has solved it to optimality we want to keep it and resolved both solvers having | ||
| // solved the problem later | ||
| if (current_termination_strategy_.is_done( | ||
| current_termination_strategy_.get_termination_status(i))) | ||
| continue; | ||
| const i_t local_idx = climber_strategies_[i].original_index; | ||
| if (sb_view_.is_solved(local_idx)) { | ||
| current_termination_strategy_.set_termination_status( | ||
| i, pdlp_termination_status_t::ConcurrentLimit); | ||
| #ifdef BATCH_VERBOSE_MODE | ||
| std::cout << "[COOP SB] DS already solved climber " << i << " (original_index " << local_idx | ||
| << "), synced to ConcurrentLimit at step " << internal_solver_iterations_ | ||
| << std::endl; | ||
| #endif | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Validate shared_sb_solved against the original batch size before using original_index.
sb_view_.is_valid() only proves the span is non-empty. These loads/stores use climber_strategies_[i].original_index, which stays in the original batch numbering after compaction. If the caller passes a shorter span, the assert inside shared_strong_branching_context_view_t disappears in release builds and this becomes an out-of-bounds atomic access. Please add a single upfront cuopt_assert that the shared span matches the original batch width.
Suggested guard
+if (sb_view_.is_valid()) {
+ cuopt_assert(sb_view_.solved.size() == settings_.new_bounds.size(),
+ "shared_sb_solved must match the original batch size");
+}
+
if (sb_view_.is_valid()) {
for (size_t i = 0; i < climber_strategies_.size(); ++i) {As per coding guidelines, "Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations".
Also applies to: 849-850, 866-869, 928-929
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/pdlp/pdlp.cu` around lines 780 - 798, Add a single upfront
cuopt_assert before using climber_strategies_[i].original_index to ensure the
shared strong-branching span matches the original batch width: when
sb_view_.is_valid() is true, assert that sb_view_ (or shared_sb_solved/span)
size/width equals the original batch width used to generate climber_strategies_
(so original_index is in-bounds for shared_strong_branching_context_view_t).
Place this check immediately inside the if (sb_view_.is_valid()) block guarding
the loop that accesses climber_strategies_ (and add the same guard at the other
occurrences noted around the blocks using original_index at the locations
corresponding to lines ~849-850, ~866-869, and ~928-929).
| current_termination_strategy_.set_termination_status( | ||
| i, pdlp_termination_status_t::ConcurrentLimit); |
There was a problem hiding this comment.
Don't serialize PDLP iterates as the result for ConcurrentLimit climbers.
Once a climber is flipped to pdlp_termination_status_t::ConcurrentLimit, this method still fills its primal/dual/reduced-cost buffers and step metadata from pdhg_solver_ / PDLP state. That means the status says “externally finished”, but the payload is just PDLP’s last iterate, and solved_by is only written on the PDLP branch. Please either plumb the external solver result through here or skip populating per-climber solution payloads for ConcurrentLimit.
As per coding guidelines, "Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results".
Also applies to: 818-849, 872-878, 897-928
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/pdlp/pdlp.cu` around lines 789 - 790, When you set a climber to
pdlp_termination_status_t::ConcurrentLimit via
current_termination_strategy_.set_termination_status, do not serialize PDLP
iterates into that climber's result: either propagate the external solver's
result into the per-climber payload or skip filling primal/dual/reduced-cost
buffers, step metadata and the solved_by field for that climber. Locate the
blocks that read from pdhg_solver_/PDLP state and populate per-climber solution
data (the occurrences around the current call and the other ranges noted:
~818-849, 872-878, 897-928) and add a guard that if status ==
pdlp_termination_status_t::ConcurrentLimit then avoid populating the iterate
buffers and step metadata (or instead copy the externally-provided result
structure) and ensure solved_by is set consistently from the external-solver
branch.
This PR enables the following regarding batch PDLP:
Currently we still keep BPDLP off by default both at the root and in reliability branching