Skip to content

Add exception handling for pdlp in concurrent mode#966

Open
Iroy30 wants to merge 9 commits intoNVIDIA:mainfrom
Iroy30:add_exception_handling_pdlp
Open

Add exception handling for pdlp in concurrent mode#966
Iroy30 wants to merge 9 commits intoNVIDIA:mainfrom
Iroy30:add_exception_handling_pdlp

Conversation

@Iroy30
Copy link
Member

@Iroy30 Iroy30 commented Mar 17, 2026

Description

Fixes fails when pdlp throws exception in concurrent mode. Still needs evaluation on why pdlp fails.

Issue

Checklist

  • I am familiar with the Contributing Guidelines.
  • Testing
    • New or existing tests cover these changes
    • Added tests
    • Created an issue to follow-up
    • NA
  • Documentation
    • The documentation is up to date with these changes
    • Added new documentation
    • NA

@Iroy30 Iroy30 requested a review from a team as a code owner March 17, 2026 20:41
@Iroy30 Iroy30 requested review from chris-maes and rg20 March 17, 2026 20:41
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 17, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link

coderabbitai bot commented Mar 17, 2026

📝 Walkthrough

Walkthrough

The change adds exception handling scaffolding to the concurrent PDLP runner. It wraps the main execution in try/catch, capturing exceptions in an std::exception_ptr variable and signaling concurrent_halt on failure while preserving existing behavior when no exception occurs.

Changes

Cohort / File(s) Summary
Exception Handling for Concurrent PDLP
cpp/src/pdlp/solve.cu
Introduces try/catch block around main PDLP execution to capture exceptions, store them in pdlp_exception, and set settings_pdlp.concurrent_halt = 1 on failure. Includes TODO note about rethrowing exceptions in concurrent mode and commented-out rethrow logic. Preserves existing behavior when no exception occurs.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check ✅ Passed The pull request description clearly explains the problem being addressed: fixing failures when PDLP throws exceptions in concurrent mode, with an acknowledgment that the root cause still needs investigation.
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding exception handling for PDLP in concurrent mode, which aligns with the changeset modifications to solve.cu.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can make CodeRabbit's review stricter and more nitpicky using the `assertive` profile, if that's what you prefer.

Change the reviews.profile setting to assertive to make CodeRabbit's nitpick more issues in your PRs.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
cpp/src/pdlp/solve.cu (1)

1190-1192: TODO lacks issue tracking; commented-out code should be cleaned up.

The TODO indicates a known intermittent issue but provides no way to track resolution. Additionally, commented-out code with printf statements should not remain in production code.

Recommendations:

  1. Create a tracked issue and reference it in the comment (e.g., // TODO(#123): ...)
  2. Either remove the commented-out rethrow entirely, or conditionally enable it behind a debug flag if needed for investigation
♻️ Suggested cleanup
-  // TODO: Active Issue: PDLP throws an Exception interminttently.
-  // if (pdlp_exception) { printf("Rethrowing PDLP exception from concurrent mode\n");
-  // std::rethrow_exception(pdlp_exception); }
+  // TODO(`#ISSUE_NUMBER`): PDLP throws exceptions intermittently in concurrent mode.
+  // Once root cause is fixed, rethrow: std::rethrow_exception(pdlp_exception);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/solve.cu` around lines 1190 - 1192, Remove the dead
commented-out rethrow and replace the vague TODO with a tracked issue reference
and optional debug guard: create a repository issue for the intermittent PDLP
exception and update the comment to include the issue ID (e.g., "//
TODO(#<issue>): Intermittent PDLP exception observed when running concurrent
mode"), and either delete the commented block that references
pdlp_exception/printf or move that logic behind a compile-time/debug flag (e.g.,
PDLP_DEBUG_RETHROW) so the rethrow of pdlp_exception can be enabled only for
debugging; locate the commented code around pdlp_exception in solve.cu and apply
the change accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/pdlp/solve.cu`:
- Around line 1175-1183: The catch block around run_pdlp currently stores the
exception in pdlp_exception but discards it, losing diagnostics; update the
catch in solve.cu to extract and surface the exception message (from
pdlp_exception or std::current_exception()) — e.g., convert to a string via
std::rethrow_exception / try-catch and log it with the existing logging
facilities and/or set a more informative termination status on sol_pdlp (instead
of only pdlp_termination_status_t::NumericalError) before setting
*settings_pdlp.concurrent_halt = 1; ensure run_pdlp, pdlp_exception, sol_pdlp,
and settings_pdlp.concurrent_halt are referenced so callers receive meaningful
error info.

---

Nitpick comments:
In `@cpp/src/pdlp/solve.cu`:
- Around line 1190-1192: Remove the dead commented-out rethrow and replace the
vague TODO with a tracked issue reference and optional debug guard: create a
repository issue for the intermittent PDLP exception and update the comment to
include the issue ID (e.g., "// TODO(#<issue>): Intermittent PDLP exception
observed when running concurrent mode"), and either delete the commented block
that references pdlp_exception/printf or move that logic behind a
compile-time/debug flag (e.g., PDLP_DEBUG_RETHROW) so the rethrow of
pdlp_exception can be enabled only for debugging; locate the commented code
around pdlp_exception in solve.cu and apply the change accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 36088b3c-2460-43f0-af07-139729eea9b0

📥 Commits

Reviewing files that changed from the base of the PR and between 0daf1e1 and 3fcf5fb.

📒 Files selected for processing (1)
  • cpp/src/pdlp/solve.cu

@Iroy30 Iroy30 added bug Something isn't working non-breaking Introduces a non-breaking change pdlp labels Mar 17, 2026
@anandhkb anandhkb added this to the 26.04 milestone Mar 17, 2026
@Iroy30 Iroy30 changed the title Add exception handling pdlp Add exception handling for pdlp in concurrent mode Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Introduces a non-breaking change pdlp

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants