-
Notifications
You must be signed in to change notification settings - Fork 144
Open
Labels
awaiting responseThis expects a response from maintainer or contributor depending on who requested in last comment.This expects a response from maintainer or contributor depending on who requested in last comment.bugSomething isn't workingSomething isn't working
Description
Describe the bug
When run_concurrent is called , run_pdlp throws an error intermittently and causes termination of the barrier thread prior to joining. PR #966 handles the exception to avoid a crash but the root cause of the exception needs to be evaluated. The snippet of the caught exception is below:
===================================================================================================== FAILURES ======================================================================================================
___________________________________________________________________________________ test_incumbent_get_callback[/mip/swath1.mps] ____________________________________________________________________________________
file_name = '/mip/swath1.mps'
@pytest.mark.parametrize(
"file_name",
[
("/mip/swath1.mps"),
("/mip/neos5-free-bound.mps"),
],
)
def test_incumbent_get_callback(file_name):
> _run_incumbent_solver_callback(file_name, include_set_callback=False)
python/cuopt/cuopt/tests/linear_programming/test_incumbent_callbacks.py:112:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
python/cuopt/cuopt/tests/linear_programming/test_incumbent_callbacks.py:87: in _run_incumbent_solver_callback
solution = solver.Solve(data_model_obj, settings)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/raid/iroy/miniforge3/envs/py313/lib/python3.13/site-packages/cuopt/utilities/exception_handler.py:48: in func
raise e
/raid/iroy/miniforge3/envs/py313/lib/python3.13/site-packages/cuopt/utilities/exception_handler.py:24: in func
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
/raid/iroy/miniforge3/envs/py313/lib/python3.13/site-packages/cuopt/linear_programming/solver/solver.py:98: in Solve
s = solver_wrapper.Solve(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
E RuntimeError: CUDA error encountered at: file=/home/nfs/iroy/cuopt-1/cpp/src/pdlp/utilities/ping_pong_graph.cu line=56: call='cudaStreamEndCapture(stream_view_.value(), &even_graph)', Reason=cudaErrorStreamCaptureInvalidated:operation failed due to a previous error during capture
E Obtained 49 stack frames
E #1 in /raid/iroy/miniforge3/envs/py313/lib/libcuopt.so(+0x2b66b1) [0x7f31a28fc6b1]
E #2 in /raid/iroy/miniforge3/envs/py313/lib/libcuopt.so: cuopt::linear_programming::detail::ping_pong_graph_t<int>::end_capture(int) +0xa9e [0x7f31a2aecaee]
E #3 in /raid/iroy/miniforge3/envs/py313/lib/libcuopt.so: cuopt::linear_programming::detail::pdhg_solver_t<int, double>::compute_next_primal_dual_solution_reflected(rmm::device_uvector<double>&, rmm::device_uvector<double>&, bool) +0x4cc [0x7f31a29e045c]
E #4 in /raid/iroy/miniforge3/envs/py313/lib/libcuopt.so: cuopt::linear_programming::detail::pdhg_solver_t<int, double>::take_step(rmm::device_uvector<double>&, rmm::device_uvector<double>&, int, bool, int, bool) +0x8b [0x7f31a29e370b]
E #5 in /raid/iroy/miniforge3/envs/py313/lib/libcuopt.so: cuopt::linear_programming::detail::pdlp_solver_t<int, double>::run_solver(cuopt::timer_t const&) +0xbdc [0x7f31a29ca19c]
E #6 in /raid/iroy/miniforge3/envs/py313/lib/libcuopt.so(+0x3265a2) [0x7f31a296c5a2]
E #7 in /raid/iroy/miniforge3/envs/py313/lib/libcuopt.so: cuopt::linear_programming::optimization_problem_solution_t<int, double> cuopt::linear_programming::run_pdlp<int, double>(cuopt::linear_programming::detail::problem_t<int, double>&, cuopt::linear_programming::pdlp_solver_settings_t<int, double> const&, cuopt::timer_t const&, bool) +0xcd [0x7f31a297093d]
E #8 in /raid/iroy/miniforge3/envs/py313/lib/libcuopt.so: cuopt::linear_programming::optimization_problem_solution_t<int, double> cuopt::linear_programming::run_concurrent<int, double>(cuopt::linear_programming::detail::problem_t<int, double>&, cuopt::linear_programming::pdlp_solver_settings_t<int, double> const&, cuopt::timer_t const&, bool) +0x323 [0x7f31a2972203]
E #9 in /raid/iroy/miniforge3/envs/py313/lib/libcuopt.so: cuopt::linear_programming::optimization_problem_solution_t<int, double> cuopt::linear_programming::solve_lp_with_method<int, double>(cuopt::linear_programming::detail::problem_t<int, double>&, cuopt::linear_programming::pdlp_solver_settings_t<int, double> const&, cuopt::timer_t const&, bool) +0x35 [0x7f31a2973285]
E #10 in /raid/iroy/miniforge3/envs/py313/lib/libcuopt.so: cuopt::linear_programming::detail::diversity_manager_t<int, double>::run_solver() +0x1184 [0x7f31a2d83844]
E #11 in /raid/iroy/miniforge3/envs/py313/lib/libcuopt.so: cuopt::linear_programming::detail::mip_solver_t<int, double>::run_solver() +0x1f8b [0x7f31a2d7272b]
E #12 in /raid/iroy/miniforge3/envs/py313/lib/libcuopt.so: cuopt::linear_programming::mip_solution_t<int, double> cuopt::linear_programming::run_mip<int, double>(cuopt::linear_programming::detail::problem_t<int, double>&, cuopt::linear_programming::mip_solver_settings_t<int, double> const&, cuopt::timer_t&) +0x1186 [0x7f31a2d64d86]
E #13 in /raid/iroy/miniforge3/envs/py313/lib/libcuopt.so: cuopt::linear_programming::mip_solution_t<int, double> cuopt::linear_programming::solve_mip<int, double>(cuopt::linear_programming::optimization_problem_t<int, double>&, cuopt::linear_programming::mip_solver_settings_t<int, double> const&) +0xcec [0x7f31a2d6645c]
E #14 in /raid/iroy/miniforge3/envs/py313/lib/libcuopt.so: std::unique_ptr<cuopt::linear_programming::mip_solution_interface_t<int, double>, std::default_delete<cuopt::linear_programming::mip_solution_interface_t<int, double> > > cuopt::linear_programming::solve_mip<int, double>(cuopt::linear_programming::optimization_problem_interface_t<int, double>*, cuopt::linear_programming::mip_solver_settings_t<int, double> const&) +0x176 [0x7f31a2d6a8d6]
E #15 in /raid/iroy/miniforge3/envs/py313/lib/libcuopt.so: cuopt::cython::call_solve_mip(cuopt::linear_programming::optimization_problem_interface_t<int, double>*, cuopt::linear_programming::mip_solver_settings_t<int, double>&) +0x61 [0x7f31a2aedb51]
E #16 in /raid/iroy/miniforge3/envs/py313/lib/libcuopt.so: cuopt::cython::call_solve(cuopt::mps_parser::data_model_view_t<int, double>*, cuopt::linear_programming::solver_settings_t<int, double>*, unsigned int, bool) +0x7e3 [0x7f31a2aee843]
E #17 in /raid/iroy/miniforge3/envs/py313/lib/python3.13/site-packages/cuopt/linear_programming/solver/solver_wrapper.cpython-313-x86_64-linux-gnu.so(+0x529ff) [0x7f31a88de9ff]
E #18 in /raid/iroy/miniforge3/envs/py313/bin/python3.13: PyObject_Vectorcall +0x2e [0x557521595e6e]
E #19 in /raid/iroy/miniforge3/envs/py313/bin/python3.13: _PyEval_EvalFrameDefault +0x9245 [0x5575215ad375]
E #20 in /raid/iroy/miniforge3/envs/py313/bin/python3.13(+0x27b2e7) [0x5575216672e7]
E #21 in /raid/iroy/miniforge3/envs/py313/bin/python3.13(+0x2cac98) [0x5575216b6c98]
E #22 in /raid/iroy/miniforge3/envs/py313/bin/python3.13: _PyObject_MakeTpCall +0x27c [0x557521593c5c]
E #23 in /raid/iroy/miniforge3/envs/py313/bin/python3.13: _PyEval_EvalFrameDefault +0x9245 [0x5575215ad375]
E #24 in /raid/iroy/miniforge3/envs/py313/bin/python3.13(+0x27b2e7) [0x5575216672e7]
E #25 in /raid/iroy/miniforge3/envs/py313/bin/python3.13(+0x2cac98) [0x5575216b6c98]
E #26 in /raid/iroy/miniforge3/envs/py313/bin/python3.13(+0x28a699) [0x557521676699]
E #27 in /raid/iroy/miniforge3/envs/py313/bin/python3.13: _PyEval_EvalFrameDefault +0x3df7 [0x5575215a7f27]
E #28 in /raid/iroy/miniforge3/envs/py313/bin/python3.13(+0x27b2e7) [0x5575216672e7]
E #29 in /raid/iroy/miniforge3/envs/py313/bin/python3.13(+0x2cac98) [0x5575216b6c98]
E #30 in /raid/iroy/miniforge3/envs/py313/bin/python3.13: _PyObject_MakeTpCall +0x27c [0x557521593c5c]
E #31 in /raid/iroy/miniforge3/envs/py313/bin/python3.13: _PyEval_EvalFrameDefault +0x9245 [0x5575215ad375]
E #32 in /raid/iroy/miniforge3/envs/py313/bin/python3.13(+0x27b2e7) [0x5575216672e7]
E #33 in /raid/iroy/miniforge3/envs/py313/bin/python3.13(+0x2cac98) [0x5575216b6c98]
E #34 in /raid/iroy/miniforge3/envs/py313/bin/python3.13: _PyObject_MakeTpCall +0x27c [0x557521593c5c]
E #35 in /raid/iroy/miniforge3/envs/py313/bin/python3.13: _PyEval_EvalFrameDefault +0x9245 [0x5575215ad375]
E #36 in /raid/iroy/miniforge3/envs/py313/bin/python3.13(+0x27b2e7) [0x5575216672e7]
E #37 in /raid/iroy/miniforge3/envs/py313/bin/python3.13(+0x2cac98) [0x5575216b6c98]
E #38 in /raid/iroy/miniforge3/envs/py313/bin/python3.13: _PyObject_MakeTpCall +0x27c [0x557521593c5c]
E #39 in /raid/iroy/miniforge3/envs/py313/bin/python3.13: _PyEval_EvalFrameDefault +0x9245 [0x5575215ad375]
E #40 in /raid/iroy/miniforge3/envs/py313/bin/python3.13: PyEval_EvalCode +0x9f [0x55752166903f]
E #41 in /raid/iroy/miniforge3/envs/py313/bin/python3.13(+0x2bc5a3) [0x5575216a85a3]
E #42 in /raid/iroy/miniforge3/envs/py313/bin/python3.13(+0x2b96ac) [0x5575216a56ac]
E #43 in /raid/iroy/miniforge3/envs/py313/bin/python3.13(+0x2b64b6) [0x5575216a24b6]
E #44 in /raid/iroy/miniforge3/envs/py313/bin/python3.13(+0x2b6173) [0x5575216a2173]
E #45 in /raid/iroy/miniforge3/envs/py313/bin/python3.13(+0x2b5f2c) [0x5575216a1f2c]
E #46 in /raid/iroy/miniforge3/envs/py313/bin/python3.13: Py_RunMain +0x3b4 [0x5575216a08e4]
E #47 in /raid/iroy/miniforge3/envs/py313/bin/python3.13: Py_BytesMain +0x37 [0x557521654947]
E #48 in /lib/x86_64-linux-gnu/libc.so.6: __libc_start_main +0xf3 [0x7f31b6eb8083]
E #49 in /raid/iroy/miniforge3/envs/py313/bin/python3.13(+0x267cdd) [0x557521653cdd]
cuopt/linear_programming/solver/solver_wrapper.pyx:519: RuntimeError
Steps/Code to reproduce bug
Follow this guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports to craft a minimal bug report. This helps us reproduce the issue you're having and resolve the issue more quickly.
Expected behavior
A clear and concise description of what you expected to happen.
Environment details (please complete the following information):
- Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
- Method of cuOpt install: [conda, Docker, or from source]
- If method of install is [Docker], provide
docker pull&docker runcommands used
- If method of install is [Docker], provide
Additional context
Add any other context about the problem here.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
awaiting responseThis expects a response from maintainer or contributor depending on who requested in last comment.This expects a response from maintainer or contributor depending on who requested in last comment.bugSomething isn't workingSomething isn't working