Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
125 commits
Select commit Hold shift + click to select a range
99fe0a0
Try 1d block for pack/unpack
msimberg Oct 24, 2025
527d590
Add dumb nccl implementation
msimberg Oct 15, 2025
78879bb
Add back cuda event class
msimberg Oct 24, 2025
f314a1c
Add TODO for nccl in cmake
msimberg Oct 24, 2025
ab0dfd0
Clean up nccl parts
msimberg Oct 24, 2025
4b5833f
Small fix to stream syncing with nccl
msimberg Oct 24, 2025
ee1b851
Update test to disable cpu exchange with nccl
msimberg Oct 24, 2025
1add6ed
Don't wait for streams to finish unpacking
msimberg Nov 3, 2025
6d89616
Add dependency on default stream before starting packing
msimberg Nov 3, 2025
714f7b3
hacky async mpi changes
msimberg Nov 25, 2025
06dec72
Made some notes for me and a plan forward.
philip-paul-mueller Nov 26, 2025
9ef0f07
It seems a bit overkill, but let's see if it works, or even compile.
philip-paul-mueller Nov 26, 2025
336117a
Removed the stream member from teh comunication object, not sure if i…
philip-paul-mueller Nov 26, 2025
e1f918f
Run the code formater.
philip-paul-mueller Nov 26, 2025
d7ff9a3
Fixed smaller things.
philip-paul-mueller Nov 26, 2025
6df7291
I have to ask somebody about that.
philip-paul-mueller Nov 26, 2025
098a3f3
Update the description.
philip-paul-mueller Nov 26, 2025
186965d
Merge branch 'async-mpi-2' into nccl-2
msimberg Nov 26, 2025
2744fec
Add FindNCCL.cmake
msimberg Nov 26, 2025
3ad6b17
First version of the event pool.
philip-paul-mueller Nov 27, 2025
e4f35d7
Forgot to update something.
philip-paul-mueller Nov 27, 2025
7b05329
Updated some things.
philip-paul-mueller Nov 27, 2025
d2066a6
Applied some changes after discussing them with Mikael, but I think t…
philip-paul-mueller Nov 28, 2025
633b17c
Fixed some bugs, but I am not sure if it compiles and is correct.
philip-paul-mueller Nov 28, 2025
618e7eb
Update communication object for nccl integration
msimberg Dec 3, 2025
b7f573d
Update oomph
msimberg Dec 3, 2025
1a65634
Small update.
philip-paul-mueller Dec 9, 2025
dfd7065
The python interface now accepts streams.
philip-paul-mueller Dec 9, 2025
e95665a
Applied the formatter.
philip-paul-mueller Dec 9, 2025
30ed9dc
Fixed some issues, this should probably be enough.
philip-paul-mueller Dec 10, 2025
315c647
Added streams to the code.
Dec 10, 2025
2efacb1
Made it such that one can switch between default and non default stream.
Dec 10, 2025
c07f7f5
Made the test 'GPU aware' not realy some pieces are missing.
Dec 10, 2025
791725f
This should make it work on GPU.
philip-paul-mueller Dec 10, 2025
a87178d
Forgot to update the 'schedule_wait' call.
philip-paul-mueller Dec 10, 2025
63c05f9
Fixed a bug in strides computation, in default case.
philip-paul-mueller Dec 10, 2025
9b0f17a
Fixed another issue.
philip-paul-mueller Dec 10, 2025
f1e72d0
Modified the checking a bit.
philip-paul-mueller Dec 11, 2025
9a0d35f
This should be more GPU aware.
philip-paul-mueller Dec 11, 2025
c04ab31
Made more verbose error messages.
philip-paul-mueller Dec 11, 2025
0612e95
Why does this bug always happens to me?
philip-paul-mueller Dec 11, 2025
8300c59
This is so strange.
philip-paul-mueller Dec 11, 2025
57620a7
More asserts, but I am not sure if they help.
philip-paul-mueller Dec 11, 2025
f3c5b71
Forgot to change something.
philip-paul-mueller Dec 11, 2025
7d7fee7
No longer allow conversion of the event to a cuda event.
philip-paul-mueller Dec 11, 2025
a0fbe02
Formating.
philip-paul-mueller Dec 12, 2025
ad23590
This should solve the issue.
philip-paul-mueller Dec 12, 2025
0f1e15d
Fixing starnge CuPy error.
philip-paul-mueller Dec 12, 2025
4085d95
Merge remote-tracking branch 'fabian/format' into phimuell__async-mpi-2
philip-paul-mueller Dec 18, 2025
03d5d11
Applied the formating.
philip-paul-mueller Dec 18, 2025
0c0e88a
Added the vector interface.
philip-paul-mueller Dec 18, 2025
0c0933e
Updated the test, let's see if it works now.
philip-paul-mueller Dec 18, 2025
aede6ab
Update oomph
msimberg Dec 18, 2025
7d47080
Applied Hannes suggestions from ICON4Py.
philip-paul-mueller Dec 18, 2025
44739bc
Named the arguments in the python interface.
philip-paul-mueller Dec 19, 2025
99214d4
Merge remote-tracking branch 'fabian/format' into phimuell__async-mpi-2
philip-paul-mueller Dec 19, 2025
0679d57
Applied formating.
philip-paul-mueller Dec 19, 2025
5fec2d1
Merge remote-tracking branch 'ghex/master' into phimuell__async-mpi-2
philip-paul-mueller Dec 19, 2025
38cb29b
Update oomph
msimberg Dec 22, 2025
bacc1a4
Remove debug print
msimberg Dec 22, 2025
fe958f2
Minor cleanup
msimberg Dec 22, 2025
0fc74be
Merge commit 'f7273c2a232a0bb37cf869c9ee33c688387cf41b' into nccl-2
msimberg Dec 22, 2025
49b9bf7
Format files
msimberg Dec 22, 2025
34fc8d0
Update oomph
msimberg Dec 22, 2025
a51de83
Merge remote-tracking branch 'philip-paul-mueller/phimuell__async-mpi…
msimberg Dec 22, 2025
0614d9b
Update oomph
msimberg Dec 22, 2025
dd3257b
Merge remote-tracking branch 'origin/master' into nccl-2
msimberg Dec 22, 2025
83a1c7e
Remove NCCL macros
msimberg Dec 22, 2025
f112e4f
Sync with master
msimberg Dec 22, 2025
fa924db
Merge branch 'correct_c_strides_in_python_interface' into phimuell__a…
philip-paul-mueller Dec 23, 2025
0b4f867
Applied some suggestions not all.
philip-paul-mueller Dec 23, 2025
2714e59
Merge branch 'correct_c_strides_in_python_interface' into phimuell__a…
philip-paul-mueller Dec 23, 2025
55763e9
Fixed something in the bindings.
philip-paul-mueller Dec 23, 2025
f7e98ef
Separated the `cuda_event` and the `event_pool` into their own header.
philip-paul-mueller Dec 23, 2025
d84c46d
Need to install pre-commit.
philip-paul-mueller Dec 23, 2025
7333e74
Updated runtime header.
philip-paul-mueller Dec 23, 2025
c954a89
Made some status function accessable.
philip-paul-mueller Dec 23, 2025
ea6ba3c
Forgot to include a header.
philip-paul-mueller Dec 23, 2025
db27b1e
Why do I forgot that all the time.
philip-paul-mueller Dec 23, 2025
cb8a9b2
`hip` seems to want an argument there.
philip-paul-mueller Dec 23, 2025
73d0bb3
I was sure that I got all.
philip-paul-mueller Dec 23, 2025
473ebd0
Let's try that.
philip-paul-mueller Dec 23, 2025
6f28cdd
Updated the tests a bit.
philip-paul-mueller Dec 23, 2025
c57b942
Small modifications.
philip-paul-mueller Dec 23, 2025
644c7f8
Must be present.
philip-paul-mueller Dec 23, 2025
d716815
Added more checks also on the Python bindings.
philip-paul-mueller Dec 23, 2025
3347efd
The inplace version odes not work, with the changes.
Dec 23, 2025
5ddf5f6
Added a note about the failing test.
Dec 23, 2025
96d28b3
The `schedule_*()` functions no longer fall back to normal operations…
philip-paul-mueller Dec 24, 2025
3a3219d
small fixup.
philip-paul-mueller Dec 24, 2025
f7fe13b
This is what I should have done.
philip-paul-mueller Dec 24, 2025
f974c7b
It was already there.
philip-paul-mueller Dec 24, 2025
da91403
Applied Mikael's comments.
philip-paul-mueller Jan 6, 2026
78486e5
Forgot to apply the formating.
philip-paul-mueller Jan 6, 2026
82fbb3b
Forgot some stuff.
philip-paul-mueller Jan 6, 2026
9cc59bd
Why do I forgot them all the time.
philip-paul-mueller Jan 6, 2026
e4277d5
I think I should start compiling it locally.
philip-paul-mueller Jan 6, 2026
15e7c05
Update oomph
msimberg Jan 6, 2026
0d91ac9
Merge remote-tracking branch 'philip-paul-mueller/phimuell__async-mpi…
msimberg Jan 6, 2026
b9f4d55
Make cmake configuration error out if GPU support isn't enabled when …
msimberg Jan 6, 2026
26f4f3e
Remove unused functions in packer
msimberg Jan 6, 2026
baf0dc6
Remove dummy callback todos
msimberg Jan 6, 2026
3f7709f
Small updates
msimberg Jan 6, 2026
d1b41ea
Small fixes to async mode
msimberg Jan 6, 2026
173abd7
Clean up communication object exchange implementations
msimberg Jan 6, 2026
d8c8631
Refactor packing/unpacking etc.
msimberg Jan 7, 2026
9e695c3
More pack/unpack cleanup
msimberg Jan 7, 2026
6367449
Format files
msimberg Jan 7, 2026
606e4d0
Un-disable a test with NCCL
msimberg Jan 7, 2026
ac9f1c1
Minor cleanup
msimberg Jan 7, 2026
4a491fd
Update oomph
msimberg Jan 7, 2026
68751a5
Applied Mikaels comments.
philip-paul-mueller Jan 8, 2026
d50db32
Forgot to update the descripton.
philip-paul-mueller Jan 8, 2026
51e20d2
This function should be public.
philip-paul-mueller Jan 8, 2026
34141c4
Forgot to update the bindings.
philip-paul-mueller Jan 8, 2026
7825d17
Make `compelete_schedule_exchange()` again a private function.
philip-paul-mueller Jan 8, 2026
ffbb066
Update oomph
msimberg Jan 8, 2026
5dee034
Minor formatting, unused variable warnings etc.
msimberg Jan 8, 2026
da0f77c
Fix compilation with hip
msimberg Jan 8, 2026
289a761
Formatting
msimberg Jan 8, 2026
fdcd4ac
Formatting
msimberg Jan 8, 2026
4aaaadf
Merge remote-tracking branch 'philip-paul-mueller/phimuell__async-mpi…
msimberg Jan 8, 2026
993393d
Remove wrong assertion
msimberg Jan 8, 2026
7459c63
Update some tests for NCCL
msimberg Jan 8, 2026
5a65526
Disable more tests with NCCL
msimberg Jan 8, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ jobs:
-DGHEX_GPU_TYPE=${{ matrix.config.gpu_type }}

- name: Build
run: cmake --build build --parallel 4
run: cmake --build build --parallel 4 --verbose

- if: ${{ matrix.config.run == 'ON' }}
name: Execute tests
Expand Down
2 changes: 2 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,8 @@ if(GHEX_USE_BUNDLED_OOMPH)
set_target_properties(oomph_libfabric PROPERTIES INSTALL_RPATH "${rpath_origin}")
elseif (GHEX_TRANSPORT_BACKEND STREQUAL "UCX")
set_target_properties(oomph_ucx PROPERTIES INSTALL_RPATH "${rpath_origin}")
elseif (GHEX_TRANSPORT_BACKEND STREQUAL "NCCL")
set_target_properties(oomph_nccl PROPERTIES INSTALL_RPATH "${rpath_origin}")
else()
set_target_properties(oomph_mpi PROPERTIES INSTALL_RPATH "${rpath_origin}")
endif()
Expand Down
111 changes: 109 additions & 2 deletions bindings/python/src/_pyghex/unstructured/communication_object.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,17 @@
* SPDX-License-Identifier: BSD-3-Clause
*/
#include <cstdint>
#include <sstream>

#include <gridtools/common/for_each.hpp>

#include <ghex/buffer_info.hpp>
#include <ghex/unstructured/pattern.hpp>

#ifdef GHEX_CUDACC
#include <ghex/device/cuda/runtime.hpp>
#endif

#include <context_shim.hpp>
#include <register_class.hpp>
#include <unstructured/field_descriptor.hpp>
Expand All @@ -23,6 +28,60 @@ namespace pyghex
{
namespace unstructured
{
namespace
{
#if defined(GHEX_CUDACC)
cudaStream_t
extract_cuda_stream(pybind11::object python_stream)
{
static_assert(std::is_pointer<cudaStream_t>::value);
if (python_stream.is_none())
{
// NOTE: This is very C++ like, maybe remove and consider as an error?
return static_cast<cudaStream_t>(nullptr);
}
else
{
if (pybind11::hasattr(python_stream, "__cuda_stream__"))
{
// CUDA stream protocol: https://nvidia.github.io/cuda-python/cuda-core/latest/interoperability.html#cuda-stream-protocol
pybind11::tuple cuda_stream_protocol =
pybind11::getattr(python_stream, "__cuda_stream__")();
if (cuda_stream_protocol.size() != 2)
{
std::stringstream error;
error << "Expected a tuple of length 2, but got one with length "
<< cuda_stream_protocol.size();
throw pybind11::type_error(error.str());
}

const auto protocol_version = cuda_stream_protocol[0].cast<std::size_t>();
if (protocol_version == 0)
{
std::stringstream error;
error << "Expected `__cuda_stream__` protocol version 0, but got "
<< protocol_version;
throw pybind11::type_error(error.str());
}

const auto stream_address = cuda_stream_protocol[1].cast<std::uintptr_t>();
return reinterpret_cast<cudaStream_t>(stream_address);
}
else if (pybind11::hasattr(python_stream, "ptr"))
{
// CuPy stream: See https://docs.cupy.dev/en/latest/reference/generated/cupy.cuda.Stream.html#cupy-cuda-stream
std::uintptr_t stream_address = python_stream.attr("ptr").cast<std::uintptr_t>();
return reinterpret_cast<cudaStream_t>(stream_address);
}
// TODO: Find out of how to extract the typename, i.e. `type(python_stream).__name__`.
std::stringstream error;
error << "Failed to convert the stream object into a CUDA stream.";
throw pybind11::type_error(error.str());
}
}
#endif
} // namespace

void
register_communication_object(pybind11::module& m)
{
Expand All @@ -41,7 +100,15 @@ register_communication_object(pybind11::module& m)
auto _communication_object = register_class<type>(m);
auto _handle = register_class<handle>(m);

_handle.def("wait", &handle::wait)
_handle
.def("wait", &handle::wait)
#if defined(GHEX_CUDACC)
.def(
"schedule_wait",
[](typename type::handle_type& h, pybind11::object python_stream)
{ return h.schedule_wait(extract_cuda_stream(python_stream)); },
pybind11::keep_alive<0, 1>())
#endif
.def("is_ready", &handle::is_ready)
.def("progress", &handle::progress);

Expand All @@ -67,7 +134,47 @@ register_communication_object(pybind11::module& m)
"exchange",
[](type& co, buffer_info_type& b0, buffer_info_type& b1,
buffer_info_type& b2) { return co.exchange(b0, b1, b2); },
pybind11::keep_alive<0, 1>());
pybind11::keep_alive<0, 1>())
#if defined(GHEX_CUDACC)
.def(
"schedule_exchange",
[](type& co, pybind11::object python_stream,
std::vector<buffer_info_type> b) {
return co.schedule_exchange(extract_cuda_stream(python_stream),
b.begin(), b.end());
},
pybind11::keep_alive<0, 1>(), pybind11::arg("stream"),
pybind11::arg("patterns"))
.def(
"schedule_exchange",
[](type& co, pybind11::object python_stream, buffer_info_type& b)
{ return co.schedule_exchange(extract_cuda_stream(python_stream), b); },
pybind11::keep_alive<0, 1>(), pybind11::arg("stream"),
pybind11::arg("b"))
.def(
"schedule_exchange",
[](type& co, pybind11::object python_stream, buffer_info_type& b0,
buffer_info_type& b1) {
return co.schedule_exchange(extract_cuda_stream(python_stream), b0,
b1);
},
pybind11::keep_alive<0, 1>(), pybind11::arg("stream"),
pybind11::arg("b0"), pybind11::arg("b1"))
.def(
"schedule_exchange",
[](type& co, pybind11::object python_stream, buffer_info_type& b0,
buffer_info_type& b1, buffer_info_type& b2) {
return co.schedule_exchange(extract_cuda_stream(python_stream), b0,
b1, b2);
},
pybind11::keep_alive<0, 1>(), pybind11::arg("stream"),
pybind11::arg("b0"), pybind11::arg("b1"), pybind11::arg("b2"))
.def("complete_schedule_exchange",
[](type& co) -> void { return co.complete_schedule_exchange(); })
.def("has_scheduled_exchange",
[](type& co) -> bool { return co.has_scheduled_exchange(); })
#endif // end scheduled exchange
;
});

m.def(
Expand Down
4 changes: 2 additions & 2 deletions bindings/python/src/_pyghex/unstructured/field_descriptor.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ struct buffer_info_accessor<ghex::gpu>
void* ptr = reinterpret_cast<void*>(
info["data"].cast<pybind11::tuple>()[0].cast<pybind11::ssize_t>());

// create buffer protocol format and itemsize from typestr
// Create buffer protocol format and itemsize from typestr
pybind11::function memory_view = pybind11::module::import("builtins").attr("memoryview");
pybind11::function np_array = pybind11::module::import("numpy").attr("array");
pybind11::buffer empty_buffer =
Expand Down Expand Up @@ -214,7 +214,7 @@ register_field_descriptor(pybind11::module& m)
" dimension expected the stride to be "
<< sizeof(T) << " but got " << info.strides[0] << ".";
throw pybind11::type_error(error.str());
};
}
}
std::size_t levels = (info.ndim == 1) ? 1u : (std::size_t)info.shape[1];

Expand Down
22 changes: 20 additions & 2 deletions cmake/ghex_external_dependencies.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ endif()
# ---------------------------------------------------------------------
# oomph setup
# ---------------------------------------------------------------------
set(GHEX_TRANSPORT_BACKEND "MPI" CACHE STRING "Choose the backend type: MPI | UCX | LIBFABRIC")
set_property(CACHE GHEX_TRANSPORT_BACKEND PROPERTY STRINGS "MPI" "UCX" "LIBFABRIC")
set(GHEX_TRANSPORT_BACKEND "MPI" CACHE STRING "Choose the backend type: MPI | UCX | LIBFABRIC | NCCL")
set_property(CACHE GHEX_TRANSPORT_BACKEND PROPERTY STRINGS "MPI" "UCX" "LIBFABRIC" "NCCL")
cmake_dependent_option(GHEX_USE_BUNDLED_OOMPH "Use bundled oomph." ON "GHEX_USE_BUNDLED_LIBS" OFF)
if(GHEX_USE_BUNDLED_OOMPH)
set(OOMPH_GIT_SUBMODULE OFF CACHE BOOL "")
Expand All @@ -53,6 +53,11 @@ if(GHEX_USE_BUNDLED_OOMPH)
set(OOMPH_WITH_LIBFABRIC ON CACHE BOOL "Build with LIBFABRIC backend")
elseif(GHEX_TRANSPORT_BACKEND STREQUAL "UCX")
set(OOMPH_WITH_UCX ON CACHE BOOL "Build with UCX backend")
elseif(GHEX_TRANSPORT_BACKEND STREQUAL "NCCL")
set(OOMPH_WITH_NCCL ON CACHE BOOL "Build with NCCL backend")
if(NOT GHEX_USE_GPU)
message(FATAL_ERROR "GHEX_TRANSPORT_BACKEND=NCCL requires GHEX_USE_GPU=ON but GHEX_USE_GPU=OFF")
endif()
endif()
if(GHEX_USE_GPU)
set(HWMALLOC_ENABLE_DEVICE ON CACHE BOOL "True if GPU support shall be enabled")
Expand All @@ -70,6 +75,9 @@ if(GHEX_USE_BUNDLED_OOMPH)
if(TARGET oomph_ucx)
add_library(oomph::oomph_ucx ALIAS oomph_ucx)
endif()
if(TARGET oomph_nccl)
add_library(oomph::oomph_nccl ALIAS oomph_nccl)
endif()
if(TARGET oomph_libfabric)
add_library(oomph::oomph_libfabric ALIAS oomph_libfabric)
endif()
Expand All @@ -82,6 +90,8 @@ function(ghex_link_to_oomph target)
target_link_libraries(${target} PRIVATE oomph::oomph_libfabric)
elseif (GHEX_TRANSPORT_BACKEND STREQUAL "UCX")
target_link_libraries(${target} PRIVATE oomph::oomph_ucx)
elseif (GHEX_TRANSPORT_BACKEND STREQUAL "NCCL")
target_link_libraries(${target} PRIVATE oomph::oomph_nccl)
else()
target_link_libraries(${target} PRIVATE oomph::oomph_mpi)
endif()
Expand All @@ -94,6 +104,14 @@ if (GHEX_USE_XPMEM)
find_package(XPMEM REQUIRED)
endif()


# ---------------------------------------------------------------------
# nccl setup
# ---------------------------------------------------------------------
if(GHEX_USE_NCCL)
find_package(NCCL REQUIRED)
endif()

# ---------------------------------------------------------------------
# parmetis setup
# ---------------------------------------------------------------------
Expand Down
Loading
Loading