-
Notifications
You must be signed in to change notification settings - Fork 99
Multi GPU with NVSHMEM
One systems with Infiniband Networks and CUDA 11 (or later) QUDA can use NVSHMEM for communication to reduce communication overheads and improve scaling.
Note that this works on top of QMP/MPI and does not replace these.
No changes are required in application that use QUDA
NVSHMEM™ is a parallel programming interface based on OpenSHMEM that provides efficient and scalable communication for NVIDIA GPU clusters. NVSHMEM creates a global address space for data that spans the memory of multiple GPUs and can be accessed with fine-grained GPU-initiated operations, CPU-initiated operations, and operations on CUDA® streams.
To learn more about it check out https://developer.nvidia.com/nvshmem.
For QUDA the device-side communication can significantly reduce overheads from CPU and GPU synchronization and improve compute and communication overlap. This reduces latencies and improve strong scaling.
More details can be found in: https://developer.nvidia.com/gtc/2020/video/s21673-vid
To build with NVSHMEM in addition to MPI/QMP you also need to have NVSHMEM (version 2 or later) installed. This can either be
- Already installed as a module / system wide by your system administrator.
In this case just be sure to set the environment variableNVSHMEM_HOME
to the install directory. - You can install it in yourself (see below)
- Rely on QUDA building NVSHMEM (experimental, limited support)
You can get NVSHMEM from NVSHMEM Download page.
Detailed instructions on how to build and install NVSHMEM are available in the installation guide.
We recommend following the instructions there and use the default build settings.
- Make sure to leave
NVSHMEM_MPI_SUPPORT=1
(default) enabled in the build process. - Check with your system administrator if GDRCOPY is available on your system and its installation location to set
GDRCOPY_HOME
.
After you have built NVSHMEM set the environment variable NVSHMEM_HOME
to the installation location.
The build command for building with OpenMPI, CUDA and GDRCOPY available in respective directories in /usr/local
look like:
CUDA_HOME=/usr/local/cuda GDRCOPY_HOME=/usr/local/gdrcopy MPI_HOME=/usr/local/openmpi NVSHMEM_MPI_SUPPORT=1 NVSHMEM_PREFIX=/usr/local/nvshmem make -j4 install
export NVSHMEM_HOME=/usr/local/nvshmem
For Summit-specific information, check the Summit section below.
To build QUDA with NVSHMEM we assume that you have already installed NVSHMEM yourself or it has been installed by your system adminstrator. To enable NVSHMEM during the QUDA build enabled
cmake -DQUDA_NVSHMEM=ON -DQUDA_MPI=ON [...]
or if you use QMP
cmake -DQUDA_NVSHMEM=ON -DQUDA_QMP=ON [...]
Cmake will try to pre-populate QUDA_NVSHMEM_HOME
with the value from the environment variable NVSHMEM_HOME
. If you have not set NVSHMEM_HOME
or it fails for whatever reason you can also pass
-DQUDA_NVSHMEM_HOME=/path/to/nvshmem_install
NVSHMEM communication is enabled by default and the quda autotuner will use it as it sees fit. No further action is required. It is however recommended to follow the best practices for RDMA performance described at Maximizing GDR performance to make sure you use proper binding of CPU and HCA.
NVSHMEM is usually smart enough to use the correct HCA adapter without explicit binding. Note that any environment variables for HCA binding from MPI/UCX are not respected by NVSHMEM and you should use NVSHMEM_ENABLE_NIC_PE_MAPPING
and NVSHMEM_HCA_PE_MAPPING
. See NVSHMEM environment variables for details.
For troubleshooting NVSHMEM issues we also recommend the FAQ.
TODO
At runtime, you can opt out of using NVSHMEM by setting the environment variable QUDA_ENABLE_NVSHMEM=0
, where the default value is equivalent to one. This will disable all use of NVSHMEM Dslash policies, relying purely on CUDA IPC and MPI message exchange as supported by the system being run used.
We recommend building and installing your own version of NVSHMEM from source when running on Summit, which is a standard MPI install. The challenge on Summit is finding the locations of GDRCopy, MPI, etc, to pass to the NVSHMEM make
command as described above. These instructions have been tested with the following modules:
$ module list
Currently Loaded Modules:
1) lsf-tools/2.0 3) darshan-runtime/3.3.0-lite 5) cuda/11.0.3 7) spectrum-mpi/10.4.0.3-20210112 9) cmake/3.20.2 11) nsight-compute/2021.2.1
2) hsi/5.0.2.p5 4) DefApps 6) gcc/9.3.0 8) git/2.31.1 10) nsight-systems/2021.3.1.54 12) gdrcopy/2.2
The modules cuda/11.0.3
, gcc/9.3.0
, spectrum-mpi/10.4.0.3-20210112
, and gdrcopy/2.2
are what's most relevant, though there is likely freedom to choose other versions of GCC. Other versions of CUDA are not officially supported on Summit and thus we do not consider them here.
With these options, the environment variables passed to make
for NVSHMEM are (as of May 25, 2022):
-
CUDA_HOME=/sw/summit/cuda/11.0.3
(parsed fromwhich nvcc
) -
GDRCOPY_HOME=/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-8.3.1/gdrcopy-2.2-xk2w6ftqfas57fuzgcxcc7p5pebgthth
(parsed fromecho $LD_LIBRARY_PATH
) -
MPI_HOME=/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/spectrum-mpi-10.4.0.3-20210112-6depextb6p6ulrvmehgtbskbmcsyhtdi
(parsed fromwhich mpicc
)
NVSHMEM can then be built and installed locally by using the make
command above, for ex:
export NVSHMEM_HOME=/my/path/to/nvshmem
CUDA_HOME=[..] GDRCOPY_HOME=[...] MPI_HOME=[...] NVSHMEM_MPI_SUPPORT=1 NVSHMEM_PREFIX=$NVSHMEM_HOME make -j4 install
Be sure to append the libraries directory of the install directory to your LD_LIBRARY_PATH
environment variable appropriately, either in your default environment and/or in your LSF submit script via
export LD_LIBRARY_PATH="${NVSHMEM_HOME}/lib:${LD_LIBRARY_PATH}"