Skip to content

Latest commit

 

History

History
129 lines (91 loc) · 3.93 KB

README.md

File metadata and controls

129 lines (91 loc) · 3.93 KB

Install NVSHMEM

Important notices

This project is neither sponsored nor supported by NVIDIA.

Use of NVIDIA NVSHMEM is governed by the terms at NVSHMEM Software License Agreement.

Prerequisites

  1. GDRCopy (v2.4 and above recommended) is a low-latency GPU memory copy library based on NVIDIA GPUDirect RDMA technology, and it requires kernel module installation with root privileges.

  2. Hardware requirements

Installation procedure

1. Install GDRCopy

GDRCopy requires kernel module installation on the host system. Complete these steps on the bare-metal host before container deployment:

Build and installation

wget https://github.com/NVIDIA/gdrcopy/archive/refs/tags/v2.4.4.tar.gz
cd gdrcopy-2.4.4/
make -j$(nproc)
sudo make prefix=/opt/gdrcopy install

Kernel module installation

After compiling the software, you need to install the appropriate packages based on your Linux distribution.
For instance, using Ubuntu 22.04 and CUDA 12.3 as an example:

cd packages
CUDA=/path/to/cuda ./build-deb-packages.sh
sudo dpkg -i gdrdrv-dkms_2.4.4_amd64.Ubuntu22_04.deb \
             libgdrapi_2.4.4_amd64.Ubuntu22_04.deb \
             gdrcopy-tests_2.4.4_amd64.Ubuntu22_04+cuda12.3.deb \
             gdrcopy_2.4.4_amd64.Ubuntu22_04.deb
sudo ./insmod.sh  # Load kernel modules on the bare-metal system

Container environment notes

For containerized environments:

  1. Host: keep kernel modules loaded (gdrdrv)
  2. Container: install DEB packages without rebuilding modules:
    sudo dpkg -i gdrcopy_2.4.4_amd64.Ubuntu22_04.deb \
                 libgdrapi_2.4.4_amd64.Ubuntu22_04.deb \
                 gdrcopy-tests_2.4.4_amd64.Ubuntu22_04+cuda12.3.deb

Verification

gdrcopy_copybw  # Should show bandwidth test results

2. Acquiring NVSHMEM source code

Download NVSHMEM v3.1.7 from the NVIDIA NVSHMEM Archive.

3. Apply our custom patch

Navigate to your NVSHMEM source directory and apply our provided patch:

git apply /path/to/deep_ep/dir/third-party/nvshmem.patch

4. Configure NVIDIA driver

Enable IBGDA by modifying /etc/modprobe.d/nvidia.conf:

options nvidia NVreg_EnableStreamMemOPs=1 NVreg_RegistryDwords="PeerMappingOverride=1;"

Update kernel configuration:

sudo update-initramfs -u
sudo reboot

For more detailed configurations, please refer to the NVSHMEM Installation Guide.

5. Build and installation

The following example demonstrates building NVSHMEM with IBGDA support:

CUDA_HOME=/path/to/cuda && \
GDRCOPY_HOME=/path/to/gdrcopy && \
NVSHMEM_SHMEM_SUPPORT=0 \
NVSHMEM_UCX_SUPPORT=0 \
NVSHMEM_USE_NCCL=0 \
NVSHMEM_IBGDA_SUPPORT=1 \
NVSHMEM_PMIX_SUPPORT=0 \
NVSHMEM_TIMEOUT_DEVICE_POLLING=0 \
NVSHMEM_USE_GDRCOPY=1 \
cmake -S . -B build/ -DCMAKE_INSTALL_PREFIX=/path/to/your/dir/to/install

cd build
make -j$(nproc)
make install

Post-installation configuration

Set environment variables in your shell configuration:

export NVSHMEM_DIR=/path/to/your/dir/to/install  # Use for DeepEP installation
export LD_LIBRARY_PATH="${NVSHMEM_DIR}/lib:$LD_LIBRARY_PATH"
export PATH="${NVSHMEM_DIR}/bin:$PATH"

Verification

nvshmem-info -a # Should display details of nvshmem