Skip to content

Commit

Permalink
Merge pull request #6 from NVIDIA/gh-needs-open
Browse files Browse the repository at this point in the history
GH200 systems require open GPU kernel module driver
  • Loading branch information
mikemckiernan authored Jan 31, 2024
2 parents 0bc66c9 + e0139d4 commit 2158d65
Show file tree
Hide file tree
Showing 4 changed files with 21 additions and 6 deletions.
4 changes: 2 additions & 2 deletions gpu-operator/gpu-operator-rdma.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ new kernel module ``nvidia-peermem`` is included in the standard NVIDIA driver i
kernel module provides Mellanox Infiniband-based HCAs direct peer-to-peer read and write access to the GPU's memory.

Starting with v23.9.1 of the Operator, the Operator uses GDS driver version 2.17.5 or newer.
This version and higher is only supported with the NVIDIA open kernel driver.
This version and higher is only supported with the NVIDIA Open GPU Kernel module driver.
The sample commands for installing the Operator include the ``--set useOpenKernelModules=true``
command-line argument for Helm.

Expand Down Expand Up @@ -386,7 +386,7 @@ The following section is applicable to the following configurations and describe
* Kubernetes on bare metal and on vSphere VMs with GPU passthrough and vGPU.

Starting with v22.9.1, the GPU Operator provides an option to load the ``nvidia-fs`` kernel module during the bootstrap of the NVIDIA driver daemonset.
Starting with v23.9.1, the GPU Operator deploys a version of GDS that requires using the NVIDIA open kernel driver.
Starting with v23.9.1, the GPU Operator deploys a version of GDS that requires using the NVIDIA Open GPU Kernel module driver.

The following sample command applies to clusters that use the Network Operator to install the MLNX_OFED drivers.

Expand Down
2 changes: 1 addition & 1 deletion gpu-operator/life-cycle-policy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ Refer to :ref:`Upgrading the NVIDIA GPU Operator` for more information.
.. _gds-open-kernel:

:sup:`1`
This release of the GDS driver requires that you use the NVIDIA open kernel driver for the GPUs.
This release of the GDS driver requires that you use the NVIDIA Open GPU Kernel module driver for the GPUs.
Refer to :doc:`gpu-operator-rdma` for more information.

.. note::
Expand Down
16 changes: 14 additions & 2 deletions gpu-operator/platform-support.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,19 +41,31 @@ Supported NVIDIA Data Center GPUs and Systems

The following NVIDIA data center GPUs are supported on x86 based platforms:

.. _open-kern-module: #requires-open-kernel-module
.. |open-kern-module| replace:: :sup:`1`

.. tab-set::

.. tab-item:: GH-series Products


.. list-table::
:header-rows: 1

* - Product
- Architecture

* - NVIDIA GH200
* - NVIDIA GH200 |open-kern-module|_
- NVIDIA Grace Hopper

.. _requires-open-kernel-module:

:sup:`1`
NVIDIA GH200 systems require the NVIDIA Open GPU Kernel module driver.
You can install the open kernel modules by specifying the ``driver.useOpenKernelModules=true``
argument to the ``helm`` command.
Refer to :ref:`chart customization options` for more information.

.. tab-item:: A, H and L-series Products
:selected:

Expand Down Expand Up @@ -466,7 +478,7 @@ Supported operating systems and NVIDIA GPU Drivers with GPUDirect Storage.
.. note::

Version v2.17.5 and higher of the NVIDIA GPUDirect Storage kernel driver, ``nvidia-fs``,
requires the NVIDIA open kernel modules.
requires the NVIDIA Open GPU Kernel module driver.
You can install the open kernel modules by specifying the ``driver.useOpenKernelModules=true``
argument to the ``helm`` command.
Refer to :ref:`chart customization options` for more information.
Expand Down
5 changes: 4 additions & 1 deletion gpu-operator/release-notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,15 +50,18 @@ New Features

- Run Ubuntu 22.04 and an NVIDIA Linux kernel, such as one provided with a ``linux-nvidia-<x.x>`` package.
- Add ``init_on_alloc=0`` and ``memhp_default_state=online_movable`` as Linux kernel boot parameters.
- Run the NVIDIA Open GPU Kernel module driver.

* Added support for configuring the driver container to use the NVIDIA open kernel modules.
* Added support for configuring the driver container to use the NVIDIA Open GPU Kernel module driver.
Support is limited to installation using the runfile installer.
Support for precompiled driver containers with open kernel modules is not available.

For clusters that use GPUDirect Storage (GDS), beginning with CUDA toolkit 12.2.2 and
the NVIDIA GPUDirect Storage kernel driver version v2.17.5, are only supported
with the open kernel modules.

NVIDIA GH200 Grace Hopper Superchip systems are only supported with the open kernel modules.

- Refer to :ref:`gpu-operator-helm-chart-options` for information about setting
``useOpenKernelModules`` if you manage the driver containers with the NVIDIA cluster policy custom resource definition.
- Refer to :doc:`gpu-driver-configuration` for information about setting ``spec.useOpenKernelModules``
Expand Down

0 comments on commit 2158d65

Please sign in to comment.