The DigitalOcean DCGM-Exporter is a thin wrapper around the DCGM-Exporter.
The following functionality is added
- configuration of a default set of DCGM fields to be monitored. Additional fields can be configured using the
--collectors
flag (like with dcgm-exporter). - forwarding of the collected metrics to the in-droplet accessible DigitalOcean endpoint with static ip
169.254.169.254
. - requirement of a standalone DCGM installation with
nv-hostengine
serving onlocalhost:5555
. This is to avoid conflicts with existingdcgm-exporter
installations.
Exposes a /metrics
endpoint serving the collected Prometheus metrics on port 9401
.
Requires DCGM and NVIDIA drivers to be installed.
Please see the installation documentation.
To build the DigitalOcean DCGM-Exporter manually, please see here.
Please note that there can only be one DCGM installation on a host. This includes an embedded
DCGM process started by the NVIDIA dcgm-exporter.
Hence, to run the DigitalOcean dcgm-exporter next to the NVIDIA dcgm-exporter,
- please create a standalone installation of DCGM.
- configure the NVIDIA dcgm-exporter to connect to the remote
nv-hostengine
serving onlocalhost:5555
(via flag-r localhost:5555
).
The DigitalOcean DCGM-Exporter connects to a nv-hostengine
process serving on localhost:5555
.
The DigitalOcean DCGM-Exporter is a thin wrapper around the DCGM-Exporter. While this has the benefit of being able to reuse functionality, it restricts the DigitalOcean DCGM-Exporter to the boundaries setup by the DCGM-Exporter code. Specifically, variables required for mocking hardware (GPUs, NVSwitches, ...) are not exported.
As a result, this project does not contain test cases covering dcgm-exporter
functionality that requires real hardware.