You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+70-67Lines changed: 70 additions & 67 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -68,12 +68,20 @@ unique set of homogenous nodes:
68
68
`free --mebi` total * `openhpc_ram_multiplier`.
69
69
*`ram_multiplier`: Optional. An override for the top-level definition
70
70
`openhpc_ram_multiplier`. Has no effect if `ram_mb` is set.
71
-
*`gres_autodetect`: Optional. The [auto detection mechanism](https://slurm.schedmd.com/gres.conf.html#OPT_AutoDetect) to use for the generic resources. Note: you must still define the `gres` dictionary (see below) but you only need the define the `conf` key. See [GRES autodetection](#gres-autodetection) section below.
72
-
*`gres`: Optional. List of dicts defining [generic resources](https://slurm.schedmd.com/gres.html). Each dict should define:
73
-
-`conf`: A string with the [resource specification](https://slurm.schedmd.com/slurm.conf.html#OPT_Gres_1) but requiring the format `<name>:<type>:<number>`, e.g. `gpu:A100:2`. Note the `type` is an arbitrary string.
74
-
-`file`: Omit if `gres_autodetect` is set. A string with the [File](https://slurm.schedmd.com/gres.conf.html#OPT_File) (path to device(s)) for this resource, e.g. `/dev/nvidia[0-1]` for the above example.
75
-
76
-
Note [GresTypes](https://slurm.schedmd.com/slurm.conf.html#OPT_GresTypes) must be set in `openhpc_config` if this is used.
71
+
*`gres_autodetect`: Optional. The [hardware autodetection mechanism](https://slurm.schedmd.com/gres.conf.html#OPT_AutoDetect)
72
+
to use for [generic resources](https://slurm.schedmd.com/gres.html).
73
+
**NB:** A value of `'off'` (the default) must be quoted to avoid yaml
74
+
conversion to `false`.
75
+
*`gres`: Optional. List of dicts defining [generic resources](https://slurm.schedmd.com/gres.html).
76
+
Not required if using `nvml` GRES autodetection. Keys/values in dicts are:
77
+
-`conf`: A string defining the [resource specification](https://slurm.schedmd.com/slurm.conf.html#OPT_Gres_1)
78
+
in the format `<name>:<type>:<number>`, e.g. `gpu:A100:2`.
79
+
-`file`: A string defining device path(s) as per [File](https://slurm.schedmd.com/gres.conf.html#OPT_File),
80
+
e.g. `/dev/nvidia[0-1]`. Not required if using any GRES autodetection.
81
+
82
+
Note [GresTypes](https://slurm.schedmd.com/slurm.conf.html#OPT_GresTypes) is
83
+
automatically set from the defined GRES or GRES autodetection. See [GRES Configuration](#gres-configuration)
84
+
for more discussion.
77
85
*`features`: Optional. List of [Features](https://slurm.schedmd.com/slurm.conf.html#OPT_Features) strings.
78
86
*`node_params`: Optional. Mapping of additional parameters and values for
**NB:** This should be quoted to avoid Ansible conversions.
@@ -278,7 +290,7 @@ cluster-control
278
290
279
291
This example shows how partitions can span multiple types of compute node.
280
292
281
-
This example inventory describes three types of compute node (login and
293
+
Assume an inventory containing two types of compute node (login and
282
294
control nodes are omitted for brevity):
283
295
284
296
```ini
@@ -293,17 +305,12 @@ cluster-general-1
293
305
# large memory nodes
294
306
cluster-largemem-0
295
307
cluster-largemem-1
296
-
297
-
[hpc_gpu]
298
-
# GPU nodes
299
-
cluster-a100-0
300
-
cluster-a100-1
301
308
...
302
309
```
303
310
304
-
Firstly the `openhpc_nodegroups`is set to capture these inventory groups and
305
-
apply any node-level parameters - in this case the `largemem` nodes have
306
-
2x cores reserved for some reason, and GRES is configured for the GPU nodes:
311
+
Firstly `openhpc_nodegroups`maps to these inventory groups and applies any
312
+
node-level parameters - in this case the `largemem` nodes have 2x cores
313
+
reserved for some reason:
307
314
308
315
```yaml
309
316
openhpc_cluster_name: hpc
@@ -312,104 +319,100 @@ openhpc_nodegroups:
312
319
- name: large
313
320
node_params:
314
321
CoreSpecCount: 2
315
-
- name: gpu
316
-
gres:
317
-
- conf: gpu:A100:2
318
-
file: /dev/nvidia[0-1]
319
322
```
320
-
or if using the NVML gres_autodection mechamism (NOTE: this requires recompilation of the slurm binaries to link against the [NVIDIA Management libray](#gres-autodetection)):
321
323
322
-
```yaml
323
-
openhpc_cluster_name: hpc
324
-
openhpc_nodegroups:
325
-
- name: general
326
-
- name: large
327
-
node_params:
328
-
CoreSpecCount: 2
329
-
- name: gpu
330
-
gres_autodetect: nvml
331
-
gres:
332
-
- conf: gpu:A100:2
333
-
```
334
-
Now two partitions can be configured - a default one with a short timelimit and
335
-
no large memory nodes for testing jobs, and another with all hardware and longer
336
-
job runtime for "production" jobs:
324
+
Now two partitions can be configured using `openhpc_partitions`: A default
325
+
partition for testing jobs with a short timelimit and no large memory nodes,
326
+
and another partition with all hardware and longer job runtime for "production"
327
+
jobs:
337
328
338
329
```yaml
339
330
openhpc_partitions:
340
331
- name: test
341
332
nodegroups:
342
333
- general
343
-
- gpu
344
334
maxtime: '1:0:0' # 1 hour
345
335
default: 'YES'
346
336
- name: general
347
337
nodegroups:
348
338
- general
349
339
- large
350
-
- gpu
351
340
maxtime: '2-0' # 2 days
352
341
default: 'NO'
353
342
```
354
343
Users will select the partition using `--partition` argument and request nodes
355
-
with appropriate memory or GPUs using the `--mem` and `--gres` or `--gpus*`
356
-
options for `sbatch` or `srun`.
344
+
with appropriate memory using the `--mem` option for `sbatch` or `srun`.
357
345
358
-
Finally here some additional configuration must be provided for GRES:
359
-
```yaml
360
-
openhpc_config:
361
-
GresTypes:
362
-
-gpu
363
-
```
346
+
## GRES Configuration
364
347
365
-
## GRES autodetection
348
+
### Autodetection
366
349
367
-
Some autodetection mechanisms require recompilation of the slurm packages to
368
-
link against external libraries. Examples are shown in the sections below.
350
+
Some autodetection mechanisms require recompilation of Slurm packages to link
351
+
against external libraries. Examples are shown in the sections below.
369
352
370
-
### Recompiling slurm binaries against the [NVIDIA Management libray](https://developer.nvidia.com/management-library-nvml)
353
+
#### Recompiling Slurm binaries against the [NVIDIA Management library](https://developer.nvidia.com/management-library-nvml)
371
354
372
-
This will allow you to use `gres_autodetect: nvml` in your `nodegroup`
373
-
definitions.
355
+
This allows using `openhpc_gres_autodetect: nvml` or `openhpc_nodegroup.gres_autodetect: nvml`.
374
356
375
357
First, [install the complete cuda toolkit from NVIDIA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/).
376
358
You can then recompile the slurm packages from the source RPMS as follows:
377
359
378
360
```sh
379
361
dnf download --source slurm-slurmd-ohpc
380
-
381
362
rpm -i slurm-ohpc-*.src.rpm
382
-
383
363
cd /root/rpmbuild/SPECS
384
-
385
364
dnf builddep slurm.spec
386
-
387
365
rpmbuild -bb -D "_with_nvml --with-nvml=/usr/local/cuda-12.8/targets/x86_64-linux/" slurm.spec | tee /tmp/build.txt
388
366
```
389
367
390
368
NOTE: This will need to be adapted for the version of CUDA installed (12.8 is used in the example).
391
369
392
-
The RPMs will be created in ` /root/rpmbuild/RPMS/x86_64/`. The method to distribute these RPMs to
393
-
each compute node is out of scope of this document. You can either use a custom package repository
394
-
or simply install them manually on each node with Ansible.
370
+
The RPMs will be created in `/root/rpmbuild/RPMS/x86_64/`. The method to distribute these RPMs to
371
+
each compute node is out of scope of this document.
395
372
396
-
#### Configuration example
373
+
## GRES configuration examples
397
374
398
-
A configuration snippet is shown below:
375
+
For NVIDIA GPUs, `nvml` GRES autodetection can be used. This requires:
376
+
- The relevant GPU nodes to have the `nvidia-smi` binary installed
377
+
- Slurm to be compiled against the NVIDIA management library as above
378
+
379
+
Autodetection can then be enabled using either for all nodegroups:
399
380
400
381
```yaml
401
-
openhpc_cluster_name: hpc
382
+
openhpc_gres_autodetection: nvml
383
+
```
384
+
385
+
or for individual nodegroups e.g:
386
+
```yaml
387
+
openhpc_nodegroups:
388
+
- name: example
389
+
gres_autodetection: nvml
390
+
...
391
+
```
392
+
393
+
In either case no additional configuration of GRES is required. Any nodegroups
394
+
with NVIDIA GPUs will automatically get `gpu` GRES defined for all GPUs found.
395
+
GPUs within a node do not need to be the same model but nodes in a nodegroup
396
+
must be homogenous. GRES types are set to the autodetected model names e.g. `H100`.
397
+
398
+
For `nvml` GRES autodetection per-nodegroup `gres_autodetection` and/or `gres` keys
399
+
can be still be provided. These can be used to disable/override the default
400
+
autodetection method, or to allow checking autodetected resources against
401
+
expectations as described by [gres.conf documentation](https://slurm.schedmd.com/gres.conf.html).
402
+
403
+
Without any autodetection, a GRES configuration for NVIDIA GPUs might be:
404
+
405
+
```
402
406
openhpc_nodegroups:
403
407
- name: general
404
-
- name: large
405
-
node_params:
406
-
CoreSpecCount: 2
407
408
- name: gpu
408
-
gres_autodetect: nvml
409
409
gres:
410
-
- conf: gpu:A100:2
410
+
- conf: gpu:H200:2
411
+
file: /dev/nvidia[0-1]
411
412
```
412
-
for additional context refer to the GPU example in: [Multiple Nodegroups](#multiple-nodegroups).
413
413
414
+
Note that the `nvml` autodetection is special in this role. Other autodetection
415
+
mechanisms, e.g. `nvidia` or `rsmi` allow the `gres.file:` specification to be
416
+
omitted but still require `gres.conf:` to be defined.
414
417
415
418
<b id="slurm_ver_footnote">1</b> Slurm 20.11 removed `accounting_storage/filetxt` as an option. This version of Slurm was introduced in OpenHPC v2.1 but the OpenHPC repos are common to all OpenHPC v2.x releases. [↩](#accounting_storage)
Copy file name to clipboardExpand all lines: defaults/main.yml
+12Lines changed: 12 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -12,6 +12,7 @@ openhpc_packages:
12
12
openhpc_resume_timeout: 300
13
13
openhpc_retry_delay: 10
14
14
openhpc_job_maxtime: '60-0'# quote this to avoid ansible converting some formats to seconds, which is interpreted as minutes by Slurm
15
+
openhpc_gres_autodetect: 'off'
15
16
openhpc_default_config:
16
17
# This only defines values which are not Slurm defaults
17
18
SlurmctldHost: "{{ openhpc_slurm_control_host }}{% if openhpc_slurm_control_host_address is defined %}({{ openhpc_slurm_control_host_address }}){% endif %}"
@@ -40,6 +41,7 @@ openhpc_default_config:
40
41
PropagateResourceLimitsExcept: MEMLOCK
41
42
Epilog: /etc/slurm/slurm.epilog.clean
42
43
ReturnToService: 2
44
+
GresTypes: "{{ ohpc_gres_types if ohpc_gres_types != '' else 'omit' }}"
43
45
openhpc_cgroup_default_config:
44
46
ConstrainCores: "yes"
45
47
ConstrainDevices: "yes"
@@ -48,6 +50,16 @@ openhpc_cgroup_default_config:
48
50
49
51
openhpc_config: {}
50
52
openhpc_cgroup_config: {}
53
+
ohpc_gres_types: >-
54
+
{{
55
+
(
56
+
['gpu'] if openhpc_gres_autodetect == 'nvml' else [] +
0 commit comments