You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* nodegroups using nodesets - doesn't handle empty nodegroups
* cope with empty nodegroups/partitions
* make gres work again
* make node/partition parameters more greppable
* use features to simplify nodeset configuration
* add nodegroup.features
* add validation
* document nodegroup.features to README
* add better examples in README
* tidy up README
* fix validate task path
* fix lint error
* default partitions to nodegroups to make CI easier
* update molecule tests for openhpc_nodegroups
* remove checks from runtime now validation defined
* fix NodeName= lines missing newlines between them when multiple hostlists within a node group
* remove tests for extra_nodes
* allow missing inventory groups (as per docs) when validating nodegroups
* only run validation once
* remove test14 from CI - extra_nodes feature removed
* update complex test for new group/partition variables
* rename openhpc_partitions.groups -> openhpc_partitions.nodegroups for clarity
* output NodeName hostlists on single line to improve large BM scheduler perf
Copy file name to clipboardExpand all lines: README.md
+152-60
Original file line number
Diff line number
Diff line change
@@ -50,32 +50,53 @@ each list element:
50
50
51
51
### slurm.conf
52
52
53
-
`openhpc_slurm_partitions`: Optional. List of one or more slurm partitions, default `[]`. Each partition may contain the following values:
54
-
*`groups`: If there are multiple node groups that make up the partition, a list of group objects can be defined here.
55
-
Otherwise, `groups` can be omitted and the following attributes can be defined in the partition object:
56
-
*`name`: The name of the nodes within this group.
57
-
*`cluster_name`: Optional. An override for the top-level definition `openhpc_cluster_name`.
58
-
*`extra_nodes`: Optional. A list of additional node definitions, e.g. for nodes in this group/partition not controlled by this role. Each item should be a dict, with keys/values as per the ["NODE CONFIGURATION"](https://slurm.schedmd.com/slurm.conf.html#lbAE) docs for slurm.conf. Note the key `NodeName` must be first.
59
-
*`ram_mb`: Optional. The physical RAM available in each node of this group ([slurm.conf](https://slurm.schedmd.com/slurm.conf.html) parameter `RealMemory`) in MiB. This is set using ansible facts if not defined, equivalent to `free --mebi` total * `openhpc_ram_multiplier`.
60
-
*`ram_multiplier`: Optional. An override for the top-level definition `openhpc_ram_multiplier`. Has no effect if `ram_mb` is set.
53
+
`openhpc_nodegroups`: Optional, default `[]`. List of mappings, each defining a
54
+
unique set of homogenous nodes:
55
+
*`name`: Required. Name of node group.
56
+
*`ram_mb`: Optional. The physical RAM available in each node of this group
in MiB. This is set using ansible facts if not defined, equivalent to
59
+
`free --mebi` total * `openhpc_ram_multiplier`.
60
+
*`ram_multiplier`: Optional. An override for the top-level definition
61
+
`openhpc_ram_multiplier`. Has no effect if `ram_mb` is set.
61
62
*`gres`: Optional. List of dicts defining [generic resources](https://slurm.schedmd.com/gres.html). Each dict must define:
62
63
-`conf`: A string with the [resource specification](https://slurm.schedmd.com/slurm.conf.html#OPT_Gres_1) but requiring the format `<name>:<type>:<number>`, e.g. `gpu:A100:2`. Note the `type` is an arbitrary string.
63
64
-`file`: A string with the [File](https://slurm.schedmd.com/gres.conf.html#OPT_File) (path to device(s)) for this resource, e.g. `/dev/nvidia[0-1]` for the above example.
64
-
65
65
Note [GresTypes](https://slurm.schedmd.com/slurm.conf.html#OPT_GresTypes) must be set in `openhpc_config` if this is used.
66
-
67
-
*`default`: Optional. A boolean flag for whether this partion is the default. Valid settings are `YES` and `NO`.
68
-
*`maxtime`: Optional. A partition-specific time limit following the format of [slurm.conf](https://slurm.schedmd.com/slurm.conf.html) parameter `MaxTime`. The default value is
69
-
given by `openhpc_job_maxtime`. The value should be quoted to avoid Ansible conversions.
70
-
*`partition_params`: Optional. Mapping of additional parameters and values for [partition configuration](https://slurm.schedmd.com/slurm.conf.html#SECTION_PARTITION-CONFIGURATION).
71
-
72
-
For each group (if used) or partition any nodes in an ansible inventory group `<cluster_name>_<group_name>` will be added to the group/partition. Note that:
73
-
- Nodes may have arbitrary hostnames but these should be lowercase to avoid a mismatch between inventory and actual hostname.
74
-
- Nodes in a group are assumed to be homogenous in terms of processor and memory.
75
-
- An inventory group may be empty or missing, but if it is not then the play must contain at least one node from it (used to set processor information).
76
-
77
-
78
-
`openhpc_job_maxtime`: Maximum job time limit, default `'60-0'` (60 days). See [slurm.conf](https://slurm.schedmd.com/slurm.conf.html) parameter `MaxTime` for format. The default is 60 days. The value should be quoted to avoid Ansible conversions.
66
+
*`features`: Optional. List of [Features](https://slurm.schedmd.com/slurm.conf.html#OPT_Features) strings.
67
+
*`node_params`: Optional. Mapping of additional parameters and values for
To deploy, create a playbook which looks like this:
170
-
171
-
---
172
-
- hosts:
173
-
- cluster_login
174
-
- cluster_control
175
-
- cluster_batch
176
-
become: yes
177
-
roles:
178
-
- role: openhpc
179
-
openhpc_enable:
180
-
control: "{{ inventory_hostname in groups['cluster_control'] }}"
181
-
batch: "{{ inventory_hostname in groups['cluster_batch'] }}"
182
-
runtime: true
183
-
openhpc_slurm_service_enabled: true
184
-
openhpc_slurm_control_host: "{{ groups['cluster_control'] | first }}"
185
-
openhpc_slurm_partitions:
186
-
- name: "compute"
187
-
openhpc_cluster_name: openhpc
188
-
openhpc_packages: []
189
-
...
184
+
[hpc_control]
185
+
cluster-control
186
+
```
190
187
188
+
```yaml
189
+
#playbook.yml
190
+
---
191
+
- hosts: all
192
+
become: yes
193
+
tasks:
194
+
- import_role:
195
+
name: stackhpc.openhpc
196
+
vars:
197
+
openhpc_cluster_name: hpc
198
+
openhpc_enable:
199
+
control: "{{ inventory_hostname in groups['cluster_control'] }}"
200
+
batch: "{{ inventory_hostname in groups['cluster_compute'] }}"
201
+
runtime: true
202
+
openhpc_slurm_control_host: "{{ groups['cluster_control'] | first }}"
203
+
openhpc_nodegroups:
204
+
- name: compute
205
+
openhpc_partitions:
206
+
- name: compute
191
207
---
208
+
```
209
+
210
+
### Multiple nodegroups
211
+
212
+
This example shows how partitions can span multiple types of compute node.
213
+
214
+
This example inventory describes three types of compute node (login and
215
+
control nodes are omitted for brevity):
216
+
217
+
```ini
218
+
# inventory/hosts:
219
+
...
220
+
[hpc_general]
221
+
# standard compute nodes
222
+
cluster-general-0
223
+
cluster-general-1
224
+
225
+
[hpc_large]
226
+
# large memory nodes
227
+
cluster-largemem-0
228
+
cluster-largemem-1
229
+
230
+
[hpc_gpu]
231
+
# GPU nodes
232
+
cluster-a100-0
233
+
cluster-a100-1
234
+
...
235
+
```
236
+
237
+
Firstly the `openhpc_nodegroups` is set to capture these inventory groups and
238
+
apply any node-level parameters - in this case the `largemem` nodes have
239
+
2x cores reserved for some reason, and GRES is configured for the GPU nodes:
240
+
241
+
```yaml
242
+
openhpc_cluster_name: hpc
243
+
openhpc_nodegroups:
244
+
- name: general
245
+
- name: large
246
+
node_params:
247
+
CoreSpecCount: 2
248
+
- name: gpu
249
+
gres:
250
+
- conf: gpu:A100:2
251
+
file: /dev/nvidia[0-1]
252
+
```
253
+
254
+
Now two partitions can be configured - a default one with a short timelimit and
255
+
no large memory nodes for testing jobs, and another with all hardware and longer
256
+
job runtime for "production" jobs:
257
+
258
+
```yaml
259
+
openhpc_partitions:
260
+
- name: test
261
+
nodegroups:
262
+
- general
263
+
- gpu
264
+
maxtime: '1:0:0'# 1 hour
265
+
default: 'YES'
266
+
- name: general
267
+
nodegroups:
268
+
- general
269
+
- large
270
+
- gpu
271
+
maxtime: '2-0'# 2 days
272
+
default: 'NO'
273
+
```
274
+
Users will select the partition using `--partition` argument and request nodes
275
+
with appropriate memory or GPUs using the `--mem` and `--gres` or `--gpus*`
276
+
options for `sbatch` or `srun`.
277
+
278
+
Finally here some additional configuration must be provided for GRES:
279
+
```yaml
280
+
openhpc_config:
281
+
GresTypes:
282
+
-gpu
283
+
```
192
284
193
285
<b id="slurm_ver_footnote">1</b> Slurm 20.11 removed `accounting_storage/filetxt` as an option. This version of Slurm was introduced in OpenHPC v2.1 but the OpenHPC repos are common to all OpenHPC v2.x releases. [↩](#accounting_storage)
0 commit comments