|
2 | 2 |
|
3 | 3 | # stackhpc.openhpc |
4 | 4 |
|
5 | | -This Ansible role is used to install the necessary packages to have a fully functional OpenHPC cluster. |
| 5 | +This Ansible role installs packages and performs configuration to provide a fully functional OpenHPC cluster. It can also be used to drain and resume nodes. |
| 6 | + |
| 7 | +As a role it must be used from a playbook, for which a simple example is given below. This approach means it is totally modular with no assumptions about available networks or any cluster features except for some hostname conventions. Any desired cluster fileystem or other required functionality may be freely integrated using additional Ansible roles or other approaches. |
6 | 8 |
|
7 | 9 | Role Variables |
8 | 10 | -------------- |
9 | 11 |
|
10 | | -`openhpc_slurm_service_enabled`: checks whether `openhpc_slurm_service` is enabled |
11 | | - |
12 | | -`openhpc_slurm_service`: name of the slurm service e.g. `slurmd` |
| 12 | +`openhpc_slurm_service_enabled`: boolean, whether to enable the appropriate slurm service (slurmd/slurmctld) |
13 | 13 |
|
14 | 14 | `openhpc_slurm_control_host`: ansible host name of the controller e.g `"{{ groups['cluster_control'] | first }}"` |
15 | 15 |
|
16 | 16 | `openhpc_slurm_partitions`: list of one or more slurm partitions. Each partition may contain the following values: |
17 | 17 | * `groups`: If there are multiple node groups that make up the partition, a list of group objects can be defined here. |
18 | | - Otherwise, `groups` can be omitted and the following attributes can be defined in the partition object. |
| 18 | + Otherwise, `groups` can be omitted and the following attributes can be defined in the partition object: |
19 | 19 | * `name`: The name of the nodes within this group. |
20 | 20 | * `cluster_name`: Optional. An override for the top-level definition `openhpc_cluster_name`. |
21 | 21 | * `num_nodes`: Nodes within the group are assumed to number `0:num_nodes-1`. |
22 | | - * `ram_mb`: Optional. The physical RAM available in each server of this group. |
23 | | - Compute node hostnames are assumed to take the form: `cluster_name-group_name-{0..num_nodes-1}` |
24 | | -* `default`: Optional. A boolean flag for whether this partion. Valid settings are `YES` and `NO`. |
25 | | -* `maxtime`: Optional. A partition-specific time limit in hours, minutes and seconds. The default value is |
26 | | - `openhpc_job_maxtime`, which defaults to `24:00:00`. |
| 22 | + * `ram_mb`: Optional. The physical RAM available in each server of this group ([slurm.conf](https://slurm.schedmd.com/slurm.conf.html) parameter `RealMemory`). |
| 23 | + |
| 24 | + For each group (if used) or partition there must be an ansible inventory group `cluster_name-group_name`. The compute nodes in this group must have hostnames in the form `cluster_name-group_name-{0..num_nodes-1}`. |
| 25 | + |
| 26 | +* `default`: Optional. A boolean flag for whether this partion is the default. Valid settings are `YES` and `NO`. |
| 27 | +* `maxtime`: Optional. A partition-specific time limit in hours, minutes and seconds ([slurm.conf](https://slurm.schedmd.com/slurm.conf.html) parameter `MaxTime`). The default value is |
| 28 | + given by `openhpc_job_maxtime`. |
27 | 29 |
|
28 | 30 | `openhpc_job_maxtime`: A maximum time job limit in hours, minutes and seconds. The default is `24:00:00`. |
29 | 31 |
|
@@ -80,10 +82,7 @@ To deploy, create a playbook which looks like this: |
80 | 82 | openhpc_slurm_control_host: "{{ groups['cluster_control'] | first }}" |
81 | 83 | openhpc_slurm_partitions: |
82 | 84 | - name: "compute" |
83 | | - flavor: "compute-A" |
84 | | - image: "CentOS7.5-OpenHPC" |
85 | 85 | num_nodes: 8 |
86 | | - user: "centos" |
87 | 86 | openhpc_cluster_name: openhpc |
88 | 87 | openhpc_packages: [] |
89 | 88 | ... |
|
0 commit comments