Skip to content

Commit 402ca70

Browse files
Matt Pryorsjpb
Matt Pryor
andauthored
Changes to support explicit quota checks (#348)
* Changes to support explicit quota checks * Add count parameter to compute flavor --------- Co-authored-by: Steve Brasier <[email protected]>
1 parent 2d78fc9 commit 402ca70

File tree

4 files changed

+66
-29
lines changed

4 files changed

+66
-29
lines changed
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
---
22

3-
# Set Prometheus storage retention size
4-
prometheus_storage_retention_size: "{{ metrics_db_maximum_size }}GB"
3+
# We reserve 10GB of the state volume for cluster state, the rest is for metrics
4+
prometheus_storage_retention_size: "{{ state_volume_size - 10 }}GB"

environments/.caas/inventory/group_vars/openstack.yml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,6 @@ terraform_project_path: "{{ playbook_dir }}/terraform"
1616
terraform_state: "{{ cluster_state | default('present') }}"
1717
cluster_ssh_user: rocky
1818

19-
# Set the size of the state volume to metrics_db_maximum_size + 10
20-
state_volume_size: "{{ metrics_db_maximum_size + 10 }}"
21-
2219
# Provision a single "standard" compute partition using the supplied
2320
# node count and flavor
2421
openhpc_slurm_partitions:

environments/.caas/ui-meta/slurm-infra-fast-volume-type.yml

Lines changed: 32 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,24 @@ parameters:
1212
kind: cloud.ip
1313
immutable: true
1414

15+
- name: login_flavor
16+
label: Login node size
17+
description: The size to use for the login node.
18+
kind: cloud.size
19+
immutable: true
20+
options:
21+
min_ram: 2048
22+
min_disk: 20
23+
24+
- name: control_flavor
25+
label: Control node size
26+
description: The size to use for the control node.
27+
kind: cloud.size
28+
immutable: true
29+
options:
30+
min_ram: 2048
31+
min_disk: 20
32+
1533
- name: compute_count
1634
label: Compute node count
1735
description: The number of compute nodes in the cluster.
@@ -23,16 +41,17 @@ parameters:
2341
- name: compute_flavor
2442
label: Compute node size
2543
description: The size to use for the compute node.
26-
kind: "cloud.size"
44+
kind: cloud.size
2745
immutable: true
2846
options:
47+
count_parameter: compute_count
2948
min_ram: 2048
3049
min_disk: 20
3150

3251
- name: home_volume_size
3352
label: Home volume size (GB)
34-
description: The size of the cloud volume to use for home directories
35-
kind: integer
53+
description: The size of the cloud volume to use for home directories.
54+
kind: cloud.volume_size
3655
immutable: true
3756
options:
3857
min: 10
@@ -51,19 +70,20 @@ parameters:
5170
options:
5271
checkboxLabel: Put home directories on high-performance storage?
5372

54-
- name: metrics_db_maximum_size
55-
label: Metrics database size (GB)
73+
- name: state_volume_size
74+
label: State volume size (GB)
5675
description: |
76+
The size of the state volume, used to hold and persist important files and data. Of
77+
this volume, 10GB is set aside for cluster state and the remaining space is used
78+
to store cluster metrics.
79+
5780
The oldest metrics records in the [Prometheus](https://prometheus.io/) database will be
58-
discarded to ensure that the database does not grow larger than this size.
59-
60-
**A cloud volume of this size +10GB will be created to hold and persist the metrics
61-
database and important Slurm files.**
62-
kind: integer
81+
discarded to ensure that the database does not grow larger than this volume.
82+
kind: cloud.volume_size
6383
immutable: true
6484
options:
65-
min: 10
66-
default: 10
85+
min: 20
86+
default: 20
6787

6888
- name: cluster_run_validation
6989
label: Post-configuration validation

environments/.caas/ui-meta/slurm-infra.yml

Lines changed: 32 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,24 @@ parameters:
1212
kind: cloud.ip
1313
immutable: true
1414

15+
- name: login_flavor
16+
label: Login node size
17+
description: The size to use for the login node.
18+
kind: cloud.size
19+
immutable: true
20+
options:
21+
min_ram: 2048
22+
min_disk: 20
23+
24+
- name: control_flavor
25+
label: Control node size
26+
description: The size to use for the control node.
27+
kind: cloud.size
28+
immutable: true
29+
options:
30+
min_ram: 2048
31+
min_disk: 20
32+
1533
- name: compute_count
1634
label: Compute node count
1735
description: The number of compute nodes in the cluster.
@@ -23,34 +41,36 @@ parameters:
2341
- name: compute_flavor
2442
label: Compute node size
2543
description: The size to use for the compute node.
26-
kind: "cloud.size"
44+
kind: cloud.size
2745
immutable: true
2846
options:
47+
count_parameter: compute_count
2948
min_ram: 2048
3049
min_disk: 20
3150

3251
- name: home_volume_size
3352
label: Home volume size (GB)
34-
description: The size of the cloud volume to use for home directories
35-
kind: integer
53+
description: The size of the cloud volume to use for home directories.
54+
kind: cloud.volume_size
3655
immutable: true
3756
options:
3857
min: 10
3958
default: 100
4059

41-
- name: metrics_db_maximum_size
42-
label: Metrics database size (GB)
60+
- name: state_volume_size
61+
label: State volume size (GB)
4362
description: |
63+
The size of the state volume, used to hold and persist important files and data. Of
64+
this volume, 10GB is set aside for cluster state and the remaining space is used
65+
to store cluster metrics.
66+
4467
The oldest metrics records in the [Prometheus](https://prometheus.io/) database will be
45-
discarded to ensure that the database does not grow larger than this size.
46-
47-
**A cloud volume of this size +10GB will be created to hold and persist the metrics
48-
database and important Slurm files.**
49-
kind: integer
68+
discarded to ensure that the database does not grow larger than this volume.
69+
kind: cloud.volume_size
5070
immutable: true
5171
options:
52-
min: 10
53-
default: 10
72+
min: 20
73+
default: 20
5474

5575
- name: cluster_run_validation
5676
label: Post-configuration validation

0 commit comments

Comments
 (0)