Skip to content

Perform Slurm database upgrade if necessary #670

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
May 15, 2025
Merged

Conversation

sjpb
Copy link
Collaborator

@sjpb sjpb commented May 14, 2025

Integrates stackhpc/ansible-role-openhpc#186 to upgrade Slurm database if necessary. A snapshot of the state volume is taken before upgrading, timestamped with the current date/time.

@sjpb
Copy link
Collaborator Author

sjpb commented May 14, 2025

An image rebuild is required due to mysql client installs.

@sjpb
Copy link
Collaborator Author

sjpb commented May 14, 2025

@sjpb
Copy link
Collaborator Author

sjpb commented May 14, 2025

Caution

CI run shows no DB upgrade is happening, even though it is required - truncated output

TASK [stackhpc.openhpc : Check if slurm database requires an upgrade] **********
Wednesday 14 May 2025  16:07:10 +0000 (0:00:00.883)       0:07:19.982 ********* 
skipping: [slurmci-RL9-2440-compute-0] => {
skipping: [slurmci-RL9-2440-compute-1] => {
skipping: [slurmci-RL9-2440-login-0] => {
ok: [slurmci-RL9-2440-control] => {
    "changed": false,
    "cmd": [
        "slurmdbd",
        "-u"
    ],
    "delta": "0:00:00.085638",
    "end": "2025-05-14 16:07:11.005165",
    "failed_when_result": false,
    "rc": 1,
    "start": "2025-05-14 16:07:10.919527"
}

STDOUT:

Slurm Database current version is '15' needs to be at '16'. Conversion needed.


TASK [stackhpc.openhpc : Set fact for slurm database upgrade] ******************
Wednesday 14 May 2025  16:07:11 +0000 (0:00:00.740)       0:07:20.722 ********* 
ok: [slurmci-RL9-2440-control] => {
    "ansible_facts": {
        "_openhpc_slurmdb_upgrade": false
    },
    "changed": false
}

@sjpb
Copy link
Collaborator Author

sjpb commented May 14, 2025

Ok have pushed stackhpc.openhpc to [feat/upgrade-db a3275a7] fix upgrade logic, will rerun once above have cancelled.

@sjpb
Copy link
Collaborator Author

sjpb commented May 15, 2025

Pushed again to [feat/upgrade-db 3050291] fix upgrade logic, then merge with PR review fix so 4th CI run above now at

commit 1a55f61305c34ec9c7c504f28fb7fefaaf341da4 (HEAD -> feat/upgrade-db, origin/feat/upgrade-db)
Merge: 3050291 9d3143f
Author: Steve Brasier <[email protected]>
Date:   Thu May 15 08:41:49 2025 +0000

    Merge branch 'feat/upgrade-db' of github.com:stackhpc/ansible-role-openhpc into feat/upgrade-db
(venv) [rocky@steveb-dev stackhpc.openhpc]$ 

@sjpb
Copy link
Collaborator Author

sjpb commented May 15, 2025

At this commit, on RL9:

  • Initial deploy on RL9 skipped the "Check if slurm database requires an upgrade" step and set _openhpc_slurmdb_upgrade: false as expected
  • Rerun found slurmdbd active on control node, skipped all subsequent upgrade logic, as expected.
    So that looks good.

@sjpb
Copy link
Collaborator Author

sjpb commented May 15, 2025

Ok CI rerun #4 above now correctly shows on current branch (edited for brevity):

TASK [stackhpc.openhpc : Check if slurm database requires an upgrade] **********
ok: [slurmci-RL9-2440-control] => {
    "changed": false,
    "cmd": [
        "slurmdbd",
        "-u"
    ],
    "delta": "0:00:00.045979",
    "end": "2025-05-15 09:09:24.553876",
    "failed_when_result": false,
    "rc": 1,
    "start": "2025-05-15 09:09:24.507897"
}
STDOUT:
Slurm Database current version is '15' needs to be at '16'. Conversion needed.
MSG:
non-zero return code

TASK [stackhpc.openhpc : Set fact for slurm database upgrade] ******************
Thursday 15 May 2025  09:09:24 +0000 (0:00:00.859)       0:07:41.921 ********** 
ok: [slurmci-RL9-2440-control] => {
    "ansible_facts": {
        "_openhpc_slurmdb_upgrade": true
    },
    "changed": false
}

TASK [stackhpc.openhpc : Backup Slurm database] ********************************
changed: [slurmci-RL9-2440-control -> localhost] => {
    "changed": true,
    "cmd": "openstack volume snapshot create --volume slurmci-RL9-2440-state --force slurmci-RL9-2440-state-20250515T090414",

TASK [stackhpc.openhpc : Run slurmdbd in foreground for upgrade] ***************
...
slurmdbd: debug2: accounting_storage/as_mysql: as_mysql_roll_usage: Everything rolled up
...

and then slurm starts as normal, so that all looks good.

@sjpb sjpb marked this pull request as ready for review May 15, 2025 09:26
@sjpb sjpb requested a review from a team as a code owner May 15, 2025 09:26
Copy link
Collaborator

@m-bull m-bull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending the dependent PR

@sjpb
Copy link
Collaborator Author

sjpb commented May 15, 2025

Checked that RL9 actually did upgrade at this commit.

@sjpb sjpb requested a review from m-bull May 15, 2025 13:40
@sjpb sjpb merged commit d082d32 into main May 15, 2025
7 checks passed
@sjpb sjpb deleted the feat/slurmdbd-upgrade branch May 15, 2025 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants