Skip to content

Separate node and partition configuration #183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
May 13, 2025
Merged

Separate node and partition configuration #183

merged 23 commits into from
May 13, 2025

Conversation

sjpb
Copy link
Collaborator

@sjpb sjpb commented May 7, 2025

Replaces openhpc_slurm_partitions with openhpc_nodegroups and openhpc_partitions, to make complex partition configurations correct. Previously it would have been possible to to defined nodes twice, with different parameters, and only the first occurance would have been used.

As part of these changes this:

  • Adds nodegroup.node_params to allow specifying additional Node-level parameters
  • Adds nodegroup.features to allow specifying additional Node features
  • Removes openhpc_slurm_partitions.cluster_name
  • Removes openhpc_slurm_partitions.extra_nodes

@sjpb sjpb requested a review from a team as a code owner May 7, 2025 13:17
@sjpb
Copy link
Collaborator Author

sjpb commented May 7, 2025

In the slurm appliance, it needs the default openhpc config and the tf-templated one to be something like:

openhpc_nodegroups:
  - name: compute
openhpc_partitions:
  - name: compute

but, maybe, the latter should actually be in common env as openhpc_partitions: "{{ openhpc_nodegroups }}" to give a partition per nodegroup by default, with no further config.

@jovial
Copy link
Contributor

jovial commented May 7, 2025

Does it resolve the issues with autoconfiguration?

This is definitely workable and seems better than describing properties of nodes on the partition (which could be configured inconsistently when nodes were members of multiple partitions.

Should openhpc_nodegroups actually be called openhpc_nodesets

+1 for consistency

Do we want to extend this with an alternative way of overriding the "top-level" parameters, to make all this breaking change visible in one PR (or, we just fixup release notes after - that's probably better).`

Could you give an example? Just trying to figure out exactly what you mean here.

Do we want to add a validate task which checks no nodegroups overlap hosts? Shouldn't be too hard, I think.

Always good to fail early

@sjpb
Copy link
Collaborator Author

sjpb commented May 8, 2025

Actually @jovial if we're using Features/NodeSets internally we can't let people do openhpc_nodegroups[*].params.Features b/c that'll break our templating so I will have to add an openhpc_nodegroups[*].features which is additive to the nodegroup features.

edit: now done

@sjpb sjpb force-pushed the feat/nodegroups branch from f00d351 to 8d0b617 Compare May 8, 2025 12:56
@sjpb sjpb force-pushed the feat/nodegroups branch from 8d0b617 to 6dabb2f Compare May 8, 2025 13:01
@sjpb sjpb requested a review from jovial May 8, 2025 16:42
@priteau priteau self-requested a review May 9, 2025 09:44
jovial
jovial previously approved these changes May 12, 2025
Copy link
Contributor

@jovial jovial left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look sensible to me. Would be interested to hear feedback from people setting per node params and if it works for them.

@sjpb
Copy link
Collaborator Author

sjpb commented May 13, 2025

@priteau do you want to look at this for Will's comment:

Would be interested to hear feedback from people setting per node params and if it works for them.

@sjpb sjpb requested a review from jovial May 13, 2025 08:02
@sjpb sjpb merged commit 0dec9d8 into master May 13, 2025
29 checks passed
@sjpb sjpb deleted the feat/nodegroups branch May 13, 2025 08:19
sjpb added a commit that referenced this pull request May 14, 2025
@sjpb
Copy link
Collaborator Author

sjpb commented May 15, 2025

This was reverted in ed717da to permit #186 to be merged "first".

sjpb added a commit that referenced this pull request May 15, 2025
sjpb added a commit that referenced this pull request May 16, 2025
* re-add "Separate node and partition configuration (#183)"

This reverts commit ed717da.

* readme tweak for clarity
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants