Skip to content

Add group.node_params to partitions/groups. #182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 9, 2025

Conversation

kbendl
Copy link

@kbendl kbendl commented May 5, 2025

Allows Features, etc., to be added to partitions.
Also added #comment lines for readability to highlight partitions/groups sections.

Allows Features, etc., to be added to partitions.
@kbendl kbendl requested a review from a team as a code owner May 5, 2025 18:43
@sjpb
Copy link
Collaborator

sjpb commented May 7, 2025

@jovial this was my suggestion for adding arbitrary node parameters. But having seen your #181 , should we actually be adding node parameters on a groupvar level, rather than in partitions, to avoid the same clash issue with nodes in multiple partitions?

@jovial
Copy link
Contributor

jovial commented May 7, 2025

@jovial this was my suggestion for adding arbitrary node parameters. But having seen your #181 , should we actually be adding node parameters on a groupvar level, rather than in partitions, to avoid the same clash issue with nodes in multiple partitions?

I personally think that using group and host vars would be a more natural way of setting per host configuration options. I am now wondering if that is consistent with what we currently do with gres which is a property on the NodeName line, but defined as part of the partition definition. To switch to the host/group vars approach I think we'd need to support something like:

# group_vars/inventory/a100/openhpc

# fallback to value defined on partition if this isn't defined for backwards compatibility
openhpc_gres:
    - conf: "gpu:A100:2"
      file:  /dev/nvidia[0-1]

Are there any other inconsistencies like this?

So you might argue that this PR is consistent with the current approach and I should change mine to work similarly. Like you commented on my PR, I could make mine error out if a inconsistency was detected. On the other hand, this could be a good opportunity to change path before we describe too many properties in the partition definition if we think that using ansible inventory is the better approach.

If using host_vars/groups vars, it could look something like this to the end user:

# group_vars/inventory/a100/openhpc
openhpc_node_params:
    Features: "gpu,a100"

@sjpb
Copy link
Collaborator

sjpb commented May 7, 2025

Using inventory does seem more "correct", TBH. I think the complication is that actually the inventory groups referenced in openhpc_slurm_partitions actually have openhpc_cluster_name + '_' prefix. Which is annoying in the appliance where the openhpc_cluster_name is usually used to distinguish staging/production. Obvs per-node vars could be defined for inventory groups which don't have those prefixes, but currently TF doesn't auto-define those groups (I was wondering if it should for another client where we need to set vars for os-level things on the tf group anyway).

@sjpb
Copy link
Collaborator

sjpb commented May 7, 2025

I don't think I mind about removing gres from the partition config being backward incompatible at least (although I do generally try hard to avoid that). Between the people in this thread we've got awareness of all appliances using it.

@sjpb sjpb changed the base branch from feat/no-ohpc to feat/no-ohpc-node_params May 9, 2025 09:29
@sjpb sjpb merged commit 64ed86c into stackhpc:feat/no-ohpc-node_params May 9, 2025
4 of 34 checks passed
@sjpb
Copy link
Collaborator

sjpb commented May 9, 2025

Merging so I can fix CI

sjpb added a commit that referenced this pull request May 9, 2025
* Add group.node_params to partitions/groups. (#182)

Allows Features, etc., to be added to partitions.

* update SelectType from legacy to current default (#167)

---------

Co-authored-by: Kurt Bendl <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants