Skip to content

ch4: use MPIR_pmi_allgather_group in address exchange #7392

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Apr 30, 2025

Conversation

hzhou
Copy link
Contributor

@hzhou hzhou commented Apr 21, 2025

Pull Request Description

True session requires address exchange within a group of processes. This PR supports it by -

  • Add PMI_Barrier_group to PMI-1.2 API
  • Support PMI-1.2 in hydra
  • Add MPIR_pmi_allgather_group
    • PMI-1 uses PMI_Barrier_group
    • PMIx supports it with PMIx_Fence over an explicit set of procs
    • PMI-2 results in error
  • Add MPIDIU_bc_exchange_node_roots to avoid duplicate code in each netmod
  • Support group level address exchange in OFI and UCX

[skip warnings]

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

@hzhou hzhou mentioned this pull request Apr 21, 2025
4 tasks
@hzhou hzhou force-pushed the 2504_pmi_group branch 5 times, most recently from 19acb5b to b4dabef Compare April 22, 2025 01:50
@hzhou
Copy link
Contributor Author

hzhou commented Apr 22, 2025

test:mpich/ch3/most
test:mpich/ch4/most

@hzhou hzhou force-pushed the 2504_pmi_group branch 4 times, most recently from 75eb799 to 7186b4e Compare April 24, 2025 12:03
@hzhou
Copy link
Contributor Author

hzhou commented Apr 24, 2025

test:mpich/pmi

@hzhou
Copy link
Contributor Author

hzhou commented Apr 24, 2025

test:mpich/ch3/most
test:mpich/ch4/most

@hzhou
Copy link
Contributor Author

hzhou commented Apr 25, 2025

test:mpich/ch4/most

@hzhou hzhou requested a review from raffenet April 28, 2025 19:25
Copy link
Contributor

@raffenet raffenet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments but no blockers

Comment on lines +214 to +222
/* convert the int array into a comma-separated int list */
group_str = MPL_malloc(nprocs * 8, MPL_MEM_OTHER); /* assume each integer fits in 8 chars */
char *s = group_str;
for (int i = 0; i < nprocs; i++) {
int n = sprintf(s, "%d,", procs[i].rank);
s += n;
}
/* overwrite the last comma */
s[-1] = '\0';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fairly close to MPL_strjoin but since it starts with an int array its not quite a fit. could extend the join function down the road to cover this use-case. nothing to do right now.

hzhou added 12 commits April 30, 2025 14:35
Skip optional attributes when the value is the default. This provide
better backward compatibility as we won't generate unrecognized
attributes if the *new* optional features are not used.
In order to support new PMI features and also support backward
compatibility, we need track server PMI versions from the init response.
PMI_Barrier_group run a barrier within a group of processes specified
by an array of processes. Constants such as PMI_GROUP_WORLD,
PMI_GROUP_NODE, PMI_GROUP_NODE_ROOTS can be used to represent
commonly-used special groups.
Old servers may not support PMI 1.2 and fail at init. Use 1.1 for
backward compatibility. Servers that support newer versions should set
the version in its response. Then client can gracefully handle new
features (e.g. PMI_Barrier_group);
Update the the version check in PMI_Init.
In preparation to support group PMI_Barrier, we need functions to
convert from a list of global ranks to local ranks.
In preparation to support group PMI_Barrier, we need a way to map from a
list of global ranks to proxies in a pg. The rankmap maps from global
rank to node_id, and we need a min_node_id offset to map from node_id to
the index in pg->proxy_list.
PMI_Barrier wire command may have an optional "group=..." attribute,
which restrict the barrier within the process group specified by the
group. The group string include special names such as "WORLD", "NODE",
or a comma-separated list of world ranks. If the group attribute is
missing, it defaults to "WORLD", which is backward compatible to
previous PMI-1 versions (1.0 and 1.1).
Add an example using/testing PMI_Barrier_group.
Add support for procs array (that is not just a PMIX_RANK_WILDCARD)
using PMI_Barrier_group wire protocol.
Add MPIR_pmi_allgather_group and MPIR_pmi_barrier_group to support
allgather values over a group of processes over PMI.

Because the groups may overlap, to control the access window, we will
need double barrier -- one between put and get and one at the end.
However, our use case is for bootstrapping business card exchange. The
business card is a constant for each process thus multiple allgather
access windows won't corrupt its value. Thus, we avoid the second
barrier if it is provided a "name".
Use MPIR_pmi_allgather_group in MPIDI_{OFI,UCX}_comm_addr_exchange when
it is supported. Fallback to use MPIR_pmi_allgather if it is not supported
and assert that it is only called in comm_world.

Fix wrong type of addrname in ofi/init_addrxchg.c.
@hzhou hzhou merged commit e68768e into pmodels:main Apr 30, 2025
4 checks passed
@hzhou hzhou deleted the 2504_pmi_group branch April 30, 2025 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants