Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coll tuned dynamic rules file alltoall_algorithm_max_requests #12827

Merged

Conversation

burlen
Copy link

@burlen burlen commented Sep 26, 2024

Teach the coll tuned dynamic rules file reader to look for the alltoall_algorithm_max_requests tuning parameter. To keep the dynamic rules file format backward compatible the presence of the alltoall_algorithm_max_requests is optional. When not present in the rule definition the value of the corresponding MCA variable is used instead.

Resolves #12589

@burlen
Copy link
Author

burlen commented Oct 14, 2024

@janjust @jsquyres could one of you give a review? or advise on who could?

I'm reaching out to you as you are tagged on the issue (#12589).

for context. On our cluster OpenMPI's fixed decision infrastructure does not choose good transitions between all-to-all algorithms. I found that OpenMPI's linear_sync N algorithm is the Swiss Army Knife implementation. With N=1 it behaves like the pairwise algorithm, with N=0 it behaves like the linear algorithm. On our cluster, the best performance was achieved by using linear_sync and tuning N. Unfortunately the rules file tuning mechanism in OpenMPI does not support setting N in the rules file. This patch aims to address that. My goal is to be able to distribute a rules file with our 3D DNS turbulence code that optimizes all-to-all perf on our cluster.

About this patch: I was not sure if a backward incompatible change to the rules file format would be accepted. That is why I made the new parameter optional. Old rules files will work unmodified. However, I could see pros and cons to both ways. Would be happy to revise to address any concerns.

@burlen burlen force-pushed the dynamic_decision_alltoall_max_requests branch from b1767b2 to 7b505b2 Compare October 26, 2024 18:27
@burlen
Copy link
Author

burlen commented Oct 26, 2024

no new changes. I rebased from main so that this patch doesn't fall too far behind.

Here's rules file that I used to test on TACC Vista's Grace-Hopper partition.
vista_gh_rules.txt
I tested with this program
simple.F90.txt

Some output

mpirun -n 3 --mca coll_tuned_verbose 60 --mca coll_ucc_enable 0 --mca coll_hcoll_enable 0 --mca coll_tuned_priority 100 --mca coll_tuned_use_dynamic_rules 1 --mca coll_tuned_dynamic_rules_filename vista_gh_rules.txt --map-by ppr:1:package:PE=72 --bind-to core ./simple 1024 0
...
3 procs 1024 KB cnt 256000 tot 3072000 B
[c609-031.vista.tacc.utexas.edu:734365] coll_tuned_dynamic_rules.c:382 - ompi_coll_tuned_get_target_method_params() Selected message rule id 7
[c609-031.vista.tacc.utexas.edu:734365] coll_tuned_alltoall_decision.c:184 - ompi_coll_tuned_alltoall_intra_do_this() Selected algorithm 4 (linear_sync) topo faninout 0 segsize 0 max requests 2
...

and again for a different message size

mpirun -n 3 --mca coll_tuned_verbose 60 --mca coll_ucc_enable 0 --mca coll_hcoll_enable 0 --mca coll_tuned_priority 100 --mca coll_tuned_use_dynamic_rules 1 --mca coll_tuned_dynamic_rules_filename vista_gh_rules.txt --map-by ppr:1:package:PE=72 --bind-to core ./simple 2048 0
...
3 procs 2048 KB cnt 512000 tot 6144000 B
[c609-031.vista.tacc.utexas.edu:734316] coll_tuned_dynamic_rules.c:382 - ompi_coll_tuned_get_target_method_params() Selected message rule id 8
[c609-031.vista.tacc.utexas.edu:734316] coll_tuned_alltoall_decision.c:184 - ompi_coll_tuned_alltoall_intra_do_this() Selected algorithm 4 (linear_sync) topo faninout 0 segsize 0 max requests 3
...

All seems to be working. I also tested that when max requests is not specified in the rules file by removing it from every other line in the rules file. I verified that the command line setting --mca coll_tuned_alltoall_algorithm_max_requests is instead used.

@burlen
Copy link
Author

burlen commented Oct 26, 2024

failed test in CI run above is not related to the patch but rather one of the CI systems being offline

Copy link
Member

@bosilca bosilca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern here is what is happening with an older version (let's say based on the 4.x) reading a configuration file that does contain this extra value ? Will it just ignore everything until the EOL or will it consider it incorrect and stop completely reading the configuration file?

ompi/mca/coll/base/coll_base_util.c Outdated Show resolved Hide resolved
@lrbison
Copy link
Contributor

lrbison commented Oct 28, 2024

Wenduo and I discussed an idea to re-implement a tuning file format in json, especially now that we have an opal/util/json. Our motivation was that for allreduce we have a different algorithm we want to use when the communicator is "disjoint" vs not, but we could not find a way to indicate this in the tuning file without breaking the existing format.

However I haven't had time to research it enough to come up with even a strong suggestion for a json format. I do believe if we could agree on a new format, then the implementation to read a new json file could be complete in only a few days.

@burlen burlen force-pushed the dynamic_decision_alltoall_max_requests branch from 7b505b2 to 5437cac Compare November 1, 2024 16:20
@burlen
Copy link
Author

burlen commented Nov 1, 2024

My main concern here is what is happening with an older version (let's say based on the 4.x) reading a configuration file that does contain this extra value ? Will it just ignore everything until the EOL or will it consider it incorrect and stop completely reading the configuration file?

I'd like this patch to be applied to both the 4.x and 5.x branches because I'm using both of these depending on what the admins I'm running on have decided.

Most importantly, a config file that works with the older release of OpenMPI will still work going forward.

Your point is that a new config file with the extra value will not work with an older release of OpenMPI. That's a limitation of the parser. I can't go back in time and rewrite it to be version aware. However, the release version below which the feature does not work could be documented in the user guide. Would this resolve your concern?

I would also be willing to make the parser version aware. It's a relatively easy change that would prevent this issue in the future.

@burlen
Copy link
Author

burlen commented Nov 1, 2024

Wenduo and I discussed an idea to re-implement a tuning file format in json

I find Yaml to be a bit easier to write than Json. It's personal preference though. Either way it would be a nice improvement.

@bosilca
Copy link
Member

bosilca commented Nov 1, 2024

I went though the selection code and if my reading (and recollection) of that code is correct we have a problem.

The loop reading the different rules expects 4 longs per rules (message size, algorithm, fanin and segment size). Assuming we have 2 correct rules from an old configuration file here are the tokens that will exists in the configuration file:

1024 0 2 1024
100000 1 2 1024

If we try to read this configuration file with the code from this PR, we will get the wrong output because the function to read the next token (getnext) ignores the newline. So after reading the last 1024 on the first line, the isdigit will return true because 1 is indeed a digit, so the parser will read 1000000 as the max requests.

We will have a similar issue if we are reading a new file with the old parser. Here I added a max request at the end of each message rule.

1024 0 2 1024 3
100000 1 2 1024 4

The old parser will read 3 as the message size for the next rule instead of simply ignoring it.

@burlen
Copy link
Author

burlen commented Nov 2, 2024

Assuming we have 2 correct rules from an old configuration file here are the tokens that will exists in the configuration file:

1024 0 2 1024
100000 1 2 1024

If we try to read this configuration file with the code from this PR, we will get the wrong output because the function to read the next token (getnext) ignores the newline. So after reading the last 1024 on the first line, the isdigit will return true because 1 is indeed a digit, so the parser will read 1000000 as the max requests.

You're wrong about that here's why. getnext reads 1024 leaving the newline in the stream. isnext_digit will see the newline and return false. isnext_digit will skip spaces and tabs when looking for the next token, but not newlines. (coll_base_util.c:494) isnext_digit always stops at the end of the current line. This is how it ensures backward compatibility. I've explicitly tested it and know that it works. Would you please take another look?

@burlen
Copy link
Author

burlen commented Nov 2, 2024

We will have a similar issue if we are reading a new file with the old parser. Here I added a max request at the end of each message rule.

1024 0 2 1024 3
100000 1 2 1024 4

The old parser will read 3 as the message size for the next rule instead of simply ignoring it.

I found one way to solve this, and have added a commit to the PR with the fix (2da8e6d). This adds support in the parser for a version identifier: rule-file-version-N where N is an unsigned integer in the first line of the file. The old parser will gracefully fallback to fixed decision mechanism when it's present.

@burlen
Copy link
Author

burlen commented Nov 3, 2024

here is a simple working example of the rule file that can be used for testing.

rule-file-version-2
1   # num of collectives
3   # Rules for all-to-all
2   # number of sets of rules
#======================
0    # comm size
1    # number of rules
# Bytes    alg - - reqs
#----------------------
0            0 0 0 0
#=====================
2    # comm size
14   # number of rules
# Bytes    alg - - reqs
#----------------------
0            0 0 0
8000         2 0 0
24000        4 0 0 2
48000        2 0 0
96000        4 0 0 4
192000       2 0 0
384000       4 0 0 6
768000       2 0 0
1536000      4 0 0 8
3072000      2 0 0
6144000      4 0 0 10
12288000     2 0 0
24576000     4 0 0 12
40960000     0 0 0

Copy link
Member

@bosilca bosilca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you initialized version but I don't see it used in the code. If you protect the use of isnext_digit with a check for the version is should work better.

ompi/mca/coll/tuned/coll_tuned_dynamic_file.c Outdated Show resolved Hide resolved
@burlen burlen force-pushed the dynamic_decision_alltoall_max_requests branch from 2da8e6d to 7884ae8 Compare November 6, 2024 00:30
@bosilca
Copy link
Member

bosilca commented Nov 6, 2024

@burlen all looks good. Can you please add or update the copyright in the files you touched. Something like this:

* Copyright (c) 2024      NVIDIA CORPORATION. All rights reserved.

@burlen burlen force-pushed the dynamic_decision_alltoall_max_requests branch from 7884ae8 to f302969 Compare November 7, 2024 00:11
Burlen Loring added 2 commits November 6, 2024 18:16
Teach the dynamic rules file reader to look for the
alltoall_algorithm_max_requests tuning parameter.  To keep the dynamic rules
file format backward compatible the alltoall_algorithm_max_requests is
optional. When not present in the rule definition the value of the
corresponding MCA variable is used instead.

Resolves open-mpi#12589

Signed-off-by: Burlen Loring <[email protected]>
the version identifier is optional but when provided it  must have the
following format and must appear on the first line.`rule-file-version-N`
where N is an unsigned integer. Older versions of the
parser will fall back to fixed decision mechanism when this line is
present. Version 1 is the original format, Version 2 has support for
optional coll_tuned_alltoall_algorithm_max_requests specification.

Signed-off-by: Burlen Loring <[email protected]>
@burlen burlen force-pushed the dynamic_decision_alltoall_max_requests branch from f302969 to f6387a4 Compare November 7, 2024 00:24
@burlen
Copy link
Author

burlen commented Nov 7, 2024

updated copyright and rebased from main

@bosilca bosilca merged commit 0f68484 into open-mpi:main Nov 7, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

coll_tuned_dynamic_rules_filename option no way to set alltoall_algorithm_max_requests from the rules file
4 participants