coll tuned dynamic rules file alltoall_algorithm_max_requests #12827

burlen · 2024-09-26T19:56:10Z

Teach the coll tuned dynamic rules file reader to look for the alltoall_algorithm_max_requests tuning parameter. To keep the dynamic rules file format backward compatible the presence of the alltoall_algorithm_max_requests is optional. When not present in the rule definition the value of the corresponding MCA variable is used instead.

Resolves #12589

burlen · 2024-10-14T17:09:35Z

@janjust @jsquyres could one of you give a review? or advise on who could?

I'm reaching out to you as you are tagged on the issue (#12589).

for context. On our cluster OpenMPI's fixed decision infrastructure does not choose good transitions between all-to-all algorithms. I found that OpenMPI's linear_sync N algorithm is the Swiss Army Knife implementation. With N=1 it behaves like the pairwise algorithm, with N=0 it behaves like the linear algorithm. On our cluster, the best performance was achieved by using linear_sync and tuning N. Unfortunately the rules file tuning mechanism in OpenMPI does not support setting N in the rules file. This patch aims to address that. My goal is to be able to distribute a rules file with our 3D DNS turbulence code that optimizes all-to-all perf on our cluster.

About this patch: I was not sure if a backward incompatible change to the rules file format would be accepted. That is why I made the new parameter optional. Old rules files will work unmodified. However, I could see pros and cons to both ways. Would be happy to revise to address any concerns.

burlen · 2024-10-26T19:47:53Z

no new changes. I rebased from main so that this patch doesn't fall too far behind.

Here's rules file that I used to test on TACC Vista's Grace-Hopper partition.
vista_gh_rules.txt
I tested with this program
simple.F90.txt

Some output

mpirun -n 3 --mca coll_tuned_verbose 60 --mca coll_ucc_enable 0 --mca coll_hcoll_enable 0 --mca coll_tuned_priority 100 --mca coll_tuned_use_dynamic_rules 1 --mca coll_tuned_dynamic_rules_filename vista_gh_rules.txt --map-by ppr:1:package:PE=72 --bind-to core ./simple 1024 0
...
3 procs 1024 KB cnt 256000 tot 3072000 B
[c609-031.vista.tacc.utexas.edu:734365] coll_tuned_dynamic_rules.c:382 - ompi_coll_tuned_get_target_method_params() Selected message rule id 7
[c609-031.vista.tacc.utexas.edu:734365] coll_tuned_alltoall_decision.c:184 - ompi_coll_tuned_alltoall_intra_do_this() Selected algorithm 4 (linear_sync) topo faninout 0 segsize 0 max requests 2
...

and again for a different message size

mpirun -n 3 --mca coll_tuned_verbose 60 --mca coll_ucc_enable 0 --mca coll_hcoll_enable 0 --mca coll_tuned_priority 100 --mca coll_tuned_use_dynamic_rules 1 --mca coll_tuned_dynamic_rules_filename vista_gh_rules.txt --map-by ppr:1:package:PE=72 --bind-to core ./simple 2048 0
...
3 procs 2048 KB cnt 512000 tot 6144000 B
[c609-031.vista.tacc.utexas.edu:734316] coll_tuned_dynamic_rules.c:382 - ompi_coll_tuned_get_target_method_params() Selected message rule id 8
[c609-031.vista.tacc.utexas.edu:734316] coll_tuned_alltoall_decision.c:184 - ompi_coll_tuned_alltoall_intra_do_this() Selected algorithm 4 (linear_sync) topo faninout 0 segsize 0 max requests 3
...

All seems to be working. I also tested that when max requests is not specified in the rules file by removing it from every other line in the rules file. I verified that the command line setting --mca coll_tuned_alltoall_algorithm_max_requests is instead used.

burlen · 2024-10-26T20:00:11Z

failed test in CI run above is not related to the patch but rather one of the CI systems being offline

bosilca

My main concern here is what is happening with an older version (let's say based on the 4.x) reading a configuration file that does contain this extra value ? Will it just ignore everything until the EOL or will it consider it incorrect and stop completely reading the configuration file?

ompi/mca/coll/base/coll_base_util.c

lrbison · 2024-10-28T17:07:39Z

Wenduo and I discussed an idea to re-implement a tuning file format in json, especially now that we have an opal/util/json. Our motivation was that for allreduce we have a different algorithm we want to use when the communicator is "disjoint" vs not, but we could not find a way to indicate this in the tuning file without breaking the existing format.

However I haven't had time to research it enough to come up with even a strong suggestion for a json format. I do believe if we could agree on a new format, then the implementation to read a new json file could be complete in only a few days.

burlen · 2024-11-01T16:56:03Z

My main concern here is what is happening with an older version (let's say based on the 4.x) reading a configuration file that does contain this extra value ? Will it just ignore everything until the EOL or will it consider it incorrect and stop completely reading the configuration file?

I'd like this patch to be applied to both the 4.x and 5.x branches because I'm using both of these depending on what the admins I'm running on have decided.

Most importantly, a config file that works with the older release of OpenMPI will still work going forward.

Your point is that a new config file with the extra value will not work with an older release of OpenMPI. That's a limitation of the parser. I can't go back in time and rewrite it to be version aware. However, the release version below which the feature does not work could be documented in the user guide. Would this resolve your concern?

I would also be willing to make the parser version aware. It's a relatively easy change that would prevent this issue in the future.

burlen · 2024-11-01T17:18:18Z

Wenduo and I discussed an idea to re-implement a tuning file format in json

I find Yaml to be a bit easier to write than Json. It's personal preference though. Either way it would be a nice improvement.

bosilca · 2024-11-01T18:05:01Z

I went though the selection code and if my reading (and recollection) of that code is correct we have a problem.

The loop reading the different rules expects 4 longs per rules (message size, algorithm, fanin and segment size). Assuming we have 2 correct rules from an old configuration file here are the tokens that will exists in the configuration file:

1024 0 2 1024
100000 1 2 1024

If we try to read this configuration file with the code from this PR, we will get the wrong output because the function to read the next token (getnext) ignores the newline. So after reading the last 1024 on the first line, the isdigit will return true because 1 is indeed a digit, so the parser will read 1000000 as the max requests.

We will have a similar issue if we are reading a new file with the old parser. Here I added a max request at the end of each message rule.

1024 0 2 1024 3
100000 1 2 1024 4

The old parser will read 3 as the message size for the next rule instead of simply ignoring it.

burlen · 2024-11-02T00:30:51Z

Assuming we have 2 correct rules from an old configuration file here are the tokens that will exists in the configuration file:
1024 0 2 1024
100000 1 2 1024
If we try to read this configuration file with the code from this PR, we will get the wrong output because the function to read the next token (getnext) ignores the newline. So after reading the last 1024 on the first line, the isdigit will return true because 1 is indeed a digit, so the parser will read 1000000 as the max requests.

You're wrong about that here's why. getnext reads 1024 leaving the newline in the stream. isnext_digit will see the newline and return false. isnext_digit will skip spaces and tabs when looking for the next token, but not newlines. (coll_base_util.c:494) isnext_digit always stops at the end of the current line. This is how it ensures backward compatibility. I've explicitly tested it and know that it works. Would you please take another look?

burlen · 2024-11-02T21:13:02Z

We will have a similar issue if we are reading a new file with the old parser. Here I added a max request at the end of each message rule.
1024 0 2 1024 3
100000 1 2 1024 4
The old parser will read 3 as the message size for the next rule instead of simply ignoring it.

I found one way to solve this, and have added a commit to the PR with the fix (2da8e6d). This adds support in the parser for a version identifier: rule-file-version-N where N is an unsigned integer in the first line of the file. The old parser will gracefully fallback to fixed decision mechanism when it's present.

burlen · 2024-11-03T01:52:03Z

here is a simple working example of the rule file that can be used for testing.

rule-file-version-2
1   # num of collectives
3   # Rules for all-to-all
2   # number of sets of rules
#======================
0    # comm size
1    # number of rules
# Bytes    alg - - reqs
#----------------------
0            0 0 0 0
#=====================
2    # comm size
14   # number of rules
# Bytes    alg - - reqs
#----------------------
0            0 0 0
8000         2 0 0
24000        4 0 0 2
48000        2 0 0
96000        4 0 0 4
192000       2 0 0
384000       4 0 0 6
768000       2 0 0
1536000      4 0 0 8
3072000      2 0 0
6144000      4 0 0 10
12288000     2 0 0
24576000     4 0 0 12
40960000     0 0 0

bosilca

I see that you initialized version but I don't see it used in the code. If you protect the use of isnext_digit with a check for the version is should work better.

ompi/mca/coll/tuned/coll_tuned_dynamic_file.c

bosilca · 2024-11-06T13:44:29Z

@burlen all looks good. Can you please add or update the copyright in the files you touched. Something like this:

* Copyright (c) 2024      NVIDIA CORPORATION. All rights reserved.

Teach the dynamic rules file reader to look for the alltoall_algorithm_max_requests tuning parameter. To keep the dynamic rules file format backward compatible the alltoall_algorithm_max_requests is optional. When not present in the rule definition the value of the corresponding MCA variable is used instead. Resolves open-mpi#12589 Signed-off-by: Burlen Loring <[email protected]>

the version identifier is optional but when provided it must have the following format and must appear on the first line.`rule-file-version-N` where N is an unsigned integer. Older versions of the parser will fall back to fixed decision mechanism when this line is present. Version 1 is the original format, Version 2 has support for optional coll_tuned_alltoall_algorithm_max_requests specification. Signed-off-by: Burlen Loring <[email protected]>

burlen · 2024-11-07T00:24:43Z

updated copyright and rebased from main

github-actions bot added the Target: main label Sep 26, 2024

jsquyres assigned bosilca and abouteiller Oct 14, 2024

burlen force-pushed the dynamic_decision_alltoall_max_requests branch from b1767b2 to 7b505b2 Compare October 26, 2024 18:27

bosilca reviewed Oct 28, 2024

View reviewed changes

ompi/mca/coll/base/coll_base_util.c Outdated Show resolved Hide resolved

burlen force-pushed the dynamic_decision_alltoall_max_requests branch from 7b505b2 to 5437cac Compare November 1, 2024 16:20

bosilca approved these changes Nov 1, 2024

View reviewed changes

bosilca reviewed Nov 5, 2024

View reviewed changes

ompi/mca/coll/tuned/coll_tuned_dynamic_file.c Outdated Show resolved Hide resolved

burlen force-pushed the dynamic_decision_alltoall_max_requests branch from 2da8e6d to 7884ae8 Compare November 6, 2024 00:30

burlen force-pushed the dynamic_decision_alltoall_max_requests branch from 7884ae8 to f302969 Compare November 7, 2024 00:11

Burlen Loring added 2 commits November 6, 2024 18:16

burlen force-pushed the dynamic_decision_alltoall_max_requests branch from f302969 to f6387a4 Compare November 7, 2024 00:24

bosilca merged commit 0f68484 into open-mpi:main Nov 7, 2024
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

coll tuned dynamic rules file alltoall_algorithm_max_requests #12827

coll tuned dynamic rules file alltoall_algorithm_max_requests #12827

burlen commented Sep 26, 2024

burlen commented Oct 14, 2024 •

edited

Loading

burlen commented Oct 26, 2024 •

edited

Loading

burlen commented Oct 26, 2024

bosilca left a comment

lrbison commented Oct 28, 2024

burlen commented Nov 1, 2024

burlen commented Nov 1, 2024

bosilca commented Nov 1, 2024

burlen commented Nov 2, 2024 •

edited

Loading

burlen commented Nov 2, 2024 •

edited

Loading

burlen commented Nov 3, 2024

bosilca left a comment

bosilca commented Nov 6, 2024

burlen commented Nov 7, 2024

coll tuned dynamic rules file alltoall_algorithm_max_requests #12827

coll tuned dynamic rules file alltoall_algorithm_max_requests #12827

Conversation

burlen commented Sep 26, 2024

burlen commented Oct 14, 2024 • edited Loading

burlen commented Oct 26, 2024 • edited Loading

burlen commented Oct 26, 2024

bosilca left a comment

Choose a reason for hiding this comment

lrbison commented Oct 28, 2024

burlen commented Nov 1, 2024

burlen commented Nov 1, 2024

bosilca commented Nov 1, 2024

burlen commented Nov 2, 2024 • edited Loading

burlen commented Nov 2, 2024 • edited Loading

burlen commented Nov 3, 2024

bosilca left a comment

Choose a reason for hiding this comment

bosilca commented Nov 6, 2024

burlen commented Nov 7, 2024

burlen commented Oct 14, 2024 •

edited

Loading

burlen commented Oct 26, 2024 •

edited

Loading

burlen commented Nov 2, 2024 •

edited

Loading

burlen commented Nov 2, 2024 •

edited

Loading