-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix inaccurate comment in parallel_state
.
#473
Conversation
I'm not sure if this is the right approach. Could you please take a look? Thanks! @feifeibear |
@@ -338,12 +337,12 @@ def initialize_model_parallel( | |||
dp_degree (2) * cfg_degree (2) * sp_degree (2) * pp_degree (2) = 16. | |||
The present function will create 2 data parallel-groups, | |||
The present function will create 8 data-parallel groups, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original description is correct. It is 2 DP groups. The degree means the number of processes in a group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the CFG, PP, and SP groups also two-degree? Should I update those as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your confusion. The dp degree is special here. It means the number of groups.
The degree means the nccl communication group size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from xfuser.core.distributed.utils import RankGenerator
rank = RankGenerator(tp=1, dp=2, cfg=2, sp=2, pp=2, order="tp-sp-pp-cfg-dp")
print("dp :", rank.get_ranks("dp"))
print("cfg:", rank.get_ranks("cfg"))
print("sp :", rank.get_ranks("sp"))
print("pp :", rank.get_ranks("pp"))
# dp : [[0, 8], [1, 9], [2, 10], [3, 11], [4, 12], [5, 13], [6, 14], [7, 15]]
# cfg: [[0, 4], [1, 5], [2, 6], [3, 7], [8, 12], [9, 13], [10, 14], [11, 15]]
# sp : [[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, 11], [12, 13], [14, 15]]
# pp : [[0, 2], [1, 3], [4, 6], [5, 7], [8, 10], [9, 11], [12, 14], [13, 15]]
The code above produces ranks arranged as [[0, 8], [1, 9], [2, 10], [3, 11], [4, 12], [5, 13], [6, 14], [7, 15]]
, but the current comment suggests a different arrangement: [g0, g1, g2, g3, g4, g5, g6, g7], [g8, g9, g10, g11, g12, g13, g14, g15]
. I'm not sure which is correct. If it's the former arrangement, does that still mean we have 8 groups?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code is correct. Would you like to update the MR w.r.t the real outputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a bit confused. If the dp degree refers to the number of groups, the code's output won't match the current comment. If the dp degree refers to the NCCL group size, then the current MR will be sufficient. Which part do I need to revise: the description above, the output below, or both?
Fixes: #472.