Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inaccurate comment in parallel_state. #473

Merged
merged 1 commit into from
Mar 9, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 7 additions & 8 deletions xfuser/core/distributed/parallel_state.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,13 @@
import torch
import torch.distributed
import xfuser.envs as envs
import os
from xfuser.logger import init_logger
from .group_coordinator import (
GroupCoordinator,
PipelineGroupCoordinator,
SequenceParallelGroupCoordinator,
)
from .utils import RankGenerator, generate_masked_orthogonal_rank_groups
from .utils import RankGenerator

env_info = envs.PACKAGES_CHECKER.get_packages_info()
HAS_LONG_CTX_ATTN = env_info["has_long_ctx_attn"]
Expand Down Expand Up @@ -281,7 +280,7 @@ def init_model_parallel_group(
local_rank=local_rank,
torch_distributed_backend=backend,
)

def init_dit_group(
dit_parallel_size: int,
backend: str,
Expand All @@ -290,7 +289,7 @@ def init_dit_group(
_DIT = torch.distributed.new_group(
ranks=list(range(dit_parallel_size)), backend=backend
)

def get_dit_group():
assert _DIT is not None, "DIT group is not initialized"
return _DIT
Expand Down Expand Up @@ -338,12 +337,12 @@ def initialize_model_parallel(

dp_degree (2) * cfg_degree (2) * sp_degree (2) * pp_degree (2) = 16.

The present function will create 2 data parallel-groups,
The present function will create 8 data-parallel groups,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original description is correct. It is 2 DP groups. The degree means the number of processes in a group.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the CFG, PP, and SP groups also two-degree? Should I update those as well?

Copy link
Collaborator

@feifeibear feifeibear Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your confusion. The dp degree is special here. It means the number of groups.

The degree means the nccl communication group size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from xfuser.core.distributed.utils import RankGenerator

rank = RankGenerator(tp=1, dp=2, cfg=2, sp=2, pp=2, order="tp-sp-pp-cfg-dp")
print("dp :", rank.get_ranks("dp"))
print("cfg:", rank.get_ranks("cfg"))
print("sp :", rank.get_ranks("sp"))
print("pp :", rank.get_ranks("pp"))

# dp : [[0, 8], [1, 9], [2, 10], [3, 11], [4, 12], [5, 13], [6, 14], [7, 15]]
# cfg: [[0, 4], [1, 5], [2, 6], [3, 7], [8, 12], [9, 13], [10, 14], [11, 15]]
# sp : [[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, 11], [12, 13], [14, 15]]
# pp : [[0, 2], [1, 3], [4, 6], [5, 7], [8, 10], [9, 11], [12, 14], [13, 15]]

The code above produces ranks arranged as [[0, 8], [1, 9], [2, 10], [3, 11], [4, 12], [5, 13], [6, 14], [7, 15]], but the current comment suggests a different arrangement: [g0, g1, g2, g3, g4, g5, g6, g7], [g8, g9, g10, g11, g12, g13, g14, g15]. I'm not sure which is correct. If it's the former arrangement, does that still mean we have 8 groups?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code is correct. Would you like to update the MR w.r.t the real outputs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a bit confused. If the dp degree refers to the number of groups, the code's output won't match the current comment. If the dp degree refers to the NCCL group size, then the current MR will be sufficient. Which part do I need to revise: the description above, the output below, or both?

8 CFG group, 8 pipeline-parallel group, and
8 sequence-parallel groups:
2 data-parallel groups:
[g0, g1, g2, g3, g4, g5, g6, g7],
[g8, g9, g10, g11, g12, g13, g14, g15]
8 data-parallel groups:
[g0, g8], [g1, g9], [g2, g10], [g3, g11],
[g4, g12], [g5, g13], [g6, g14], [g7, g15]
8 CFG-parallel groups:
[g0, g4], [g1, g5], [g2, g6], [g3, g7],
[g8, g12], [g9, g13], [g10, g14], [g11, g15]
Expand Down