Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TL/MLX5: generate schedule for zcopy allgather #1059

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

MamziB
Copy link
Collaborator

@MamziB MamziB commented Dec 18, 2024

The PR focuses on optimizing allgather communication by leveraging a pipelined multicast mechanism with configurable concurrency and prepost bucket sizes for recv buffers. It implements a new function, ucc_tl_mlx5_mcast_prepare_zero_copy_allgather, which calculates and sets up a pipelined schedule for the zero-copy allgather. The function validates parameters, allocates schedules, and registers the receive buffers as well.

@MamziB MamziB self-assigned this Dec 18, 2024
@janjust janjust requested a review from nsarka December 19, 2024 16:34
Copy link
Collaborator

@samnordmann samnordmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I only left some minor comments.

Do I understand correctly that this PR doesn't implement any new behavior in the collective itself, but only the setup phase of a collective that will be implemented later?

src/components/tl/mlx5/tl_mlx5.c Show resolved Hide resolved
src/components/tl/mlx5/mcast/tl_mlx5_mcast_allgather.c Outdated Show resolved Hide resolved
src/components/tl/mlx5/mcast/tl_mlx5_mcast_allgather.c Outdated Show resolved Hide resolved
src/components/tl/mlx5/mcast/tl_mlx5_mcast_allgather.c Outdated Show resolved Hide resolved
src/components/tl/mlx5/mcast/tl_mlx5_mcast_allgather.c Outdated Show resolved Hide resolved
src/components/tl/mlx5/mcast/tl_mlx5_mcast_allgather.c Outdated Show resolved Hide resolved
src/components/tl/mlx5/mcast/tl_mlx5_mcast_allgather.c Outdated Show resolved Hide resolved
src/components/tl/mlx5/mcast/tl_mlx5_mcast_allgather.c Outdated Show resolved Hide resolved
src/components/tl/mlx5/mcast/tl_mlx5_mcast_allgather.c Outdated Show resolved Hide resolved
src/components/tl/mlx5/mcast/tl_mlx5_mcast.h Outdated Show resolved Hide resolved
src/components/tl/mlx5/mcast/tl_mlx5_mcast_allgather.c Outdated Show resolved Hide resolved
@MamziB
Copy link
Collaborator Author

MamziB commented Jan 30, 2025

@samnordmann @nsarka Thanks for the comments! Please let me know if you have any more. I added a new commit.
Sam, can you please let me know your thoughts about
#1059 (comment)
?

@MamziB MamziB force-pushed the mamzi/mcast-allgather-3-part-2 branch 2 times, most recently from 1c7ff90 to 2156eac Compare February 4, 2025 21:17
@MamziB MamziB requested a review from samnordmann February 5, 2025 18:15
@MamziB MamziB force-pushed the mamzi/mcast-allgather-3-part-2 branch from 2156eac to e83dc20 Compare February 6, 2025 20:50
@MamziB
Copy link
Collaborator Author

MamziB commented Feb 6, 2025

@samnordmann Thanks for the new comments, I have addressed them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants