-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TL/MLX5: generate schedule for zcopy allgather #1059
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I only left some minor comments.
Do I understand correctly that this PR doesn't implement any new behavior in the collective itself, but only the setup phase of a collective that will be implemented later?
@samnordmann @nsarka Thanks for the comments! Please let me know if you have any more. I added a new commit. |
1c7ff90
to
2156eac
Compare
2156eac
to
e83dc20
Compare
@samnordmann Thanks for the new comments, I have addressed them. |
The PR focuses on optimizing allgather communication by leveraging a pipelined multicast mechanism with configurable concurrency and prepost bucket sizes for recv buffers. It implements a new function,
ucc_tl_mlx5_mcast_prepare_zero_copy_allgather
, which calculates and sets up a pipelined schedule for the zero-copy allgather. The function validates parameters, allocates schedules, and registers the receive buffers as well.