Skip to content

Conversation

@wangvsa
Copy link
Collaborator

@wangvsa wangvsa commented May 29, 2025

Will add more details tomorrow.

This is a big PR with many changes.

  1. Implemented the Margo (mercury) DTL. Margo supports CXI driver, libfabric and UCX. We can use "ofi+verbs" for Infiniband, "ofi+cxi" for Slingshot, and "ofi+tcp" for everything else.
  2. Tested on Corona, Tioga, and Tuolumne.
  3. Fixed many minor bugs.
  4. Changed many debug outputs fromDYAD_LOG_INFO level to DYAD_LOG_DEBUG.
  5. Changed some CMAKE options to make it more intuitive and user-friendly.
  6. Included dyad executable. So users can simply do dyad start and dyad stop to start/stop the DYAD service.

wangvsa added 22 commits March 3, 2025 19:45
Signed-off-by: Chen Wang <[email protected]>
Signed-off-by: Chen Wang <[email protected]>
Signed-off-by: Chen Wang <[email protected]>
Signed-off-by: Chen Wang <[email protected]>
Signed-off-by: Chen Wang <[email protected]>
Signed-off-by: Chen Wang <[email protected]>
Signed-off-by: Chen Wang <[email protected]>
Signed-off-by: Chen Wang <[email protected]>
Signed-off-by: Chen Wang <[email protected]>
@wangvsa
Copy link
Collaborator Author

wangvsa commented May 30, 2025

TODO:

  1. PUSH vs PULL?
  2. Runtime network layer (cxi, tcp, ucx, verbs) selection.
  3. Debug information should include client/service information (e.g., ranks, node id)
  4. Margo-based service: KVS and RPC (do PRC first)

@JaeseungYeom
Copy link
Contributor

Can you take a look at the test errors?

wangvsa added 29 commits June 11, 2025 10:42
Signed-off-by: Chen Wang <[email protected]>
Signed-off-by: Chen Wang <[email protected]>
Signed-off-by: Chen Wang <[email protected]>
Signed-off-by: Chen Wang <[email protected]>
Signed-off-by: Chen Wang <[email protected]>
Signed-off-by: Chen Wang <[email protected]>
Signed-off-by: Chen Wang <[email protected]>
Signed-off-by: Chen Wang <[email protected]>
Signed-off-by: Chen Wang <[email protected]>
@JaeseungYeom JaeseungYeom merged commit 1e3b48d into main Jun 11, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants