Skip to content

Duplicate Messages on Matrix? #392

@KIwabuchi

Description

@KIwabuchi

When I used the Matrix cluster, I got duplicate ygm messages when sending messages with std::vector data.
I don't see the same error on Dane. Do you have any ideas on what might be causing this, or how to fix it?

I used 2 nodes and 8 ranks.
I don't get the error if I use only one node or a small vector size (such as 4).
As for the compiler, I just loaded GCC 13 (ml gcc/13.3.1-magic).
I tried the master and dev-v0.9 branches.

Here is an example code:

#include <iostream>
#include <unordered_set>
#include <vector>
#include <ygm/comm.hpp>

int main(int argc, char** argv) {
  ygm::comm comm(&argc, &argv);

  static std::unordered_set<uint64_t> table;
  comm.cf_barrier();

  constexpr int vec_size = 128;
  std::vector<float> data(vec_size, 0.0f);

  constexpr uint64_t chunk_size = 1 << 20;
  for (uint64_t i = 0; i < chunk_size; ++i) {
    const uint64_t id = i + comm.rank() * chunk_size;

    auto receiver = [](auto, const uint64_t id,
                       const std::vector<float>& data) {
      if (table.contains(id)) {
        std::cerr << "Duplicate ID " << id << std::endl;
        MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE);
      }
      table.insert(id);
    };
    comm.async(id % comm.size(), receiver, id, data);
  }
  comm.barrier();

  std::cout << "All done " << comm.rank() << std::endl;
  comm.cf_barrier();

  return 0;
}

Thanks!
Keita

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions