-
Notifications
You must be signed in to change notification settings - Fork 27
Open
Description
When I used the Matrix cluster, I got duplicate ygm messages when sending messages with std::vector data.
I don't see the same error on Dane. Do you have any ideas on what might be causing this, or how to fix it?
I used 2 nodes and 8 ranks.
I don't get the error if I use only one node or a small vector size (such as 4).
As for the compiler, I just loaded GCC 13 (ml gcc/13.3.1-magic).
I tried the master and dev-v0.9 branches.
Here is an example code:
#include <iostream>
#include <unordered_set>
#include <vector>
#include <ygm/comm.hpp>
int main(int argc, char** argv) {
ygm::comm comm(&argc, &argv);
static std::unordered_set<uint64_t> table;
comm.cf_barrier();
constexpr int vec_size = 128;
std::vector<float> data(vec_size, 0.0f);
constexpr uint64_t chunk_size = 1 << 20;
for (uint64_t i = 0; i < chunk_size; ++i) {
const uint64_t id = i + comm.rank() * chunk_size;
auto receiver = [](auto, const uint64_t id,
const std::vector<float>& data) {
if (table.contains(id)) {
std::cerr << "Duplicate ID " << id << std::endl;
MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE);
}
table.insert(id);
};
comm.async(id % comm.size(), receiver, id, data);
}
comm.barrier();
std::cout << "All done " << comm.rank() << std::endl;
comm.cf_barrier();
return 0;
}Thanks!
Keita
Metadata
Metadata
Assignees
Labels
No labels