-
Notifications
You must be signed in to change notification settings - Fork 72
Open
Description
Hi, First I would like to thank the contributors for providing such an elegant and easy-to-go library to profile MPI programs.
MY problem:
I built a mpi cluster within a lan with up to 8 devices (Linux Ubuntu 20.04) according to the MPI tutorial.
I want to use Caliper to profile my applications over multiple devices. And before that, I wrote a simple hello world to test if it works.
The code is as below:
#include <mpi.h>
#include <stdio.h>
#include <caliper/cali.h>
#include <caliper/cali-manager.h>
// ...
// ...
int main(int argc, char** argv) {
//l Initialize the MPI environment
cali::ConfigManager mgr;
mgr.add("runtime-report,event-trace(output=trace.cali)");
int provided;
MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided);
if (provided < MPI_THREAD_MULTIPLE) {
fprintf(stderr, "xxx MPI does not provide needed thread support!\n");
return -1;
// Error - MPI does not provide needed threading level
}
// MPI_Init(&argc, &argv);
mgr.start();
// ...
// Get the number of processes
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
// Get the rank of the process
int world_rank;
// CALI_MARK_BEGIN("iemann_slice_precompute");
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
//CALI_MARK_END("iemann_slice_precompute");
// Get the name of the processor
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processor_name, &name_len);
// Print off a hello world message
printf("Hello world from processor %s, rank %d out of %d processors\n",
processor_name, world_rank, world_size);
// Finalize the MPI environment.
//
mgr.flush();
mgr.stop();
MPI_Finalize();
}
the program works perfectly with multi-threads on a single device.
sky@nx01:~/cloud$ mpirun -np 2 ./hello
Hello world from processor nx01, rank 0 out of 2 processors
Hello world from processor nx01, rank 1 out of 2 processors
Path Min time/rank Max time/rank Avg time/rank Time %
MPI_Comm_dup 0.000952 0.001182 0.001067 13.165525
MPI_Get_processor_name 0.000133 0.000193 0.000163 2.011228
Function Count (min) Count (max) Time (min) Time (max) Time (avg) Time %
9 13 0.040653 0.040994 0.040823 92.516799
MPI_Comm_dup 2 2 0.001527 0.002249 0.001888 4.278705
MPI_Recv 4 4 0.000935 0.000935 0.000935 1.059478
MPI_Comm_free 1 1 0.000170 0.000287 0.000228 0.517841
MPI_Get_processor_name 1 1 0.000170 0.000285 0.000228 0.515575
MPI_Send 4 4 0.000421 0.000421 0.000421 0.477048
MPI_Finalize 1 1 0.000069 0.000134 0.000102 0.230026
MPI_Probe 2 2 0.000186 0.000186 0.000186 0.210762
MPI_Get_count 2 2 0.000171 0.000171 0.000171 0.193766
When I test them over two devices(nodes), the program could not return normally and got stuck in somewhere.
sky@nx01:~/cloud$ mpirun -np 2 --host nx01,nx02 ./hello
Hello world from processor nx02, rank 1 out of 2 processors
Hello world from processor nx01, rank 0 out of 2 processors
Path Min time/rank Max time/rank Avg time/rank Time %
MPI_Comm_dup 0.003007 0.003007 0.003007 29.905520
MPI_Get_processor_name 0.000132 0.000132 0.000132 1.312780
Is there anybody who encounters the same issue or figure out where the bug locates?
Thanks a lot for answering.
Metadata
Metadata
Assignees
Labels
No labels