Description
Background information
Successfully able to run a 60k replica hello_c example across 1024 hosts with 60 slots-per-host.
In same environment, attempting to run a ~100k replica hello_c example across 1024 hosts with 128 slots-per-host.
What version of Open MPI are you using? Describe how Open MPI was installed
I have Open MPI version 4.0.5 installed in my docker images.
Installation done using the openmpi-4-0-5.tar.gz tarball.
Release date: Aug 26, 2020
Please describe the system on which you are running
*Operating system/version:
Native: centos:7 with 3.10.0-957.el7.x86_64 kernel version
Docker: 19.03.9, build 9d988398e7
*Computer hardware: Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70 Ghz
*Network type: Docker Swarm - User-defined Overlay
Details of the problem
After setting up the cluster, the following command is executed:
mpirun -n 131072
--hostfile ./hostfile
--mca mpi_yield_when_idle 1
--mca hwloc_base_binding_policy none
--mca mpi_oversubscribe true
--mca btl tcp,self,vader
--mca btl_tcp_if_include x.x.0.0/16
--mca oob_tcp_if_include x.x.0.0/16
--mca opal_net_private_ipv4 x.x.0.0/16
--mca orte_tmpdir_base /openmpi/tmp
--mca opal_event_include epoll
--mca event_libevent2022_event_include epoll
--mca opal_set_max_sys_limits 1
--mca oob_tcp_listen_mode listen_thread
--mca pmix_base_async_modex true
--mca orte_keep_fqdn_hostnames true \
--mca orte_hostname_cutoff 2000
--mca orte_enable_recovery true
/opt/exec/hello_c.o
The following error message is received:
>A received msg header indicates a size that is too large:
> Requested size: 25836785
> Size limit: 16777216
>If you believe this msg is legitimate, please increase the
>max msg size via the ptl_base_max_msg_size parameter.
The ptl framework has been marked depricated/outdated in the openMPI faq page here:
https://www.open-mpi.org/faq/?category=tuning#frameworks
Please suggest how can I increase the maximum message size as I do believe the message is legitimate.
Happy to provide any additional information as required.
Thank you.