Skip to content

Received msg header indicates a size that is too large - ptl_base_max_msg_size #8278

Open
@fm-dewal

Description

@fm-dewal

Background information

Successfully able to run a 60k replica hello_c example across 1024 hosts with 60 slots-per-host.

In same environment, attempting to run a ~100k replica hello_c example across 1024 hosts with 128 slots-per-host.

What version of Open MPI are you using? Describe how Open MPI was installed

I have Open MPI version 4.0.5 installed in my docker images.
Installation done using the openmpi-4-0-5.tar.gz tarball.
Release date: Aug 26, 2020

Please describe the system on which you are running

*Operating system/version:
Native: centos:7 with 3.10.0-957.el7.x86_64 kernel version
Docker: 19.03.9, build 9d988398e7
*Computer hardware: Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70 Ghz
*Network type: Docker Swarm - User-defined Overlay


Details of the problem

After setting up the cluster, the following command is executed:

mpirun -n 131072
--hostfile ./hostfile
--mca mpi_yield_when_idle 1
--mca hwloc_base_binding_policy none
--mca mpi_oversubscribe true
--mca btl tcp,self,vader
--mca btl_tcp_if_include x.x.0.0/16
--mca oob_tcp_if_include x.x.0.0/16
--mca opal_net_private_ipv4 x.x.0.0/16
--mca orte_tmpdir_base /openmpi/tmp
--mca opal_event_include epoll
--mca event_libevent2022_event_include epoll
--mca opal_set_max_sys_limits 1
--mca oob_tcp_listen_mode listen_thread
--mca pmix_base_async_modex true
--mca orte_keep_fqdn_hostnames true \

--mca orte_hostname_cutoff 2000
--mca orte_enable_recovery true
/opt/exec/hello_c.o

The following error message is received:
>A received msg header indicates a size that is too large:
> Requested size: 25836785
> Size limit: 16777216
>If you believe this msg is legitimate, please increase the
>max msg size via the ptl_base_max_msg_size parameter.

The ptl framework has been marked depricated/outdated in the openMPI faq page here:
https://www.open-mpi.org/faq/?category=tuning#frameworks

Please suggest how can I increase the maximum message size as I do believe the message is legitimate.
Happy to provide any additional information as required.
Thank you.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions