Skip to content

openmpi v4 launch error messages #13293

Open
@puneet336

Description

@puneet336

Hi Team ,
We had installed openmpi 4.1.6 and openmpi v5 using easy build scripts on a VM based cluster on RHEL 9.5 based cluster.
https://docs.easybuild.io/version-specific/supported-software/o/OpenMPI/
Each server in VM based cluster has following 3 interfaces :

NAME                   TYPE      DEVICE
System ens192    ethernet  ens192
System ens161    ethernet  ens161
System ens256    ethernet  ens256

when we run the openmpi v4 , we are able to perform multinode runs , but we see lots of error messages at the beginning and end of the expected output.

[user@server1 mpi]$ mpirun --version
mpirun (Open MPI) 4.1.6
Report bugs to http://www.open-mpi.org/community/help/
[suser@server1 mpi]$  mpirun -np 4 ./a.out
< error messages>
Hello world from processor server1 rank 1 out of 4 processors
Hello world from processor server2, rank 3 out of 4 processors
Hello world from processor server3 rank 2 out of 4 processors
Hello world from processor server4 rank 0 out of 4 processors
<error messages>

Based on the output error messages, i see 2 category of issues -
issue 1)

                   server1:rank1: PSM3 can't open nic unit: -1 (err=23)
[1749130772.508535285] server1:rank0.a.out: Unable to create UDP socket for ens161: Address family not supported by protocol
[1749130772.508552855] server1:rank0.a.out: Unable to initialize sockets NIC /sys/class/net/ens161 (unit 0:0)
[1749130772.510515510] server1:rank0.a.out: Unable to create UDP socket for ens192: Address family not supported by protocol
[1749130772.510530731] server1:rank0.a.out: Unable to initialize sockets NIC /sys/class/net/ens192 (unit 1:0)
[1749130772.512386528] server1:rank0.a.out: Unable to create UDP socket for ens256: Address family not supported by protocol
[1749130772.512401122] server1:rank0.a.out: Unable to initialize sockets NIC /sys/class/net/ens256 (unit 2:0)
server1:rank0: PSM3 can't open nic unit: -1 (err=23)
[1749130772.542387747] server2:rank2.a.out: Unable to create UDP socket for ens161: Address family not supported by protocol
[1749130772.542512255] server2:rank2.a.out: Unable to initialize sockets NIC /sys/class/net/ens161 (unit 0:0)
[1749130772.543819831] server2:rank3.a.out: Unable to create UDP socket for ens161: Address family not supported by protocol
[1749130772.543838127] server2:rank3.a.out: Unable to initialize sockets NIC /sys/class/net/ens161 (unit 0:0)
[1749130772.544649294] server2:rank2.a.out: Unable to create UDP socket for ens192: Address family not supported by protocol
[1749130772.544700905] server2:rank2.a.out: Unable to initialize sockets NIC /sys/class/net/ens192 (unit 1:0)
[1749130772.545667705] server2:rank3.a.out: Unable to create UDP socket for ens192: Address family not supported by protocol



[1749130772.545685370] server2:rank3.a.out: Unable to initialize sockets NIC /sys/class/net/ens192 (unit 1:0)
[1749130772.546689089] server2:rank2.a.out: Unable to create UDP socket for ens256: Address family not supported by protocol
[1749130772.546706079] server2:rank2.a.out: Unable to initialize sockets NIC /sys/class/net/ens256 (unit 2:0)
server2:rank2: PSM3 can't open nic unit: -1 (err=23)
[1749130772.547558723] server2:rank3.a.out: Unable to create UDP socket for ens256: Address family not supported by protocol
[1749130772.547571518] server2:rank3.a.out: Unable to initialize sockets NIC /sys/class/net/ens256 (unit 2:0)
server2:rank3: PSM3 can't open nic unit: -1 (err=23)
[1749130772.549656387] server2:rank2.a.out: Unable to create UDP socket for ens161: Address family not supported by protocol
[1749130772.549732373] server2:rank2.a.out: Unable to initialize sockets NIC /sys/class/net/ens161 (unit 0:0)
server2:rank2: PSM3 can't open nic unit: 0 (err=23)
[1749130772.550315258] server2:rank3.a.out: Unable to create UDP socket for ens161: Address family not supported by protocol
[1749130772.550334655] server2:rank3.a.out: Unable to initialize sockets NIC /sys/class/net/ens161 (unit 0:0)
server2:rank3: PSM3 can't open nic unit: 0 (err=23)
[1749130772.552612768] server2:rank2.a.out: Unable to create UDP socket for ens192: Address family not supported by protocol
[1749130772.552688696] server2:rank2.a.out: Unable to initialize sockets NIC /sys/class/net/ens192 (unit 1:0)
server2:rank2: PSM3 can't open nic unit: 1 (err=23)[1749130772.552796054] server2:rank3.a.out: Unable to create UDP socket for ens192: Address family not supported by protocol

issue 2) at the end of run i see following message -

Hello world from processor server1, rank 0 out of 4 processors
[server1:3799223] PMIX ERROR: PMIX_ERR_NO_PERMISSIONS in file dstore_base.c at line 238

I am attaching the complete stdout herewith, ompiv4_error.txt
Please do let me know if any further information is required from my end.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions