Skip to content

Conversation

agimenog
Copy link

@agimenog agimenog commented Aug 4, 2025

Hi,

Reubicating this PR following the previous one EESSI/software-layer#1085

Tested the functionality and checked that the MPI injection now also works for OpenMPI/5.0.7-GCC-14.2.0.

We added some warnings and instructions to follow as newer MPI implementations include libcuda.so which is difficult to locate depending on the system.

Also, where would be the best place to document how to use the script?

Regards,
Arturo.

@ocaisa
Copy link
Member

ocaisa commented Aug 5, 2025

This needs CI testing for two scenarios:

  • Injection of an EESSI OpenMPI as the override
  • Injection of an OS OpenMPI as an override

Both cases should verify that the expected libraries are being picked up and run an OSU benchmark to verify that executables do not crash

@ocaisa
Copy link
Member

ocaisa commented Aug 26, 2025

@agimenog That test passes, well done!

Now add an additional test that injects OpenMPI from the host operating system (installed via apt install openmpi-bin)

if [ -n "$(ls -A ${host_injection_mpi_path})" ]; then
echo "MPI was already injected"
if ${FORCE}; then
echo "Forcing new MPI injection"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move the existing directory to a time stamped version mv ${host_injection_mpi_path} ${host_injection_mpi_path}_<timestamp> in this situation

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes for this already commited

@agimenog
Copy link
Author

Hi @ocaisa,

I will work on that. Just to clarify, should it be on a different .yml file or in the same one?

@ocaisa
Copy link
Member

ocaisa commented Aug 26, 2025

Hi @ocaisa,

I will work on that. Just to clarify, should it be on a different .yml file or in the same one?

Same one, just another step

mkdir ${temp_inject_path}

# Get all library files from openmpi dir
find ${MPI_PATH} -type f -name "*.so*" -exec cp {} ${temp_inject_path} \;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is probably not going to play nice with an OS installed version of OpenMPI as the libraries may well be in the same place as all the other libraries on the system. We should only search for the libraries that make up the compatibility libraries (see https://docs.open-mpi.org/en/v5.0.x/version-numbering.html#shared-library-version-number):

libmpi
libmpi_mpifh
libmpi_usempi_tkr
libmpi_usempi_ignore_tkr
libmpi_usempif08
libmpi_cxx
libmpi_java
liboshmem

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder what will happen when it tries to load a provider...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need libmca*.so* and mca_*.so (see https://docs.open-mpi.org/en/main/mca.html)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the documentation! I also think this will fail as it was originally thinked to inject all libs from one unique directory.

I will take a look on that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The provider thing is very tricky, I expect that they would need to be in the same relative location to the libraries

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea how to inspect or test that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants