Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fortran/mpi_f08: interfaces are missing for device buffer (NVHPC extension) #7250

Open
ShatrovOA opened this issue Dec 30, 2024 · 26 comments · May be fixed by #7297
Open

fortran/mpi_f08: interfaces are missing for device buffer (NVHPC extension) #7250

ShatrovOA opened this issue Dec 30, 2024 · 26 comments · May be fixed by #7297
Assignees

Comments

@ShatrovOA
Copy link

mpich 4.2.3 compiled with NVIDIA HPC SDK compilers (nvc, nvc++, nvfortran) with cuda support

Unable to use mpi_f08 module with device pointers.
Sample

program main
#if defined(USE_MPI)
use mpi
#define TYPE_MPI_REQUEST integer
#else
use mpi_f08
#define TYPE_MPI_REQUEST type(MPI_Request)
#endif
  implicit none
  TYPE_MPI_REQUEST :: request
  real, device, allocatable :: buf(:)
  integer :: mpi_ierr

  call MPI_Isend(buf( 1 ), 1, MPI_REAL, 1, 0, MPI_COMM_WORLD, request, mpi_ierr)
  
end program main

mpifort mpich.F90 -cuda
NVFORTRAN-S-0155-Could not resolve generic procedure mpi_isend (mpich.F90: 14)
0 inform, 0 warnings, 1 severes, 0 fatal for main

mpifort mpich.F90 -cuda -DUSE_MPI
SUCCESS

@raffenet
Copy link
Contributor

raffenet commented Feb 5, 2025

I think MPICH would only use IGNORE_TKR in mpi_f08 if support for TS 29113 wasn't available, though right now MPICH requires TS 29113 support to build mpi_f08. Is this actually a compiler issue that device buffers are treated differently than system ones in this case? Maybe @jeffhammond knows something that I don't.

@jeffhammond
Copy link
Member

IGNORE_TKR is only relevant to MPI and MPI_F08 when type(*), dimension(..) is not used in the interface.

Please try without device to determine if that's a load-bearing part of the interface match.

@raffenet
Copy link
Contributor

raffenet commented Feb 6, 2025

Yes, removing device causes the test to compile successfully in my experience.

@jeffhammond
Copy link
Member

I verified that the issue is device and have asked the NVHPC Fortran folks about a solution.

@jeffhammond
Copy link
Member

The compiler team says we support ignore_tkr (d) to ignore the device attribute.

@raffenet how feasible is it to add this to the binding generator for MPI_F08 when CUDA support is enabled?

@hzhou
Copy link
Contributor

hzhou commented Feb 6, 2025

The compiler team says we support ignore_tkr (d) to ignore the device attribute.

@raffenet how feasible is it to add this to the binding generator for MPI_F08 when CUDA support is enabled?

We don't use ignore_tkr in MPI_F08. Instead, we declare the buf as

TYPE(*), DIMENSION(..), INTENT(in), ASYNCHRONOUS :: buf

I think we need add a separate interface with

TYPE(*), DIMENSION(..), INTENT(in), DEVICE, ASYNCHRONOUS :: buf

Is the DEVICE attribute part of fortran standard? My only concern is whether it will break legacy Fortran compilers.

@hzhou
Copy link
Contributor

hzhou commented Feb 6, 2025

Is the DEVICE attribute part of fortran standard? My only concern is whether it will break legacy Fortran compilers.

I guess we should configure test it.

@hzhou hzhou self-assigned this Feb 6, 2025
@jeffhammond
Copy link
Member

DEVICE is an NVHPC-specific extension for CUDA Fortran.

@hzhou hzhou changed the title mpi_f08 interfaces are missing IGNORE_TKR fortran/mpi_f08: interfaces are missing for device buffer (NVHPC extension) Feb 6, 2025
@hzhou
Copy link
Contributor

hzhou commented Feb 7, 2025

Whipped up #7296. Need find a machine to test it.

@raffenet
Copy link
Contributor

raffenet commented Feb 7, 2025

I used JLSE gpu_a100 node with module load nvhpc/nvhpc/24.1 when trying the user program.

@ShatrovOA
Copy link
Author

Hello everyone!

I believe that adding interfaces with device attribute is not a way to solve this issue. The problem is - device is not the only possible attribute that brings CUDA Fortran. There are others:

  • managed
  • pinned
  • unified

https://docs.nvidia.com/hpc-sdk/compilers/cuda-fortran-prog-guide/#variable-qualifiers

My idea is that MPI should just ignore all of these attributes

@jeffhammond
Copy link
Member

That isn't necessary. device is the only attribute that triggers this compiler error. The other 3 you list do not, because they are valid host memory.

for a in device managed unified pinned ; do /opt/mpich/nvhpc/bin/mpifort -cuda -DATTRIBUTE=$a nvhpc.F90 >& /dev/null || echo ${a} causes the problem ; done ; cat nvhpc.F90 
device causes the problem
program main
#if defined(USE_MPI)
use mpi
#define TYPE_MPI_REQUEST integer
#else
use mpi_f08
#define TYPE_MPI_REQUEST type(MPI_Request)
#endif
  implicit none
  TYPE_MPI_REQUEST :: request
  real, ATTRIBUTE, allocatable :: buf(:)
  !real, allocatable :: buf(:)
  integer :: mpi_ierr

  call MPI_Isend(buf( 1 ), 1, MPI_REAL, 1, 0, MPI_COMM_WORLD, request, mpi_ierr)
  
end program main

@hzhou
Copy link
Contributor

hzhou commented Feb 7, 2025

The configure check for this test:

configure:58755: checking whether device attribute is available
configure:58771: nvfortran -c    conftest.f90 >&5
NVFORTRAN-S-0034-Syntax error at or near device (conftest.f90: 8)
NVFORTRAN-S-0134-Illegal attribute - conflict with intent (conftest.f90: 8)
  0 inform,   0 warnings,   2 severes, 0 fatal for main
configure:58771: $? = 2
configure: failed program was:
|
|         program main
|             INTERFACE A
|                 SUBROUTINE A1(buf)
|                 TYPE(*), DIMENSION(..), INTENT(in) :: buf
|                 END SUBROUTINE
|                 SUBROUTINE A2(buf)
|                 TYPE(*), DIMENSION(..), DEVICE, INTENT(in) :: buf
|                 END SUBROUTINE
|             END INTERFACE
|         end
|

It is not recognizing the following:

TYPE(*), DIMENSION(..), DEVICE, INTENT(in) :: buf

So the question is what is the compiler looking for to match -

real, device, allocatable :: buf(:)

As it is complaining NVFORTRAN-S-0155-Could not resolve generic procedure mpi_isend (mpich.F90: 14)

Our current interfaces are -

     INTERFACE MPI_Isend
        SUBROUTINE MPI_Isend_f08ts(buf, count, datatype, dest, tag, comm, request, ierror)
            USE :: mpi_f08_types, ONLY : MPI_Datatype, MPI_Comm, MPI_Request
            IMPLICIT NONE
            TYPE(*), DIMENSION(..), INTENT(in), ASYNCHRONOUS :: buf
            INTEGER, INTENT(in) :: count
            TYPE(MPI_Datatype), INTENT(in) :: datatype
            INTEGER, INTENT(in) :: dest
            INTEGER, INTENT(in) :: tag
            TYPE(MPI_Comm), INTENT(in) :: comm
            TYPE(MPI_Request), INTENT(out) :: request
            INTEGER, OPTIONAL, INTENT(out) :: ierror
        END SUBROUTINE MPI_Isend_f08ts

        SUBROUTINE MPI_Isend_c_f08ts(buf, count, datatype, dest, tag, comm, request, ierror)
            USE :: mpi_f08_types, ONLY : MPI_Datatype, MPI_Comm, MPI_Request
            USE :: mpi_f08_compile_constants, ONLY : MPI_COUNT_KIND
            IMPLICIT NONE
            TYPE(*), DIMENSION(..), INTENT(in), ASYNCHRONOUS :: buf
            INTEGER(KIND=MPI_COUNT_KIND), INTENT(in) :: count
            TYPE(MPI_Datatype), INTENT(in) :: datatype
            INTEGER, INTENT(in) :: dest
            INTEGER, INTENT(in) :: tag
            TYPE(MPI_Comm), INTENT(in) :: comm
            TYPE(MPI_Request), INTENT(out) :: request
            INTEGER, OPTIONAL, INTENT(out) :: ierror
        END SUBROUTINE MPI_Isend_c_f08ts
    END INTERFACE MPI_Isend

We can add more interfaces, but what is nvfortran looking for? @jeffhammond

@hzhou
Copy link
Contributor

hzhou commented Feb 7, 2025

Let me try adding the -cuda to FCFLAGS...

EDIT: That compiles.

@jeffhammond
Copy link
Member

Yeah, you can't use device without -cuda.

@hzhou
Copy link
Contributor

hzhou commented Feb 7, 2025

Looks like the device attribute is conflict to asynchronous?

NVFORTRAN-S-0134-Illegal attribute - conflict with asynchronous (src/binding/fortran/use_mpi_f08/pmpi_f08.f90: 284)
NVFORTRAN-S-0134-Illegal attribute - conflict with asynchronous (src/binding/fortran/use_mpi_f08/pmpi_f08.f90: 301)
NVFORTRAN-S-0134-Illegal attribute - conflict with asynchronous (src/binding/fortran/use_mpi_f08/pmpi_f08.f90: 416)
...

, where the line in question is

            TYPE(*), DIMENSION(..), DEVICE, INTENT(in), ASYNCHRONOUS :: sendbuf

@ShatrovOA
Copy link
Author

I may have found another argument to making compiler ignore device attribute instead of adding separate interface subroutine.
Since HPC-SDK can be shipped with two different cuda versions when you compile with -cuda you should usually specify desired CUDA Runtime to link
Simple program like

program main
  real, device, allocatable :: buf(:)

end program main

compiled with -gpu=cuda11.8 and -gpu=cuda12.4 links different libraries:
cuda11.8

 /opt/hpc_sdk/Linux_x86_64/24.5/compilers/lib/libcudafor_118.so
 /opt/hpc_sdk/Linux_x86_64/24.5/cuda/11.8/lib64/libcudart.so.11.0

cuda12.4

 /opt/hpc_sdk/Linux_x86_64/24.5/compilers/lib/libcudafor_120.so
 /opt/hpc_sdk/Linux_x86_64/24.5/cuda/12.4/lib64/libcudart.so.12

@hzhou
Copy link
Contributor

hzhou commented Feb 7, 2025

@ShatrovOA How that makes compiler ignore the device attribute?

@hzhou
Copy link
Contributor

hzhou commented Feb 7, 2025

Anyway, adding interfaces seems working. However, it is CRAZY! For functions with single choice buffer (e.g. MPI_Send), it doubles the interfaces. And for functions with two choice buffers (e.g. MPI_Reduce), it quadruples the interfaces (host/host, host/device, device/host, device/device). And consider we already doubled the interfaces with large-count APIs. Also in order to jump from Fortran to C (the whole what C-interop is about) creates more interfaces which also multiplies depend on feature mismatches between Fortran and C. I really think all these modern FORTRAN stuff adds nothing but hassles, pure harms to the language users.

This is what happens when language committee are a bunch of people that doesn't really use the language actively but just pursuing utopia. They are proud to claim modern Fortran is more advanced than C++, but not seeing that they are way above the cloud. // end rant

@jeffhammond
Copy link
Member

Using ignore_tkr(d) doesn't add any interfaces.

The interaction of CUDA Fortran with MPI_F08 isn't standardized by anybody, and the issues here are a bug/feature of the NVHPC Fortran compiler. There's a good reason for the compiler to check the interfaces and enforce device consistency, and a good reason to not do so. The solution from the NVHPC compiler team is ignore_tkr(d), which is a lot safer than requiring users to set a compiler flag correctly all the time.

@hzhou
Copy link
Contributor

hzhou commented Feb 7, 2025

Yeah, but ignore_tkr is not standard, rather a hack. If we are okay to hack, we don't need mpi_f08.

@jeffhammond
Copy link
Member

Read https://www.open-mpi.org/papers/euro-pvmmpi-2005-fortran/euro-pvm-mpi-2005-fortran.pdf to see why the MPI_F08 interface design is necessary and good. The previous MPI design was not standard Fortran and relied on ignore_tkr or compilers automatically ignoring type checking altogether.

@jeffhammond
Copy link
Member

The other solution is to give up datatypes and make Fortran base types part of the function name the way OpenSHMEM did.

@hzhou
Copy link
Contributor

hzhou commented Feb 7, 2025

The other solution is to give up datatypes and make Fortran base types part of the function name the way OpenSHMEM did.

Yeah, I would vote for this. Fortran binding should adapt to Fortran rather than standardize the interface with C MPI.

@hzhou
Copy link
Contributor

hzhou commented Feb 7, 2025

Using ignore_tkr(d) doesn't add any interfaces.

The interaction of CUDA Fortran with MPI_F08 isn't standardized by anybody, and the issues here are a bug/feature of the NVHPC Fortran compiler. There's a good reason for the compiler to check the interfaces and enforce device consistency, and a good reason to not do so. The solution from the NVHPC compiler team is ignore_tkr(d), which is a lot safer than requiring users to set a compiler flag correctly all the time.

I'll try this solution.

@hzhou hzhou linked a pull request Feb 7, 2025 that will close this issue
4 tasks
@hzhou
Copy link
Contributor

hzhou commented Feb 7, 2025

Ok, ignore_tkr(d) works. I'll update the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants