Add ACC directives to Fortran #27

mnlevy1981 · 2025-02-26T18:15:41Z

This only updates swm_fortran_kernels.F90 (and adds a mechanism for building with nvfortran instead of gfortran)

This is probably the worst way to control what compiler is used for building the model, but on the other hand it's a simple way to control what compiler is used for building the model

Also added -acc=gpu flag to Makefile (though this might be the default? it didn't change performance, while -noacc slowed things down)

mnlevy1981 · 2025-02-26T18:18:24Z

Timing improvements using a compute node on derecho (with single gpu, where applicable):

Building swm_fortran with nvfortran (this executable does not use acc at all)

 cycle number 4000 total computer time 2.552587 time per cycle 0.000638
 time and megaflops for loop 100 0.842869 7464.336043
 time and megaflops for loop 200 0.951119 7166.028746
 time and megaflops for loop 300 0.735940 5343.047880

swm_fortran_driver (with ACC)

 cycle number 4000 total computer time 0.612509 time per cycle 0.000153
 time and megaflops for loop 100 0.422081 14905.802665
 time and megaflops for loop 200 0.086040 79215.535142
 time and megaflops for loop 300 0.081181 48437.204598

When using nvfortran, it's helpful to have both executables for comparison

mnlevy1981 · 2025-02-26T18:20:49Z

I'll mark this ready-to-review once I've done a little more optimization with regards to acc directives in the driver (single copyin / copyout, more ACC directives around periodicity, etc)

mnlevy1981 · 2025-02-26T23:36:05Z

I added ACC directives around initialization, and that had a noticeable benefit. I also tried to just move the ACC copyin and copyout directives from the kernel to the driver and that degraded performance. Specifically:

diff --git a/swm_fortran/swm_fortran_driver.F90 b/swm_fortran/swm_fortran_driver.F90
index f159393..740f279 100644
--- a/swm_fortran/swm_fortran_driver.F90
+++ b/swm_fortran/swm_fortran_driver.F90
@@ -160,7 +160,9 @@ Program SWM_Fortran_Driver
   do ncycle=1,ITMAX

     call cpu_time(c1)
+    !$acc enter data copyin(p,u,v,fsdx,fsdy,cu,cv,h,z)
     call UpdateIntermediateVariablesKernel(fsdx,fsdy,p,u,v,cu,cv,h,z)
+    !$acc exit data copyout(cu,cv,z,h)
     call cpu_time(c2)
     t100 = t100 + (c2 - c1)

@@ -190,7 +192,9 @@ Program SWM_Fortran_Driver
     tdtsdy = tdt / dy

     call cpu_time(c1)
+    !$acc enter data copyin(tdtsdx,tdtsdy,tdts8,cu,cv,z,h,pold,uold,vold,pnew,unew,vnew)
     call UpdateNewVariablesKernel(tdtsdx,tdtsdy,tdts8,pold,uold,vold,cu,cv,h,z,pnew,unew,vnew)
+    !$acc exit data copyout(unew,vnew,pnew)
     call cpu_time(c2)
     t200 = t200 + (c2-c1)

diff --git a/swm_fortran/swm_fortran_kernels.F90 b/swm_fortran/swm_fortran_kernels.F90
index cff1d69..45ee999 100644
--- a/swm_fortran/swm_fortran_kernels.F90
+++ b/swm_fortran/swm_fortran_kernels.F90
@@ -12,7 +12,6 @@ subroutine UpdateIntermediateVariablesKernel(fsdx,fsdy,p,u,v,cu,cv,h,z)

     integer :: i,j

-    !$acc enter data copyin(p,u,v,fsdx,fsdy,cu,cv,h,z)
     !$acc parallel loop collapse(2) present(p,u,v,fsdx,fsdy)
     do j=1,size(cu,2)-1
       do i=1,size(cu,1)-1
@@ -24,7 +23,6 @@ subroutine UpdateIntermediateVariablesKernel(fsdx,fsdy,p,u,v,cu,cv,h,z)
                                   v(i,j+1) * v(i,j+1) + v(i,j) * v(i,j))
       end do
     end do
-    !$acc exit data copyout(cu,cv,z,h)

   end subroutine UpdateIntermediateVariablesKernel

@@ -36,7 +34,6 @@ subroutine UpdateNewVariablesKernel(tdtsdx,tdtsdy,tdts8,pold,uold,vold,cu,cv,h,z

     integer :: i,j

-    !$acc enter data copyin(tdtsdx,tdtsdy,tdts8,cu,cv,z,h,pold,uold,vold,pnew,unew,vnew)
     !$acc parallel loop collapse(2) present(tdtsdx,tdtsdy,tdts8,cu,cv,z,h,pold,uold,vold)
     do j=1,size(unew,2)-1
       do i=1,size(unew,1)-1
@@ -49,7 +46,6 @@ subroutine UpdateNewVariablesKernel(tdtsdx,tdtsdy,tdts8,pold,uold,vold,cu,cv,h,z
         pnew(i,j) = pold(i,j) - tdtsdx * (cu(i+1,j) - cu(i,j)) - tdtsdy * (cv(i,j+1) - cv(i,j))
       end do
     end do
-    !$acc exit data copyout(unew,vnew,pnew)

   end subroutine UpdateNewVariablesKernel

results in ~50% longer runtime. Is there a directive to say "this function will always be called with data on the GPU"? Or something else that I'm missing?

mnlevy1981 · 2025-02-26T23:36:44Z

@johnmauff I should have tagged you in that last comment

Offload more initialization and periodicity updates to GPU; also added nvfortran-noacc target to Makefile to build without acc

mnlevy1981 added 2 commits February 26, 2025 10:21

Add nvfortran target to Makefile

40fd5ce

This is probably the worst way to control what compiler is used for building the model, but on the other hand it's a simple way to control what compiler is used for building the model

Add OpenACC directives to swm_fortran_kernels.F90

c8a25c7

Also added -acc=gpu flag to Makefile (though this might be the default? it didn't change performance, while -noacc slowed things down)

Build both swm_fortran and swm_fortran_driver

7549a93

When using nvfortran, it's helpful to have both executables for comparison

mnlevy1981 marked this pull request as draft February 26, 2025 18:20

mnlevy1981 added 2 commits February 26, 2025 14:52

Add OpenACC directives around initialization

f0f1ad2

Single copyin for initializing arrays

6b7ed88

Move some ACC directives into driver

17e2c93

Offload more initialization and periodicity updates to GPU; also added nvfortran-noacc target to Makefile to build without acc

mnlevy1981 marked this pull request as ready for review February 27, 2025 18:36

johnmauff merged commit 2f3a84b into NCAR:main Feb 27, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ACC directives to Fortran #27

Add ACC directives to Fortran #27

mnlevy1981 commented Feb 26, 2025

mnlevy1981 commented Feb 26, 2025

mnlevy1981 commented Feb 26, 2025

mnlevy1981 commented Feb 26, 2025

mnlevy1981 commented Feb 26, 2025

Add ACC directives to Fortran #27

Add ACC directives to Fortran #27

Conversation

mnlevy1981 commented Feb 26, 2025

mnlevy1981 commented Feb 26, 2025

mnlevy1981 commented Feb 26, 2025

mnlevy1981 commented Feb 26, 2025

mnlevy1981 commented Feb 26, 2025