Refactor of limit vel function for nicer layout for GPU compute by JorgeG94 · Pull Request #1023 · NOAA-GFDL/MOM6

JorgeG94 · 2026-01-21T22:01:49Z

@marshallward sorry for taking long to open this!

This PR modifies vertvisc_limit_vel by refactoring the single CFL test and velocity truncation jki loop into two kji loops for each stage.

Much like before, the first loop tests for CFL violations and logs the smallest velocity magnitude for each ij column. The second loop independently checks for and applies potential potential truncations in the compute domain. The third write_[uv]_accel loop is preserved, with a new global control flag. The pattern is repeated for u and v.

This split was required to eliminate some performance bottlenecks observed in the GPU vert_friction implemenation.

* `trunc_any_array` is promoted to edge-domain in both directions `(SZIB_(G), SZJB_(G), ...)` * Some i/j indexing errors (e.g. `h(i+1,j)` -> `h(i,j+1)`) (Mostly i- code in the j- block) * Removed `!$omp parallel do` directives (NOTE: maybe we undo this one?) * Style fixes * Index case fixes (e.g. `I` -> `i`) * `end do` -> `enddo`, `end if` -> `endif` * Operator/keyword spacings * Trailing whitespace * Removed unused CFL_based_trunc from vertvisc_CS

Replaced the older thickness H_report conditions with the newer omes based on compute domain and using CS%h_[uv]. Also moved the write_u_accel block to mirror dev/gfdl. No idea if there are performance implications but we can check it out.

Revisions to vertvisc_limit_vel PR to dev/gfdl

src/parameterizations/vertical/MOM_vert_friction.F90

Hallberg-NOAA

There are a couple of minor stylistic things that should be corrected here, and there needs to be a proper description of the changes in the PR message, but once these are in place this should be ready to go.

Please also note that the 12 commits in this PR will be squashed into a single commit, unless there is a compelling case to preserve each of the intermediate commits.

JorgeG94 · 2026-01-22T01:14:57Z

There are a couple of minor stylistic things that should be corrected here, and there needs to be a proper description of the changes in the PR message, but once these are in place this should be ready to go.

Please also note that the 12 commits in this PR will be squashed into a single commit, unless there is a compelling case to preserve each of the intermediate commits.

! I thought Marshall and I got all of them, sorry about that.

I don't mind committs being squashed. Will address !

marshallward · 2026-01-23T16:50:02Z

This PR modifies vertvisc_limit_vel by refactoring the single CFL test and velocity truncation jki loop into two kji loops for each stage.

Much like before, the first loop tests for CFL violations and logs the smallest velocity magnitude for each ij column. The second loop independently checks for and applies potential potential truncations in the compute domain. The third write_[uv]_accel loop is preserved, with a new global control flag. The pattern is repeated for u and v.

This split was required to eliminate some performance bottlenecks observed in the GPU vert_friction implemenation.

marshallward

@JorgeG94 While reviewing this again, I noticed that the trunc_any variable was dropped, causing the second truncation loop to always be executed. This could reduce the performance.

I believe an if (CFL > CS%CFL_trunc) trunc_any = .true. can be added before or after the if (CFL > CS%CFL_report) block, and these second kji blocks could be wrapped with if (trunc_any) tests.

The CI has also detected some trailing whitespace which needs to be removed.

I've also noted a few very minor style changes as inline comments.

And of course, thanks very much for figuring out a way to keep this working on both CPU and GPU!

src/parameterizations/vertical/MOM_vert_friction.F90

…factor_limit_vel

marshallward · 2026-01-28T13:40:29Z

Some of my suggestions on the GPU seemed to cause a serious (10x) degradation of runtime on the GPU. Or, at the least, my implementation had some issues. So this needs some investigation before moving forward.

marshallward · 2026-02-02T18:46:30Z

Apologies, my previous comment about performance can be ignored. I can no longer reproduce the problem. It was likely a compiler flag difference.

JorgeG94 · 2026-02-02T20:30:40Z

Apologies, my previous comment about performance can be ignored. I can no longer reproduce the problem. It was likely a compiler flag difference.

With that put to rest, I think everything else has been addressed?

marshallward · 2026-02-02T20:54:00Z

There are still a few outstanding issues.

trunc_any still needs to be restored. The second kji loop of each direction is always run, even though the CFL is already computed and testable in the first kji loop. I would imagine this is CPU-expensive (and maybe even GPU-expensive).
In the current form, the two if (CS%u_trunc_file) blocks are sequential, and I don't see any reason why they should not be merged to a single if-block. Once merged, the scope of do_any_write becomes local to this if-block, so I would move the initialization inside this block. Similarly for CS%v_trunc_file.

There may be an argument for moving the write_[uv]_accel blocks outside, but I'd suggest addressing it at a later time.
There are still a few minor style diffs that need to be fixed:
- Indentation of L2667
- if(do_any_write) -> if (do_any_write)

…factor_limit_vel

- truncation moved inside the main big if since they are sequential, limit the scope of the write any variable - fix indendation

marshallward · 2026-02-02T21:17:20Z

I think everything that I can see has been addressed, but I think it would also be good for someone other than me to review this.

Hallberg-NOAA · 2026-02-04T00:07:36Z

src/parameterizations/vertical/MOM_vert_friction.F90

-        endif
-      enddo ; enddo
+    do_any_write = .false.



trunc_any is never initialized to false, meaning that its (arbitrary) initialization value can determine whether the loops starting at lines 2404 and 2671 are called when there are in fact no truncations. The line trunc_any = .false should be added at about line 2579 and then again at line 2645.

Alternately, trunc_any could perhaps be replaced with the new variable do_any_write where it is used and then eliminated.

good catch, will fix, thanks so much

@Hallberg-NOAA There are different conditions for do_any_write (CFL > CS%CFL_report) and trunc_any (CFL > CS%CFL_trunc).

The CFL > CS%CFL_trunc is ultimately (re)tested in the second loop, but it is now wrapped in a do_any_write / CS%CFL_report test. If any CS%CFL_report is true and all CS%CFL_trunc are false, then you will run the loop and recompute CFL for nothing.

Admittedly, these do default to the same value, and even if they were not this is fairly unlikely, but it could have consequences in longer mostly-stable runs with intermittent CFL violations.

The current version of the code removes the trunc_any test. Unless I'm missing something, it makes more sense to me to preserve this test.

Yes, @marshallward, you are absolutely correct. We do need to retain both do_any_write and any_trunc, and both need to be initialized properly.

JorgeG94 · 2026-03-12T20:08:35Z

sorry it seems I just never committed the last fix!

…factor_limit_vel

JorgeG94 and others added 12 commits January 8, 2026 19:14

port refactored limit vel

86e6276

Update thickness check conditions

83a3184

Replaced the older thickness H_report conditions with the newer omes based on compute domain and using CS%h_[uv]. Also moved the write_u_accel block to mirror dev/gfdl. No idea if there are performance implications but we can check it out.

Merge pull request #1 from marshallward/refactor_limit_vel_amend

e43a871

Revisions to vertvisc_limit_vel PR to dev/gfdl

loop ordering

41627ce

any to avoid loop

514f6a8

Merge branch 'NOAA-GFDL:dev/gfdl' into refactor_limit_vel

c61607e

do any write instead of any to avoid copyu of array unless necessary

4f5d649

Merge branch 'NOAA-GFDL:dev/gfdl' into refactor_limit_vel

b6cb9ce

recompute instead of using an array, should be cheaper

7ff58d9

remove undefined variable

745a22e

Merge branch 'NOAA-GFDL:dev/gfdl' into refactor_limit_vel

fde0757

Hallberg-NOAA reviewed Jan 22, 2026

View reviewed changes

src/parameterizations/vertical/MOM_vert_friction.F90 Outdated Show resolved Hide resolved

Hallberg-NOAA reviewed Jan 22, 2026

View reviewed changes

src/parameterizations/vertical/MOM_vert_friction.F90 Outdated Show resolved Hide resolved

Hallberg-NOAA requested changes Jan 22, 2026

View reviewed changes

marshallward requested changes Jan 23, 2026

View reviewed changes

JorgeG94 added 4 commits January 26, 2026 16:43

address review comments

0317c2e

delete trailing space

f6b5c49

Merge branch 'refactor_limit_vel' of github.com:JorgeG94/MOM6 into re…

88e52a7

…factor_limit_vel

fix?

2c348c7

Merge branch 'dev/gfdl' into refactor_limit_vel

9ee43de

Merge branch 'dev/gfdl' into refactor_limit_vel

12ec2d3

JorgeG94 added 2 commits February 2, 2026 14:55

Merge branch 'refactor_limit_vel' of github.com:JorgeG94/MOM6 into re…

37e9ee1

…factor_limit_vel

address review comments:

4cf2b6c

- truncation moved inside the main big if since they are sequential, limit the scope of the write any variable - fix indendation

Hallberg-NOAA reviewed Feb 4, 2026

View reviewed changes

address review comments: delete trunc_any in favour of do_any_write

b5eccfd

Hallberg-NOAA added the refactor Code cleanup with no changes in functionality or results label Feb 13, 2026

JorgeG94 added 4 commits February 16, 2026 07:46

Merge branch 'NOAA-GFDL:dev/gfdl' into refactor_limit_vel

e0cf981

revert removal of trunc_any after review comment

3714881

trunc any initialized to false with do_any_write

c419fbf

Merge branch 'dev/gfdl' into refactor_limit_vel

1b12031

JorgeG94 added 2 commits March 13, 2026 07:35

trunc_any set to true if CFL>CS%CFL_trunc in a separate if

60a92b5

Merge branch 'refactor_limit_vel' of github.com:JorgeG94/MOM6 into re…

1118fe8

…factor_limit_vel

Conversation

JorgeG94 commented Jan 21, 2026 • edited by marshallward Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Hallberg-NOAA left a comment

Choose a reason for hiding this comment

Uh oh!

JorgeG94 commented Jan 22, 2026

Uh oh!

marshallward commented Jan 23, 2026

Uh oh!

marshallward left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

marshallward commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marshallward commented Feb 2, 2026

Uh oh!

JorgeG94 commented Feb 2, 2026

Uh oh!

marshallward commented Feb 2, 2026

Uh oh!

marshallward commented Feb 2, 2026

Uh oh!

Hallberg-NOAA Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

JorgeG94 Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

marshallward Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Hallberg-NOAA Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

JorgeG94 commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JorgeG94 commented Jan 21, 2026 •

edited by marshallward

Loading

marshallward commented Jan 28, 2026 •

edited

Loading

marshallward Feb 4, 2026 •

edited

Loading