Skip to content

Feature/adjoint flow #1526

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 87 commits into from
May 8, 2025
Merged

Feature/adjoint flow #1526

merged 87 commits into from
May 8, 2025

Conversation

rkarur
Copy link
Contributor

@rkarur rkarur commented Dec 15, 2024

Implemented the adjoint fermion flow algorithm (both the serial and hierarchical versions detailed in Appendix E of https://arxiv.org/pdf/1302.5246) and wrote a test that tests both the adjoint and regular (forward) gradient flow, in which the equivalence of the serial and hierarchical methods is demonstrated, as well as a numerical verification that the forward and adjoint fermionic flows are indeed adjoint methods of one another (Identity E.4 in https://arxiv.org/pdf/1302.5246)

rkarur and others added 30 commits October 25, 2024 13:32
initialize dslash_type_precondition
fix CMake to use QUDA_DIRAC_LAPLACE instead of old QUDA_LAPLACE
…ds. Revert prior change to color_spinor_util.in.cu.
@rkarur
Copy link
Contributor Author

rkarur commented Apr 17, 2025

refined errors tests

@cpviolator
Copy link
Member

@maddyscientist This looks pretty goof from my perspective, if you give it the once over for @rkarur we can merge and I can refactor into classes as a separate PR.

@weinbe2
Copy link
Contributor

weinbe2 commented May 7, 2025

The test executable ./su3_fermion_test doesn't run out of the box with no arguments passed in:

> mpirun -np 1 ./su3_fermion_test
[...]
WARNING: Using device memory pool allocator
WARNING: Using pinned memory pool allocator
cublasCreated successfully
ERROR: Smear type 1 not supported - only Wilson or Symanzik supported (rank 0, host ipp1-2318.nvidia.com, su3_fermion_test.cpp:147 in int main(int, char**)())
       last kernel called was (name=cudaMemcpyDeviceToHost,volume=bytes=47775744,aux=copy,gauge_field.cpp,1024)
[...]

Smear type "1" is stout smearing, which is the default specified in the command_line_params files. Should we change the default there, or manually override it in ./su3_fermion_test?

@weinbe2
Copy link
Contributor

weinbe2 commented May 7, 2025

This also fails an error check after manually specifying wilson-type smearing:

> mpirun -np 1 ./su3_fermion_test --su3-smear-type wilson --verbosity verbose
[...]
Fractional error of (<check,adj_check> - <check,fwd_check>.conj()) = 2.82863e-14
adjoint safe/hier match precision passed!
ERROR: fractional error precision failed
 (rank 0, host ipp1-2318.nvidia.com, su3_fermion_test.cpp:279 in int main(int, char**)())
       last kernel called was (name=cudaMemcpyDefault,volume=bytes=63700992,aux=copy,color_spinor_field.cpp,498)

Symanzik has the same issue.

@weinbe2
Copy link
Contributor

weinbe2 commented May 7, 2025

Further, it looks like setting the precision with --prec double vs --prec single doesn't do anything---it seems like this is intentional?

Lines 75 to 80 of the test executable:

  QudaGaugeParam gauge_param = newQudaGaugeParam();
  if (prec_sloppy == QUDA_INVALID_PRECISION) {
    prec = QUDA_DOUBLE_PRECISION;
    prec_sloppy = prec;
  }
  if (link_recon_sloppy == QUDA_RECONSTRUCT_INVALID) link_recon_sloppy = link_recon;

If there's a reason for this that's fine, but there should be a warningQuda noting so, and maybe some other warning or error deeper down if there's a fundamental concern about running this in single precision.

@maddyscientist
Copy link
Member

Further, it looks like setting the precision with --prec double vs --prec single doesn't do anything---it seems like this is intentional?

Lines 75 to 80 of the test executable:

  QudaGaugeParam gauge_param = newQudaGaugeParam();
  if (prec_sloppy == QUDA_INVALID_PRECISION) {
    prec = QUDA_DOUBLE_PRECISION;
    prec_sloppy = prec;
  }
  if (link_recon_sloppy == QUDA_RECONSTRUCT_INVALID) link_recon_sloppy = link_recon;

If there's a reason for this that's fine, but there should be a warningQuda noting so, and maybe some other warning or error deeper down if there's a fundamental concern about running this in single precision.

This was a bug: fixed in my cleanup.

@maddyscientist
Copy link
Member

This also fails an error check after manually specifying wilson-type smearing:

> mpirun -np 1 ./su3_fermion_test --su3-smear-type wilson --verbosity verbose
[...]
Fractional error of (<check,adj_check> - <check,fwd_check>.conj()) = 2.82863e-14
adjoint safe/hier match precision passed!
ERROR: fractional error precision failed
 (rank 0, host ipp1-2318.nvidia.com, su3_fermion_test.cpp:279 in int main(int, char**)())
       last kernel called was (name=cudaMemcpyDefault,volume=bytes=63700992,aux=copy,color_spinor_field.cpp,498)

Symanzik has the same issue.

This was due to requesting too tight a tolerance in double precision. Fixed.

@maddyscientist
Copy link
Member

The test executable ./su3_fermion_test doesn't run out of the box with no arguments passed in:

> mpirun -np 1 ./su3_fermion_test
[...]
WARNING: Using device memory pool allocator
WARNING: Using pinned memory pool allocator
cublasCreated successfully
ERROR: Smear type 1 not supported - only Wilson or Symanzik supported (rank 0, host ipp1-2318.nvidia.com, su3_fermion_test.cpp:147 in int main(int, char**)())
       last kernel called was (name=cudaMemcpyDeviceToHost,volume=bytes=47775744,aux=copy,gauge_field.cpp,1024)
[...]

Smear type "1" is stout smearing, which is the default specified in the command_line_params files. Should we change the default there, or manually override it in ./su3_fermion_test?

Fixed in my cleanup: the default smear type is Stout (which is inherited from pre-existing state). If the user doesn't specify, then Wilson-flow is now set (and a message printed to this effect).

@maddyscientist
Copy link
Member

This is great addition @rkarur, thank you.

I have cleaned up the test a bit, and it now uses google test when the --enable-testing true flag is passed to the exe. Also, I've added the test to the ctest list, so it will be run in our CI. Can you check what I've done to make sure I've not done anything untoward?

I'm approving this PR now, but before we merge we should wait for @weinbe2's sign off.

Copy link
Contributor

@weinbe2 weinbe2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me. Thanks @rkarur and @maddyscientist !

@weinbe2 weinbe2 merged commit a76d22e into develop May 8, 2025
6 checks passed
@weinbe2 weinbe2 deleted the feature/adjoint-flow branch May 8, 2025 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants