Skip to content

Feature/DD #1447

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 163 commits into from
May 19, 2025
Merged

Feature/DD #1447

merged 163 commits into from
May 19, 2025

Conversation

sbacchio
Copy link
Member

@sbacchio sbacchio commented Mar 19, 2024

This is a first PR towards enabling domain decomposition (DD) features in QUDA.
The goal of this PR is to enable a red-black decomposition for the Dirac operator.

Remarks

  • We first consider the blocks to exactly fit in the local lattice. Generalization is left to a future PR.
  • We focus only on the application of the Dirac operator. Optimization of BLAS functions is left to a future PR.

Design

Under domain decomposition, the Dirac operator assumes a block structure, e.g.

$$ D = \begin{bmatrix} D_{rr} & D_{rb} \\ D_{br} & D_{bb} \end{bmatrix} $$

Thus we need a simple strategy to implement all possible operators and their application to a field.
Our strategy is to add the specifications about the domain decomposition directly to the vector (spinor) field.
E.g. the application of $D_{rb}$, i.e. $y = D_{rb} x = P_r D P_b x$, can be expressed by setting the input vector as "black", i.e. $x_b = P_b x$, and the output vector as "red", i.e. $y_r = P_r y$. A pseudocode would look like this:

x.dd_red_active();
y.dd_black_active();
applyD(y,x);

Then, we make the application of D DD-aware and only apply it to the in/out active points.

Summary of changes:

  • DDParam is added as a property of lattice fields (used only for ColoSpinorFields at the moment).
  • ... TODO

TODO list:

  • Add more comments and function documentation
  • Add checks for parameters, e.g. the block size should divide the local lattice size
  • Add calculation of first block parity Use global coordinates
  • Test application of individual pieces, i.e. D_rr, D_rb, D_br, D_bb
  • Test all operators
  • Test that performance without DD is not affected (i.e. compared to current develop)
  • Test MR solver and usage in MG as smoother
  • Properly disable comms when not needed (e.g. block fits local lattice)
  • Improve performance by unrolling threads block-wise
  • Disable usage of DD for all PC Mat
  • ...

Current issues:

  • Tests are failing if the block size is odd in the x direction for the PC operator, e.g. dslash_test --xdim 6 --ydim 4 --zdim 4 --tdim 4 --dd-block-size 3 2 2 2 --dd-red-black=true --test 0. It works if not PC, e.g. --test 2 or if it is odd in any other direction. Now it is not allowed to have odd block size in the x-direction.
  • Tests are failing if export QUDA_REORDER_LOCATION=CPU is set (dslash_test failing with QUDA_REORDER_LOCATION=CPU #1466).
  • Tests are failing for MatPC with Caught signal 8 (Floating point exception: integer divide by zero) . See e.g. dslash_test --xdim 8 --ydim 8 --zdim 8 --tdim 8 --dd-red-black=true --test 1 Not testing for PC operators

@sbacchio sbacchio requested a review from maddyscientist May 8, 2024 07:34
@weinbe2
Copy link
Contributor

weinbe2 commented Apr 24, 2025

I'd like to better understand what's going on with the #ifdef SIGNATURE_ONLY pattern across many of the dslash files; at a high level I can sort of see what it's accomplishing, reducing code duplication, but it's messy from a software engineering perspective and I feel like there must be a more C++-y way to accomplish it. Thoughts @maddyscientist ?

@weinbe2
Copy link
Contributor

weinbe2 commented Apr 24, 2025

I just completed a visual review, and needless to say this is very impressive work---I left some QUDA style convention comments, and a few questions. I think the most important piece of feedback I have is about SIGNATURE_ONLY; I'd really like something neater than that.

My thoughts are all orthogonal to doing actual code testing, so I'll carry on with that next.

@weinbe2
Copy link
Contributor

weinbe2 commented Apr 28, 2025

FYI @pittlerf re:SIGNATURE_ONLY, @maddyscientist is going to look into it briefly and take care of it if there's an easy fix, otherwise we'll merge this and deal with it later---I don't want to postpone merging this PR much longer :)

…avoid mixing them up with color spinor copy filenames
@weinbe2
Copy link
Contributor

weinbe2 commented Apr 29, 2025

I just got done with the boolean op replacement, done by hand instead of with clang-tidy because I figured out very quickly that it was going to be much more complicated than I initially thought. I also elected to rename include/kernels/color_spinor_project_dd.cuh to *domain_decomp.cuh because I kept confusing it with the _dd, _ds, etc files for copying between precisions and I wanted to disambiguate that.

Copy link
Member

@maddyscientist maddyscientist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all this great work. Approving this PR.

Copy link
Contributor

@weinbe2 weinbe2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome work, approved!

@maddyscientist maddyscientist merged commit 96d7a14 into develop May 19, 2025
6 checks passed
@maddyscientist maddyscientist deleted the feature/DD branch May 19, 2025 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants