Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/mrhs solvers #1489

Open
wants to merge 82 commits into
base: develop
Choose a base branch
from
Open

Feature/mrhs solvers #1489

wants to merge 82 commits into from

Conversation

maddyscientist
Copy link
Member

@maddyscientist maddyscientist commented Sep 5, 2024

This is PR is a biggie:

  • Adds MRHS solvers to QUDA (at present CG, MR, SD, GCR, CA-GCR) are all implemented
    • Reliable updates performed using 0th RHS
    • Do not flag convergence until all RHS are converged
  • Adds MRHS support to multigrid
    • Batched null space finding implemented, exposed with new parameter QudaMultigridParam::n_vec_batch
    • MRHS supported in actual solves as well
  • Expose MRHS solvers using invertMultiSrcQuda interface
    • This is compatible with the split-grid interface, batching is used across the number of sources per sub grid
  • Explicit breaking of the interface
    • QudaInvertParam::true_res and QudaInvertParam::true_res_hq are now arrays
  • Batched deflation implemented
    • Batch eigenvalue deflation confirmed working using staggered fermions
    • Batch singular-value deflation confirmed working as a coarse grid deflator
  • Eigenvalue computation is now batch, to take advantage of MRHS
  • All Dirac::prepare and Dirac::reconstruct functions are now MRHS optimized
  • Chronological solver is now MRHS optimized
  • Cast from cvector<T> to T is now explicit instead of implicit
    • This makes it much easier to catch bugs when updating code to be MRHS
  • Tensor-core 3xFP16 DslashCoarse is now robust to underflow
  • Miscellaneous cleanup and additions to aid all of the above
  • Fixes some earlier bugs introduced in prior MRHS PRs
  • Add MMA instantiations for 32 -> 64 coarsening
  • Since QUDA is now threaded, update to using MPI_THREAD_FUNNELED
  • Augmentations to the power monitoring

Things left to do

  • Add real MRHS support to all solvers (or at least to a few outstanding ones: CGNE, CGNR, BiCGStab, etc.)
  • Verify NVSHMEM MRHS operators are all working as expected
  • Update QUDA version number due to breakage of interface (QUDA 2.0?)
  • Fix reporting of true residual per RHS when running split grid

… size to use when generating null-space vectors. This is the parameter used to enable MRHS null space generation. Updated the null-space generation to work on vectors of this width
…-> scalar cast operator explicit instead of implicit
… issues at compile time, the scalar -> vector<scalar> cast operator is now explicit. Apply necessary changes to ensure that stuff doesn't break with this change
…of invertQuda to new function solve which is MRHS aware
… also now called by the invertMultiSrcQuda interface
lib/inv_cg_quda.cpp Outdated Show resolved Hide resolved
lib/inv_gcr_quda.cpp Outdated Show resolved Hide resolved
lib/inv_gcr_quda.cpp Outdated Show resolved Hide resolved
@@ -1426,8 +1426,9 @@ void qudaInvertMsrc(int external_precision, int quda_precision, double mass, Qud

// return the number of iterations taken by the inverter
*num_iters = invertParam.iter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe MILC expects the aggregate number of iterations as if it wasn't a batch/block solve, but I forget. I'll also scope out how it expects the residuals as part of the work I'm doing on the MILC side and sort this out

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include/invert_quda.h Outdated Show resolved Hide resolved

// break-out check if we have reached the limit of the precision
if (r2 > r2_old) {
resIncrease++;
resIncreaseTotal++;
warningQuda(
"CA-CG: new reliable residual norm %e is greater than previous reliable residual norm %e (total #inc %i)",
sqrt(r2), sqrt(r2_old), resIncreaseTotal);
sqrt(r2[0]), sqrt(r2_old[9]), resIncreaseTotal);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r2_old[0] not r2_old[9] :)

return true;
}

bool Solver::convergenceHQ(double, double hq2, double, double hq_tol)
bool Solver::convergenceHQ(cvector<double> &hq2, cvector<double> &hq_tol)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a big "gotcha" that happens here: in, for example, inv_cg_quda.cpp, this routine is called as:

convergenceHQ(hq_res, param.tol_hq)

Where param.tol_hq is just a double. Within convergenceHQ (and elsewhere), this is blindly converted to a cvector<double> where the 0th element is param.tol_hq and then the rest are uninitialized. This gives garbage for the other values, messing up convergence.

I think the only way to catch this is to make a conversion from a type T to cvector<T> require an explicit cast, which I think you've imposed on other types as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants