Refactor/expansion of MG setup methods #1283

weinbe2 · 2022-05-27T17:00:24Z

This PR refactors and expands the number of methods by which near-null vectors can be generated in QUDA. Due to the nature of the refactor and cleanup, this PR is interface breaking, but in principle in a future-proof way -- near-null vector generation methods are now specified via an enum, QudaNullVectorSetupType, so it is straightforward to add more options in a non-breaking fashion. This PR also codifies the existing behavior that, if an input near-null vector is specified (via --mg-load-vec from the command line, for ex), it is loaded and all other options are ignored.

The full list of methods now includes:

Inverse iterations (still the default)
Chebyshev filter a la arXiv:2103.05034, P. Boyle and A. Yamaguchi
Eigenvectors (as before)
Free field vectors (as before, and only for sanity tests)
Restricting as many near null vectors as possible from the finer level, then generating more from another method (that is --- if level 1 is Nc = 24, and level 2 is Nc = 32, then for level 2 all 24 near-null vectors from level 1 are restricted then 8 more are generated)

This PR also supports "polishing" near-null vectors generated by other methods with more iterations of inverse iterations.

The incomplete test vector support in QUDA has been mostly removed as it requires a fuller refactor that is outside of the scope of this PR, though it is on the to-do list in the future as it is a demonstrably successful approach.

Command line arguments

The core command line argument, which has been repurposed, is

--mg-setup-type [level] [method], where [method] can be inverse-iterations (default), chebyshev-filter, eigenvectors, test-vectors, restrict-fine, and free-field (where test-vectors gracefully errorQudas out)

Inverse iterations

No options for inverse iterations have been changed; --mg-setup-tol, --mg-setup-maxiter, etc, all behave as expected.

Eigenvectors

No options for eigenvectors have been changed.

Chebyshev Filter

The Chebyshev filter has a flexible set of parameters describing generating a set of near-null vectors related to the initial low-pass filter, the number of starting vectors, and subsequent generation from a low-passed starting vector.

--mg-setup-filter-startup-vectors - the number of random starting vectors, default 1. As some examples, if the number of near-null vectors for level 1 is 24, --mg-setup-filter-startup-vectors 1 1 corresponds to one starting vector with 24 near-nulls generated; [...] 1 3 corresponds to three starting vectors with 3 near-nulls generated from each, etc. In cases like [...] 1 5, where 5 doesn't divide into 24, 5 near-null vectors are generated from the first four starting vectors, and 4 from the last --- 4 * 4 + 4 = 24.
--mg-setup-filter-startup-iterations - number of iterations for the initial low-pass filter, default 1000.
--mg-setup-filter-startup-rescale-frequency - an empirical feature; since the norm of a vector could overflow (or individual values thereof), the vector can be renormalized with some frequency in a way that preserves the Chebyshev recursion; empirical default 50.
--mg-setup-filter-lambda-max - upper bound for the Chebyshev filters, default power iterations to guess an upper bound
--mg-setup-filter-lambda-min - lower bound to use for the initial low pass filter, modes smaller than that value are enhanced. Default 1.
--mg-setup-filter-iterations-between-vectors - number of iterations between subsequent near-null vectors after the initial low-pass filter. As an example, if startup iterations is 1000, and the number of iterations between vectors is 150, a near-null vector is generated after 1000 (initial) matrix applications, then at 1150, 1300, 1450... . Default 150.

Restriction

There are no special flags for restriction in and of itself, but:

--mg-setup-restrict-remaining-type [level] [method] can be used to specify which method to use to generate the "remaining" near null vectors if fine nvec != coarse nvec. Parameters for the remainder method are taken from the flags for each method.

"Polishing" near-null vectors

"Polishing" near-null vectors with inverse iterations is enabled by specifying a non-zero number of polish iterations via

--mg-setup-maxiter-inverse-iterations-polish [level] [numbers] , where the default [numbers] is 0, corresponding to no polishing. Parameters for polishing are taken from the flags for inverse iteration setup.

Reference commands

A base command where we use inverse iterations for level 1 and a custom method for level 2 (specified by SETUP_FLAGS_LEVEL2) is:

CONFIG=[...]
SETUP_FLAGS_LEVEL2=[...]
./invert_test \
  --prec double --prec-sloppy single --prec-null half --prec-precondition half \
  --recon 12 --recon-sloppy 8 --recon-precondition 8 \
  --dim 16 16 16 16 --gridsize 1 1 1 1 --load-gauge $CONFIG \
  --dslash-type wilson --kappa 0.1394265 --tol 1e-10 --nsrc 1 --niter 20 \
  --verbosity verbose --solve-type direct-pc --solution-type mat --inv-type gcr \
  --inv-multigrid true --mg-levels 3 \
  --mg-block-size 0 4 4 4 4 --mg-nvec 0 24 \
  --mg-block-size 1 2 2 2 2 --mg-nvec 1 32 \
  --mg-setup-tol 0 5e-7 --mg-setup-inv 0 cgnr $SETUP_FLAGS_LEVEL2 \
  --mg-use-mma true \
  --mg-smoother 0 ca-gcr --mg-smoother-solve-type 0 direct-pc --mg-nu-pre 0 0 --mg-nu-post 0 4 \
  --mg-smoother 1 ca-gcr --mg-smoother-solve-type 1 direct-pc --mg-nu-pre 1 0 --mg-nu-post 1 4 \
  --mg-coarse-solver 1 gcr --mg-coarse-solve-type 1 direct-pc --mg-coarse-solver-tol 1 0.35 --mg-coarse-solver-maxiter 1 10 \
  --mg-coarse-solver 2 gcr --mg-coarse-solve-type 2 direct-pc --mg-coarse-solver-tol 2 0.01 --mg-coarse-solver-maxiter 2 20 \
  --mg-verbosity 0 verbose --mg-verbosity 1 verbose --mg-verbosity 2 verbose

Where we will fill in SETUP_FLAGS_LEVEL2 for different options.

Inverse Iterations

A standard setup is

SETUP_FLAGS_LEVEL2="--mg-setup-type 1 inverse-iterations --mg-setup-tol 1 5e-7 --mg-setup-inv 1 cgnr"

Where the --mg-setup-type flag is optional as inverse iterations are the default

Eigenvectors

A reference setup without polynomial acceleration is

SETUP_FLAGS_LEVEL2="--mg-setup-type 1 eigenvectors --mg-eig-type 1 trlm --mg-eig-use-dagger 1 false --mg-eig-use-normop 1 true --mg-eig-use-poly-acc 1 false --mg-eig-n-ev 1 36 --mg-eig-n-kr 1 64 --mg-eig-tol 1 1e-5"

Chebyshev filter

A reference setup where 4 base vectors are used -> 8 near-null vectors are generated from each base vector, the minimum of the low pass filter is 1.0, a 500 iteration low pass filter with rescaling every 50 iterations is used, and there are 100 iterations between subsequent near-nulls is:

SETUP_FLAGS_LEVEL2="--mg-setup-type 1 chebyshev-filter --mg-setup-filter-startup-vectors 1 4 --mg-setup-filter-startup-iterations 1 500 --mg-setup-filter-startup-rescale-frequency 1 50 --mg-setup-filter-lambda-min 1 1.0 --mg-setup-filter-iterations-between-vectors 1 50"

Restriction

A reference setup with restriction, then using inverse iterations for the remaining 8 vectors (32 on level 2 minus 24 on level 1) is:

SETUP_FLAGS_LEVEL2="--mg-setup-type 1 restrict-fine --mg-setup-restrict-remaining-type 1 inverse-iterations --mg-setup-tol 1 5e-7 --mg-setup-inv 1 cgnr"

--mg-setup-restrict-remaining-type can be changed appropriately, grabbing other reference flags as appropriate.

Polishing with inverse iterations

As an example, the parameters for a Chebyshev filter can be included, and then they can be polished for 50 iterations via adding:

SETUP_FLAGS_LEVEL2="${SETUP_FLAGS_LEVEL2} --mg-setup-maxiter-inverse-iterations-refinement 1 50 --mg-setup-tol 1 5e-7 --mg-setup-inv 1 cgnr"

Outstanding work

Exposing these options in the MILC interface
Copy plus pasting all of this to a wiki page: https://github.com/lattice/quda/wiki/Multigrid-Solver
Interface testing/updating with Chroma, ...?
clang-format

…variables, level 2 (critical path for now) only

…per-level basis; chebyshev filter parameters are still only setable via the command line

…, generate leftover near nulls when restricting some fine near null vectors

…arse

… restricting finer near-nulls

tests/utils/host_utils.h

tests/utils/host_utils.cpp

lib/transfer.cpp

lib/inv_cg_quda.cpp

lib/eigensolve_quda.cpp

lib/check_params.h

lib/multigrid.cpp

maddyscientist · 2022-06-02T00:15:06Z

Completed an initial visual review of this PR. Looks like a good contribution. Left a few comments, and I'll test it on some clover multigrid shortly.

kostrzewa · 2022-06-15T13:30:42Z

@weinbe2 sorry that I haven't gotten back to this yet. I started taking a look to see what I need to adjust in tmLQCD to support the changes here when #1287 got in the way... I'll try to spin this up in the next few days and provide feedback, if any.

weinbe2 · 2022-06-15T20:38:50Z

@weinbe2 sorry that I haven't gotten back to this yet. I started taking a look to see what I need to adjust in tmLQCD to support the changes here when #1287 got in the way... I'll try to spin this up in the next few days and provide feedback, if any.

No problem! There are certainly plenty of higher-priority things bouncing around, on all of our plates. Whenever you can get to it is fine and appreciated.

weinbe2 · 2022-07-06T19:09:02Z

FYI, one of the recent merges of develop slightly broke this branch, will fix it this week.

maddyscientist

Finished visual review of this PR. Looks mostly good to my eye with some relatively small tweaks needed.

maddyscientist · 2022-09-14T20:10:12Z

include/multigrid.h

@@ -495,6 +532,14 @@ namespace quda {

      return (param.level == 0 || kd_nearnull_gen);
    }
+
+    /**
+      @brief Return if we're on the coarsest grid right now


Use@return here

maddyscientist · 2022-09-14T20:10:50Z

include/multigrid.h

@@ -449,7 +449,7 @@ namespace quda {

    /**
       @brief Load the null space vectors in from file
-       @param B Loaded null-space vectors (pre-allocated)
+       @param B Load null-space vectors to here


Can you add [in]/[out] tags to all the doxgyen you've touched in this file?

maddyscientist · 2022-09-14T20:15:06Z

include/quda.h

+    /** Number of iterations between null vectors generated from each starting vector */
+    int filter_iterations_between_vectors[QUDA_MAX_MG_LEVEL];
+
+    /** Conservative estimate of largest eigenvalue of operator used for Chebyshev filter setup */


Is the doxygen correct here: min is largest e-value and max is lower bound?

maddyscientist · 2022-09-14T20:19:09Z

lib/eigensolve_quda.cpp


      sigma_old = sigma;
    }
-    blas::copy(out, *tmp2);
+    blas::copy(out, tmp_2);


Just to note that this copy can be replaced with a swap. I've already applied this optimization to the feature/multi-rhs, so it's perhaps moot.

maddyscientist · 2022-09-14T20:22:06Z

tests/utils/command_line_params.h

+extern quda::mgarray<double> filter_lambda_min;
+extern quda::mgarray<double> filter_lambda_max;
+
+


delete extra space

maddyscientist · 2022-09-14T20:56:22Z

lib/multigrid.cpp

+    // Prepare to do the Cholesky decomposition for a thin-QR
+    std::vector<Complex> Vdagv_(num_vec * num_vec);
+
+    // outstanding bugfix


what's the issue here?

maddyscientist · 2022-09-14T21:00:09Z

lib/multigrid.cpp

+
+          // Initializing to random vectors 
+          if (!refresh) {
+            int num_initialize = param.mg_global.filter_startup_vectors[param.level];


make this unsigned?

maddyscientist · 2022-09-14T21:01:06Z

lib/multigrid.cpp

-          if (sqrt(nrm2) > 1e-16) ax(1.0/sqrt(nrm2), *B[i]);// i/<i,i>
-          else errorQuda("\nCannot normalize %u vector (nrm=%e)\n", i, sqrt(nrm2));
-        }
+      if (getVerbosity() >= QUDA_VERBOSE) {


replace these four lines with two lines of logQuda

maddyscientist · 2022-09-14T21:01:18Z

lib/multigrid.cpp

+      (*solve)(*out, *in);
+      diracSmoother->reconstruct(x, b, QUDA_MAT_SOLUTION);
+
+      if (getVerbosity() >= QUDA_VERBOSE) printfQuda("Solution = %g\n", norm2(x));


maddyscientist · 2022-09-14T21:49:25Z

lib/multigrid.cpp

    csParam.create = QUDA_ZERO_FIELD_CREATE;
    // This is the vector precision used by matResidual
    csParam.setPrecision(param.mg_global.invert_param->cuda_prec_sloppy, QUDA_INVALID_PRECISION, true);

    for (int i = 0; i < n_conv; i++) B_evecs.push_back(new ColorSpinorField(csParam));

-    // before entering the eigen solver, let's free the B vectors to save some memory
-    ColorSpinorParam bParam(*param.B[0]);
-    for (int i = 0; i < (int)param.B.size(); i++) delete param.B[i];


Did this optimization to reduce memory get deleted intentionally?

include/quda.h

kostrzewa · 2022-09-21T15:09:02Z

@weinbe2 Seems to work fine for me with with the changes to our interface that I implemented in etmc/tmLQCD#548 to preserve the status quo. I will keep track of this PR and make any adjustments that may become necessary due to ongoing changes.

kostrzewa

No issues from my side, haven't tested anything beyond our status quo, however.

weinbe2 · 2022-10-25T17:21:04Z

For future reference, do not merge, it appears an issue creeped in not in develop. For my own convenience, a command that caused an issue was

LOCAL_DIM="32 32 32 16"
GRIDSIZE="1 1 1 2"
OUTER_OPERATOR=direct-pc
GAUGE_PATH=./l32t32b7p0
MASS=0.00945
TADPOLE_COEFF=1.0
TOL=1e-5
FIRST_AGGREGATE="4 4 4 4"
SECOND_AGGREGATE="4 4 4 4"
COARSEST_NVEC=128
COARSEST_NEV=160
COARSEST_NKR=192
COARSEST_POLY_DEG=32
COARSEST_AMIN=0.85
COARSEST_AMAX=0.0

export QUDA_RESOURCE_PATH=.
export QUDA_TUNE_VERSION_CHECK=0

mpirun -np 2 --allow-run-as-root \
                        ./staggered_invert_test --inv-multigrid true --dim ${LOCAL_DIM} --gridsize ${GRIDSIZE} --verbosity verbose --nsrc 1 \
                        --dslash-type asqtad --solve-type ${OUTER_OPERATOR} --solution-type mat --compute-fat-long true \
                        --load-gauge ${GAUGE_PATH} --mass ${MASS} --tadpole-coeff ${TADPOLE_COEFF} --tol ${TOL} \
                        --prec single --prec-sloppy single --prec-precondition single --prec-null single --mg-use-mma false \
                        --recon 18 --recon-sloppy 18 --recon-precondition 18 \
                        --mg-verbosity 2 verbose --mg-verbosity 3 verbose \
                        --mg-save-vec 1 blah --mg-save-vec 2 blah \
                        --mg-levels 4 --mg-coarse-solve-type 0 ${OUTER_OPERATOR} --mg-staggered-coarsen-type kd-optimized \
                        --mg-block-size 0 1 1 1 1 --mg-nvec 0 3 \
                        --mg-smoother-solve-type 0 ${OUTER_OPERATOR} --mg-smoother 0 ca-gcr --mg-nu-pre 0 0 --mg-nu-post 0 8 --mg-smoother-tol 0 1e-10  \
                        --mg-coarse-solve-type 1 direct --mg-coarse-solver-tol 1 5e-2 \
                        --mg-coarse-solver-maxiter 1 16 --mg-coarse-solver 1 gcr \
                        --mg-setup-inv 1 cgnr --mg-setup-maxiter 1 1000 --mg-setup-tol 1 1e-5 \
                        --mg-block-size 1 ${FIRST_AGGREGATE} --mg-nvec 1 64 --mg-n-block-ortho 1 2 \
                        --mg-smoother-solve-type 1 direct --mg-smoother 1 ca-gcr --mg-nu-pre 1 0 --mg-nu-post 1 2 --mg-smoother-tol 1 1e-10 \
                        --mg-coarse-solve-type 2 direct-pc --mg-coarse-solver-tol 2 0.25 \
                        --mg-coarse-solver-maxiter 2 16   --mg-coarse-solver 2 gcr \
                        --mg-setup-inv 2 cgnr --mg-setup-maxiter 2 1000 --mg-setup-tol 2 1e-5 \
                        --mg-block-size 2 ${SECOND_AGGREGATE} --mg-nvec 2 96 --mg-n-block-ortho 2 2 \
                        --mg-smoother-solve-type 2 direct-pc --mg-smoother 2 ca-gcr --mg-nu-pre 2 0 --mg-nu-post 2 2 --mg-smoother-tol 2 1e-10 \
                        --mg-coarse-solver 3 ca-gcr --mg-coarse-solver-ca-basis-size 3 16 --mg-coarse-solver-maxiter 3 16 \
                        --mg-nvec 3 ${COARSEST_NVEC} --mg-eig-n-ev 3 ${COARSEST_NEV} --mg-eig-n-kr 3 ${COARSEST_NKR} --mg-eig-tol 3 1e-4 \
                        --mg-eig 3 true --mg-eig-type 3 trlm --mg-eig-use-dagger 3 false --mg-eig-use-normop 3 true \
                        --mg-eig-use-poly-acc 3 true --mg-eig-poly-deg 3 ${COARSEST_POLY_DEG} --mg-eig-amin 3 ${COARSEST_AMIN} --mg-eig-amax 3 ${COARSEST_AMAX} \
                        --mg-eig-max-restarts 3 1000

and the error that popped up is

MG level 2 (GPU): Xinv = 1.574662e+04
MG level 2 (GPU): Yhat[0] = 6.719429e+01 (1.147926e-01 < 4.053768e-02 x 3.218868e+00)
MG level 2 (GPU): Yhat[1] = 6.493005e+01 (1.500221e-01 < 5.319621e-02 x 3.218868e+00)
MG level 2 (GPU): Yhat[2] = 6.315730e+01 (1.395274e-01 < 5.026488e-02 x 3.218868e+00)
MG level 2 (GPU): Yhat[3] = 2.209952e+02 (3.106113e-01 < 8.176686e-02 x 3.218868e+00)
MG level 2 (GPU): Yhat[4] = 4.554047e+01 (1.680291e-01 < 6.054319e-02 x 3.218868e+00)
MG level 2 (GPU): Yhat[5] = 4.207957e+01 (1.500221e-01 < 5.088072e-02 x 3.218868e+00)
MG level 2 (GPU): Yhat[6] = 4.217662e+01 (1.335199e-01 < 4.538731e-02 x 3.218868e+00)
MG level 2 (GPU): Yhat[7] = 1.093643e+02 (3.106113e-01 < 7.678801e-02 x 3.218868e+00)
MG level 2 (GPU): ....done computing Yhat field
MG level 2 (GPU): Finished creating the preconditioned coarse op
MG level 2 (GPU): Coarse Dirac operator done
MG level 2 (GPU): Creating smoother
MG level 2 (GPU): Creating a CA-GCR solver
MG level 2 (GPU): Smoother done
MG level 2 (GPU): Checking 0 = (1 - P P^\dagger) v_k for 96 vectors
MG level 2 (GPU): Vector 0: norms v_k = 1.000000e+00 P^\dagger v_k = 6.917310e-01 (1 - P P^\dagger) v_k = 1.202923e+00, L2 relative deviation = 1.096779e+00
MG level 2 (GPU): ERROR: L2 relative deviation for k=0 failed, 1.096779e+00 > 1.000000e-03 (rank 0, host luna-0196, multigrid.cpp:920 in void quda::MG::verify(bool)())
MG level 2 (GPU): ERROR: L2 relative deviation for k=0 failed, 1.096779e+00 > 1.000000e-03 (rank 1, host luna-0196, multigrid.cpp:920 in void quda::MG::verify(bool)())
MG level 2 (GPU):        last kernel called was (name=N4quda4blas5Norm2IdfEE,volume=8x8x8x4x1,aux=GPU-offline,nParity=2,vol=2048,parity=2,precision=4,order=2,Ns=2,Nc=64)
MG level 2 (GPU):        last kernel called was (name=N4quda4blas5Norm2IdfEE,volume=8x8x8x4x1,aux=GPU-offline,nParity=2,vol=2048,parity=2,precision=4,order=2,Ns=2,Nc=64)
QMP m0,n2@luna-0196 error: abort: 1
QMP m1,n2@luna-0196 error: abort: 1

sm_80, fast build, QUDA_PRECISION=12

weinbe2 added 16 commits April 28, 2022 16:49

Initial hacky support for Chebyshev setup: controlled by environment …

84c7c7f

…variables, level 2 (critical path for now) only

Added solver.hpp include

da23108

Separate flags for level 1 and 2

4dce64e

Big commit refactoring near-null setup, exposing multiple types on a …

7f659a8

…per-level basis; chebyshev filter parameters are still only setable via the command line

Threaded in command line support for Chebyshev filter setup

cef28f5

Various cleanup

f954cec

Split out test vector setup, disabled for the time being.

4b3d7b6

Merge branch 'develop' into feature/cheby-mg-setup

05762b3

Added (untested) support to polish near-nulls with inverse iterations…

ae55523

…, generate leftover near nulls when restricting some fine near null vectors

Merge branch 'develop' into feature/cheby-mg-setup

4539dad

Need temporary vector for restricting half prec fine near-nulls -> co…

3b300ee

…arse

Partial progress to using eigensolver to generate extra vectors after…

5e109fe

… restricting finer near-nulls

Mixed restrict/eigensolver setup is now working.

0014953

Fixed type typo on chebyshev filter bounds

462854d

Merge branch 'develop' into feature/cheby-mg-setup

70dafdd

Merge branch 'develop' into feature/cheby-mg-setup

5f282c0

weinbe2 requested a review from a team as a code owner May 27, 2022 17:00

weinbe2 requested review from cpviolator, kostrzewa, maddyscientist and bjoo May 27, 2022 17:10

weinbe2 added bug feature clean-up labels May 27, 2022

Merge branch 'develop' into feature/cheby-mg-setup

e8a8677