Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor/expansion of MG setup methods #1283

Open
wants to merge 25 commits into
base: develop
Choose a base branch
from

Conversation

weinbe2
Copy link
Contributor

@weinbe2 weinbe2 commented May 27, 2022

This PR refactors and expands the number of methods by which near-null vectors can be generated in QUDA. Due to the nature of the refactor and cleanup, this PR is interface breaking, but in principle in a future-proof way -- near-null vector generation methods are now specified via an enum, QudaNullVectorSetupType, so it is straightforward to add more options in a non-breaking fashion. This PR also codifies the existing behavior that, if an input near-null vector is specified (via --mg-load-vec from the command line, for ex), it is loaded and all other options are ignored.

The full list of methods now includes:

  • Inverse iterations (still the default)
  • Chebyshev filter a la arXiv:2103.05034, P. Boyle and A. Yamaguchi
  • Eigenvectors (as before)
  • Free field vectors (as before, and only for sanity tests)
  • Restricting as many near null vectors as possible from the finer level, then generating more from another method (that is --- if level 1 is Nc = 24, and level 2 is Nc = 32, then for level 2 all 24 near-null vectors from level 1 are restricted then 8 more are generated)

This PR also supports "polishing" near-null vectors generated by other methods with more iterations of inverse iterations.

The incomplete test vector support in QUDA has been mostly removed as it requires a fuller refactor that is outside of the scope of this PR, though it is on the to-do list in the future as it is a demonstrably successful approach.

Command line arguments

The core command line argument, which has been repurposed, is

  • --mg-setup-type [level] [method], where [method] can be inverse-iterations (default), chebyshev-filter, eigenvectors, test-vectors, restrict-fine, and free-field (where test-vectors gracefully errorQudas out)

Inverse iterations

No options for inverse iterations have been changed; --mg-setup-tol, --mg-setup-maxiter, etc, all behave as expected.

Eigenvectors

No options for eigenvectors have been changed.

Chebyshev Filter

The Chebyshev filter has a flexible set of parameters describing generating a set of near-null vectors related to the initial low-pass filter, the number of starting vectors, and subsequent generation from a low-passed starting vector.

  • --mg-setup-filter-startup-vectors - the number of random starting vectors, default 1. As some examples, if the number of near-null vectors for level 1 is 24, --mg-setup-filter-startup-vectors 1 1 corresponds to one starting vector with 24 near-nulls generated; [...] 1 3 corresponds to three starting vectors with 3 near-nulls generated from each, etc. In cases like [...] 1 5, where 5 doesn't divide into 24, 5 near-null vectors are generated from the first four starting vectors, and 4 from the last --- 4 * 4 + 4 = 24.
  • --mg-setup-filter-startup-iterations - number of iterations for the initial low-pass filter, default 1000.
  • --mg-setup-filter-startup-rescale-frequency - an empirical feature; since the norm of a vector could overflow (or individual values thereof), the vector can be renormalized with some frequency in a way that preserves the Chebyshev recursion; empirical default 50.
  • --mg-setup-filter-lambda-max - upper bound for the Chebyshev filters, default power iterations to guess an upper bound
  • --mg-setup-filter-lambda-min - lower bound to use for the initial low pass filter, modes smaller than that value are enhanced. Default 1.
  • --mg-setup-filter-iterations-between-vectors - number of iterations between subsequent near-null vectors after the initial low-pass filter. As an example, if startup iterations is 1000, and the number of iterations between vectors is 150, a near-null vector is generated after 1000 (initial) matrix applications, then at 1150, 1300, 1450... . Default 150.

Restriction

There are no special flags for restriction in and of itself, but:

  • --mg-setup-restrict-remaining-type [level] [method] can be used to specify which method to use to generate the "remaining" near null vectors if fine nvec != coarse nvec. Parameters for the remainder method are taken from the flags for each method.

"Polishing" near-null vectors

"Polishing" near-null vectors with inverse iterations is enabled by specifying a non-zero number of polish iterations via

  • --mg-setup-maxiter-inverse-iterations-polish [level] [numbers] , where the default [numbers] is 0, corresponding to no polishing. Parameters for polishing are taken from the flags for inverse iteration setup.

Reference commands

A base command where we use inverse iterations for level 1 and a custom method for level 2 (specified by SETUP_FLAGS_LEVEL2) is:

CONFIG=[...]
SETUP_FLAGS_LEVEL2=[...]
./invert_test \
  --prec double --prec-sloppy single --prec-null half --prec-precondition half \
  --recon 12 --recon-sloppy 8 --recon-precondition 8 \
  --dim 16 16 16 16 --gridsize 1 1 1 1 --load-gauge $CONFIG \
  --dslash-type wilson --kappa 0.1394265 --tol 1e-10 --nsrc 1 --niter 20 \
  --verbosity verbose --solve-type direct-pc --solution-type mat --inv-type gcr \
  --inv-multigrid true --mg-levels 3 \
  --mg-block-size 0 4 4 4 4 --mg-nvec 0 24 \
  --mg-block-size 1 2 2 2 2 --mg-nvec 1 32 \
  --mg-setup-tol 0 5e-7 --mg-setup-inv 0 cgnr $SETUP_FLAGS_LEVEL2 \
  --mg-use-mma true \
  --mg-smoother 0 ca-gcr --mg-smoother-solve-type 0 direct-pc --mg-nu-pre 0 0 --mg-nu-post 0 4 \
  --mg-smoother 1 ca-gcr --mg-smoother-solve-type 1 direct-pc --mg-nu-pre 1 0 --mg-nu-post 1 4 \
  --mg-coarse-solver 1 gcr --mg-coarse-solve-type 1 direct-pc --mg-coarse-solver-tol 1 0.35 --mg-coarse-solver-maxiter 1 10 \
  --mg-coarse-solver 2 gcr --mg-coarse-solve-type 2 direct-pc --mg-coarse-solver-tol 2 0.01 --mg-coarse-solver-maxiter 2 20 \
  --mg-verbosity 0 verbose --mg-verbosity 1 verbose --mg-verbosity 2 verbose

Where we will fill in SETUP_FLAGS_LEVEL2 for different options.

Inverse Iterations

A standard setup is

SETUP_FLAGS_LEVEL2="--mg-setup-type 1 inverse-iterations --mg-setup-tol 1 5e-7 --mg-setup-inv 1 cgnr"

Where the --mg-setup-type flag is optional as inverse iterations are the default

Eigenvectors

A reference setup without polynomial acceleration is

SETUP_FLAGS_LEVEL2="--mg-setup-type 1 eigenvectors --mg-eig-type 1 trlm --mg-eig-use-dagger 1 false --mg-eig-use-normop 1 true --mg-eig-use-poly-acc 1 false --mg-eig-n-ev 1 36 --mg-eig-n-kr 1 64 --mg-eig-tol 1 1e-5"

Chebyshev filter

A reference setup where 4 base vectors are used -> 8 near-null vectors are generated from each base vector, the minimum of the low pass filter is 1.0, a 500 iteration low pass filter with rescaling every 50 iterations is used, and there are 100 iterations between subsequent near-nulls is:

SETUP_FLAGS_LEVEL2="--mg-setup-type 1 chebyshev-filter --mg-setup-filter-startup-vectors 1 4 --mg-setup-filter-startup-iterations 1 500 --mg-setup-filter-startup-rescale-frequency 1 50 --mg-setup-filter-lambda-min 1 1.0 --mg-setup-filter-iterations-between-vectors 1 50"

Restriction

A reference setup with restriction, then using inverse iterations for the remaining 8 vectors (32 on level 2 minus 24 on level 1) is:

SETUP_FLAGS_LEVEL2="--mg-setup-type 1 restrict-fine --mg-setup-restrict-remaining-type 1 inverse-iterations --mg-setup-tol 1 5e-7 --mg-setup-inv 1 cgnr"

--mg-setup-restrict-remaining-type can be changed appropriately, grabbing other reference flags as appropriate.

Polishing with inverse iterations

As an example, the parameters for a Chebyshev filter can be included, and then they can be polished for 50 iterations via adding:

SETUP_FLAGS_LEVEL2="${SETUP_FLAGS_LEVEL2} --mg-setup-maxiter-inverse-iterations-refinement 1 50 --mg-setup-tol 1 5e-7 --mg-setup-inv 1 cgnr"

Outstanding work

lib/multigrid.cpp Outdated Show resolved Hide resolved
lib/multigrid.cpp Outdated Show resolved Hide resolved
lib/multigrid.cpp Outdated Show resolved Hide resolved
lib/multigrid.cpp Outdated Show resolved Hide resolved
@maddyscientist
Copy link
Member

Completed an initial visual review of this PR. Looks like a good contribution. Left a few comments, and I'll test it on some clover multigrid shortly.

@kostrzewa
Copy link
Member

@weinbe2 sorry that I haven't gotten back to this yet. I started taking a look to see what I need to adjust in tmLQCD to support the changes here when #1287 got in the way... I'll try to spin this up in the next few days and provide feedback, if any.

@weinbe2
Copy link
Contributor Author

weinbe2 commented Jun 15, 2022

@weinbe2 sorry that I haven't gotten back to this yet. I started taking a look to see what I need to adjust in tmLQCD to support the changes here when #1287 got in the way... I'll try to spin this up in the next few days and provide feedback, if any.

No problem! There are certainly plenty of higher-priority things bouncing around, on all of our plates. Whenever you can get to it is fine and appreciated.

@weinbe2 weinbe2 requested a review from a team as a code owner July 5, 2022 21:53
@weinbe2
Copy link
Contributor Author

weinbe2 commented Jul 6, 2022

FYI, one of the recent merges of develop slightly broke this branch, will fix it this week.

Copy link
Member

@maddyscientist maddyscientist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished visual review of this PR. Looks mostly good to my eye with some relatively small tweaks needed.

@@ -495,6 +532,14 @@ namespace quda {

return (param.level == 0 || kd_nearnull_gen);
}

/**
@brief Return if we're on the coarsest grid right now
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use@return here

@@ -449,7 +449,7 @@ namespace quda {

/**
@brief Load the null space vectors in from file
@param B Loaded null-space vectors (pre-allocated)
@param B Load null-space vectors to here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add [in]/[out] tags to all the doxgyen you've touched in this file?

/** Number of iterations between null vectors generated from each starting vector */
int filter_iterations_between_vectors[QUDA_MAX_MG_LEVEL];

/** Conservative estimate of largest eigenvalue of operator used for Chebyshev filter setup */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the doxygen correct here: min is largest e-value and max is lower bound?


sigma_old = sigma;
}
blas::copy(out, *tmp2);
blas::copy(out, tmp_2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to note that this copy can be replaced with a swap. I've already applied this optimization to the feature/multi-rhs, so it's perhaps moot.

extern quda::mgarray<double> filter_lambda_min;
extern quda::mgarray<double> filter_lambda_max;


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete extra space

// Prepare to do the Cholesky decomposition for a thin-QR
std::vector<Complex> Vdagv_(num_vec * num_vec);

// outstanding bugfix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the issue here?


// Initializing to random vectors
if (!refresh) {
int num_initialize = param.mg_global.filter_startup_vectors[param.level];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this unsigned?

if (sqrt(nrm2) > 1e-16) ax(1.0/sqrt(nrm2), *B[i]);// i/<i,i>
else errorQuda("\nCannot normalize %u vector (nrm=%e)\n", i, sqrt(nrm2));
}
if (getVerbosity() >= QUDA_VERBOSE) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace these four lines with two lines of logQuda

(*solve)(*out, *in);
diracSmoother->reconstruct(x, b, QUDA_MAT_SOLUTION);

if (getVerbosity() >= QUDA_VERBOSE) printfQuda("Solution = %g\n", norm2(x));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logQuda

csParam.create = QUDA_ZERO_FIELD_CREATE;
// This is the vector precision used by matResidual
csParam.setPrecision(param.mg_global.invert_param->cuda_prec_sloppy, QUDA_INVALID_PRECISION, true);

for (int i = 0; i < n_conv; i++) B_evecs.push_back(new ColorSpinorField(csParam));

// before entering the eigen solver, let's free the B vectors to save some memory
ColorSpinorParam bParam(*param.B[0]);
for (int i = 0; i < (int)param.B.size(); i++) delete param.B[i];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this optimization to reduce memory get deleted intentionally?

@kostrzewa
Copy link
Member

@weinbe2 Seems to work fine for me with with the changes to our interface that I implemented in etmc/tmLQCD#548 to preserve the status quo. I will keep track of this PR and make any adjustments that may become necessary due to ongoing changes.

Copy link
Member

@kostrzewa kostrzewa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues from my side, haven't tested anything beyond our status quo, however.

@weinbe2
Copy link
Contributor Author

weinbe2 commented Oct 25, 2022

For future reference, do not merge, it appears an issue creeped in not in develop. For my own convenience, a command that caused an issue was

LOCAL_DIM="32 32 32 16"
GRIDSIZE="1 1 1 2"
OUTER_OPERATOR=direct-pc
GAUGE_PATH=./l32t32b7p0
MASS=0.00945
TADPOLE_COEFF=1.0
TOL=1e-5
FIRST_AGGREGATE="4 4 4 4"
SECOND_AGGREGATE="4 4 4 4"
COARSEST_NVEC=128
COARSEST_NEV=160
COARSEST_NKR=192
COARSEST_POLY_DEG=32
COARSEST_AMIN=0.85
COARSEST_AMAX=0.0

export QUDA_RESOURCE_PATH=.
export QUDA_TUNE_VERSION_CHECK=0

mpirun -np 2 --allow-run-as-root \
                        ./staggered_invert_test --inv-multigrid true --dim ${LOCAL_DIM} --gridsize ${GRIDSIZE} --verbosity verbose --nsrc 1 \
                        --dslash-type asqtad --solve-type ${OUTER_OPERATOR} --solution-type mat --compute-fat-long true \
                        --load-gauge ${GAUGE_PATH} --mass ${MASS} --tadpole-coeff ${TADPOLE_COEFF} --tol ${TOL} \
                        --prec single --prec-sloppy single --prec-precondition single --prec-null single --mg-use-mma false \
                        --recon 18 --recon-sloppy 18 --recon-precondition 18 \
                        --mg-verbosity 2 verbose --mg-verbosity 3 verbose \
                        --mg-save-vec 1 blah --mg-save-vec 2 blah \
                        --mg-levels 4 --mg-coarse-solve-type 0 ${OUTER_OPERATOR} --mg-staggered-coarsen-type kd-optimized \
                        --mg-block-size 0 1 1 1 1 --mg-nvec 0 3 \
                        --mg-smoother-solve-type 0 ${OUTER_OPERATOR} --mg-smoother 0 ca-gcr --mg-nu-pre 0 0 --mg-nu-post 0 8 --mg-smoother-tol 0 1e-10  \
                        --mg-coarse-solve-type 1 direct --mg-coarse-solver-tol 1 5e-2 \
                        --mg-coarse-solver-maxiter 1 16 --mg-coarse-solver 1 gcr \
                        --mg-setup-inv 1 cgnr --mg-setup-maxiter 1 1000 --mg-setup-tol 1 1e-5 \
                        --mg-block-size 1 ${FIRST_AGGREGATE} --mg-nvec 1 64 --mg-n-block-ortho 1 2 \
                        --mg-smoother-solve-type 1 direct --mg-smoother 1 ca-gcr --mg-nu-pre 1 0 --mg-nu-post 1 2 --mg-smoother-tol 1 1e-10 \
                        --mg-coarse-solve-type 2 direct-pc --mg-coarse-solver-tol 2 0.25 \
                        --mg-coarse-solver-maxiter 2 16   --mg-coarse-solver 2 gcr \
                        --mg-setup-inv 2 cgnr --mg-setup-maxiter 2 1000 --mg-setup-tol 2 1e-5 \
                        --mg-block-size 2 ${SECOND_AGGREGATE} --mg-nvec 2 96 --mg-n-block-ortho 2 2 \
                        --mg-smoother-solve-type 2 direct-pc --mg-smoother 2 ca-gcr --mg-nu-pre 2 0 --mg-nu-post 2 2 --mg-smoother-tol 2 1e-10 \
                        --mg-coarse-solver 3 ca-gcr --mg-coarse-solver-ca-basis-size 3 16 --mg-coarse-solver-maxiter 3 16 \
                        --mg-nvec 3 ${COARSEST_NVEC} --mg-eig-n-ev 3 ${COARSEST_NEV} --mg-eig-n-kr 3 ${COARSEST_NKR} --mg-eig-tol 3 1e-4 \
                        --mg-eig 3 true --mg-eig-type 3 trlm --mg-eig-use-dagger 3 false --mg-eig-use-normop 3 true \
                        --mg-eig-use-poly-acc 3 true --mg-eig-poly-deg 3 ${COARSEST_POLY_DEG} --mg-eig-amin 3 ${COARSEST_AMIN} --mg-eig-amax 3 ${COARSEST_AMAX} \
                        --mg-eig-max-restarts 3 1000

and the error that popped up is

MG level 2 (GPU): Xinv = 1.574662e+04
MG level 2 (GPU): Yhat[0] = 6.719429e+01 (1.147926e-01 < 4.053768e-02 x 3.218868e+00)
MG level 2 (GPU): Yhat[1] = 6.493005e+01 (1.500221e-01 < 5.319621e-02 x 3.218868e+00)
MG level 2 (GPU): Yhat[2] = 6.315730e+01 (1.395274e-01 < 5.026488e-02 x 3.218868e+00)
MG level 2 (GPU): Yhat[3] = 2.209952e+02 (3.106113e-01 < 8.176686e-02 x 3.218868e+00)
MG level 2 (GPU): Yhat[4] = 4.554047e+01 (1.680291e-01 < 6.054319e-02 x 3.218868e+00)
MG level 2 (GPU): Yhat[5] = 4.207957e+01 (1.500221e-01 < 5.088072e-02 x 3.218868e+00)
MG level 2 (GPU): Yhat[6] = 4.217662e+01 (1.335199e-01 < 4.538731e-02 x 3.218868e+00)
MG level 2 (GPU): Yhat[7] = 1.093643e+02 (3.106113e-01 < 7.678801e-02 x 3.218868e+00)
MG level 2 (GPU): ....done computing Yhat field
MG level 2 (GPU): Finished creating the preconditioned coarse op
MG level 2 (GPU): Coarse Dirac operator done
MG level 2 (GPU): Creating smoother
MG level 2 (GPU): Creating a CA-GCR solver
MG level 2 (GPU): Smoother done
MG level 2 (GPU): Checking 0 = (1 - P P^\dagger) v_k for 96 vectors
MG level 2 (GPU): Vector 0: norms v_k = 1.000000e+00 P^\dagger v_k = 6.917310e-01 (1 - P P^\dagger) v_k = 1.202923e+00, L2 relative deviation = 1.096779e+00
MG level 2 (GPU): ERROR: L2 relative deviation for k=0 failed, 1.096779e+00 > 1.000000e-03 (rank 0, host luna-0196, multigrid.cpp:920 in void quda::MG::verify(bool)())
MG level 2 (GPU): ERROR: L2 relative deviation for k=0 failed, 1.096779e+00 > 1.000000e-03 (rank 1, host luna-0196, multigrid.cpp:920 in void quda::MG::verify(bool)())
MG level 2 (GPU):        last kernel called was (name=N4quda4blas5Norm2IdfEE,volume=8x8x8x4x1,aux=GPU-offline,nParity=2,vol=2048,parity=2,precision=4,order=2,Ns=2,Nc=64)
MG level 2 (GPU):        last kernel called was (name=N4quda4blas5Norm2IdfEE,volume=8x8x8x4x1,aux=GPU-offline,nParity=2,vol=2048,parity=2,precision=4,order=2,Ns=2,Nc=64)
QMP m0,n2@luna-0196 error: abort: 1
QMP m1,n2@luna-0196 error: abort: 1

sm_80, fast build, QUDA_PRECISION=12

@weinbe2 weinbe2 mentioned this pull request Sep 5, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants