MMA-optimized batched MRHS MG #1540

maddyscientist · 2025-03-11T17:32:42Z

This PR focuses on improving the use of MMA with batched multigrid, resulting in the optimal MRHS multigrid performance:

Make the mma-ordered version of the null space vectors persistent to avoid needing to reorder on the fly
Full coarse-grid correction can now be done in MMA order, minimizing the reordering overhead
- This is enabled on a per-level basis using QudaMultigridParam::collapse_mrhs (--mg-collapse-mrhs from the command line)
- Doing so collapse the MRHS solve to a single solve on the coarse level in question
- Enabling this requires that both the transfer operation into that grid and the dslash operator for that grid are deployed using MMA
Improve memory utilization: if pre- and post- smoother are identical, they can alias, avoiding needing to be allocated separately
Various boiler-plate improvements to facilitate the above, e.g.,
- input and/or output for transfer operators can be collapsed
- new method vector_ref<ColorSpinorField>::size_actual() to allow us to query the number of RHS in a collapsed system
Improve usability of FieldTmp
- Temporaries can now be references, which helps with composability
BlockTranspose input field set can now be in half precision

…r operators; delay enablement of MMA in transfer until after coarse operator is computed to avoid double storage of V in MMA order; only store V in native or MMA order to reduce memory

…rn_residual, use_init_guess. These can now be updated using the Solver::update_param interface

…mer for the latter to reduce memory consumption

… This is helpful for unform object creation.

…lied using tensor cores

…ngle MMA-ordered super-system: prolongator, restrictor and dslash coarse will now not reorder if the input / output fields are already in the correct order

…compatible

…ith a ColorSpinorField; deploy this to check if we need to nuke any preexisting allications when resizing std::vector<ColorSpinorField>

…eferences around the input (useful for creating a uniform code path between native and MMA ordered solvers)

…llapsed

…ew parameter QudaMultigridParam::collapse_mrhs enables this on a per level basis, with MMA enablement for both dslash and transfer operator (on the finer level) required; deflation is presently handled by expanding the collapsed space to the batch form, and then collapsing again post deflation. This removes all the BlockTranspose operations outside of the initial coarsening and final prolongation (deflation excepting).

…this fixes the verify routine with MMA transfer routines when using half precision

…-misc

hummingtree

Nice clean up to make MMA/MG more accessible! Just have a small request for comments.

lib/multigrid.in.hpp

…arer. Remove unused function.

…-misc

…e tune parameters to avoid accidental overwriting of a real tunecache (closes #1558)

…egation transfer operators

…at is not a divisor of Nsrc

…ature/mg-mma-misc

…-misc

maddyscientist added 17 commits February 19, 2025 08:41

Optimize MMA MG: make a persistent copy of V in MMA order the transfe…

e158e8c

…r operators; delay enablement of MMA in transfer until after coarse operator is computed to avoid double storage of V in MMA order; only store V in native or MMA order to reduce memory

Solver class now owns copies of the parameters compute_true_res, retu…

c704ef9

…rn_residual, use_init_guess. These can now be updated using the Solver::update_param interface

If pre smoother and post smoother are identical, we can reuse the for…

1beaafa

…mer for the latter to reduce memory consumption

Add ColorSpinorField::is_reference() method

e12a0f2

Add support to FieldTmp for creating temporaries that are references.…

db0ceaa

… This is helpful for unform object creation.

Add Dirac::is_mma_enabled for querying if a Dirac operator can be app…

a8c8ede

…lied using tensor cores

Build up of the framework to allow us to turn a MRHS system into a si…

ac30902

…ngle MMA-ordered super-system: prolongator, restrictor and dslash coarse will now not reorder if the input / output fields are already in the correct order

Implement color_spinor_copy as a simple copy if input and output are …

89a6199

…compatible

Add some functions for checking if a ColorSpinorField is compatible w…

fa2b177

…ith a ColorSpinorField; deploy this to check if we need to nuke any preexisting allications when resizing std::vector<ColorSpinorField>

Move PreconditionedSolver::operator() implementation tp solver.cpp

cb9ade0

Vector version of getFieldTmp can now optionally create a vector of r…

01326f2

…eferences around the input (useful for creating a uniform code path between native and MMA ordered solvers)

Fix minor bug with ColorSpinorField::move

b95a3c3

Improve verbosity of quda_ptr error printing

2b4e3af

Coarse grid argument for prolongator and restrictor now can be pre-co…

93cd741

…llapsed

Fix some bugs

fabe2c4

Add half precision support for input vector set for BlockTranspose - …

9ea3163

…this fixes the verify routine with MMA transfer routines when using half precision

maddyscientist added bug feature optimization labels Mar 11, 2025

maddyscientist added this to the QUDA 2.0 milestone Mar 11, 2025

maddyscientist assigned weinbe2 Mar 11, 2025

maddyscientist requested a review from a team as a code owner March 11, 2025 17:32

maddyscientist assigned hummingtree Mar 11, 2025

maddyscientist changed the title ~~Feature/mg mma misc~~ MMA-optimized batched MRHS MG Mar 11, 2025

maddyscientist added 2 commits March 11, 2025 12:38

Fix CI warning

2bb7f5d

Merge branch 'develop' of github.com:lattice/quda into feature/mg-mma…

57859a0

…-misc

hummingtree approved these changes Apr 23, 2025

View reviewed changes

lib/multigrid.in.hpp Outdated Show resolved Hide resolved

maddyscientist added 2 commits April 23, 2025 16:35

Add doxygen and clean up some function names to make their intent cle…

d123db4

…arer. Remove unused function.

Merge branch 'develop' of github.com:lattice/quda into feature/mg-mma…

e827e90

…-misc

maddyscientist added 3 commits April 23, 2025 16:37

Apply clang-format

0fae783

When tuning is disabled, use tunecache_notune.tsv when writing out th…

df5db2c

…e tune parameters to avoid accidental overwriting of a real tunecache (closes #1558)

Remove complex.h from domain_wall_dslash_reference.cpp (#1485)

a58c21e

This was referenced Apr 26, 2025

Hotfix/ccomplex #1485

Merged

If tuning is off and resource path is available, the current tunecache.tsv will be overwrited #1558

Closed

weinbe2 and others added 8 commits April 28, 2025 12:26

Added a clearer error message about keeping MMA disabled for non-aggr…

35e4645

…egation transfer operators

invert_test and staggered_invert_test can now use a Nrsc_tile size th…

0609901

…at is not a divisor of Nsrc

Merge branch 'develop' into feature/mg-mma-misc

e77b64d

Threaded support for collapse_mrhs into the MILC HISQ MG interface

3bc3ba0

Lift the restriction that aggregate_size_cb % aggregate_per_block != 0.

10bac89

Merge branch 'feature/mg-mma-misc' of github.com:lattice/quda into fe…

3f99c59

…ature/mg-mma-misc

Merge branch 'develop' of github.com:lattice/quda into feature/mg-mma…

9985843

…-misc

Fix bad merge

1a6a2a3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MMA-optimized batched MRHS MG #1540

MMA-optimized batched MRHS MG #1540

Uh oh!

maddyscientist commented Mar 11, 2025

Uh oh!

hummingtree left a comment

Uh oh!

Uh oh!

Uh oh!

MMA-optimized batched MRHS MG #1540

Are you sure you want to change the base?

MMA-optimized batched MRHS MG #1540

Uh oh!

Conversation

maddyscientist commented Mar 11, 2025

Uh oh!

hummingtree left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!