Skip to content
maddyscientist edited this page Nov 24, 2021 · 2 revisions

The MILC NERSC RHMC benchmark concerning running a 2+1+1 flavor HISQ-improved staggered fermion simulation. There are four benchmarks, small, medium, large and x-large, with each subsequent benchmark 16x larger than the prior. We can thus strong scale by running the same benchmark on different process counts or weak scale by running the different benchmarks with the same local volume per process, e.g., running the large benchmark on 16x more GPUs than the medium benchmark.

Benchmark Volume
Small 18^3 x 36
Medium 36^3 x 72
Large 72^3 x 144
X-Large 144^3 x 288

When running with MILC on QUDA, the following routines are offloaded to QUDA

  • Multi-shift CG solver
  • CG solver
  • Gauge Force
  • Fermion Force
  • Gauge update
  • Reunitarization

Monte Carlo Algorithm Details

  • The two-flavor determinant contribution is preconditioned by the strange quark
  • All fermionic contributions are including using RHMC.
  • A two-level time integration is used, with a second-order minimum norm integrator employed (Omelyan) on both levels. The gauge force is applied on the fine time scale, with all fermionic contributions on the coarse timescale.

Solver Details

  • All fermionic contributions in the RHMC utilize a mixed-precision multi-shift CG algorithm, where the multi-shift solver is run in double-single precision, with per-shift refinement applied in double-half precision.
  • The solves required as part of inline measurement at the end of each trajectory are performed using mixed-precision (double-half) CG.

Medium

The medium benchmark is suitable for scaling up to 16 GPUs.

Machine Nodes MPI processes GPU #GPU Time (s)
Selene 1 1 NVIDIA A100-80 1 2260
Selene 1 2 NVIDIA A100-80 2 1319
Selene 1 4 NVIDIA A100-80 4 700
Selene 1 8 NVIDIA A100-80 8 394

Large

The large benchmark is suitable for scaling up to 512 GPUs.

Machine Nodes MPI processes GPU #GPU Time (s)
Selene 4 32 NVIDIA A100-80 32 1913
Selene 8 64 NVIDIA A100-80 64 1015
Selene 16 128 NVIDIA A100-80 128 651
Selene 32 256 NVIDIA A100-80 256 433
Selene 64 512 NVIDIA A100-80 512 320
Clone this wiki locally