Useful nvidia smi queries

Memory usage

To get the total size of all jobs on the GPU:

nvidia-smi --query-compute-apps=used_memory --format=csv,noheader | cut -d " " -f 1

(Then paste into a Python variable x, sum as sum(map(int, x.split())), and convert MB to GB) If you want a rough idea you can use numfmt (something like --from=iec --to=si --suffix=B)

GPU_CORE_SIZES=$(nvidia-smi --query-compute-apps=used_memory --format=csv,noheader | cut -d " " -f 1)
python -c 'print(sum(map(int, """'"$GPU_CORE_SIZES"'""".split())))'

or to convert to GB

GPU_CORE_SIZES=$(nvidia-smi --query-compute-apps=used_memory --format=csv,noheader | cut -d " " -f 1)
python -c 'print(sum(map(int, """'"$GPU_CORE_SIZES"'""".split())) / (2**10), end="GB\n")'

As a bashrc function:

function gpu_memory_usage {
	GPU_CORE_SIZES=$(nvidia-smi --query-compute-apps=used_memory --format=csv,noheader | cut -d " " -f 1)
	python -c 'print(sum(map(int, """'"$GPU_CORE_SIZES"'""".split())) / (2**10), end="GB\n")'
}

which you can then just loop over while your jobs run to get a sense of the distribution over time

for x in {1..10000}; do clear; gpu_memory_usage; sleep 1; done

See also nvtop (Installing nvtop)

Main page

Useful nvidia smi queries

Memory usage

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!