In this tutorial, we practice fine-tuning a large language model. We will use a selection of techniques to allow us to train models that would not otherwise fit in GPU memory:
- gradient accumulation
- reduced precision
- activation checkpointing
- CPU offload
- parameter efficient fine tuning
- distributed training across multiple GPUs
Follow along at Large-scale model training on Chameleon.
This lab has two parts:
single/: single-GPU large-model training, requires an A100 80GB or H100 GPUmulti/: multi-GPU large-model training, requires a 4x A100 80GB or 4x H100 GPU
This material is based upon work supported by the National Science Foundation under Grant No. 2230079.