diff --git a/docs/guides/checkpointing.md b/docs/guides/checkpointing.md index a5e4b3502..623582824 100644 --- a/docs/guides/checkpointing.md +++ b/docs/guides/checkpointing.md @@ -228,13 +228,14 @@ uv run torchrun --nproc-per-node=2 examples/llm_finetune/finetune.py --step_sche ## Saving Checkpoints When Using Docker -When training inside a Docker container (see [Installation Guide](installation.md)), any files written to the container's filesystem are lost when the container exits (especially with `--rm`). To keep your checkpoints, **bind-mount a host directory** to the checkpoint path before starting the container: +When training inside a Docker container (see [Installation Guide](installation.md)), any files written to the container's filesystem are lost when the container exits (especially with `--rm`). To keep your checkpoints, you must **bind-mount a host directory** to the checkpoint path before starting the container: ```bash docker run --gpus all -it --rm \ --shm-size=8g \ -v "$(pwd)"/checkpoints:/opt/Automodel/checkpoints \ nvcr.io/nvidia/nemo-automodel:25.11.00 +``` You can also set a custom checkpoint directory via the YAML config or CLI override: ```yaml diff --git a/docs/guides/llm/finetune.md b/docs/guides/llm/finetune.md index 3ca81700e..a5ad7fe23 100644 --- a/docs/guides/llm/finetune.md +++ b/docs/guides/llm/finetune.md @@ -282,6 +282,10 @@ In these scenarios, you can pass `is_meta_device: true` in the model config. The ## Run the Fine-Tune Recipe Assuming the above `yaml` is saved in a file named `sft_guide.yaml` (or `peft_guide.yaml` if you want to do PEFT), you can run the fine-tuning workflow either using the AutoModel CLI or by directly invoking the recipe Python script. +:::{note} +**Fine-tuning in Docker.** When you run inside the NeMo AutoModel container, checkpoints are lost when the container exits unless you save them on the host. Use a bind-mount for your checkpoint directory (see [Install with NeMo Docker Container](../installation.md#install-with-nemo-docker-container)) and set `checkpoint.checkpoint_dir` to that path. Full details: [Saving Checkpoints When Using Docker](../checkpointing.md#saving-checkpoints-when-using-docker). +::: + ### AutoModel CLI When NeMo AutoModel is installed on your system, it includes the `automodel` CLI program that you