Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/guides/checkpointing.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,13 +228,14 @@ uv run torchrun --nproc-per-node=2 examples/llm_finetune/finetune.py --step_sche

## Saving Checkpoints When Using Docker

When training inside a Docker container (see [Installation Guide](installation.md)), any files written to the container's filesystem are lost when the container exits (especially with `--rm`). To keep your checkpoints, **bind-mount a host directory** to the checkpoint path before starting the container:
When training inside a Docker container (see [Installation Guide](installation.md)), any files written to the container's filesystem are lost when the container exits (especially with `--rm`). To keep your checkpoints, You must **bind-mount a host directory** to the checkpoint path before starting the container:

```bash
docker run --gpus all -it --rm \
--shm-size=8g \
-v "$(pwd)"/checkpoints:/opt/Automodel/checkpoints \
nvcr.io/nvidia/nemo-automodel:25.11.00
```

You can also set a custom checkpoint directory via the YAML config or CLI override:
```yaml
Expand Down
4 changes: 4 additions & 0 deletions docs/guides/llm/finetune.md
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,10 @@ In these scenarios, you can pass `is_meta_device: true` in the model config. The
## Run the Fine-Tune Recipe
Assuming the above `yaml` is saved in a file named `sft_guide.yaml` (or `peft_guide.yaml` if you want to do PEFT), you can run the fine-tuning workflow either using the AutoModel CLI or by directly invoking the recipe Python script.

:::{note}
**Fine-tuning in Docker.** When you run inside the NeMo AutoModel container, checkpoints are lost when the container exits unless you save them on the host. Use a bind-mount for your checkpoint directory (see [Install with NeMo Docker Container](../installation.md#install-with-nemo-docker-container)) and set `checkpoint.checkpoint_dir` to that path. Full details: [Saving Checkpoints When Using Docker](../checkpointing.md#saving-checkpoints-when-using-docker).
:::

### AutoModel CLI

When NeMo AutoModel is installed on your system, it includes the `automodel` CLI program that you
Expand Down