Skip to content

Commit

Permalink
chore: update submodules (#187)
Browse files Browse the repository at this point in the history
Co-authored-by: ydcjeff <[email protected]>
  • Loading branch information
github-actions[bot] and ydcjeff authored Sep 27, 2023
1 parent 62fdc34 commit 26820af
Show file tree
Hide file tree
Showing 3 changed files with 6 additions and 6 deletions.
6 changes: 3 additions & 3 deletions src/tutorials/intermediate/01-cifar10-distributed.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ The type of distributed training we will use is called data parallelism in which
>
> -- <cite>[Distributed Deep Learning 101: Introduction](https://towardsdatascience.com/distributed-deep-learning-101-introduction-ebfc1bcd59d9)</cite>
PyTorch provides a [torch.nn.parallel.DistributedDataParallel](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html) API for this task however the implementation that supports different backends + configurations is tedious. In this example, we will see how to can enable data distributed training which is adaptable to various backends in just a few lines of code alongwith:
PyTorch provides a [torch.nn.parallel.DistributedDataParallel](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html) API for this task however the implementation that supports different backends + configurations is tedious. In this example, we will see how to enable data distributed training which is adaptable to various backends in just a few lines of code alongwith:
* Computing training and validation metrics
* Setup logging (and connecting with ClearML)
* Saving the best model weights
Expand Down Expand Up @@ -229,7 +229,7 @@ def get_model(config):

### Optimizer

Then we can setup the optimizer using hyperameters from `config` and pass it through [`auto_optim()`](https://pytorch.org/ignite/generated/ignite.distributed.auto.auto_optim.html#ignite.distributed.auto.auto_optim).
Then we can setup the optimizer using hyperparameters from `config` and pass it through [`auto_optim()`](https://pytorch.org/ignite/generated/ignite.distributed.auto.auto_optim.html#ignite.distributed.auto.auto_optim).


```python
Expand Down Expand Up @@ -651,7 +651,7 @@ torchrun --nproc_per_node=2 main.py run --backend="nccl"



### Run with internal spawining (`torch.multiprocessing.spawn`)
### Run with internal spawning (`torch.multiprocessing.spawn`)

```
python -u main.py run --backend="nccl" --nproc_per_node=2
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ This notebook uses Models, Dataset and Tokenizers from Huggingface, hence they c
## Common Configuration
We maintain a config dictionary which can be extended or changed to store parameters required during training. We can refer back to this code when we will use these parameters later.

In this example we are using ``t5-small``, which has 60M parameters. The way t5 models work is they taske an input with the a task-specific prefix. This prefix (like "Translate English to German") will let our model know which task it needs to perform. For more details refer to the original paper [here](https://arxiv.org/abs/1910.10683).
In this example we are using ``t5-small``, which has 60M parameters. The way t5 models work is they take an input with a task-specific prefix. This prefix (like "Translate English to German") will let our model know which task it needs to perform. For more details refer to the original paper [here](https://arxiv.org/abs/1910.10683).


Here we train on less number of iterations per step and on a limited dataset, this can be modified using the ``train_dataset_length`` and ``epoch_length`` config.
Expand Down Expand Up @@ -254,7 +254,7 @@ The forward pass is wrapped in the autocast context manager for mixed precision
Gradient accumulation is implemented as batch size of 1 would lead to noisy updates otherwise. Check the ``accumulation_steps`` variable in config to define the number of steps to accumulate the gradient.

#### Trainer Handlers
Handlers can be defined and attached directly to the trainer engine. Here we also make use of a special function : `setup_common_training_handlers` which has a lot of the commonly used, useful handlers (like `save_every_iters`, `clear_cuda_cache` etc) already defined. To know more about this function, refer to the docs [here](https://pytorch.org/ignite/contrib/engines.html#ignite.contrib.engines.common.setup_common_training_handlers).
Handlers can be defined and attached directly to the trainer engine. Here we also make use of a special function : `setup_common_training_handlers` which has a lot of the commonly used, useful handlers (like `save_every_iters`, `clear_cuda_cache`, etc) already defined. To know more about this function, refer to the docs [here](https://pytorch.org/ignite/contrib/engines.html#ignite.contrib.engines.common.setup_common_training_handlers).


```python
Expand Down

0 comments on commit 26820af

Please sign in to comment.