Merge branch 'master' into qlora

flairNLP · Jul 15, 2024 · 6bcc677 · 6bcc677
2 parents 3d4e785 + 08b45e9
commit 6bcc677
Show file tree

Hide file tree

Showing 2 changed files with 2 additions and 2 deletions.
diff --git a/docs/tutorial/tutorial-embeddings/transformer-embeddings.md b/docs/tutorial/tutorial-embeddings/transformer-embeddings.md
@@ -111,7 +111,7 @@ torch.Size([1536])
 torch.Size([9984])
 ```
 
-I.e. the size of the embedding increases the mode layers we use (but ONLY if layer_mean is set to False, otherwise the length is always the same).
+I.e. the size of the embedding increases the more layers we use (but ONLY if layer_mean is set to False, otherwise the length is always the same).
 
 (pooling)=
 ### Pooling operation

diff --git a/docs/tutorial/tutorial-training/how-model-training-works.md b/docs/tutorial/tutorial-training/how-model-training-works.md
@@ -252,7 +252,7 @@ trainer.train('resources/taggers/example-upos',
               max_epochs=10)
 ```
 
-This will launch a "standard training run" with SGD as optimizer. By default, the learning rate is annealed against the development score: if fo 3 epochs there is no improvement on the dev split, the learning rate is halved. If this happens too often, the learning rate will fall below a minimal threshold and training stops early.
+This will launch a "standard training run" with SGD as optimizer. By default, the learning rate is annealed against the development score: if for 3 epochs there is no improvement on the dev split, the learning rate is halved. If this happens too often, the learning rate will fall below a minimal threshold and training stops early.
 
 The max_epochs parameter is set to a small number in this script to make it run fast, but normally you should use a much higher value (150 or 200).