chore: update submodules (#188)

github-actions[bot] · ydcjeff · web-flow · commit b908cda33302 · 2023-09-28T14:26:28.000+02:00
Co-authored-by: ydcjeff &lt;ydcjeff@users.noreply.github.com&gt;
diff --git a/src/tutorials/advanced/01-collective-communication.md b/src/tutorials/advanced/01-collective-communication.md
@@ -160,7 +160,7 @@ idist.spawn(backend="gloo", fn=broadcast_example, args=(), nproc_per_node=3)
     Rank 1, After performing broadcast: hello from rank 0
 
 
-For a real world use case, let's assume you need to gather the predicted and actual values from all the processes on rank 0 for computing a metric and avoiding a memory error. You can do do this by first using `all_gather()`, then computing the metric and finally using `broadcast()` to share the result with all processes. `src` below refers to the rank of the source process.
+For a real world use case, let's assume you need to gather the predicted and actual values from all the processes on rank 0 for computing a metric and avoiding a memory error. You can do this by first using `all_gather()`, then computing the metric and finally using `broadcast()` to share the result with all processes. `src` below refers to the rank of the source process.
 
 
 ```python
diff --git a/src/tutorials/intermediate/03-reinforcement-learning.md b/src/tutorials/intermediate/03-reinforcement-learning.md
@@ -83,7 +83,7 @@ from pyvirtualdisplay import Display
 
 ## Configurable Parameters
 
-We will use there values later in the tutorial at appropriate places.
+We will use these values later in the tutorial at appropriate places.
 
 
 ```python
@@ -133,7 +133,7 @@ env = wrap_env(env)
 
 ## Model
 
-We are going to utilize the reinforce algorithm in which our agent will use episode samples from starting state to goal state directly from the environment. Our model has two linear layers with 4 in features and 2 out features for 4 state variables and 2 actions respectively. We also define an action buffer as `saved_log_probs` and a rewards one. We also have an intermediate ReLU layer through which the outputs of the 1st layer are passed to receive the score for each action taken. Finally, we return a list of probabilities for each of these actions.
+We are going to utilize the reinforce algorithm in which our agent will use episode samples from starting state to goal state directly from the environment. Our model has two linear layers with 4 in features and 2 out features for 4 state variables and 2 actions respectively. We also define an action buffer as `saved_log_probs` and `rewards`. We also have an intermediate ReLU layer through which the outputs of the 1st layer are passed to receive the score for each action taken. Finally, we return a list of probabilities for each of these actions.
 
 
 
@@ -261,7 +261,7 @@ def reset_environment_state():
     trainer.state.ep_reward = 0
 ```
 
-When an episode finishes, we update the running reward and perform backpropogation by calling `finish_episode()`.
+When an episode finishes, we update the running reward and perform backpropagation by calling `finish_episode()`.
 
 
 ```python
diff --git a/static/examples b/static/examples
@@ -1 +1 @@
-Subproject commit c14668681466493db92e48075a0f472867894fce
+Subproject commit 49e03253d1d5e18a448aae3914886f2388b33a68