Skip to content

Commit b908cda

Browse files
chore: update submodules (#188)
Co-authored-by: ydcjeff <[email protected]>
1 parent 26820af commit b908cda

File tree

3 files changed

+5
-5
lines changed

3 files changed

+5
-5
lines changed

src/tutorials/advanced/01-collective-communication.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@ idist.spawn(backend="gloo", fn=broadcast_example, args=(), nproc_per_node=3)
160160
Rank 1, After performing broadcast: hello from rank 0
161161

162162

163-
For a real world use case, let's assume you need to gather the predicted and actual values from all the processes on rank 0 for computing a metric and avoiding a memory error. You can do do this by first using `all_gather()`, then computing the metric and finally using `broadcast()` to share the result with all processes. `src` below refers to the rank of the source process.
163+
For a real world use case, let's assume you need to gather the predicted and actual values from all the processes on rank 0 for computing a metric and avoiding a memory error. You can do this by first using `all_gather()`, then computing the metric and finally using `broadcast()` to share the result with all processes. `src` below refers to the rank of the source process.
164164

165165

166166
```python

src/tutorials/intermediate/03-reinforcement-learning.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ from pyvirtualdisplay import Display
8383

8484
## Configurable Parameters
8585

86-
We will use there values later in the tutorial at appropriate places.
86+
We will use these values later in the tutorial at appropriate places.
8787

8888

8989
```python
@@ -133,7 +133,7 @@ env = wrap_env(env)
133133

134134
## Model
135135

136-
We are going to utilize the reinforce algorithm in which our agent will use episode samples from starting state to goal state directly from the environment. Our model has two linear layers with 4 in features and 2 out features for 4 state variables and 2 actions respectively. We also define an action buffer as `saved_log_probs` and a rewards one. We also have an intermediate ReLU layer through which the outputs of the 1st layer are passed to receive the score for each action taken. Finally, we return a list of probabilities for each of these actions.
136+
We are going to utilize the reinforce algorithm in which our agent will use episode samples from starting state to goal state directly from the environment. Our model has two linear layers with 4 in features and 2 out features for 4 state variables and 2 actions respectively. We also define an action buffer as `saved_log_probs` and `rewards`. We also have an intermediate ReLU layer through which the outputs of the 1st layer are passed to receive the score for each action taken. Finally, we return a list of probabilities for each of these actions.
137137

138138

139139

@@ -261,7 +261,7 @@ def reset_environment_state():
261261
trainer.state.ep_reward = 0
262262
```
263263

264-
When an episode finishes, we update the running reward and perform backpropogation by calling `finish_episode()`.
264+
When an episode finishes, we update the running reward and perform backpropagation by calling `finish_episode()`.
265265

266266

267267
```python

0 commit comments

Comments
 (0)