Skip to content

Commit d607276

Browse files
committed
fix: Update numerical outputs, adjust PyTorch code examples, and correct typos across learning content files.
1 parent d6502ad commit d607276

File tree

6 files changed

+22
-22
lines changed

6 files changed

+22
-22
lines changed

public/content/learn/attention-mechanism/applying-attention-weights/applying-attention-weights-content.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -72,9 +72,9 @@ Think of values as the "payload" - the actual content we'll extract.
7272
output = attn_weights @ V
7373

7474
print(output)
75-
# tensor([[2.2000, 3.2000],
76-
# [2.8000, 3.8000],
77-
# [2.6000, 3.6000]])
75+
# tensor([[2.4000, 3.4000],
76+
# [3.2000, 4.2000],
77+
# [2.8000, 3.8000]])
7878
```
7979

8080
**Shape transformation:**
@@ -96,7 +96,7 @@ Position 0 output:
9696
= [2.4, 3.4]
9797
```
9898
99-
**PyTorch output:** [2.2, 3.2] (small difference due to rounding in display)
99+
**PyTorch output:** [2.4, 3.4] (matches perfectly!)
100100
101101
**What happened:**
102102
- Position 0 mostly retrieves from V[0] (weight 0.5)

public/content/learn/attention-mechanism/calculating-attention-scores/calculating-attention-scores-content.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -202,16 +202,16 @@ print(attn_weights)
202202
**After softmax (each row sums to 1):**
203203
```
204204
Pos0 Pos1 Pos2
205-
Query0 [0.576, 0.212, 0.212] ← Mostly attends to position 0
206-
Query1 [0.212, 0.576, 0.212] ← Mostly attends to position 1
205+
Query0 [0.506, 0.186, 0.308] ← Mostly attends to position 0
206+
Query1 [0.186, 0.506, 0.308] ← Mostly attends to position 1
207207
Query2 [0.333, 0.333, 0.333] ← Attends equally to all
208208
```
209209

210210
### Understanding the Result
211211

212212
**Position 0:**
213213
- Query matched Key0 best (score 2.0 before scaling)
214-
- After softmax: 57.6% attention to position 0
214+
- After softmax: 50.6% attention to position 0
215215

216216
**Position 2:**
217217
- Query matched all keys equally (scores all 1.0)

public/content/learn/attention-mechanism/multi-head-attention/multi-head-attention-content.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ import torch
6464
import torch.nn as nn
6565

6666
# Single-head attention: One attention pattern
67-
single_head = nn.MultiheadAttention(embed_dim=512, num_heads=1)
67+
single_head = nn.MultiheadAttention(embed_dim=512, num_heads=1, batch_first=True)
6868
```
6969

7070
**With 1 head:**
@@ -74,7 +74,7 @@ single_head = nn.MultiheadAttention(embed_dim=512, num_heads=1)
7474

7575
```python
7676
# Multi-head attention: 8 parallel attention patterns!
77-
multi_head = nn.MultiheadAttention(embed_dim=512, num_heads=8)
77+
multi_head = nn.MultiheadAttention(embed_dim=512, num_heads=8, batch_first=True)
7878
```
7979

8080
**With 8 heads:**
@@ -84,13 +84,13 @@ multi_head = nn.MultiheadAttention(embed_dim=512, num_heads=8)
8484

8585
```python
8686
# Test both
87-
x = torch.randn(10, 32, 512) # (seq_len=10, batch=32, embed_dim=512)
87+
x = torch.randn(32, 10, 512) # (batch=32, seq_len=10, embed_dim=512)
8888

8989
single_output, _ = single_head(x, x, x)
9090
multi_output, _ = multi_head(x, x, x)
9191

92-
print(f"Single head output: {single_output.shape}") # torch.Size([10, 32, 512])
93-
print(f"Multi-head output: {multi_output.shape}") # torch.Size([10, 32, 512])
92+
print(f"Single head output: {single_output.shape}") # torch.Size([32, 10, 512])
93+
print(f"Multi-head output: {multi_output.shape}") # torch.Size([32, 10, 512])
9494
```
9595

9696
**Same output shape!** But multi-head is more expressive.

public/content/learn/math/functions/functions-content.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ For x = -1:
5353

5454
f(-1) = 2(-1) + 3 = -2 + 3 = 1
5555

56-
Now image a function that takes in "Cat sat on a" and returns "mat" - that function would be a lot more difficult to create, but neural networks (LLMs) can learn it.
56+
Now imagine a function that takes in "The cat sat on a" and returns "mat" - that function would be a lot more difficult to create, but neural networks (LLMs) can learn it.
5757

5858
### Example 2: Quadratic Function f(x) = x² + 2x + 1
5959

@@ -109,7 +109,7 @@ Previous quadratic function will always give 9 if x=2 and nothing else.
109109

110110
## Code Examples
111111

112-
Our 2 functions coded in python, if you are unfamiliar with python you can skip the code, next module will focus on python.
112+
Our 2 functions coded in Python, if you are unfamiliar with Python you can skip the code, next module will focus on Python.
113113

114114
```python
115115
# Linear function: f(x) = 2x + 3
@@ -348,23 +348,23 @@ def cosine_function(x):
348348

349349
![Trigonometric Functions](/content/learn/math/functions/trigonometric-functions.png)
350350

351-
This is used in Rotory Positional Embeddings (RoPE) - LLM is using it to know the order of words (tokens) in the text.
351+
This is used in Rotary Positional Embeddings (RoPE) - LLM is using it to know the order of words (tokens) in the text.
352352

353353

354354

355355

356356

357357

358358

359-
Functions are using in neural networks a lot: forward propagation, backward propagation, attention, activation functions, gradients, and many more.
359+
Functions are used in neural networks a lot: forward propagation, backward propagation, attention, activation functions, gradients, and many more.
360360

361361
You don't need to learn them yet, just check them out.
362362

363363
### 1. Sigmoid Function
364364

365365
![Sigmoid Formula](/content/learn/math/functions/sigmoid-formula.png)
366366

367-
**e** is a famous constant (Euler's number) used in math everywhere, it's value is approximately 2.718
367+
**e** is a famous constant (Euler's number) used in math everywhere, its value is approximately 2.718
368368

369369
**f(x) = 1 / (1 + e^(-x))**
370370

@@ -379,7 +379,7 @@ def sigmoid_derivative(x):
379379

380380
![Sigmoid Function and Derivative](/content/learn/math/functions/sigmoid-function-derivative.png)
381381

382-
We will learn derivativers in the next lesson, but I included the images here - derivative tells you how fast the function is changing - you see that when sigmoid function is growing fastest (in the middle), the derivative value is spiking.
382+
We will learn derivatives in the next lesson, but I included the images here - derivative tells you how fast the function is changing - you see that when sigmoid function is growing fastest (in the middle), the derivative value is spiking.
383383

384384
Just look at the slope of the function, if it's big (changing fast), the derivative will be big.
385385

public/content/learn/neuron-from-scratch/making-a-prediction/making-a-prediction-content.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ input_data = torch.tensor([[1.0, 2.0]]) # New data point
7979
prediction = neuron(input_data)
8080

8181
print(prediction)
82-
# tensor([[0.8176]]) ← Prediction!
82+
# tensor([[0.8581]]) ← Prediction!
8383
```
8484

8585
**Manual calculation:**

public/content/learn/neuron-from-scratch/the-linear-step/the-linear-step-content.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ z = torch.dot(w, x) + b
7777
# OR: z = (w * x).sum() + b
7878

7979
print(z)
80-
# tensor(1.1000)
80+
# tensor(1.4000)
8181
```
8282

8383
**Manual calculation:**
@@ -367,12 +367,12 @@ with torch.no_grad():
367367
# Predict price
368368
predicted_price = price_neuron(house_features)
369369
print(predicted_price)
370-
# tensor([[540000.]]) ← $540,000 prediction
370+
# tensor([[590000.]]) ← $590,000 prediction
371371

372372
# Manual calculation:
373373
# 2000×200 + 3×50000 + 10×(-1000) + 50000
374374
# = 400,000 + 150,000 - 10,000 + 50,000
375-
# = 590,000 (close to our result!)
375+
# = 590,000 (perfect match!)
376376
```
377377

378378
**What the weights learned:**

0 commit comments

Comments
 (0)