Lin1007 · dani-capellan · Jan 3, 2021
diff --git a/1.3 Shallow neural networks.md b/1.3 Shallow neural networks.md
@@ -87,20 +87,20 @@ For example, if X has 3 training examples, each exmaple with 2 values.
 
   - $a = tanh(z) = \frac{e^z-e^{-z}}{e^z+e^{-z}}$
   - Range: a [-1,1]
-  - Almost always work better than sigmoid because the value between -1 and 1, the activation is close to have mean 0 (The effect is similar to centering yhe data)
+  - Almost always work better than sigmoid because the value between -1 and 1, the activation is close to have mean 0 (The effect is similar to centering the data)
   - If z is very lare or z is very small, the slope of the gradient is almost 0, so this can slow down gradient descent.
 
 - ReLu
 
   - $a = max(0,z)$
   - Derivative is almost 0 when z is negative and 1 when z is positive
-  - Due to the derivative property, it can be faster than tank
+  - Due to the derivative property, it can be faster than tanh
 
-- Leacy ReLu
+- Leaky ReLu
 
 Rules of thumb:
 
-- If output is 0,1 value (binary classification) ->sigmoid
+- If output is 0,1 value (binary classification) -> sigmoid
 - If dont know which to use: -> ReLu
 
 ## Why non linear activation function?
@@ -211,4 +211,4 @@ The general methodology to build a Neural Network is to:
    predictions = (A2>0.5)
    ```
 
-
+