diff --git a/1.3 Shallow neural networks.md b/1.3 Shallow neural networks.md index 752f5e3..63af8ec 100644 --- a/1.3 Shallow neural networks.md +++ b/1.3 Shallow neural networks.md @@ -87,20 +87,20 @@ For example, if X has 3 training examples, each exmaple with 2 values. - $a = tanh(z) = \frac{e^z-e^{-z}}{e^z+e^{-z}}$ - Range: a [-1,1] - - Almost always work better than sigmoid because the value between -1 and 1, the activation is close to have mean 0 (The effect is similar to centering yhe data) + - Almost always work better than sigmoid because the value between -1 and 1, the activation is close to have mean 0 (The effect is similar to centering the data) - If z is very lare or z is very small, the slope of the gradient is almost 0, so this can slow down gradient descent. - ReLu - $a = max(0,z)$ - Derivative is almost 0 when z is negative and 1 when z is positive - - Due to the derivative property, it can be faster than tank + - Due to the derivative property, it can be faster than tanh -- Leacy ReLu +- Leaky ReLu Rules of thumb: -- If output is 0,1 value (binary classification) ->sigmoid +- If output is 0,1 value (binary classification) -> sigmoid - If dont know which to use: -> ReLu ## Why non linear activation function? @@ -211,4 +211,4 @@ The general methodology to build a Neural Network is to: predictions = (A2>0.5) ``` - \ No newline at end of file +