ANN Series 5 - Sigmoid Activation Function - First order derivative of sigmoid and tanh
Linear Activation Functions
- For classification problems, linear functions like the identity function can have Separating Plane to be linear hyperplane. With a single layer of ANN can only it can produce a line.
- A model composed of only linear functions, no matter how many layers it has, cannot learn non-linear boundaries because the composition of linear functions is itself a linear function. However, the introduction of non-linear activation functions between layers enables neural networks to learn non-linear separators.
- They are suitable for regression tasks, especially in the output layer for problems expecting a continuous output range.
Sigmoid Functions
Sigmoid Functions are mathematical function with S shaped curve(sigmoid curve)Types of Sigmoid functions
1. Logistic function
Output between 0 and 1f(x) = 1/(1+π^(−π₯) )
2. Hyperbolic tangent
Output between -1 and 1f(x) = (π^π₯−π^(-π₯))/(π^π₯+π^(-π₯))
Effect of w and b on logistic function
When we use the logistic function it gives an S-Shaped curve as shown below.The graph above demonstrates how different weights (w) and biases (b) affect the structure of the sigmoid function. Each curve represents a sigmoid function with a specific combination of weight and bias:
Weight(w) :
Determines the slope of the sigmoid curve. A larger weight makes the transition from 0 to 1 steeper, indicating that the function becomes more sensitive to changes in the input (x). This is evident as we compare the curves for (w=0.5), (w=1), and (w=2), where the steepness increases with the weight.
Bias(b) :
Shifts the curve left or right along the (x)-axis. A positive bias shifts the curve to the left, meaning the sigmoid function reaches the transition region (where it goes from near 0 to near 1) at lower input values. Conversely, a negative bias shifts the curve to the right. This effect is shown by the different positions of the curves for biases (b=-2), (b=0), and (b=2).
By adjusting the weight and bias, one can control the sensitivity and the threshold at which the sigmoid function activates, making these parameters crucial for training neural networks to fit various data patterns.
Python Code:
The Python code used to produce the graph can be found here:ml-course/sigmoid.ipynb at main · lovelynrose/ml-course (github.com)
First Order Derivative of the Logistic Function
First Order Derivative of the tanh Function
Effect on Gradient Descent
The gradient descent algorithm requires the first order derivative of the activation function to make the weight updation. We see that the first order derivative of the sigmoid functions can be written in terms of the sigmoid functions. This means that once you have computed the sigmoid function's value for any given input x, you can easily compute its derivative without going through the differentiation process again.
Example
For x=0.5. the values of the logistic (sigmoid) function and the hyperbolic tangent (tanh) functions are:
Logistic (Sigmoid) value: 0.6224593312018546
Hyperbolic Tangent (tanh) value: 0.46211715726000974
For x=0.5, the first-order derivatives of the logistic (sigmoid) function and the hyperbolic tangent (tanh) function are:
Derivative of the Logistic (Sigmoid) function:
0.2350037122015945
We can derive the same answer by substituting in:
f'(x) = f(x)(1-f(x)
= 0.6224593312018546 (1- 0.6224593312018546 )
= 0.2350037122015945
Derivative of the Hyperbolic Tangent (tanh)
function: 0.7864477329659274
We get the same answer by substituting in the formula:
tanh'(x) = 1-tanh(x)^2
= 1 - 0.46211715726000974^ 2
= 0.7864477329659274
Comments
Post a Comment