ANN Series 5 - Sigmoid Activation Function - First order derivative of sigmoid and tanh

- February 08, 2024

Linear Activation Functions

For classification problems, linear functions like the identity function can have Separating Plane to be linear hyperplane. With a single layer of ANN can only it can produce a line.

A model composed of only linear functions, no matter how many layers it has, cannot learn non-linear boundaries because the composition of linear functions is itself a linear function. However, the introduction of non-linear activation functions between layers enables neural networks to learn non-linear separators.

They are suitable for regression tasks, especially in the output layer for problems expecting a continuous output range.

Sigmoid Functions

Sigmoid Functions are mathematical function with S shaped curve(sigmoid curve)

Types of Sigmoid functions

1. Logistic function

Output between 0 and 1
f(x) = 1/(1+𝑒^(−𝑥) )

2. Hyperbolic tangent

Output between -1 and 1
f(x) = (𝑒^𝑥−𝑒^(-𝑥))/(𝑒^𝑥+𝑒^(-𝑥))

Effect of w and b on logistic function

When we use the logistic function it gives an S-Shaped curve as shown below.

The graph above demonstrates how different weights (w) and biases (b) affect the structure of the sigmoid function. Each curve represents a sigmoid function with a specific combination of weight and bias:

Weight(w) :

Determines the slope of the sigmoid curve. A larger weight makes the transition from 0 to 1 steeper, indicating that the function becomes more sensitive to changes in the input (x). This is evident as we compare the curves for (w=0.5), (w=1), and (w=2), where the steepness increases with the weight.

Bias(b) :

Shifts the curve left or right along the (x)-axis. A positive bias shifts the curve to the left, meaning the sigmoid function reaches the transition region (where it goes from near 0 to near 1) at lower input values. Conversely, a negative bias shifts the curve to the right. This effect is shown by the different positions of the curves for biases (b=-2), (b=0), and (b=2).

By adjusting the weight and bias, one can control the sensitivity and the threshold at which the sigmoid function activates, making these parameters crucial for training neural networks to fit various data patterns.

Python Code:

The Python code used to produce the graph can be found here:

ml-course/sigmoid.ipynb at main · lovelynrose/ml-course (github.com)

First Order Derivative of the Logistic Function

First Order Derivative of the tanh Function

Effect on Gradient Descent

The gradient descent algorithm requires the first order derivative of the activation function to make the weight updation. We see that the first order derivative of the sigmoid functions can be written in terms of the sigmoid functions. This means that once you have computed the sigmoid function's value for any given input x, you can easily compute its derivative without going through the differentiation process again.

Example

For x=0.5. the values of the logistic (sigmoid) function and the hyperbolic tangent (tanh⁡) functions are:

Logistic (Sigmoid) value: 0.6224593312018546

Hyperbolic Tangent (tanh⁡) value: 0.46211715726000974

For x=0.5, the first-order derivatives of the logistic (sigmoid) function and the hyperbolic tangent (tanh) function are:

Derivative of the Logistic (Sigmoid) function: 0.2350037122015945

We can derive the same answer by substituting in:

f'(x) = f(x)(1-f(x)

= 0.6224593312018546 (1- 0.6224593312018546 )

= 0.2350037122015945

Derivative of the Hyperbolic Tangent (tanh⁡) function: 0.7864477329659274

We get the same answer by substituting in the formula:

tanh'(x) = 1-tanh(x)^2

= 1 - 0.46211715726000974^ 2

= 0.7864477329659274

Search This Blog

Machine Learning - Simplified