Softmax Function - First Derivative

 Logits

They refer to the raw output produced by a machine learning model. This is before we normalize the data to the expected form.

For Example, consider an ANN for classifying into 3 classes. The output layer may have 3 output neurons with a linear activation function. This is a logits. Note this will be a vector of 3 real values. 

Softmax function

This function is used when we want to interpret the output of a model as the probability for various classes. This is specifically useful in a multi-class classification problem. It is a squashing function that squashes the values to fall between [0,1] and sum to 1 across all classes. It evaluates the probability of choosing a class by using the logit across all the classes. The final predicted class has the highest probability.

This is commonly used in ANN as the last layer of a multi-class classification problem. The node with the highest probability represents the chosen class.

Probability Distribution

A probability distribution satisfies two properties, namely,

1. Non-negative 

The use of exponential in the softmax function ensure that negative numbers do not occur. In fact, the values are probabilities between 0 and 1.

2. Normalized

The sum of the probabilities of the individual classes must be 1.  The denominator in the softmax function ensures this by taking a summation over all the classes.

This ensures that the softmax is a probability distribution over Multiple Classes for a given input.

The following pdf shows the softmax function along with the first derivative. 



Comments

Popular posts from this blog

ANN Series - 10 - Backpropagation of Errors

Regression 10 - Pseudo-Inverse Matrix Approach - Relationship between MLE and LSA - Derivation

Regression 9 - Maximum Likelihood Estimation(MLE) Approach for Regression - Derivation