Softmax Function - First Derivative
Logits
They refer to the raw output produced by a machine learning model. This is before we normalize the data to the expected form.
For Example, consider an ANN for classifying into 3 classes. The output layer may have 3 output neurons with a linear activation function. This is a logits. Note this will be a vector of 3 real values.
Softmax function
This function is used when we want to interpret the output of a model as the probability for various classes. This is specifically useful in a multi-class classification problem. It is a squashing function that squashes the values to fall between [0,1] and sum to 1 across all classes. It evaluates the probability of choosing a class by using the logit across all the classes. The final predicted class has the highest probability.
This is commonly used in ANN as the last layer of a multi-class classification problem. The node with the highest probability represents the chosen class.
Probability Distribution
A probability distribution satisfies two properties, namely,
1. Non-negative
The use of exponential in the softmax function ensure that negative numbers do not occur. In fact, the values are probabilities between 0 and 1.
2. Normalized
The sum of the probabilities of the individual classes must be 1. The denominator in the softmax function ensures this by taking a summation over all the classes.
This ensures that the softmax is a probability distribution over Multiple Classes for a given input.
The following pdf shows the softmax function along with the first derivative.
Comments
Post a Comment