Logistic Regression
Logistic regression is a statistical method for binary classification. The classes can be considered as Y={0,1}. For a dataset X, we are trying to find
P(Y=1|X)
Logistic Regression Workflow
Linear Combination(logit)
The logit is calculated first as shown in the blog on Softmax. This is a linear combination of the input features given by
logit = t = w0 + w1 x1 + w2 x2 + ... + wn xn
This can be a value between (-INF, +INF). This value also reflects the log odds of P(Y=1|X).
Odds
The odds of an event occurring is the ratio between the probability of the event occurring to the probability of the event not occurring.
Odds = {P(Y=1|X} / {1 - P(Y=1|X}
Log Odds
This is the natural logarithm of the odds. In logistic regression we see that the log odds of P(Y=1|X) is the logit(t). The proof can be found in the pdf below.
Sigmoid Transformation
The logit(t) is passed through a sigmoid function to get a value between 0 and 1 and is interpreted as the probability of getting class Y=1. i.e. P(Y=1|X). In other words, the logistic function converts the log odds into a probability.
The following pdf gives the mathematical formulation of the logistic regression.
Interpretation of the coefficients in terms of odds ratios(AI Generated Portion)
The interpretation of the coefficients in terms of odds
ratios is what makes logistic regression particularly insightful for
understanding how various predictors influence the likelihood of different
outcomes.
1. Positive Coefficient (t>0):
When a coefficient is positive, its exponentiated form (e^t) will be greater than 1. This means that for every one-unit increase in the predictor variable, the odds of the outcome occurring (assuming the outcome is represented as (Y=1)) are multiplied by e^t, indicating a positive association between the predictor and the outcome. The greater the value of e^t, the stronger the positive association.
2. Negative Coefficient (t<0):
If a coefficient
is negative, e^t will be less than 1. This indicates that for every
one-unit increase in the predictor variable, the odds of the outcome are
multiplied by a value less than 1, effectively decreasing. Hence, there's a
negative association between the predictor and the outcome; the predictor's
increase is associated with a decrease in the odds of the outcome.
When a coefficient is zero, e^t equals 1. This scenario means that the predictor has no effect on the odds of the outcome; there's no association between the predictor variable and the likelihood of the outcome occurring.
This interpretation of the coefficients in terms of odds ratios is what makes logistic regression particularly insightful for understanding how various predictors influence the likelihood of different outcomes.
Comments
Post a Comment