Regression Terminology : Residual
Residuals:
In the context of linear regression, residuals are the differences between the observed values of the target variable and the values predicted by the model. They represent the error in the predictions.
Consider the dataset with one independent variable x and a dependent variable(t) denoted by sin(2πx) given in lecture xxx.
x | 0 | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | 1 |
t=sin(2πx) | 0 | 0.5878 | 0.9511 | 0.9511 | 0.5878 | 0 | -0.5878 | -0.9511 | -0.9511 | -0.5878 | 0 |
Here the actual target is "t". Assume the final answer predicted by the model for x=0.2 is y=0.8, then the residual is
(t-y)
resulting in
(0.9511 - 0.8) = 0.1511
Like this we can calculate the residual for every given x.
Assumption of Normality of Residuals:
One of the key assumptions in linear regression is that these residuals are normally distributed. This assumption is particularly important for making inferences, like hypothesis testing and constructing confidence intervals. If the residuals are normally distributed, it implies that the model's errors are random and not biased.
For the Polynomial curve fitting problem in lessson xx, we get the distribution of the residuals for 100 points as follows. We see that it follows a normal distribution.
When residuals are normally distributed, it often indicates that the model is appropriate for the data. If the residuals show a non-normal distribution, it might suggest issues with the model, such as omitted variables, incorrect functional form, or outliers.
ml-course/residual.ipynb at main · lovelynrose/ml-course (github.com)
Comments
Post a Comment