Regression 12 - Bias-Variance Tradeoff

We create models making approximations on the equation that would fit a training set. Some common approximations we make are linear(y=W.X), polynomial(given in notes).

How do we validate our model approximation?

A good model should neither underfit(simple models) nor overfit(complex models). 

Bias

Bias shows if a model has a tendency to underfit. To measure bias we create models on different training datasets using k-fold cross validation and measure how much the predicted values of the models created differ from the true value. If they underfit, the models created will be similar and the mean of the predicted values will vary a lot from the true value. That is, they have high bias. A model that fits well will have low bias.

Variance

On the other hand, variance measures if a model has the tendency to overfit. It checks how much the models created in k-fold cross-validation differ from one another. If they overfit, the variance between the predicted models will be high. This normally happens in complex models.

Youtube Link

The bias-variance tradeoff and how to ensure a good model can be viewed in the video below:


Bias-Variance Decomposition in Regression Tasks

The mathematical equation that combines bias and variance into the total expected error for regression models is often represented as follows:

Expected Prediction Error = (Bias^2) + Variance + Irreducible Error

The following pdf explains bias-variance decomposition. That is, the derivation for the equation. Check the last page for the properties and formulae used in the derivation.




Observations from the Equation

1. Increasing bias leads to a higher total predicted error by making the model too simple and unable to capture the underlying patterns of the data (underfitting), while increasing variance raises the error by making the model overly sensitive to fluctuations in the training data (overfitting), failing to generalize well to new data.

2. Decreasing bias and variance optimally can lower the total predicted error, but there's a trade-off: reducing bias too much may increase variance and vice versa, so the goal is to find a balance where the sum of bias squared, variance, and irreducible error is minimized for the best model performance on unseen data.

High Variance:

A model with high variance pays too much attention to the training data, to the extent that it captures the noise in the dataset as well as the underlying pattern. This phenomenon leads to overfitting, where the model performs well on the training data but poorly on unseen data, due to its failure to generalize from the training data to broader patterns.

Measuring Variance

Use Resampling Techniques: Apply techniques like cross-validation or bootstrapping. By repeatedly training the model on different subsets of the data and making predictions on a held-out set, you can measure how much the predictions vary for the same observation across the different trained models.

Variance Estimation: For each point in the validation set, calculate the variance of the predictions from models trained on different subsets. The average of these variances across all points gives an estimate of the model's variance.

High Bias:

A model with high bias oversimplifies the problem, potentially ignoring relevant features or assuming a too simplistic relationship between features and the target outcome. This can result from the model's inability to capture the complexity of the underlying pattern in the data, leading to underfitting. Consequently, a high-bias model performs poorly not only on unseen data but often also on the training data, as it fails to capture the essential relationships within it.

Measuring Bias

Estimate with a Single Model: 

Train your model on the dataset and then make predictions on a validation set. The bias for a regression task can be approximated by calculating the mean difference (error) between the predicted values and the actual values in the validation set. For classification, you might look at the systematic tendency of the model to prefer certain classes over others.

Comparison with Known Benchmarks: 

If the true function or the expected outputs are known (as in synthetic datasets), you can directly compare the model's predictions against these true values.

In theory, an ideal model should have low bias and low variance.

Reality:

However, in practice, it's impossible to achieve perfectly low bias and low variance simultaneously. There's a trade-off between the two:

Reducing bias often leads to increasing variance (more complex models can fit the training data better but might overfit).

Reducing variance often leads to increasing bias (simpler models are less prone to overfitting but might underfit the true pattern).

Techniques to manage and improve the bias-variance trade-off in machine learning models:

  1. Cross-Validation

- Description: A technique for assessing how the results of a statistical analysis will generalize to an independent dataset. It is mainly used in settings where the goal is prediction and one wants to estimate how accurately a predictive model will perform in practice.

- Impact: Helps in determining the model that has the right balance between bias and variance, by evaluating its performance on unseen data.

 2. Regularization (L1, L2, Elastic Net)

- Description: Adds a penalty on the size of coefficients. L1 regularization (Lasso) can lead to sparse models with fewer parameters, L2 regularization (Ridge) reduces the magnitude of coefficients, and Elastic Net combines both.

- Impact: Helps to prevent overfitting by penalizing large weights, thus reducing variance without substantially increasing bias.

 3. Pruning (for Decision Trees)

- Description: Reduces the size of decision trees by removing sections of the tree that provide little power in predicting target variables.

- Impact: Reduces the complexity of the final classifier, which lowers variance and helps to minimize overfitting.

 4. Feature Selection

- Description: Involves selecting the most useful features to train on among existing features. Methods can be filter, wrapper, or embedded methods.

- Impact: Reduces the dimensionality of the data, which can decrease variance and improve model interpretability, albeit sometimes at the cost of a slight increase in bias.

 5. Ensembling Techniques

- Description: Methods like Bagging (Bootstrap Aggregating) and Boosting reduce variance and bias, respectively. Random Forest is an example of bagging, while Gradient Boosting Machines (GBM) are an example of boosting.

- Impact: By combining multiple models, ensembling techniques can often achieve better performance than any single model, balancing bias and variance.

 6. Model Complexity

- Description: Adjusting the complexity of the model, such as the depth of a decision tree or the number of layers/neurons in a neural network.

- Impact: A simpler model (with fewer parameters) tends to have higher bias and lower variance, while a more complex model tends to have lower bias and higher variance. Tuning model complexity can help find a good trade-off.

 7. Training with More Data

- Description: Increasing the size of the training dataset can help reduce variance without increasing bias.

- Impact: With more data, the model can learn the signal better, reducing the chance that it will capture noise as if it were a true signal.

 8. Bayesian Methods and Techniques

- Description: Incorporating prior knowledge through Bayesian approaches can help in managing the trade-off.

- Impact: By considering prior distributions and the likelihood, Bayesian methods can effectively balance bias and variance, especially in cases with limited data.

Each of these techniques has its context where it is more effective, and often, the best approach involves combining several methods to achieve the optimal balance between bias and variance for a specific problem.


Comments

Popular posts from this blog

ANN Series - 10 - Backpropagation of Errors

Regression 10 - Pseudo-Inverse Matrix Approach - Relationship between MLE and LSA - Derivation

Regression 9 - Maximum Likelihood Estimation(MLE) Approach for Regression - Derivation