What is overfitting? How to detect overfitting? Explain the way to solve the problem of overfitting

 OverFitting

  • Overfitting means the model has a High accuracy score on training data but a low score on test data. An overfit model has overly memorized the data set it has seen and is unable to generalize the learning to an unseen data set. That is why an overfit model results in very poor test accuracy. Poor test accuracy may occur when the model is highly complex, i.e., the input feature combinations are in a large number and affect the model's flexibility.
  • Overfitting happens when the algorithm used to build a prediction model is very complex and it has overlearned the underlying patterns in training data. Overfitting is an error from sensitivity to small fluctuations in the training set. Overfitting can cause an algorithm to model the random noise in the training data, rather than the intended result. In classification, overfitting happens when algorithms are strongly influenced by the specifics of the training data and try to learn patterns that are noisy and not generalized, and only limited to the training data set.

For example, As shown in the figure below, the model is trained to classify between the circles and crosses, and unlike last time, this time the model learns too well. It even tends to classify the noise in the data by creating an excessively complex model (right).


Detection of overfitting model: 

The parameters to look out for to determine if the model is overfitting or not is similar to those of underfitting ones. These are listed below: 

1. Training and Validation Loss: As already mentioned, it is important to measure the loss of the model during training and validation. A very low training loss but a high validation loss would signify that the model is overfitting.

2. Too Complex Prediction Graph: If a graph is plotted showing the data points and the fitted curve, and the curve is too complex to be the simplest solution that fits the data points appropriately, then the model is overfitting. If every single class is properly classified on the training set by forming a very complex decision boundary, then there is a good chance that the model is overfitting. 

Fix for an overfitting model: 

If the model is overfitting, the developer can take the following steps to recover from the overfitting state:

1. Early Stopping during Training: Allowing the model to train for a high number of iterations may lead to overfitting. Hence it is necessary to stop the model from training when the model has started to overfit. This is done by monitoring the validation loss and stopping the model when the loss stops decreasing over a given number of iterations.

2. Train with more data: Often, the data available for training is less when compared to the model complexity. Hence, in order to get the model to fit appropriately, it is often advisable to increase the training dataset size

3. Train a less complex model: As mentioned earlier, the main reason behind overfitting is excessive model complexity for a relatively less complex dataset. Hence it is advisable to reduce the model complexity in order to avoid overfitting. 

4. Remove features: As a contrast to the steps to avoid underfitting, if the number of features is too many, then the model tends to overfit. Hence, reducing the number of unnecessary or irrelevant features often leads to a better and more generalized model. Deep Learning models are usually not affected by this.

5. Regularization: Regularization is the process of simplification of the model artificially, without losing the flexibility that it gains from having a higher complexity. With the increase in regularization, the effective model complexity decreases and hence prevents overfitting.

6. Ensembling: Ensembling is a Machine Learning method that is used to combine the predictions from multiple separate models. It reduces the model complexity and reduces the errors of each model by taking the strengths of multiple models. Out of multiple ensembling methods, two of the most commonly used are Bagging and Boosting. Boosting attempts to improve the predictive flexibility of simple models. Boosting combines all the weak learners in the sequence to bring out one strong learner. Bagging works by training many strong learners arranged in a parallel pattern and then combining them to optimize their predictions. Bagging attempts to reduce the chance of overfitting complex models. Bagging then combines all the strong learners together to "smooth out" their predictions.


Comments

Popular posts from this blog

Suppose that a data warehouse for Big-University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg_grade. When at the lowest conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg_grade measure stores the actual course grade of the student. At higher conceptual levels, avg_grade stores the average grade for the given combination. a) Draw a snowflake schema diagram for the data warehouse. b) Starting with the base cuboid [student, course, semester, instructor], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each BigUniversity student. c) If each dimension has five levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids)?

Suppose that a data warehouse consists of the four dimensions; date, spectator, location, and game, and the two measures, count and charge, where charge is the fee that a spectator pays when watching a game on a given date. Spectators may be students, adults, or seniors, with each category having its own charge rate. a) Draw a star schema diagram for the data b) Starting with the base cuboid [date; spectator; location; game], what specific OLAP operations should perform in order to list the total charge paid by student spectators at GM Place in 2004?

Discuss classification or taxonomy of virtualization at different levels.