Welcome to Overfitting and Underfitting. In this video, you will learn about underfitting and overfitting models, using polynomial regression as an example, and how those concepts relate to the bias-variance trade-off. You’ll also learn how to select the best polynomial order and how to identify problems that arise with selecting the wrong order polynomial. Let’s start with a visual that demonstrates the idea of overfitting, underfitting, and ideal balance for a regression model. This model is underfitting because it misses the clear curvature in the relationship between the predictor and response. This model can neither fit the training data nor generalize to new data. An underfit model will have poor performance on the training data. Underfitting is often not discussed because it is easy to detect given a good performance metric. The remedy is to move on and try to alternate machine learning algorithms. This model is overfitting. This overly complicated model can predict its own data very well – but that is because the model you see is catching a lot of the specific random behavior in this dataset. This model too closely corresponds to the training data and thereby fails to generalize on test data. It isn’t hard to imagine that a new data set, generated by the same process but exhibiting different random behavior, would be poorly predicted by even this complicated model. The last model appears to be the preferred one to generalize and make predictions from. This is because it captures the systemic trend in the predictor/response relationship. You can see high bias resulting in an oversimplified model (that is, underfitting); high variance resulting in overcomplicated models (that is, overfitting); and lastly, striking the right balance between bias and variance. However, there is a dilemma: You want to avoid overfitting because it gives too much predictive power to specific quirks in your data. And you want to avoid underfitting because you will ignore important general features in your data. How do we balance the two? This is known as the bias-variance trade-off. High bias corresponds to underfitting (your predictions are too vague to account for general pattern that do exist in the sample) and high variance corresponds to overfitting (your predictions are so specific that they only reflect your specific sample). In summary, the bias-variance trade-off is about balancing and finding a sweet spot between errors due to bias and errors due to variance. Let's look at a plot of the mean square error for the training and testing set for different model complexities. The training error decreases with the complexity of the model. The error decreases until the best model complexity is determined, and then the error begins to increase. The region on the left, where both training and validation errors are high, is the region of high bias. On the other hand, the region on the right where test error is high, but training error is low, is the region of high variance. A model that is underfit will have high training and high testing error while an overfit model will have extremely low training error but a high testing error. In the end, you want to be in the sweet spot in the middle. To demonstrate these concepts, let’s use the Cars dataset in R to predict the distance (or dist) it takes for cars to stop given their speed. This example uses a simple polynomial model with degree 0 that predicts the distance using one value, the average distance over the whole dataset. In R, you can use ggplot() to visualize this simple model by using geom_hline() and setting the horizonal line’s y-intercept to be the mean of the distances. On the right is the plot created from the code. The x-axis is the speed, the y axis is distance, and the simple model is the red line. From this graph, you can see that the model is underfitting since, for all the speeds less than 10, the predicted stopping speed would be much higher than in reality and for speeds greater than 20, the predicted stopping distance would be much lower. There are several ways to prevent underfitting in the context of linear regression. First, you can increase the model complexity. For example, instead of using a linear function with a polynomial with degree 1, you can use a polynomial with a higher degree. Or you can switch from a linear to a non-linear model. Another option is to add more features. Your model may be underfitting because the training data is too simple. It may lack the features that will make the model detect the relevant patterns to make accurate predictions. You can also try different models, such as regression trees or random forest. Coming back to the Cars dataset in R, you can now try using a more complicated model, a polynomial with 8 degrees. Using ggplot(), you can use the geom_smooth() function to create a model. The method “lm” is for linear models, and the formula uses poly() to introduce greater degrees of polynomials, in this case up to 8. Looking at the graph created on the right, the model in red looks like it is overfitting. The model is fitting to the points in the top right. If this model received new speeds, it may not be able to predict accurate distances. There are a few ways to prevent overfitting. One way is to reduce the model complexity. For example, if the model is a high-degree polynomial, you can decrease the degree. Another option is collecting more data; however, this method is considered expensive and not always possible. A third option is to perform cross-validation. Cross-validation is a powerful preventative measure against overfitting. Finally, the regularization technique is another way to prevent overfitting. Regularization refers to a broad range of techniques for artificially forcing your model to be simpler. Going back to the example with the Cars dataset, you can reduce the complexity of the model. In the previous overfitting example, a polynomial model of 8 degrees was used. Instead, you can use a polynomial of degree 1 or a simple linear regression model. In R, you can set the formula to y over x. Now, from the plot on the right, you can see that the model is generalizing more to the data. In this example, we demonstrated how you can prevent overfitting and underfitting models by changing the model complexity. In this video, you learned that the bias-variance trade-off is finding the right balance between overfitting, which gives too much predictive power to specific quirks in your data, and underfitting, which ignores important general features in your data. You can prevent overfitting and underfitting models by changing the model complexity and employing other strategies.