Very intuitive answer given! Just to add to that, there are good theoretical reasons why this must be the case:
1. Definition of the model methodology
We are using OLS to fit the model. If we have an initial model with p variables and consider adding an additional one, in the very worst case we can get back to the original model by setting the p original parameters equal to their prior values and the parameter in respect of the new variable to 0. Thus we at the very least we will have as good a model as we did previously, the only option is to get better (from the perspective mentioned in the notes - adding more variables this way doesn't mean an overall "better model"). This can also be seen by consideration of the relevant formulae.
2. Understanding the extremes
It doesn't just have to be variables that have any "real" explanatory power that can cause overfitting issues. If you start with a simple linear regression model with one variable and say 32 observations, then adding 31 white noise variables to the model will result in a perfect fit (all residuals being 0). This is because it effectively boils down to solving a system of 32 simultaneous equations in 32 variables to get all the fitted values to match perfectly.
You can see this yourself with some R code using one of the built in data sets. The overfitted model will have 0 sum of squared residuals. You can mess around with the number of parameters to see the change as we go up to 32 in total (31 white noise).
This model is pretty useless (not surprisingly) for anything other than the data points given, it would not be useful for predictions. Hence the issues with overfitting.
# check data
head(mtcars)
# quick linear model
mpg_hp <- lm(mpg ~ hp,data = mtcars)
# some benchmark measurement
mean((mpg_hp$residuals)^2)
# also anova
anova(mpg_hp)
# test overfitted model (can change parameter input)
params <- 31
# generate white noise
x <- matrix(rnorm(params*length(mtcars$mpg)), ncol=params)
# fit model
mpg_hp_overfit <- lm(mtcars$mpg ~ mtcars$hp + x)
# test benchmark and anova
mean((mpg_hp_overfit$residuals)^2)
anova(mpg_hp_overfit)
summary(mpg_hp_overfit)
Last edited by a moderator: Jan 8, 2019