In Page 17 of chapter 21, I read a sentence - "In a normal linear regression model, as we include more variables, the proportion of the variance in the dependent variable that is explained cannot decrease." What do they mean by this statement?

I think it just means that as you include more variables in your regression model, these will (naturally) pick up more of the marginal effects relating specifically to each variable added in order help to explain the response variable. So it's only by removing variables that your variance is potentially going to increase.

Very intuitive answer given! Just to add to that, there are good theoretical reasons why this must be the case: 1. Definition of the model methodology We are using OLS to fit the model. If we have an initial model with p variables and consider adding an additional one, in the very worst case we can get back to the original model by setting the p original parameters equal to their prior values and the parameter in respect of the new variable to 0. Thus we at the very least we will have as good a model as we did previously, the only option is to get better (from the perspective mentioned in the notes - adding more variables this way doesn't mean an overall "better model"). This can also be seen by consideration of the relevant formulae. 2. Understanding the extremes It doesn't just have to be variables that have any "real" explanatory power that can cause overfitting issues. If you start with a simple linear regression model with one variable and say 32 observations, then adding 31 white noise variables to the model will result in a perfect fit (all residuals being 0). This is because it effectively boils down to solving a system of 32 simultaneous equations in 32 variables to get all the fitted values to match perfectly. You can see this yourself with some R code using one of the built in data sets. The overfitted model will have 0 sum of squared residuals. You can mess around with the number of parameters to see the change as we go up to 32 in total (31 white noise). This model is pretty useless (not surprisingly) for anything other than the data points given, it would not be useful for predictions. Hence the issues with overfitting. # check data head(mtcars) # quick linear model mpg_hp <- lm(mpg ~ hp,data = mtcars) # some benchmark measurement mean((mpg_hp$residuals)^2) # also anova anova(mpg_hp) # test overfitted model (can change parameter input) params <- 31 # generate white noise x <- matrix(rnorm(params*length(mtcars$mpg)), ncol=params) # fit model mpg_hp_overfit <- lm(mtcars$mpg ~ mtcars$hp + x) # test benchmark and anova mean((mpg_hp_overfit$residuals)^2) anova(mpg_hp_overfit) summary(mpg_hp_overfit)