This is actually a very interesting point. Maximising the variance here is exactly the same thing as minimising the projection error, and it boils down to Pythagoras's theorem.
Try to follow the below by drawing it out on a piece of paper:
Each point of data 'x' is some distance 'c' from the centre of all your data 'M'. If we project our point x onto a principal component, we will end up at a point 'p' on that principal component. This will be some distance 'a' from M, and it will be distance 'b' from the original point x. If you draw these out you'll see that M, x and p all form a right-angled triangle. Therefore c^2=a^2+b^2.
Distance 'c' is fixed by the data, it can't change it we change the principal component. But if we changed the direction of our principal component then we change the position of p, so values a and b will change. The value of a^2 is exactly the variance of the projected point p and the value of b is the projection error, since it's the distance from x to p - i.e. how close is projected point to the original data point. If we want to change our principal component direction to increase a, then we have to decrease b at the same time because c^2=a^2+b^2 and the value of c cannot change. Hence maximising a is the same as minimising b, i.e. maximising the variance of the projected points is the same as minimising their projection error.
If you struggled to follow the above it is because of a lack of pictures. The top answer on this stackoverflow post is the best explanation of PCA I have ever read:
https://stats.stackexchange.com/que...l-component-analysis-eigenvectors-eigenvalues and I would thoroughly recommend it. It covers the above and more, including an intuitive understanding of what PCA is trying to achieve in general.
Click to expand...