PCA

Discussion in 'CS1' started by Molly, Feb 17, 2023.

  1. Molly

    Molly Ton up Member

    Hi all,

    just reading on PCA. we choose components to be uncorrelated linear combinations of the variables that maximise the variance. Im not sure the significance of maximising the variance here, how does that ensure that we havent sacrificed too much information?

    Thanks
     
  2. CapitalActuary

    CapitalActuary Ton up Member

    This is actually a very interesting point. Maximising the variance here is exactly the same thing as minimising the projection error, and it boils down to Pythagoras's theorem.

    Try to follow the below by drawing it out on a piece of paper:

    Each point of data 'x' is some distance 'c' from the centre of all your data 'M'. If we project our point x onto a principal component, we will end up at a point 'p' on that principal component. This will be some distance 'a' from M, and it will be distance 'b' from the original point x. If you draw these out you'll see that M, x and p all form a right-angled triangle. Therefore c^2=a^2+b^2.

    Distance 'c' is fixed by the data, it can't change it we change the principal component. But if we changed the direction of our principal component then we change the position of p, so values a and b will change. The value of a^2 is exactly the variance of the projected point p and the value of b is the projection error, since it's the distance from x to p - i.e. how close is projected point to the original data point. If we want to change our principal component direction to increase a, then we have to decrease b at the same time because c^2=a^2+b^2 and the value of c cannot change. Hence maximising a is the same as minimising b, i.e. maximising the variance of the projected points is the same as minimising their projection error.

    If you struggled to follow the above it is because of a lack of pictures. The top answer on this stackoverflow post is the best explanation of PCA I have ever read: https://stats.stackexchange.com/que...l-component-analysis-eigenvectors-eigenvalues and I would thoroughly recommend it. It covers the above and more, including an intuitive understanding of what PCA is trying to achieve in general.
     
    vidhya36, Andrea Goude and Molly like this.
  3. Molly

    Molly Ton up Member

    Thank you so so much for this, thats very interesting and helpful!! Thank you! :)
     
  4. John Lee

    John Lee ActEd Tutor Staff Member

    This was very cool. Thanks for sharing.
     

Share This Page