Chapter 16 - GLM

Discussion in 'SP8' started by Cheng, Jul 22, 2012.

  1. Cheng

    Cheng Member

    Hi again,

    I have some question from this chapter and hopefully someone can help me about it. I'm pretty confused about the materials.

    1) PDF for Tweedie distribution looks nasty. Are we expected to know how to show that this distribution is part of the exponential family? i.e. to find a(.), b(.), c(.) etc?

    2) last para of page 34 says 'if this interaction term is insignificant in the model then we would conclude that the effect of policyholder age is the same for every level of randomgroup and that policyholder age is consistent throughout the whole data'

    does that mean that 'age' is not a significant factor in the model? what does it mean when a factor is consistent throughout the whole data?

    3) page 35 says that deviance residual measures the distance between the actual observation and the fitted value while raw residuals shows the difference between actual and GLM expected values.
    It sounds like deviance residual and raw residuals are the same. What are the difference between the two?

    4) for residual plots, can i conclude that
    i) if the residual plot is symmetrical about the x-axis but is not constant across the width of the fitted values (like the second n third graph), then the distribution is appropriate (ie we have chosen the right exponential family) but just inappropriate parameters
    ii) if the residual plot is not symmetrical about the x-axis, this indicates that I've chosen an inappropriate distribution in the exponential family

    5) I don't really get the difference between complete and marginal interaction. Can you provide another example?

    6) page 49, why is it the case that when one removes the factor with the most exposure, the standard error associated with other parameter estimates are minimised?

    7) I don't get the example on aliasing in section 5.4. Can you point me to another example which shows how aliasing works?

    Sorry for having so many questions in one go, but hopefully someone can help me with these.

    Thanks in advance!
     
  2. Pede

    Pede Member

  3. Cheng

    Cheng Member

    Thanks for the paper. It really helps make things clearer =)
     
  4. indexo

    indexo Member

    Hi,

    Can someone help to elaborate on:
    1. the example under 'near aliasing' on page 49: why the model would estimate large and negative parameter for unknown number of doors and very large positive parameter for unknown colour?
    2. the example on aliasing in section 5.4 page 50. I had difficulties in trying to understand how it works - any explanation will be very helpful.
    Thanks!
     
  5. Ian Senator

    Ian Senator ActEd Tutor Staff Member

    To be honest, I don't think you should spend too much time worrying about the statistical detail behind the Core Reading here. The material only gives a brief coverage to indicate the sort of issues that can occur, and to date, the examiners have only asked for a summary - I recommend having a look at Sept 2013's exam (ST8) where aliasing was examined explicitly and you'll see what I mean. If you're really keen to know, you could see McCullagh and Nelder's coverage in their book 'Generalized Linear Models', but this goes far beyond what you need for SP8.
     
  6. padasala

    padasala Ton up Member

    Basically the GLM's job is to provide relativities to the factors and explain the model based on the inputs that have been provided.

    In the GLM example, if you use the model without removing unknown door and unknown color, the problem here is that the software package will not remove those factors (unlike in the extrinsic aliasing case where the software automatically removes the extrinsically aliased factors).

    This results in the model assigning relativities to the levels. The problem here is that in this particular case, say 99.9% of the exposure is in unknown/unknown and say 0.1% is in unknown/blue. This means that the model will assign a large positive value for unknown/unknown and a large negative value to unknown/blue to counter the large positive value (because this info to the GLM model is basically garbage).

    So you get a model with insane parameter values which make it very difficult to interpret the model
     
    Ian Senator likes this.

Share This Page