• We are pleased to announce that the winner of our Feedback Prize Draw for the Winter 2024-25 session and winning £150 of gift vouchers is Zhao Liang Tay. Congratulations to Zhao Liang. If you fancy winning £150 worth of gift vouchers (from a major UK store) for the Summer 2025 exam sitting for just a few minutes of your time throughout the session, please see our website at https://www.acted.co.uk/further-info.html?pat=feedback#feedback-prize for more information on how you can make sure your name is included in the draw at the end of the session.
  • Please be advised that the SP1, SP5 and SP7 X1 deadline is the 14th July and not the 17th June as first stated. Please accept out apologies for any confusion caused.

Chapter 16 - GLM

C

Cheng

Member
Hi again,

I have some question from this chapter and hopefully someone can help me about it. I'm pretty confused about the materials.

1) PDF for Tweedie distribution looks nasty. Are we expected to know how to show that this distribution is part of the exponential family? i.e. to find a(.), b(.), c(.) etc?

2) last para of page 34 says 'if this interaction term is insignificant in the model then we would conclude that the effect of policyholder age is the same for every level of randomgroup and that policyholder age is consistent throughout the whole data'

does that mean that 'age' is not a significant factor in the model? what does it mean when a factor is consistent throughout the whole data?

3) page 35 says that deviance residual measures the distance between the actual observation and the fitted value while raw residuals shows the difference between actual and GLM expected values.
It sounds like deviance residual and raw residuals are the same. What are the difference between the two?

4) for residual plots, can i conclude that
i) if the residual plot is symmetrical about the x-axis but is not constant across the width of the fitted values (like the second n third graph), then the distribution is appropriate (ie we have chosen the right exponential family) but just inappropriate parameters
ii) if the residual plot is not symmetrical about the x-axis, this indicates that I've chosen an inappropriate distribution in the exponential family

5) I don't really get the difference between complete and marginal interaction. Can you provide another example?

6) page 49, why is it the case that when one removes the factor with the most exposure, the standard error associated with other parameter estimates are minimised?

7) I don't get the example on aliasing in section 5.4. Can you point me to another example which shows how aliasing works?

Sorry for having so many questions in one go, but hopefully someone can help me with these.

Thanks in advance!
 
Thanks for the paper. It really helps make things clearer =)
 
Hi,

Can someone help to elaborate on:
1. the example under 'near aliasing' on page 49: why the model would estimate large and negative parameter for unknown number of doors and very large positive parameter for unknown colour?
2. the example on aliasing in section 5.4 page 50. I had difficulties in trying to understand how it works - any explanation will be very helpful.
Thanks!
 
To be honest, I don't think you should spend too much time worrying about the statistical detail behind the Core Reading here. The material only gives a brief coverage to indicate the sort of issues that can occur, and to date, the examiners have only asked for a summary - I recommend having a look at Sept 2013's exam (ST8) where aliasing was examined explicitly and you'll see what I mean. If you're really keen to know, you could see McCullagh and Nelder's coverage in their book 'Generalized Linear Models', but this goes far beyond what you need for SP8.
 
Hi,

Can someone help to elaborate on:
1. the example under 'near aliasing' on page 49: why the model would estimate large and negative parameter for unknown number of doors and very large positive parameter for unknown colour?
2. the example on aliasing in section 5.4 page 50. I had difficulties in trying to understand how it works - any explanation will be very helpful.
Thanks!

Basically the GLM's job is to provide relativities to the factors and explain the model based on the inputs that have been provided.

In the GLM example, if you use the model without removing unknown door and unknown color, the problem here is that the software package will not remove those factors (unlike in the extrinsic aliasing case where the software automatically removes the extrinsically aliased factors).

This results in the model assigning relativities to the levels. The problem here is that in this particular case, say 99.9% of the exposure is in unknown/unknown and say 0.1% is in unknown/blue. This means that the model will assign a large positive value for unknown/unknown and a large negative value to unknown/blue to counter the large positive value (because this info to the GLM model is basically garbage).

So you get a model with insane parameter values which make it very difficult to interpret the model
 
Back
Top