Generalised Linear Models revision booklet - practice question

Discussion in 'CT6' started by MindFull, Aug 31, 2010.

  1. MindFull

    MindFull Ton up Member

    Hi All,

    Just had a query re: the covariates and factors. The practice question has covariates of the size of a house (4 categories), the area of house (urban or rural) and the # of claims made. When writing out the linear predictor, for the factor - area, only beta is used (instead of beta-i). I was just wondering why that was. I thought that area had two parameters, urban and rural, so shouldn't it have a subscript... any help?

    Thanks.
     
  2. John Lee

    John Lee ActEd Tutor Staff Member

    I agree it's a little confusing - you could put a j subscript and that would be marked correct also.

    When you add a new covariate you always lose one parameter as we are unable to estimate it.

    For example suppose we had size and area and the constants for each of the categories were:

    Area Rural: 80
    Area Urban: 120

    Size 1: 50
    Size 2: 100
    Size 3: 200
    Size 4: 500

    Then we would get the following totals for the following data:

    R1: 130 U1: 170
    R2: 180 U2: 220
    R3: 280 U3: 320
    R4: 580 U4: 620

    Now if you gave this data to someone and asked them to estimate the constants they wouldn't be able to individually identify all of them.

    R1-R4 would tell you that the sizes go up by 50, 100, 300
    As would U1-U4

    Comparing R1 and U1 would tell you that Urban is 50 more than Rural.
    As would R2 and U2, etc.

    But we wouldn't be able to work out exactly how R and 1 were split to make the 130 and so on (the equations are linearly dependent).

    So what we do is set one of the parameters equal to zero (effectively we absorb it into the other constants). So suppose we set Rural to zero. Then we would get:

    Area Rural: 0
    Area Urban: 40

    Size 1: 130
    Size 2: 180
    Size 3: 280
    Size 4: 580

    This would give exactly the same answers as before:

    R1: 130 U1: 170
    R2: 180 U2: 220
    R3: 280 U3: 320
    R4: 580 U4: 620

    Hence, we only have one parameter for area - which is called beta and is included if you are in an urban area and but not if you are in a rural area. The subscript would help us identify that but would perhaps mislead us into thinking there are 2 parameters there. Hence it was omitted. But once again please be assured you would get the marks if you did put it on.

    Clear as mud?
     
  3. MindFull

    MindFull Ton up Member

    I think I've gotten it a little better now. So for instance, if the ques. find the expected value of the number of claims from a policyholder who has a house of size 3 and who doesn't live in an urban area, then we'd just use the estimate that we got for "alpha - 3", and forget about the beta?

    One more thing. Why is it that when you add a new covariate you always lose one parameter as we are unable to estimate it?

    Thanks so much.
     
    Last edited: Sep 1, 2010
  4. John Lee

    John Lee ActEd Tutor Staff Member

    Yes as long as you are using the *new* alpha 3 result (and not the old one before you added the rural/urban covariate

    Because the equations are linearly dependent - and so there will never be enough information to separate out two of the constants.
     
    Last edited: Apr 5, 2011
  5. RyuVI

    RyuVI Member

    Please excuse me for butting in Jem/Mr Lee, I haven't actually read the thread but I just wanted to add that I was stuck on this question myself a few years back and Julie's response was a great help at the time:

    http://www.acted.co.uk/forums/showthread.php?t=802

    (Have forgotten it all now though of course!)
     
  6. MindFull

    MindFull Ton up Member

    Thanks alot everyone. :)
     
  7. Simon C

    Simon C Member

    Further queries on linear predictors

    Hi

    John and Julie's explanations below about how to specify models for the linear predictor in parameterised form are very helpful but I'm still struggling with a few aspects of this topic! I'd be very grateful for help with these queries:

    1) The Core Reading on Ch10 p21 of the current ActEd notes states that the parameterised form of the linear predictor for age * sex would be ai + Bix. Why would it not be ai + Bx + Bix?

    2) Looking at the solution to question 10.12, where the same variable appears twice in the linear predictor, how many non-zero parameters would this model have? I understand how we'd lose a parameter each time we add a factor to a model, but am not sure what happens when we add either the same variable more than once (as in 10.12) or a further new variable.

    3) The model for two factors, sex * vehicle group, is given as having a parameterised form ai + Bj + gij. However in practice wouldn't we only be able to estimate a model of the form aij which can take four non-zero parameter values? This seems to be the principle that Q&A 3.17 follows.

    Thanks
    Simon
     
  8. John Lee

    John Lee ActEd Tutor Staff Member

    It could be (although the Bi is not the same in each version) however the CR one is simpler:

    ai + Bx +Bix = ai + (B + Bi)x = ai + (B'i)x

    I don't blame you - it is a bit confusing! I've got 42 parameters.

    Bunching them together:

    (ai + cj) + (bi + dj)x

    In each of the brackets we have 2 + (20-1) = 21 parameters.

    Or from the original formula:

    age * (sex + vehicle) = 2 × (2 + 19) = 42

    Yes, essentially we need a parameter for every combination of gender and vehicle group. Personally I like aij which gives he final value for each combination. However some people prefer to write it as ai + Bj + gij where gij is the interaction effects only.
     
  9. Simon C

    Simon C Member

    Thanks very much John. I think I'm getting there though am still not finding this topic very intuitive!

    Re my first question below, in practice would we actually be able to estimate separately the Bx and Bix effects? I'm assuming we'd only be able to estimate the overall B'ix (which would make it more logical why the Core Reading gives the model ai + B'ix).

    To see if I have got the hang of this, if we changed the model in my second question below to age * sex * vehicle type, would we have a model of the form aij + bijx where each of aij and bij had 40 different possible values reflecting the 2 * 20 combinations of sex and vehicle type i.e. 80 non-zero parameters in total?

    Thanks again for your help.
     
  10. John Lee

    John Lee ActEd Tutor Staff Member

    No we can't individually estimate them.

    As an example here are 5 results:

    Male age 20 = 2090
    Male age 30 = 3110
    Female age 20 = 1065
    Female age 30 = 1585
    Female age 50 = 2625

    Can you individually estimate the 5 parameters in ai + (b + bi)x?

    Spot on!
     

Share This Page