Help! GLM - Linear predictors

Discussion in 'CT6' started by RyuVI, Apr 9, 2007.

  1. RyuVI

    RyuVI Member

    I'd be really grateful if one of the CT6 tutors (John Lee, Julie Lewis) or a student can help explain to me how linear predictors are determined [especially how the operators *(star) +(plus) and .(dot) work.]

    I'll list the practice question from the revision booklet if that's ok as I can't even do this (simple?) example let alone attempt any past paper questions:

    An insurer is trying to model the no. of claims on household insurance policies. The model involves covariates of size of house (4 categories), area (urban or rural), and the no. of claims in the last 5 yrs

    (i) Write down the no. of parameters and the linear predictor for the following models:

    a) size of house
    b) size of house + area
    c) size of house + area + no. of claims
    d) size of house * area + no. of claims
    e) size of house * area * no. of claims


    Many thanks to anyone that can explain how it works
     
  2. Julie Lewis

    Julie Lewis Member

    Size of house is a factor with 4 categories, so the model that just uses the covariate "size of house" is of the form a_i, i=1,2,3,4. It has 4 parameters.

    Now let's consider adding in area (that's what the + means). Area is a factor with 2 categories. Whenever we ADD AN EXTRA factor with n categories, we ADD an extra n-1 non-zero parameters to our model. The model is of the form:

    a_i + b_j

    j=1, 2. But only 1 of the b's is non-zero. So we now have 5 parameters. In fact we could just write this model as a_i +b if we wanted.

    No of claims is a variable (ie it takes a numerical value). So when we add this into our model we get:

    a_i +b_j +cx

    where x = no of claims. We've added in another parameter c, and we now have 6 altogether.

    The * notation is just shorthand. Size of house * area means the same as

    size of house + area + size.area

    where the dot denotes the interaction between the two covariates. Since size corresponds to a subscript of i and area corresponds to a subscript of j, the interaction term is going to be of the form d_ij. So we could write the model size*area + no of claims as:

    a_i + b_j +cx +d_ij

    However, this is usually written in the simpler form of

    e_ij + cx

    The e_ij is a term that takes account of size and area. There are 4 times 2 = 8 different e's and 1 c, so the total number of parameters is now 9.

    Actually, the * notation makes things easy here - you treat it like a multiplication sign to get the number of parameters.

    Finally size* area * no of claims ...

    Remember that no of claims is a variable. If this were the only term included in the model, it would be of the form

    a+bx

    which has 2 parameters.

    Starring it with size and area gives a model of the form:

    a_ij + b_ij x

    There are 8 a's and 8 b's. So we have 16 parameters altogether.
     
  3. RyuVI

    RyuVI Member

    Hi Julie, thanks a lot for your response... I've surprised myself as i think I actually get it now!

    I didn't know the bit about adding a factor with n categories and it was really throwing me off... thanks :D
     
  4. hi5

    hi5 Member

    Thumbs up.
    Thanks
     

Share This Page