Help! GLM - Linear predictors

RyuVI · Apr 9, 2007

I'd be really grateful if one of the CT6 tutors (John Lee, Julie Lewis) or a student can help explain to me how linear predictors are determined [especially how the operators *(star) +(plus) and .(dot) work.]

I'll list the practice question from the revision booklet if that's ok as I can't even do this (simple?) example let alone attempt any past paper questions:

An insurer is trying to model the no. of claims on household insurance policies. The model involves covariates of size of house (4 categories), area (urban or rural), and the no. of claims in the last 5 yrs

(i) Write down the no. of parameters and the linear predictor for the following models:

a) size of house
b) size of house + area
c) size of house + area + no. of claims
d) size of house * area + no. of claims
e) size of house * area * no. of claims

Many thanks to anyone that can explain how it works

Julie Lewis · Apr 18, 2007

Size of house is a factor with 4 categories, so the model that just uses the covariate "size of house" is of the form a_i, i=1,2,3,4. It has 4 parameters.

Now let's consider adding in area (that's what the + means). Area is a factor with 2 categories. Whenever we ADD AN EXTRA factor with n categories, we ADD an extra n-1 non-zero parameters to our model. The model is of the form:

a_i + b_j

j=1, 2. But only 1 of the b's is non-zero. So we now have 5 parameters. In fact we could just write this model as a_i +b if we wanted.

No of claims is a variable (ie it takes a numerical value). So when we add this into our model we get:

a_i +b_j +cx

where x = no of claims. We've added in another parameter c, and we now have 6 altogether.

The * notation is just shorthand. Size of house * area means the same as

size of house + area + size.area

where the dot denotes the interaction between the two covariates. Since size corresponds to a subscript of i and area corresponds to a subscript of j, the interaction term is going to be of the form d_ij. So we could write the model size*area + no of claims as:

a_i + b_j +cx +d_ij

However, this is usually written in the simpler form of

e_ij + cx

The e_ij is a term that takes account of size and area. There are 4 times 2 = 8 different e's and 1 c, so the total number of parameters is now 9.

Actually, the * notation makes things easy here - you treat it like a multiplication sign to get the number of parameters.

Finally size* area * no of claims ...

Remember that no of claims is a variable. If this were the only term included in the model, it would be of the form

a+bx

which has 2 parameters.

Starring it with size and area gives a model of the form:

a_ij + b_ij x

There are 8 a's and 8 b's. So we have 16 parameters altogether.

RyuVI · Apr 18, 2007

Hi Julie, thanks a lot for your response... I've surprised myself as i think I actually get it now!

I didn't know the bit about adding a factor with n categories and it was really throwing me off... thanks

hi5 · Apr 18, 2007

Julie Lewis said:
Size of house is a factor........16 parameters altogether.

Thumbs up.
Thanks

Help! GLM - Linear predictors

RyuVI

Member

Julie Lewis

Member

RyuVI

Member

hi5

Member