• We are pleased to announce that the winner of our Feedback Prize Draw for the Winter 2024-25 session and winning £150 of gift vouchers is Zhao Liang Tay. Congratulations to Zhao Liang. If you fancy winning £150 worth of gift vouchers (from a major UK store) for the Summer 2025 exam sitting for just a few minutes of your time throughout the session, please see our website at https://www.acted.co.uk/further-info.html?pat=feedback#feedback-prize for more information on how you can make sure your name is included in the draw at the end of the session.
  • Please be advised that the SP1, SP5 and SP7 X1 deadline is the 14th July and not the 17th June as first stated. Please accept out apologies for any confusion caused.

Help! GLM - Linear predictors

R

RyuVI

Member
I'd be really grateful if one of the CT6 tutors (John Lee, Julie Lewis) or a student can help explain to me how linear predictors are determined [especially how the operators *(star) +(plus) and .(dot) work.]

I'll list the practice question from the revision booklet if that's ok as I can't even do this (simple?) example let alone attempt any past paper questions:

An insurer is trying to model the no. of claims on household insurance policies. The model involves covariates of size of house (4 categories), area (urban or rural), and the no. of claims in the last 5 yrs

(i) Write down the no. of parameters and the linear predictor for the following models:

a) size of house
b) size of house + area
c) size of house + area + no. of claims
d) size of house * area + no. of claims
e) size of house * area * no. of claims


Many thanks to anyone that can explain how it works
 
Size of house is a factor with 4 categories, so the model that just uses the covariate "size of house" is of the form a_i, i=1,2,3,4. It has 4 parameters.

Now let's consider adding in area (that's what the + means). Area is a factor with 2 categories. Whenever we ADD AN EXTRA factor with n categories, we ADD an extra n-1 non-zero parameters to our model. The model is of the form:

a_i + b_j

j=1, 2. But only 1 of the b's is non-zero. So we now have 5 parameters. In fact we could just write this model as a_i +b if we wanted.

No of claims is a variable (ie it takes a numerical value). So when we add this into our model we get:

a_i +b_j +cx

where x = no of claims. We've added in another parameter c, and we now have 6 altogether.

The * notation is just shorthand. Size of house * area means the same as

size of house + area + size.area

where the dot denotes the interaction between the two covariates. Since size corresponds to a subscript of i and area corresponds to a subscript of j, the interaction term is going to be of the form d_ij. So we could write the model size*area + no of claims as:

a_i + b_j +cx +d_ij

However, this is usually written in the simpler form of

e_ij + cx

The e_ij is a term that takes account of size and area. There are 4 times 2 = 8 different e's and 1 c, so the total number of parameters is now 9.

Actually, the * notation makes things easy here - you treat it like a multiplication sign to get the number of parameters.

Finally size* area * no of claims ...

Remember that no of claims is a variable. If this were the only term included in the model, it would be of the form

a+bx

which has 2 parameters.

Starring it with size and area gives a model of the form:

a_ij + b_ij x

There are 8 a's and 8 b's. So we have 16 parameters altogether.
 
Hi Julie, thanks a lot for your response... I've surprised myself as i think I actually get it now!

I didn't know the bit about adding a factor with n categories and it was really throwing me off... thanks :D
 
Back
Top