Chapter 12: Section 7.4 Cumm Devs

jensen · Mar 20, 2009

In the first paragraph, "Purpose":

"...This test can detect overall goodness of fit.. where the fit is not good, this may be due to heterogeneity or duplicates."

What does it mean by heterogeneity or duplicates in this context?

didster · Mar 21, 2009

Don't have the notes to see the context of your quote, but heterogeneity is where you have data for persons that are not "typical". Generally, we prefer homogeneous groups of data where each member in the group has the same characteristics, and in this case same underlying mortality. Heterogeneity is where there is different mortality, eg smokers/nonsmokers.

I imagine that duplicates means duplicate entries in the data. Say someone purchased two (or more) policies, then the data may have two sets of exposed to risk (and death) for the one person.

When modelling, and the fit is not good you should think:

Is it the model?
Is it the data?

We usually focus on the first by trying other graduations, but sometimes the problem may be the second (and you can think about removing data problems, eg duplicates, splitting into homogeneous grouping etc)

jensen · Mar 21, 2009

thanks didster for answering my questions

I can see how heterogeneity could affect the fit; the underlying risks are of different kind so therefore there is effectively (at least) two set of results present in the mortality experience, hence graduation is not perfect.

However if there are duplicates, the points will just be the same, so how does this affect the fit?

didster · Mar 21, 2009

The points aren't the same.
Say you have a group with no duplicates and you calculate the q-value.
If you add a duplicate who survived, you're adding extra Exposed to risk, so the q-value will go down.
If you added a duplicate who dies, you're adding both extra exposed to risk and a death. The addition of a death is much more significant, and the q-value will go up.

Now if you do a proper study you might have millions of lives and for duplicates to make a difference, you'll need loads of duplicates. Arguably, the duplicates should on average cancel each other out. Some of the assumptions aren't as valid anymore though, eg independant lives, so you get more variation. There are other negative effect of duplication. Whether they actually make a practical difference or not depends on the degree of duplication.
If everyone had exactly two policies it wouldn't matter. But if only people of a certain kind had multiple policies, then it may skew the results.

jensen · Mar 21, 2009

didster said:
The points aren't the same.
Say you have a group with no duplicates and you calculate the q-value.
If you add a duplicate who survived, you're adding extra Exposed to risk, so the q-value will go down.
If you added a duplicate who dies, you're adding both extra exposed to risk and a death. The addition of a death is much more significant, and the q-value will go down.

-- do you mean "up"?

Out of curiousity, what do you do normally to remove the effect of duplicates?

didster · Mar 21, 2009

Yes sorry meant up. (edited post above accordingly)

Obvious thing to remove duplicates would be the check and see if they are any. This probably isn't feasible, probably geniune cases of people with same name or DOB, data sets are huge and data may only be in summarised form from each insurer.

Don't know the tricks to allow for duplicates, but you may find mention of them if you read the CMI papers. There is a bit about it in the paper by Forfar, McCutcheon and Wilkie published in the Journal of the Institute of Actuaries in 1988. This is an interesting read that covers a good bit of CT4 material although may be much more detailed than required for the exam.

Chapter 12: Section 7.4 Cumm Devs

jensen

Member

didster

Member

jensen

Member

didster

Member

jensen

Member

didster

Member