CH18 - risks of using data

kiki · Mar 4, 2022

Can someone come up with some examples to help me understand the differences between the following two risks when using historical data ?

- Changes in the balance of any homogeneous groups underlying the data ? is that means of change the mix of business?
- heterogeneity with the group to which the assumption are to relate ?

thank you

Helen Evans · Mar 6, 2022

Hi
Thank you for your post. Lets think about say a pension scheme which has active members who are either manual workers, office staff or executives.

Heterogeneity within the group
If we were examining mortality experience to calculate a mortality rate by age we might create the groupings just by age (particularly if we were low on data) ... so within the group who are age 50 there will be individuals who are workers, staff and execs. (Because we have different types of employee in the group it is heterogeneous.) If over time there is a change in the mix of these workers then we might well expect mortality experience for the group to change so past data is less relevant.

Change in the balance of homogeneous groups
We might create groups which are more homogeneous (assuming we have sufficient data for credibility), ie in this example at each age have a separate group for workers, office staff and execs. Within each group I can now expect stability with past data being a better reflection of future experience. I may though want just one mortality assumption for each age so I use the results of these analyses to calculate a composite mortality rate for active members of the scheme. I need then to be mindful that the balance of homogeneous groups may change over time, ie could be current makeup of the scheme is 50/30/20 split across the employee categories but that the rate would become inappropriate if the split changed in the future to 60/35/5 and I should then recalculate the mortality rate using the revised weightings.

I hope this helps and your studies are going well.

Helen

Bill SD · Dec 25, 2022

Helen Evans said:
Change in the balance of homogeneous groups
We might create groups which are more homogeneous (assuming we have sufficient data for credibility), ie in this example at each age have a separate group for workers, office staff and execs. Within each group I can now expect stability with past data being a better reflection of future experience. I may though want just one mortality assumption for each age so I use the results of these analyses to calculate a composite mortality rate for active members of the scheme. I need then to be mindful that the balance of homogeneous groups may change over time, ie could be current makeup of the scheme is 50/30/20 split across the employee categories but that the rate would become inappropriate if the split changed in the future to 60/35/5 and I should then recalculate the mortality rate using the revised weightings.
Helen

Thanks Helen for the example. I'm still confused why these two are separate risks _the point for both your examples is that they're calculating a single age 50 mortality rate for a range of employee categories!
(the only difference is that in your second example (which i've quoted) there is sufficient data to create homogeneous groups but ultimately they still weight a single mortality rate to apply to a wide heterogeneous group based on these smaller homogeneous groups.)

The Core Reading (chapter 18 page 9) also lists:
- future trends not being reflected sufficiently in past data
- past data may not be sufficiently up to date
What is the difference between these two risks (and if there isn't one, would they be awarded separate marks in an exam just because the core reading lists them separately)?

In the world of motor insurance (rather than benefits), would the following be examples of the different risks of using historical claims data:

- past abnormal events = covid lockdown which temporarily reduced no. of claims due to reduced car use;
- significant random fluctuations = if there was an unusually high/low number of payouts in a past period, for no apparent reason;

- future trends not being reflected sufficiently in past data = in a developing country, past data won't allow for a future increase of more expensive cars (&therefore higher sum assureds&cost of repairs);

- past data may not be sufficiently up to date = past claim sizes don't reflect recent inflationary increases (affecting cost of car repairs, medical injuries etc)

- changes in the way past data recorded - If past claims data doesn't
Change in balance of any homogeneous groups underlying the data
other changes- increased use of public transport/lower speed limits so expect reduced accidents

Bill SD · Dec 25, 2022

Sorry my previous message appears a mess &forum no longer allows me to edit and clear it up.

Also, I've now seen that this section of Core Reading (chapter 18 page 9) is repeated (with more elaboration) in Chapter 19 page 7-11 about risks of using past data to set assumptions.

If easiest, please (ignore my previous post and) point me to a relevant past exam question and i'll attempt it and then look at the Examiners Report. I still think there is duplication in the Core Reading list (and unclear whether similar points would each score in an exam).

CapitalActuary · Jan 2, 2023

There are definitely lists in the core reading where there are essentially duplicate points, and points which are sub-points of each other. I found it very irritating as a student.

I agree that heterogeneity within the group vs balance of homogeneous groups is basically a semantic difference. This is the same point expressed top down vs bottom up. In my opinion for practical purposes this is the same point.

However, in a more abstract way one could argue these are subtly different. If you say there is homogeneity within a group, you don’t need to say anything about whether there are homogenous sub-groups. E.g. I can say that people are a heterogeneous group without talking about discrete groupings you might put people into, like gender, nationality, favourite colour or first language. Whereas if you talk about the different groups of people it automatically implies the heterogeneity of people as a group overall. (I could even name a continuous dimension along which people are heterogeneous, like height, without the use of discrete homogeneous sub-groupings.)

I think in all likelihood if they are listed separately in the core reading you’d get marks for both. It’s not your fault the core reading is unclear and lists what appear to be essentially duplicate points.

I think it would be unfair to say 1) “past data doesn’t reflect future trends” and 2) “past data might not be up to date” are duplicates though.

An example of 1) which wouldn’t be an example for 2): if I were back in 2021 forecasting inflation / consumer prices for 2022 using the previous 20 years of data, the data wouldn’t contain periods of high inflation, and would have missed the strong inflationary trend we saw in 2022. This wasn’t a case of data being out of date. It simply didn’t contain a trend which would be seen in the future.

If you had more data, stretching back to high inflation periods like the 70s, perhaps you would say your past data did reflect future trends. Then the data would have neither limitation 1) nor 2).

There are other examples, e.g. COVID market crash and recovery, where having up to date data wouldn’t have helped you and there isn’t really a comparable global pandemic no matter how far back your dataset goes. That’s an example for both 1) and 2).

If you’re looking at the trend upwards in oil futures prices in 2022 after Russia invaded Ukraine, I’d argue that we’ve seen such upward trends in oil prices before (again… the 70s). However, if you’re trying to forecast 2022/2023 oil prices and you only have data until May 2020, you’re going to be way off because your data is out of date and you’re looking at a world where demand for oil was at a very low point due to global shutdown. This would make the situation an example for 2) but not 1).

Bill SD · Jan 22, 2023

You're right, @Capital Actuary if it gives two marks in an exam, i won't complain about any duplication

Thanks very much for your comprehensive answer -especially appreciate your up-to-date examples! Makes it far easier for me to appreciate the subtle difference between 1) and 2).

CH18 - risks of using data

kiki

Very Active Member

Helen Evans

Ton up Member

Bill SD

Ton up Member

Bill SD

Ton up Member

CapitalActuary

Ton up Member

Bill SD

Ton up Member