Graduation September 2019 CS2A Q6

dan r · Sep 18, 2020

Hi !
I was wondering if someone could help me with the past paper September CS2A - Question 6
For the chi square test - we use the Exc * national mortality rate in the Q= expected deaths
but for cumulative deviations test - we use we use the Exc * 'graduated' national mortality rate? Is this because it ~ Normal distribution?

Also, I was wondering how to calculate the graduated national mortality rate? When I do the cumulative deviations test, I am not getting the same answers as the examiners report.

thank you

dan r · Sep 18, 2020

Also, if someone could help me understand where they got z bar 1 and z bar 2 for the serial correlations? and z_x+1?

thank you!!

Andrew Martin · Sep 18, 2020

Hello

Cumulative deviations

Not sure what you mean by graduated national mortality rate - there isn't any graduation going on in this question. The expected deaths used are the same for the goodness-of-fit and cumulative deviations tests. Sorry if I'm not understanding properly, can you be a bit more specific about the issue? If you're not getting the same answers in the cumulative deviation test, can you provide your calculations?

Serial correlations

I've written out a lot of formula in this answer. It is a real mess to write complicated formulae out on the forum so please feel free to point out if anything looks wrong.

There appears to be an error in the Examiners' report (as of the time of writing this post). They appear to be using the exact formula, ie:

\( r_j = \frac{\sum_{i=1}^{m-j} (z_i - \bar z^{(1)})(z_{i+j} - \bar z^{(2)})}
{\sqrt{ \sum_{i=1}^{m-j} (z_i - \bar z^{(1)})^2 \sum_{i=1}^{m-j} (z_{i+j} - \bar z^{(2)})^2 }} \)

where \( \bar z^{(1)} = \frac{1}{m-j} \sum_{i=1}^{m-j} z_i \)
and \( \bar z^{(2)} = \frac{1}{m-j} \sum_{i=1}^{m-j} z_{i+j} \)

for the case where \( j = 1 \). This formula can be found on page 40 of Chapter 10 in the Course Notes.

Calculating these for this question:

\( \bar z^{(1)} = \frac{1}{m-1} \sum_{i=1}^{m-1} z_i = \frac{1}{m-1} (0.02460 + 0.00779 + ... + -1.02989) = -0.62034 \)
\( \bar z^{(2)} = \frac{1}{m-1} \sum_{i=1}^{m-1} z_{i+1} = \frac{1}{m-1} (0.00779 + ... + -1.02989 + -1.21842) = -0.75845 \)

For the numerator we need \( z_1 - \bar z^{(1)} , z_2 - \bar z^{(1)}, ..., z_9 - \bar z^{(1)} \). They'll be a few sets of numbers so I will call this set of numbers set 1:

\(
0.64494 \\
0.62812 \\
0.67433 \\
1.14825 \\
0.01533 \\
-0.81279 \\
-1.00294 \\
-0.88568 \\
-0.40956
\)

where, for example, the last number is \( z_9 - \bar z^{(1)} \) which is \(-1.02989 - -0.62034 = -0.40956 \).

We also need \( z_2 - \bar z^{(2)} , z_3 - \bar z^{(2)}, ..., z_{10} - \bar z^{(2)} \). We'll call this set of numbers set 2:

\(
0.76624 \\
0.81245 \\
1.28636 \\
0.15344 \\
-0.67467 \\
-0.86483 \\
-0.74757 \\
-0.27144 \\
-0.45997
\)

where, for example, the last number is \( z_{10} - \bar z^{(2)} \) which is \(-1.21842 - -0.75845 = -0.45997 \).

Next we need the product of these numbers. Specifically, we need: \( (z_1 - \bar z^{(1)})(z_{2} - \bar z^{(2)}), (z_2 - \bar z^{(1)})(z_{3} - \bar z^{(2)}), ..., (z_9 - \bar z^{(1)})(z_{10} - \bar z^{(2)}) \). We'll call this set of numbers set 3:

\(
0.49418 \\
0.51032 \\
0.86744 \\
0.17619 \\
-0.01034 \\
0.70292 \\
0.74977 \\
0.24041 \\
0.18838 \\
\)

where, for example, the first entry is: \( (z_1 - \bar z^{(1)})(z_{2} - \bar z^{(2)}) = 0.64494 * 0.76624 = 0.49418 \).

The numerator is the sum of these, which is 3.9193. As of the time of writing, the Examiners' report incorrectly appears to have an extra row in their table and so they get 3.46565 instead. The difference is the -0.45361 in the final column of the last row.

For the denominator we need: \( \sum_{i=1}^{m-1} (z_i - \bar z^{(1)})^2 \). This is the sum of the square of the numbers in set 1. This is: \( 0.64494^2 + .. + (-0.40956)^2 = 5.2026 \).

For the denominator we also need: \( \sum_{i=1}^{m-1} (z_{i+1} - \bar z^{(2)})^2 \). This is the sum of the square of the numbers in set 2. This is: \( 0.76624^2 + .. + (-0.45997)^2 = 4.9727 \).

Again, as of the time of writing, the Examiners' report incorrectly appears to have an additional row in these calculations. The difference between the numbers here and those in the report are given by the entries in that row.

Finally, we have:

\( r_1 = 3.9193 / \sqrt{5.2026 * 4.9727} = 0.77055 \)

The observed value of the test statistic is then:

\( \sqrt{10} * r_1 = 2.4367 \)

Alternatively, we can use the following approximate formula instead:

\( r_j \approx \frac{ \frac{1}{m-j} \sum_{i=1}^{m-j} (z_i - \bar z)(z_{i+j} - \bar z)}
{ \frac{1}{m} \sum_{i=1}^{m} (z_i - \bar z)^2} \)

where \( \bar z = \frac{1}{m} \sum_{i=1}^{m} z_i \)

This is also on page 40 of the Course Notes and the one on page 34 of the Tables.

I'll leave the calculations to you on this one. You should get an observed value of the test statistic of 2.47.

Hope this helps

Andy

dan r · Sep 24, 2020

Hi Andy, thanks so much for your reply.

Cumulative Deviations Test:
For the first part, these are my results, which is different to the results online:

first column: actual deaths
second column: expected number of deaths
the two blue points are the two test statistics: (sum of actual observed deaths - sum of expected number of deaths/square root (sum of expected number of deaths))

10 9.9225
11 10.9742
12 11.8144
15 13.09
48 45.8011 41.232349
12 14.2868
5 9.392
5 10.179
6 10.9934
8 11.4912
8 8.7625
44 65.1049 -2.616

Also, is the reason that we split this out into two separate parts because the z_x's are split into positives and negative results? Is there any other way of understanding why we split it out without doing the chi squared test first?

Andrew Martin · Sep 24, 2020

Hi Dan

Looks like you've used the age ranges 60-63 and 64-69 instead of 60-64 and 65-69 as per the solutions.

When doing the cumulative deviations test, we shouldn't look at the data to determine how we divide up the ages into groups. Splitting up the age range into two even halves of 5 ages is just one way to do it.

Hope this helps

Andy

dan r · Sep 24, 2020

Hi Andy, thank you! That was the problem. would it be okay in the exam to not split up the results and get the result? as originally when i tried the question myself, i got -1.795 and didn't think so split up the age groups.

Also, for the serial correlations test.. thank you so much for the above, really appreciate it. I have been seeing if i can figure it out using the alternative formula, and i just wanted to check if this is right?

average standard deviation: z_bar = sum of the standard deviations (z_x's) / m
denominator of r1: 1 / m * SUM of (individual standard deviations (z_x) - average standard deviation from above (z_bar))^2 for all the standard deviations - between ages 60 to 69
numerator of r1: 1/ m - 1 * SUM of (standard deviations - average standard deviation from above (z_bar)) * ( unknown - average standard deviation from above (z_bar)) between ages 60 - 68? would that be right?

unknown: I am not sure how to calculate z_x+1? I know you explained it above, but would you be able to explain it in words instead please?

r1 = numerator/denominator

test statistic = square root of m * r1.

Apologies for the confusion, quite hard to explain what i mean! Hope this makes sense.

& finally, just a random question, but if they asked us to test for smoothness for this paper- would we be using the national mortality rate, or would we be using the rate of death / exposed to risk?

thank you!!

Andrew Martin · Sep 27, 2020

Hello

I don't think -1.795 is correct for the observed value of the test statistic when applying the test to the entire age range. I think it should be -2.095. With the way the question is worded, I personally can't see an issue with doing the cumulative standardised deviations on the entire age range. However, the Examiners' Report does not mention how this would be treated for marks.

For the numerator, we have:

\( \frac{1}{m-1} \sum_{i=1}^{m-1} (z_i - \bar{z})(z_{i+1} - \bar{z}) \)

Or, if you want to think about it terms of ages:

\( \frac{1}{m-1} \sum_{x=60}^{68} (z_x - \bar{z})(z_{x+1} - \bar{z}) \)

So, for example, the first term of the sum is:

\( (z_{60} - \bar{z})(z_{61} - \bar{z}) \)

So, we're iterating over a number of ages. For each age, we put in that age as \(x \) into the terms in the sum. So, in the example above, the first term relates to \( x = 60 \) and so \( x + 1 = 61 \). So \(z_{x+1} \) in the first term of the sum is the standardised deviation for age 61.

I'm not sure what a smoothness test would look like for this type of question - there is no graduation to test the smoothness of.

Andy

Graduation September 2019 CS2A Q6

dan r

Member

dan r

Member

Andrew Martin

ActEd Tutor

dan r

Member

Andrew Martin

ActEd Tutor

dan r

Member

Andrew Martin

ActEd Tutor