• We are pleased to announce that the winner of our Feedback Prize Draw for the Winter 2024-25 session and winning £150 of gift vouchers is Zhao Liang Tay. Congratulations to Zhao Liang. If you fancy winning £150 worth of gift vouchers (from a major UK store) for the Summer 2025 exam sitting for just a few minutes of your time throughout the session, please see our website at https://www.acted.co.uk/further-info.html?pat=feedback#feedback-prize for more information on how you can make sure your name is included in the draw at the end of the session.
  • Please be advised that the SP1, SP5 and SP7 X1 deadline is the 14th July and not the 17th June as first stated. Please accept out apologies for any confusion caused.

Chapter 14 Q 14.6 (ii) - Turning Point Test

D

dannyp123

Member
Hi,

Looks like in the answers there is a cheeky continuity correction that goes unmentioned??

Few specific follow-ups which I'd really appreciate some clarity on:
1) Why is continuity correction applied?
2) Most importantly - how do I know whether I am adding 0.5 or subtracting 0.5?
3) Can someone remind me when we do a correction of 0.5*n and when its just 0.5?

Many thanks in advance,
Dan
 
Hi Dan

To discuss continuity corrections I will first briefly remind us of where they tend to crop up in the CS2 course which is to do with hypothesis testing. Apologies if you are comfortable with this already, however it hopefully makes the answer more complete.

In hypothesis testing we reach a conclusion by either comparing the observed value of a test statistic to a critical value or by calculating a p-value to compare to the level of chosen significance. The p-value is in essence the probability of observing the data that we did or something more extreme under the assumption of H0 being true.

Example 1

Let's take the example of a turning point test where the total observations (n) is 1001 and there are 400 observed turning points. In this case, as per page 42 of the Tables, the expected number of turning points is 666 from 2/3 * (1001-2) and the variance is, well (16*n - 29 )/ 90 which I won't write out.

Under the null hypothesis, the number of turning points T is approximately normally distribution with this mean and variance.

In this example our p-value is 2*P(T<= 400) because we are doing a two-sided test and this represents the probability of seeing the data we did or something more extreme. Importantly the "more extreme" in this case relates to T being less than 400 because 400 is less than the mean or expected number of turning points.

Now let T' be the RV which is N(2/3 (n-1), (16n - 29)/90) then T' can be used to approximate T. However, T' is a continuous random variable and T is discrete (taking integer values). Therefore we need to think about what values of T' correspond to which values of T in their respective sample spaces. Some examples are:

1. If 399.5 <= T' < 400.5 then we would say this corresponds to T being 400

2. If 450.5 <= T' < 451.5 then we would say this corresponds to T being 451

Hence to estimate P(T <= 400) by using T' we want to calculate P(T' < 400.5). Hence we consider:

Phi[(400 + 0.5 - mu)/sigma]

(which we would then times by 2 to get the p-value - or compare the standardised value to +-1.96).

Example 2

Consider a slightly different case where the observed number of turning points was 700. Then because 700 is bigger than the mean of 666, our p-value would be 2*P(T >= 700) where the inequality is the other way around given "more extreme" in this case relates to more than the 700 as 700 is bigger than the mean.

Using a similar argument to the above, to estimate P(T>=700) we calculate P(T' >= 699.5) which is:

1 - Phi[(700 - 0.5 - mu)/sigma]

Example 3

If the discrete random variable we are estimating with a continuous distribution doesn't take integer values but rather, say the values ... - 400,-300,-200,-100,0100, 200, 300, 400,.... etc. then we need to reconsider our analysis of what values in the sample space of the discrete random variable correspond to the sample space of the continuous random variable. Let's use T and T' again for the RV of interest and the cts approx RV, then, for example:

1. 350 <= T' < 450 corresponds to T as 400

2. -250 <= T' < -150 corresponds to T as -200

We could estimate, for example, P(T>=300) as P(T'>250) as this represents the range for which the values of T' correspond to values of T being larger than, or equal to, 300.

General rule

For calculating probabilities in hypothesis testing, the general rule is to continuity correct towards the mean by an amount which is half the step size of the possible values of the discrete random variable.

Hope this helps.

Andy
 
Back
Top