Chapter 14 Q 14.6 (ii) - Turning Point Test

Discussion in 'CS2' started by dannyp123, Jul 12, 2019.

  1. dannyp123

    dannyp123 Keen member

    Hi,

    Looks like in the answers there is a cheeky continuity correction that goes unmentioned??

    Few specific follow-ups which I'd really appreciate some clarity on:
    1) Why is continuity correction applied?
    2) Most importantly - how do I know whether I am adding 0.5 or subtracting 0.5?
    3) Can someone remind me when we do a correction of 0.5*n and when its just 0.5?

    Many thanks in advance,
    Dan
     
  2. Andrew Martin

    Andrew Martin ActEd Tutor Staff Member

    Hi Dan

    To discuss continuity corrections I will first briefly remind us of where they tend to crop up in the CS2 course which is to do with hypothesis testing. Apologies if you are comfortable with this already, however it hopefully makes the answer more complete.

    In hypothesis testing we reach a conclusion by either comparing the observed value of a test statistic to a critical value or by calculating a p-value to compare to the level of chosen significance. The p-value is in essence the probability of observing the data that we did or something more extreme under the assumption of H0 being true.

    Example 1

    Let's take the example of a turning point test where the total observations (n) is 1001 and there are 400 observed turning points. In this case, as per page 42 of the Tables, the expected number of turning points is 666 from 2/3 * (1001-2) and the variance is, well (16*n - 29 )/ 90 which I won't write out.

    Under the null hypothesis, the number of turning points T is approximately normally distribution with this mean and variance.

    In this example our p-value is 2*P(T<= 400) because we are doing a two-sided test and this represents the probability of seeing the data we did or something more extreme. Importantly the "more extreme" in this case relates to T being less than 400 because 400 is less than the mean or expected number of turning points.

    Now let T' be the RV which is N(2/3 (n-1), (16n - 29)/90) then T' can be used to approximate T. However, T' is a continuous random variable and T is discrete (taking integer values). Therefore we need to think about what values of T' correspond to which values of T in their respective sample spaces. Some examples are:

    1. If 399.5 <= T' < 400.5 then we would say this corresponds to T being 400

    2. If 450.5 <= T' < 451.5 then we would say this corresponds to T being 451

    Hence to estimate P(T <= 400) by using T' we want to calculate P(T' < 400.5). Hence we consider:

    Phi[(400 + 0.5 - mu)/sigma]

    (which we would then times by 2 to get the p-value - or compare the standardised value to +-1.96).

    Example 2

    Consider a slightly different case where the observed number of turning points was 700. Then because 700 is bigger than the mean of 666, our p-value would be 2*P(T >= 700) where the inequality is the other way around given "more extreme" in this case relates to more than the 700 as 700 is bigger than the mean.

    Using a similar argument to the above, to estimate P(T>=700) we calculate P(T' >= 699.5) which is:

    1 - Phi[(700 - 0.5 - mu)/sigma]

    Example 3

    If the discrete random variable we are estimating with a continuous distribution doesn't take integer values but rather, say the values ... - 400,-300,-200,-100,0100, 200, 300, 400,.... etc. then we need to reconsider our analysis of what values in the sample space of the discrete random variable correspond to the sample space of the continuous random variable. Let's use T and T' again for the RV of interest and the cts approx RV, then, for example:

    1. 350 <= T' < 450 corresponds to T as 400

    2. -250 <= T' < -150 corresponds to T as -200

    We could estimate, for example, P(T>=300) as P(T'>250) as this represents the range for which the values of T' correspond to values of T being larger than, or equal to, 300.

    General rule

    For calculating probabilities in hypothesis testing, the general rule is to continuity correct towards the mean by an amount which is half the step size of the possible values of the discrete random variable.

    Hope this helps.

    Andy
     

Share This Page