I am struggling to fully pin down the concept behind the Generalized Pareto Distribution that is being used in Question 2 ii of the September 2020 exam. My immediate answer to this question was to calculate 1 - G(70) ?= 1 - P(X <=70) = P(X > 70) But the marking scheme says it is incorrect to use G(70). Why is this? And what is G(70) = 1 - (1 + (70/45))^-3 = 0.9400838 actually telling us then if it is not P(X <= 70)? I get this probably related to where it says "it is easy to make the mistake of calculating a probability for the threshold exceedance instead of for the underlying distance." But I don't understand what the difference is between "threshold exceedance" and "underlying distance".
Hi Brett Your immediate answer of using 1 - G(70) is the easiest trap to fall into when working with the GPD. For threshold exceedances (in this case the threshold is 50), we are creating a new variable (Y, say) where Y = X - 50 | X > 50. In other words the GPD models the excess of X above 50, where that excess is positive. This leads to the relationship: P(X > 70) = P(X > 50) * P(Y > 20) Or alternatively, P(X > 70) = P(X > 50) * (1 - G(20)) Hope that helps Dave
Hi Dave, thanks for the quick response. Yes I think the important piece I was not thinking about was the fact that the GPD models the conditional distribution above some threshold. So that means by calculating G(70) what I actually did was calculate P(X - 50 <= 70 | X > 50) = P(X <= 120 | X > 50)? The probability that someone throws between 50 and 120m, given that they throw over 50m?
If that is the case, and 1 - G(70) = 1 - 0.9400838 ~= 0.0599, meaning the probability of throwing between 50 and 120m given someone throws over 50m is approximately 6%. How does this have a higher probability than someone throwing over 70m given that they throw over 50m? Or am I getting my conditional and unconditional probabilities confused here?
You've written some of that the wrong way round but I think I know what you meant to say. Let's look at the 3 probabilities to make sure you know what they all mean. 1 - G(70) is the probability of throwing more than 120m, given the throw is over 50m, and this is c. 6%. The probability of throwing more than 70m given the throw was over 50m (ie 1 - G(20)) is c. 33%, as given in the marking schedule, and logically this is higher than the probability above. The probability of throwing over 70m unconditionally is c. 1.7%, and is given by the product of the above probability and the probability of throwing over 50m.