Lift Curves and Gains Curves

Discussion in 'SP8' started by George88, Apr 20, 2014.

  1. George88

    George88 Member

    Hi,

    I am really confused about what lift curves and gains curves are actually telling me.

    On page 40 of the notes it says that the lines on the graph are actual claims frequency but arent they modelled frequency? On the previous page it says that a modle 1 is more predictive of the 2 because it is steeper? coudnt it be less predictive if it is overstating the relative frequency between the bands of exposure?

    The gains curve is to compare observe to fitted values, but Gini coefficient is derived by comparing cumulative values to and the stright diagonal line, without giving regard to the cumultive distribution. Both seem to be showing the relative frequency of the or shape of cdf, as opposed to predictiveness of the model.

    Please help :)

    Thanks
     
  2. Katherine Young

    Katherine Young ActEd Tutor Staff Member

    Think of the lift curve as being not so much an illustration of goodness of fit, as an illustration of how good the model is at distinguishing between good risks and bad risks.

    Imagine you draw a graph with actual claim frequency plotted against expected claim frequency. It this is a good fit, the graph would be along the 45 degree line.

    But if the model has not distinguished between good risks and bad risks, then all the expected claim frequencies (ie the points on the x axis) would be ranked in a different order to the actual claim frequencies .

    In that situation, all the points in the graph would be too far either to the left or the right, all jumbled up, and the graph wouldn't slope upwards.

    The opposite must therefore also be true, a graph that is sloping upwards is doing a good job at telling which risks are good and which risks are bad; and the steeper the graph, the better the model is at distinguishing between risks.

    ====

    I tend to think of the Gains curve as follows:

    Think of the straight line as being the hypothetical dataset where every policy has the same degree of risk. In this case, policy 1 has x amount of risk, policy 2 also has x amount of risk etc, so the cumulative risk for 2 policies is 2x, the cumulative risk for 3 policies is 3x, etc. You can see that this line must be the diagonal from (0,0) to (1,1).

    Now, for our own dataset, we know that risks are not uniform. We consider the higher risk policies first. Since these are higher than average, the data rises above the diagonal. This continues until we arrive at policies with a lower than average degree of risk which means we start to sink back towards the diagonal.

    So, just as with the lift curve, the gains curve gives an illustration of how good the model is at distinguishing between good risks and bad risks. A graph which rises far above the line is a better predictor.
     
    Last edited: Apr 29, 2014
    redzer and jonathans like this.
  3. George88

    George88 Member

    Thanks katherine
     
  4. redzer

    redzer Member

    Rather than starting a new thread. I thought I'd post here.

    in Sept 2015 Q10 ii

    The examiners report has the following about the reference line

    "A reference line is created by dividing the cumulative observed values evenly
    against the cumulative exposure"

    Can you explain what they mean by this?

    I much prefer you explanation of the reference line being a hypothetical dataset where every policy has the same degree of risk.

    Regards,
    R
     
  5. Hemant Rupani

    Hemant Rupani Senior Member

    Assuming a dataset of n policies with corresponding predictor variables, all with equal weight, as required.
    Let out-of-sample size m.
    Now, in the Gains curve.
    1. Make a model
    2. Get predicted outcomes for out-of-sample data.
    3. Arrange (2.) In descending order
    4. with (3.) Arrange corresponding exposure in x-axis cumulatively(basically, 1 to m)
    5. with (3.) Arrange corresponding observed (real that was before modelling) outcome in y-axis cumulatively (each observed attached to exposure 1 to m cumulatively)
    6. If the model is better than other comparing with. We can expect (5.) Start from high values and end to low values...(covering more area under gains curve).

    Now to reference line as a relative measure -
    It is just the diagonal under the rectangle made by (4.) and (5.).
    If you draw rectangle as specified, you can see if you divide y-axis values evenly against x-axis, you can see diagonal.
    Mathematically
    Let take total of (5.) is j
    You will get segment of y=(j/m)*x under the rectangle.
     

Share This Page