• We are pleased to announce that the winner of our Feedback Prize Draw for the Winter 2024-25 session and winning £150 of gift vouchers is Zhao Liang Tay. Congratulations to Zhao Liang. If you fancy winning £150 worth of gift vouchers (from a major UK store) for the Summer 2025 exam sitting for just a few minutes of your time throughout the session, please see our website at https://www.acted.co.uk/further-info.html?pat=feedback#feedback-prize for more information on how you can make sure your name is included in the draw at the end of the session.
  • Please be advised that the SP1, SP5 and SP7 X1 deadline is the 14th July and not the 17th June as first stated. Please accept out apologies for any confusion caused.

Some questions about CMP CS2 CH17 to CH21

ykai

Ton up Member
1.CMP CS2-CH20-question 20.2-(i)
Where is notation u come from?
I can't find it from question.

2.CMP CS2-CH21-question 21.9-(ii)
How to distinguish the number of nodes in calculation?
It is clear that each tree has 3 nodes (test1, 2, 3&test4, 5, 6), but each tree only uses 2 in the answer.

I think it maybe for tree 1
G_1=1-(3/4)^2-(1/4)^2
G_2=1-1^2
G_3=1-(4/8)^2-(4/8)^2

G=4*G_1+0+8*G_3

I think it maybe for tree 2
G_1=1-(5/6)^2-(1/6)^2
G_2=1-1^2
G_3=1-(3/6)^2-(3/6)^2

G=6*G_1+6*G_3

3.CS2 Assignment X5-5.4-(ii)
Why lower tail dependence copulais suitable?
Is this because lower tail dependence is denser at low probabilities in graph in CMP CS2-CH17-page29?
Does this mean that the correlation is higher when the probability is low?
 
Last edited:
For 2, I mean G_2 is 0 because of the same type.

I get it from CMP CS2-CH21-page35."So,if all the items are of the same type, the probability will be 0."
sum^K_(k=1)[p_jk(1-p_jk)]

In page 37&38
formal tree sum from node 1 to 2
second tree sum from node 1 to 3
Shouldn't question 21.9 sum from node 1 to 4 for each tree?
I mean sum^3_(k=1)[p_jk(1-p_jk)].

For Greedy splitting,we should sum from node 1 to 3 to maximise the reduction in a loss function.
To Max (3 node formula- 4 node formula),shouldn't we use 3 node formula because of 4 node formula = 0?
Max [sum^3_(k=1)[p_jk(1-p_jk)]-sum^4_(k=1)[p_jk(1-p_jk)]]
 
Last edited:
Hi Ykai

1. Apologies, there is actually a typo in the question. The first line should read:

.... Poisson distribution with parameter mu.

the 'X = 1' is wrong. Again, apologies for the confusion here, we will get a correction issued.

2. This question is asking about the initial split point. So, for the first tree, it is considering the outcome of test 1 only, which splits the data into a node with BBBBCCCDDDD and a node with AAAB.

3. The solutions here are comparing the probabilities and saying, on the basis of this calculation, that the Clayton copula appears more suitable given the probability is higher. This is what we'd expect due to the widowhood effect / broken heart syndrome. The solution doesn't discuss tail dependency.

4. As k tends to infinity 1/k tends to 0 and 1 - 1/k tends to 1. In terms of what this means, it is saying that the worst possible purity score increases as the number of categories increases. For example, if k = 2, the worst possible purity is 1 - 1/2 = 1/2. If k = 3, the worst possible purity score is 1 - 1/3 = 2/3 and so on.

Hope this helps

Andy
 
Thank you for your reply,but I still have 1 question about 4.
1.What I want to know is what is relationship between "measure of purity for the jth external node" and "1-1/k"?
How did this come to be like this?
 
The measure of purity (or perhaps more accurately, impurity) for a particular node j is given by:

sum(k = 1, K) [pjk(1-pjk))]

where pjk is the proportion of individuals in node j in category k and K is the total number of categories.

This formula is effectively the probability that two items selected at random are of different types. So, the higher this is, the more impure the node is (the more mixed up the node is with different categories). The lower this is, the purer the category. The lowest value possible is 0, where it is not possible to pick two items of different types (as a perfectly pure node only contains one type of category).

The value of 1 - 1/K is the worst possible value (highest) this measure can take, which depends on the number of categories, K.

For example, the worst possible purity for a node when there are two categories is when there is an equal number of each category in that node (so it is not pure at all, it is as mixed up as it can be). In this case, we get:

sum(k = 1, K) [pjk(1-pjk))] = 0.5 * (1-0.5) + 0.5 * (1-0.5) = 0.5.

Using the formula of 1 - 1/K for the worst case, we get 1- 1/2 = 0.5, which matches as expected.

Hope this helps!

Andy
 
Back
Top