[SOUND] Now, we come down to compare these null invariant measures. So, which one is better? We know, we can sense not all those null-invariant measures are created equal so, we want to see which one's better in all the cases. Let's examine the two variable contingency table of the transactions containing milk and coffee. Let's look at the case in this data set. So the first, you look at D1 and D2, D1 and D2, you probably can see the difference is only on the number of null transactions. But you also can see very likely, milk and coffee should get together, they should be positive. In that sense, you can see all of those five, null-invariant measures, they give equal value. In this sense, no matter how many transactions on the null part. Okay, they do not change their value. And also they are very close to one, in the sense, these are possibly getting together. When you look at D3, D3 means mc getting together is quite rare because they get along more frequent. In that sense, all their values are very close to zero. Then you look at the D4, D4 says mc getting together or mc alone, they are all like 1,000 and 1,000 and 1,000 cases. No matter how many null transactions, they actually got things right in the middle this 0.5, 0.5. Only Jaccard's 0.33, actually just in Jaccard, this one means its balance is right in the middle. Then we look at cases, it could be D5 and D6. D5, if you see this is 1,110 solved cases. So what you probably can see, is from coffee point of view, like a coffee guy may say, mc are likely getting together, because they get along as 100 cases buying coffee but not milk. 1000 cases we're buying both coffee and milk. But for milk guy, they probably say, they are very unlikely getting together, because I got 10,000 cases buying milk but not coffee. But only 1000 case buying milk and coffee. So, in that case you look at different measures. It's interesting to see, All Confidence and Jaccard they also say it's closer to zero, unlikely getting together. But Max Confidence says it's close to one, they are very likely getting together. Then we look at a Kulczynski said, I'm right in the middle, because the tug of the war on each side is ten to one. Then Cosine said, I'm a little prone to unlikely getting together. Now, we change this one even more. This is 1,000 to ten, or 1,000 to 100,000. The coffee guys said they are very, very likely getting together. But the milk guy said, they are very unlikely getting together. Now in this case, you probably can see All Confidence and Jaccard drop down to 0.01, and even cosine dropped down to 0.1. But if Max Confidence says, I am very confident they are very close to one, because they are very likely getting together. But in Kulcyzynski said, I'm still in the neutral because this is 100 to one, the other is one to 100, they have the equal ratio. So, which one do you like? So, probably we can see D4 to D6. The real case is, that differentiate that five null-invariant measures. But we probably can see, Kulcyzynski measure holds firm when in these very imbalanced cases. But the ratio is balanced on both sides, and it holds firm at 0.5. That looks interesting. But on the other hand, we also know those cases, some are very imbalanced, we may want to introduce another measure called imbalance ratio. The imbalanced ratio is introduced in the sense, the support of item set A and support of item set B, their differences play important role in this imbalance ratio computation. Then you proceed for the same cases in the last three, the Kulcyzynski vector holds firms at 0.5. But the imbalance ratio, okay, D sub four cases is zero, because they are already balanced. And D sub five cases become 0.89, they are rather imbalanced. And D sub six cases, it is very imbalanced. So imbalance ratio, really can show you how balanced the two sides are. So we feel Kulcyzynski plus imbalance ratio, these two things getting together will present a clear picture for all the three data sets, D4 through D6. Because D4 is neutral and balanced, D5 is neutral but imbalanced, and D6 is neutral but very imbalanced. Finally, we're going to show you some real data sets like a DBLP data sets, we want to look at co-author relationships. So, let's look at this table. This table we got around year 2007. We study the recent database conferences, we look at those authors, they publish papers and they co-author papers in database conferences. But you can probably see, the interesting thing is, for example you look at Hans-Peter Kreigel and Martin Pfeifle. Martin Pfeifle got 18 papers, but all of them are with with Hans-Peter Kriegel. The Hans-Peter Krieger got 146 papers, 18 was with Martin Pfeifle. Well, okay, you can see in that cases Kulcyzynski shows pretty strong value that simply says, these two authors are closely tied together in some way. But they are imbalanced as well. We can see the case of imbalanced ratio, you can either calculate the imbalance ratio is really high. So in that case, you probably can easily judge Hans-Peter, Kriegel likely to be the adviser of Martin Pfeifle. So using Kulcyzynski and imbalance ratio, we can easily see advisor-advisee relationships, and close collaborators. In one research paper on finding advisor-advisees, we are really using those measures to find them with reasonably high accuracy. So finally, we will show you a bunch of papers. These are the papers quite representative on how to judge the correlation relationship, the different measures, including their interest in measure, the non-variant ones and the measure we discussed on the Kulcyzynski. Thank you. [MUSIC]