Okay, welcome back everyone to our second video. As promised in the last one, I'm now going to give you a real world example to ground some of the abstract notions from before. Let's give an example from medical testing, but all we're going to do is use just the vocabulary of set theory. By the end of this video, we're going to be able to understand the idea of false negatives, false positives, false negative rate, false positive rate, and things like that. All this will be very, very relevant and not just to medical testing vocabulary in the future, but also to lots of other things, like machine learning. So let's jump right in, let's invent an awful condition which we're going to call VBS, for Very Bad Syndrome. This is something that we don't want to have, and something that fortunately scientists have just developed tests for. Let's suppose that there's a set of people X, X is going to be equal to the set of people, In a clinical trial. That are taking the test that tells you whether or not, or purports to tell you whether or not, you have VBS. Let's divide X up into two sets, let's say that S is going to be equal to the set of people x in X, such that x has VBS. So why are we using S, S somehow stands for sick, these are the set of people who genuinely have VBS. Let's let H, H for healthy, stand for the set of x in X such that x does not have VBS. Okay, so now, let's note that X is equal to S union H, because you either have VBS, or you don't. It's a bit of idealized world, where we have good diagnoses, so you either have it, or you don't. And notice that S intersect H, therefore, is the empty set. If you either have it or you don't, there is no one who both has it and doesn't. Okay, in some sense the whole point of medical testing is to figure out whether you're in S, or you're in H. Let's think about what the test tells us, so let's take P equal a set of people x in X, such that x tests positive for VBS. That is, they come into the lab, they take the test, and whatever marker the test has for positive, they test positive. The doctor looks at the test and says, the test says you have VBS. it's very important to realize that is a different concept than, you have VBS. All that we're saying is you test positive for VBS. And let's let N, N for negative, be the set of x in X, Such that x tests negative for VBS. So again you take the test, doctor or the clinician looks at the test. Turns red or whatever the test needs to do, and it indicates you're test negative. Doesn't mean you're in the clear, doesn't mean you're okay, it just means you've tested negative. Now again, assuming the test is deterministic, notice, again, we have that P union N is everyone, and P intersect N is no one. Just like we had S union H is everyone, and S intersect N is no one. Okay, now, in an ideal world, we would have that S = P. That is, the sick people would be the ones who test positive, the ones who test positive would be the ones who are sick, and we would have H = N. This is not always the case, let's write four intersections which allow us to talk about those discrepancies, when that's not always the case. Let's consider S intersect P, let's consider H intersect N. Let's consider S intersect N, And let's consider H intersect P. Let's think about what these are, how big we want them to be, what they mean in the real world, and so on. So first S intersect P, what does it mean to be S intersect P? First it means that you're in, S, which means you have VBS, and it means you're in P, which means you test positive. These are what are often called the true positives. So the bad news is that you have VBS, the good news is the test told you accurately that you have VBS, so perhaps you can seek treatment. The second test, the second set here, H intersect N, that means you don't have VBS because you're in H, you're healthy. At least in regards to this disease, and, you are an N which mean you test negative. These are true negatives. In some sense these are the most fortunate, the very good news is that you don't have the disease. And the somewhat good news is, the test told you, so you don't have to worry. Okay, these next two are ones which we would prefer not exist. We would really prefer that the next two sets were empty, they almost never are. So S intercept N means that you're in S, so you have the disease, but you're also in N, which means you tested negative. So these are what are called false negatives. False negative results, it's not really nice to call the people false negatives, but sometimes we do. And those are people who have the disease, but unfortunately the test told them not, so in a way, they have false hope. They don't have enough information that might help them act on the disease, act on treatment for the disease. We'd like that set of people to not be big. Often tests aren't that good, so often the site is really large. Let's talk about H intersect P, those are people who are in H, which means they do not have the disease, but they're in P, which means they test positive. As you might guess, those people are false positives. You know why these people look a little bit more fortunate than the ones in S intersect N, but they're still quite worried, or they don't have the disease, they don't know that. They tested positive and so they might worry unnecessarily, they might even get treatment which would have some negative side effects with no payoff, because they don't actually have the disease. So what's often interesting is comparing the cardinalities of all of these various sets and that will allow us to talk about mcap that's useful later in medical testing. And even generalizes to sort-of general machine learning, which you'll see much, much later in this course and in the following one. Let's think about the cardinality of S divided by the cardinality of X. So notice, this number has to be less than or equal to 1, because everyone who is in S is in X. So what is this equal to, this is equal to the proportion Of people in the study. Who do genuinely has VBS. By the way, a little side note. When you design a study, you would like it that this proportion might actually represent the proportion of people in a much larger population of VBS, a representative sample. That might be true, that might not be true, and care is needed when you design studies to think about that relation. So for example if you take the people in the study are just people in a VBS clinic. Probably this proportion is very close to 1, but it doesn't represent the true proportion out in, say. the United State of America. So it depends what you're trying to estimate. Let's consider the cardinality of H divided by the cardinality of X. This is the proportion of people in the study without, w/o, VBS. What should we get, by the way, when we add these two quantities? Think about it a second, we better get 1, because you either have it or you don't. Okay, those are interesting, but here are two far more interesting quantities. Let's think about this cardinality of S intersect P. Remember what those were, those were the true positives. Divided by the cardinality of S, so what are those? In the numerator, on top, we have the number of people who are true positives, and in the denominator we have the number of people who are sick. This is what is called the true positive rate. We'd like that to be big. A number we would like to be small is, let's look at H intersect P. The cardinality of that, divided by the cardinality of H. So in the numerator, we have the false positives, the people who are actually healthy, but the test tells them they're positive. In the denominator, we have the people who are actually healthy. This is called the false positive rate. We would like that to be as close to 0 as possible, often it isn't. And then two other quantities which might make sense. The size of S intersect N, divided by the size of S. Let's let you think about what that is. So what is that, that's the false negative rate. And again, we would like that to be as small as possible. Finally, H intersect N divided by H is the true negative. To summarize and simplify, really, violently so, all of medical testing theory into one sentence. You would like the test to be such that the true positive rate and the true negative rate are essentially 1, and so that the false positive rate and the false negative rate are essentially 0. That never, ever happens, if so, you'll win Nobel prizes, you'll make a lot of money. So what generally happens is the true positive rates, ideally, are close to 1. And the false positive rates ideally are close to 0. Later, when you start thinking about business analytics, you might start thinking about what type of false positive rates are acceptable. How large can they be, how small can they be? And it's important to realize that this generalizes far beyond medical testing. If you have something that's true, you're either sick or you're healthy, you're either green or you're blue. And you have a test that tells you whether or not the test thinks you're green or blue. You can use vocabulary like true positive, false negative. True positive, false negative, and a lot of that goes into machine learning, supervised learning, as you'll see later.