Hi, and welcome back. In this module, we're going to be looking, more in-depth, at expectation and variance, in preparation for, defining, covariance, and correlation. Those are the big topics, for this module, covariance, and correlation. Now, before we get to that, we need to do two, other videos, the first one will be a deeper dive into mean and variance, and standard deviation, for a function, of a random variable. The next video will talk about the concept of jointly distributed random variables. Then we'll move on, to covariance and correlations. Let's begin. I want to talk a little bit about, several motivating examples. In statistics and data science, we collect data, from several random variables. What we usually need to do, or one thing we frequently need to do, is understand and quantify, the strengths of their interactions. For example, the relationship between the length of time a student studies, and their score on an exam. Or the relationship between, male and female, life expectancy, in a certain country. Or maybe the relationship between the quantity, of two different products, purchased by a consumer. All of these things involve, two, different random variables. You can think of other examples as well. That's our goal. Before we get there, I want to review, expected value. You'll recall, the expected value, of a discrete random variable, is, the sum, over all possible values that the, random variable can take on. K, that value, times the probability of getting it that value. In the continuous case, we get the integral from minus infinity to infinity of x, f of x, dx. So the x, is equivalent, to the k, and the f of x dx, is equivalent, to this probability that x equals k. Now, what happens, if you have, a function, of a random variable? What do we get in that case? The expected value, of g of x, we've actually used this, and you'll see that in a minute. The expected value, in the discrete case, is going to be, g of k, times the probability, that x equals k. That's X discrete. It's going to be the integral, from minus infinity to infinity, of g of x, times the density function, and that's if X is continuous. Now, what about the expected value of a_X plus b? We've actually done that already, so let's just review. You get the sum over all possible case, this is if, it's discrete. We're going to get a_k plus b, times the probability, that x equals k. We can separate that, using properties of summation, we get a, sum over k, k times the probability that x, equals k, plus the sum, and we can put the b out in front. This is all the probability that we can get, so that's one. This, is our definition, of the expected value of X. What we end up with, is a, times the expected value of X, plus b. We've actually already done this, example before. This is just a review. What about, an example, of such a situation? Well, suppose a university has 15,000 students, and X is the number of courses, for which a randomly selected student, is registered. These numbers are just made up. The probability mass function, X equaling one is 0.01, X equals two, 0.03. You can see that most students are registered, for four or five, maybe six classes. Now, if a student pays $500 per course, and $100 per semester registration fee, what is the average amount, a student pays each semester? Let's let X equal, the number of courses. A student takes and we're interested in, so we want the expected value, so they're paying $500 per course, so 500X plus the $100 registration fee. We know this is going to be 500 times the expected value of X plus 100, we just calculated that on the previous slide. Now what we need to do is, we need to go in and calculate the expected value of X directly. That's going to be the sum from k equals 1-7, k times the probability that X equals k. We've done this type of calculation before. It's one times the probability, two times the probability, three times the probability and so on, up to seven times 0.2. When you work all that out, you get 4.57. The expected value of 500X plus 100 is going to be 500 times the expected value, that's the 4.57 plus the 100, and that's going to be $2,385 approximately. Now what about variance, how does this change when we look at the variance? We know the definition for the variance of a random variable X, whether it's discrete or continuous, it has the same exact definition. It's X minus Mu quantity squared. We take the expectation of that. This is our g of X. g of X is X minus Mu squared and we have done that calculation before. We also know that this expected value of X squared minus the mean squared, this is our computational formula for calculating the variance. These are all equivalent statements. We can also write that the variance of X is k minus Mu. We take on all the values of X, so that's k minus MU squared times the probability of X equals k if X is discrete and the variance X minus Mu squared times f of X, dX, if X is continuous. Now what happens to the variance of g of X? Well, it works exactly the same way. We get the variance of g of X and we'll have the same thing, discrete or continuous. We're going to get the sum over all of k and now we have g of k minus our mean expected value of g of X, that whole thing is squared times the probability that x equals k and this is for X discrete. If X is continuous, it looks very similar integral from minus infinity to infinity. We have g of X minus the expected value of g of X, and that's squared, times our density function dX. This is if X is continuous. What we have to do in all of these cases, is we have to first compute the expected value of g of X and then you can take that and put it into the variance calculations. Let's see what happens if we use our original definition on the linear function aX plus b. Let's do that calculation. We've got the variance of aX plus b is going to equal the expected value. This is using the original definition, aX plus b minus the mean of aX plus b squared. We're going to expand all this out. First, we remember that we get the expected value of X, plus B is minus A, the expected value of X, minus B and that squared. We'll notice that the B's are going to cancel, and I can factor out that A too. A squared X minus the expected value of X squared and the A squared can come out in front as well. We end up with A squared times the expected value of X minus the expected value of X squared. Notice this is A squared variance of X. Here's the question. What happened to B? Why does the B not affect the variance calculation? This is worth thinking about. Recall and we just did this, the expected value of AX plus B is A times the expected value of X plus B. The expected value tells us something about the mean. Where's the center of the data? It's a measure of the center of the data. If we multiply it by A, so we're scaling it by A and then shifting it, this B is a shift. We're shifting it by B. That scales the original mean by A and then it shifts it by B. The variance measures the spread of the data, so shifting the whole distribution of data from a center at the expected value of X to a center at the expected value of AX plus B. We are just shifting that whole set of data. The B has no effect on the spread of the data because we've just shifted it. We haven't changed the spread at all. I'll just write that down. Variance measures the spread of the data. The B shifts the data but doesn't affect the spread. The B doesn't show up in the variance calculation of AX plus B. It doesn't show up over here on the right hand side. You want to start to get a little bit of intuition about what the expected value and variance are doing for you. Let's go back to our example at the university and same exact example, we found the expected value of X is 4.57, and the expected value of 500X plus 100 is 2,385. We know the variance of X is the expected value of X squared minus the expected value of X quantity squared. We have to compute that second moment, that's going to be K equals 1-7 of K squared times the probability that X equals K minus the 4.57 squared. Throw all that in and you get about 1.265. What does the variance of 500X plus 100 give us? Well, it's just going to give us 500 squared times the variance of X. When we put all that in, we get 316,273. Now we understand how to compute expected values and variances, particularly for linear functions of a random variable. In the next video, we'll look at what happens for functions of random variables when you have two random variables. Not just an X, not just a Y, but when you have X's and Y's together in the same problem. We'll see you then.