Let's talk about normal likelihoods now. So we have that y given x. Let's assume that it's normal x beta, in variant sigma squared I. And I just want to emphasize here that we're conditioning on x. We're not considering the variability in x for the most part, when we talk about linear models. So the likelihood of beta and sigma squared given y and x is then 2 pi sigma squared to negative n over 2 exp of minus 1 over 2 sigma squared norm y minus x beta, squared. And so, if we wanted to obtain maximum likelihood estimates, we would maximize this likelihood, with respect to beta in sigma squared. But I often find, and in many cases you can see, since everyone's sort of product and exponentiated, it's probably easier to maximize the log of the likelihood than it is to maximize the likelihood. And because the log is an increasing monotonic continuous function the argument maximum is going to be the same. So, but, in this case I think rather than maximize the log of the likelihood, I'm going to maximize, minimize minus twice the log of the likelihood. So instead of maximizing the log, I'm going to minimize minus twice the negative of the log, and minus twice the log of the likelihood is called the deviance. And then I don't have to worry about any constants that don't depend on sigma or beta. So I'm going to get rid of the 2 pi part, okay. So negative n log sigma squared, and then times, plus y minus x beta squared divided by sigma squared. So that's the log of the likelihood. Now notice, whoops, and I don't have a negative there, because it's minus twice the likelihood. So I'm going to minimize this quantity to obtain the maximum likelihood estimates. So if I minimize this quantity with respect to beta, notice that if I minimize that, I'm going to minimize this expression over all. And that's just the least squares criteria. So I get beta hat is equal to x transpose x inverse, x transpose y. And another way to think about that is, if I were to fix Sigma squared and find the optimum for beta hat, then this equation is clearly maximized at the least squares estimate. And then since the least squares estimate doesn't depend on sigma, then it must be the optimal estimate overall. Now I am going to ask you for homework to then maximize this with respect to sigma squared. But be careful to maximize it with respect to sigma square that may be replaced sigma squared by theta or something. So you know that you're not in derivatives with respect to sigma, you're taking derivatives with respect to sigma squared. But when you do that, you get that the estimate of sigma squared is just a norm y minus x beta hat squared over n, which is, of course, the e transose e over n. And so we get the biased version of the variance. Which is fine, in maximum likelihood estimates are not guaranteed to be unbiased. It does work out to be unbiased in the beta case, but it works out to be the biased version in the sigma square case. So hopefully what you can see is that there's this pretty deep connection in normal theory linear models between the maximum likelihood estimates and the least squares estimates. And this is very useful and it gets, it kinds gets blurred, the lines get sort of blurred between the two. So that people often just kind of interchangeably talk about least squares and maximum likelihood for the slope coefficients in this case without being too precise. Now there are instances where it matters in linear regression. We don't have to extend it by very much. But to get instances where things can change. So at any rate, what I want you to be able to do is, do some simple maximum likely calculations. In the book, I have an instance where I say that y given x is normal x beta, but with a general variance matrix sigma. And so, what I'd like you to try to do for homework is to figure out the maximum likelihood estimate for beta with a fixed sigma here. And so, find beta hat. And if you can't figure it out, then the solution is worked out in the notes. But try it on your own first. And you get a very famous equation as an answer. So the weighted least squares estimate. And then, also, if you can, try to figure out to estimate for sigma hat. That's a little bit harder but still, I think, doable. So that's a good exercise to build on what we learned today.