The ideas of simple regression are fundamental. This is the starting point for many econometric methods and techniques. In simple regression, the focus is on two variables of interest that we denote by y and x, where one variable x is thought to be helpful to predict the other variable. In this lecture, we'll have a look at another example and try to answer the question, do soccer teams score lesser goals in an away match without their supporters? After the soccer match of February 15th, 2009, between Ajax Amsterdam and Feyenoord Rotterdam, the mayors of Amsterdam and Rotterdam decided that the football teams were not allowed anymore to bring their own fans to the away match. So when there is a match between Feyenoord and Ajax, there were no fans of Ajax in the Rotterdam stadium, De Kuip, and there were no fans of Feyenoord in the Amsterdam stadium. This made me wonder after this measure, does Ajax score lesser goals in the Rotterdam stadium, and at the same time Feyenoord score lesser goals in Amsterdam? To answer this question, we need to collect the data of the first ever Feyenoord versus Ajax match until February 15th, 2009. After that date, until the beginning of 2019. Before 2009, there were 169 matches. After 2009, there are 23 matches. The number of goals of the home playing team is presented in this figure. On the x-axis, you see the number of matches, and the y-axis, shows the number of goals which concerns data for 192 matches in total. The goals of the away playing team are given in this figure. Now, let us see if there is a difference in the numbers of goals for both teams before and after February 15th, 2009. To do this, you can run a simple regression where the y variable is the number of goals, and the x variable is a so-called dummy variable. Which takes a value of zero for the first 169 observations, and a value of 1 for the last 23 observations. Hence the regression model looks like this. Before we run the regression, look again at the two graphs with a data on goals. It seems that the fluctuations are pretty large. Also such that one may wonder if the variance of the error term is varying over time, perhaps because of different players in the team. If that variance is not constant, we call it heteroscedasticity, which opposes homoscedasticity, a constant variance. To take account of this possibility, one can use the so-called White heteroscedasticity-consistent standard errors for b, named after the famous econometrician Halbert White. Consider again, the simple regression where Alpha's set at zero to save notation. The standard assumption is that the variance is the same for all observations, namely Sigma squared. If that is the case, for the simple regression, we have the following results, that the variance of b then equals this. However, when one suspects heteroscedasticity, that is that the variance is Sigma squared y, hence different across the i, one can then use this expression. Now you may want to test the Beta in the regression of goals, and the dummy is perhaps equal to zero. The key idea behind the simple regression model is that we assume that in reality, the y and x are connected via this expression. But that in practice, we have fitted the following line. Under a range of assumptions too long to quote here but there are all quite reasonable. It is possible to derive the following quite important results. First, it can be shown that b is an unbiased estimator of Beta, which means that the expected value of b is Beta. This is a great result. That means that the value of b obtained via the least-squares method is informative for the unknown presupposed link between y and x. But there is more. When it comes to making predictions, it's actually relevant to know if Beta perhaps is equal to zero. Because if it is, you better not use x to predict y. Even though b will always be some number, can we say that Beta is close enough to zero to call it zero? To be able to say something on the closeness of Beta to zero, well, actually having only the value for b, we need a confidence interval around Beta, and create a t-test like we did in an earlier video. If that confidence interval includes zero, we can then say that Beta is not significantly different from zero. With some extra derivations, it is possible to show that when the number of observations is sufficiently large, we have the following result. Where N, zero, one, is again, the standard normal distribution and where the standard error can be computed as follows. This last expression suggests that the larger N is the more precise than our estimate b. Finally, given the standard normal distribution, an approximate 95 percent confidence interval of b minus Beta, divided by the s of Beta, is minus two to two. Hence we have in case the true beta is zero, that the following interval, if it includes zero in 95 percent of the cases. So here's the result for the Feyenoord and Ajax matches. For the goals scored by the home playing team we get this outcome as estimates for Alpha and Beta where we used the White corrected variance to compute the standard errors in the parentheses. Clearly, Beta is not significant at a five percent level as the t-test value was equal to this. When we considered the goals scored by the away playing team, so without their support as in the last 23 matches, we get the following result. Note that this b is almost twice as large as the earlier b while at the same time, the estimated standard error is much smaller. The 95 percent confidence interval for the associated data is the following. Indeed, this latter interval does not include zero, although admittedly, only marginally. What we can conclude from this is that in the period with their supporters present, the away playing team scored 1.663 goals on average. While in the shorter period without their supporters present, the number of goals is 1.663 minus 0.532 which is 1.131, on average. Of course we cannot conclude whether this smaller amount of goals is due to the decision of the mayors. So we cannot conclude anything on causality. There can be many more reasons why the scores were lower. We only can conclude that the last 23 matches for the away playing teams have less goals on average. That's all.