Okay, let's stop and take a few minutes to summarize what is our objective, and what we have learnt so far. Our objective in risk management is to come up with a statistical model of the future distribution of the returns of our portfolio. Now, for the sake of illustration, we are using the Wilshire 5000 index as our portfolio. We are interested in the future distribution of the return of this index, particularly in the left tail. Of course, we do not know what this future distribution really is. We can only observe the historical returns of this portfolio. So we have to make some assumptions about how the future distribution is related to the historical distribution. Then, when we analyze the historical distribution, it can tell us something about the future distribution. Up until now, we have made two assumptions. We have not explicitly stated these assumptions, but it is now time to do so. The first assumption is that the future distribution of the log returns of the Wilshire 5000 index is the same is its historical distribution. In particular, we have seen that the historical distribution is not normally distributed. It is better described by a t-distribution. We have estimated value at risk and expected shortfall using the t-distribution. We have also used the actual empirical distribution of the data because we have over 9,000 daily returns. Now, regardless of which method we use, we are basing our estimates of VAR and expected shortfall on historical data, and then assuming that these values can be applied to the future. There is of course no way to tell ahead of time, if the future distribution of returns is the same as the historical distribution of returns. It may be true some of the time, but probably not true all of the time. The second assumption is that we are estimating the parameters of the historical distribution without paying any attention to the ordering of the data. Let me clarify this a bit more. When we first assume that the data to be normally distributed, we estimated the two key parameters of the normal distribution. That is, the mean and the standard deviation. But the sample mean and the sample standard deviation do not use any information about the ordering of the data. What I'm saying is that, if we had randomly permuted the order of the log returns, we will still get the same sample mean and the same sample standard deviation. Similarly, when we assume the data to come from a t-distribution, we estimated three parameters of the t-distribution. Now, you will have to take my word for it, that the all function that we use to estimate the three parameters of the t-distribution using the principle of maximum likelihood, yeah by the way, do you remember what that all function was? It was fitdistr, the function in the mass package. Anyway, as I was saying, the all function we use to estimate the three parameters of the t-distribution also does not depend on the ordering of the data. Again, if we had randomly permuted the ordering of the data, the estimates of the three parameters of the t-distribution will not change. Now, suppose there actually is some information in the ordering of the log returns, we may be able to use that information to help inform us something about the future distribution of log returns in addition to what we have learned so far. Now unlike assumption number one, assumption number two is actually testable. So we're going to run a couple of tests now. There are many ways to test if the ordering of data is important. We are going to use two of them. The first test look for the presence of serial correlation. There was a long history of research in financial markets that look at serial correlation. Now to understand what serial correlation means, let's start with the correlation between two variables, let's say, X and Y. The correlation coefficient between X and Y measures how they co-move together. We say that X and Y are positively correlated when they tend to move up and down together. To be more precise, if X is above its mean, then Y is likely to be also above its mean. Similarly, we say that X and Y are negatively correlated when they tend to move in opposite direction. Again, to be precise, that means if X is above its mean, Y tends to be below its mean. Lastly, we say that X and Y are uncorrelated when they are neither positively nor negatively correlated. Now, serial correlation is about how a variable X and its own past are correlated. To illustrate this point, let's consider X today and X last period. If X has positive serial correlation, then an above-average return last period is likely to be followed by an above return this period. Similarly, if X has negative serial correlation, then above-average return last period is likely to be followed by a below average return this period. Lastly, if X has no serial correlation then an above-average return last period does not increase the likelihood of being followed by an above return or a below-average return this period. Before we move on, let me say a few words about why serial correlation has been of interest in finance. You may have heard of the term market efficiency. Perhaps you came across it in a basic finance course. The theory of market efficiency says that the observed price of an asset such as a stock, fully reflects all information about that asset. So when new information comes in, it will change the price of the asset. Good news presumably will lead to an increase in its price. Conversely, bad news will lead to a decrease in its price. Now, because all existing information has been incorporated in the price of an asset, any new information must be a surprise. If the surprise is good news, the price of the asset will go up, and if the surprise is bad news, the price of the asset will go down. But there is no way to tell ahead of time whether the news is good or bad. So there is no way to tell if the price will go up or down. The theory of market efficiency therefore gives rise to the so-called random walk model of prices. This random walk model just says there is no way to predict if the future price of an asset will go up or go down relative to its current price. An implication of the random walk model is that returns have no serial correlation. So testing for serial correlation in stock return has been viewed as testing for market efficiency. Now, let's talk about how to test for the presence of serial correlation in our data. To do so, we need a little bit of notation. Let us denote x with a subscript t or x sub t to be the value of a time series on day t. We then define its autocorrelation coefficient at lag j to be the Greek letter rho with a subscript j. The autocorrelation coefficient is the correlation coefficient of the series x sub t and series x sub t minus j. This is like the correlation coefficient between two variables say Y and Z. The autocorrelation coefficient is the correlation coefficient between a variable and itself sometime in the past. Now the lag j can take on any value. When j is zero, we have the correlation of X with itself, but that must be one by definition. So rho sub zero equals one. When j is one, we call rho sub one the first-order autocorrelation coefficient. When j is two, we call rho sub two the second-order autocorrelation coefficient and so on. It is convenient to use a function and a graph to summarize the autocorrelation coefficients for many lags. The autocorrelation function or ACF returns rho sub j for any given lagged value of j. So it is natural to graph the ACF as a function of lag j. Here is how to create a graph of the autocorrelation function or ACF of the log returns of the Wilshire 5000 index. There is an R function acf which does this. It plots the rho sub j for each value of j starting with J equals zero. Normally, R will pick how many lags to show depending on the number of data points in this series. In this particular case, the ACF function is showing the autocorrelation coefficients up to lag 40. I suggest you read the documentation for the ACF function to get more details. Besides plotting the autocorrelation coefficients, the ACF function also provide two dashed lines. These represent the 95 percent confidence bands for the autocorrelation coefficients around zero. If many autocorrelation coefficients are outside the dash bands, then there is evidence of serial correlation. If most or all of the autocorrelation coefficients are inside the dashed lines, then there is no strong evidence of serial correlation. Now, in the case of the daily log returns of the Wilshire 5000 index as you can see in the graph, there is no strong evidence of serial correlation. So here's the take away so far. We do not find much evidence of serial correlation in the daily log returns of the Wilshire 5000 index. That means the direction of stock prices are not predictable. By the way, this is the typical finding by researchers looking at stock prices.