I hope you'll remember that one of the issues that we talked about in the discussion of unreplicated designs. Was that there might be some concern on the part of an experimenter about the effect of a severe outlier on the results. I've got a little demonstration here that I want to show you. That maybe will ease your mind about outliers a little bit. What I did to create this scenario is I went back to our data from the drilling experiment, I'm sorry, from the resin plant experiment. And I took one of the test combinations, cd, that's a and b at the low level and c and d at the high level. And that value had been 75 and I turned it into a severe outlier, I made it 375. Now, that's a big outlier with respect to the rest of that data. And so then I simply calculated the factor effects and did normal probability plots and half normal probability plots of the effect estimates. And those are the two figures that you see on the slide. The normal probability plot is on the right, and the half normal probability plot is on the left. Now, let's take a look at the half normal probability plot first, okay? Remember, the line on the half normal probability plot is supposed to emanate from zero, right here. And so the effect estimates that are non-negative should lie along that line and in effect, they should point towards zero. Well, there are the effects that seem to cluster along a straight line. And you notice that they don't point toward zero at all. They point someplace quite different from zero. And that's a clue, that's an indication that there are outliers in the data. When you have outliers at the data, you'll always see this kind of behavior on the half normal plot. Now, let's look at the full normal plot for the same scenario. Once again, kind of an odd looking plot. But notice that your effect estimates form two clusters that essentially form straight lines. And this is again a very unusual appearance. But it's an indication of an outlier, or one or more outliers in the system. So you'll very often see an outlier in your data reflected in this kind of behavior in either the normal plot or the half normal plot. And so this is a very good check to see whether you may have outliers. Now, in some cases the outlier may be so big, so gross, so huge that it's obvious. But even if it's not, these diagnostics, these plots will serve as useful diagnostics. So what do we do when we have an outlier, what's the right strategy? Well, one thing we could do is replace with an estimate, and people do that quite a bit. Just eliminate that observation and replace it with some reasonable estimate. What's a reasonable estimate? Well, I think a reasonable thing to do is to replace the outlier with an estimated value that makes the highest order interaction equal to zero. Now, why would we do that? Well, what do you think about the highest order interaction, say, ABCD in the resin plant problem? It's probably negligible anyway, you don't believe it's going to be real. So just set it equal to zero. And now you've got a contrast that has 16 terms. 15 of them are known, CD is not and the contrast is set equal to zero. So now solve that for CD, and that would work very well. I think that's a very reasonable approach. Another possibility is to just drop that observation and analyze only the data you have. Analyze only the 15 runs you have. Now that disturbs the orthogonality of the design, it's no longer orthogonal. But the consequences of that are really minimal. The amount of correlation that that introduces among the effect estimates is so small that it's really essentially negligible. And here's an example of that. Here is our resin plant example with that outline observation CD removed. And this is the half normal probability plot of the effects. And this half normal plot looks essentially exactly like the one that we saw with all of the correct data. So removing that outlier had very, very little effect on the results. So I have one more thing, but I want to show you before we sort of move on here a little bit in terms of topics. This is another example of an unreplicated factorial. And it's an example that I call the drilling experiment. And this is an example that was published two years ago by Cuthbert Daniel. The individual that created the normal probably plotting method for analyzing data from unreplicated factorials. And this is an unreplicated factorial where we're studying the advance rate of a drill. That is how fast does this push through the crust of the Earth? This is a oil drilling process, A is the drill load, B is the flow rate of the drilling mud. C is the speed, rotational speed of the drill and D is the type of drilling mud that's used. And this is a normal probability plot of the effects from this experiment. And by the way, the data is shown on the cubes that you see in figure 619. So there is the normal probability plot of the effects. And it's pretty clear that we have three large main effects, B,C and D. And then the BD and BC interactions also appear to be large. So here is a normal probability plot of the residuals from this experiment. And this doesn't look very good. Clearly there's a straight line here. That makes some sense for normally distributed data. But we've got several values that lie considerably off that line. So there really isn't a nice straight line that represents the residuals on that normal plot. And then on the right is a plot of residuals versus the predicted advance rate, this is residual versus y hat. Well if you look at this plot, I see a very distinct funnel-shaped appearance on that plot. This is an indication of inequality of variance. Now in a sense, I kind of suspected that that might happen with a normal probability plot that looks like the one you see here. If your underlying distribution is normal, the variance of a normal distribution Is independent of the mean. But if your underlying distribution is not normal, if it's skewed. For instance, if you have a skewed distribution, that looks something like that. As the mean of that distribution gets bigger, the variance gets bigger. So very often in non-normal right skewed distributions you find this problem of inequality of variance. Well the way we typically think about fixing that problem is to employ a transformation. Where it's transform the y value in an effort to stabilize the variance. Probably the most widely used class of transformations are power family transformations. And in the power family transformation, the transformed variable y star is equal to y to some power lambda. And for example, lambda of one-half would be a square root. A lambda of minus one would be a reciprocal. We take lambda of zero to be the logarithm. And so typically we employ one of these transformations, either a square root or a log or a reciprocal. And we see if that in fact it leads to an improvement in the modeling situation. Transformations are very useful in terms of stabilizing variance. If they stabilize the variance, they will typically also induce at least approximate normality in the data. And sometimes a transformation will actually simplify the underlying model. Well, how do we select the transformation? Well, one way that we could do this will be simply empirically, trial and error. We could try a couple of different transformations and see what happens. Sometimes you may have prior theoretical knowledge or experience that can suggest the appropriate form of a transformation. For example, if you're working with variables like the molecular weight of a material and, say, temperature. Many times we find that, say, something like a long transformation is very appropriate for dealing with that kind of situation. There's been work done to develop an analytical method for selecting lambda. This is something called the Box-Cox method. It simultaneously estimates the model parameters and the transformation parameter lambda. Now the Box-Cox method is implemented in software. And later on in here I'll show you how that method works when we get toward the last module or so in the course. There's a session in there that goes into the details of the Box-Cox method. And I'm going to defer discussion of that until that point. But what I'm going to do is simply take a guess. And well, let's see what happens with a log transformation? So here are the effect estimates following the log transformation. And here is the normal probability plot of those effects after we've done the log transformation on y. Well, take a look at this. The three main effects emerge as being important. But those two interactions that were significant, or not. No indication of large interaction effects. And here's the analysis of variance. All three of these factors are highly significant. And here are the normal probability plots. And the plot of residuals versus predicted following the log transformation. Take a look at these plots and compare those to what they look like when we analyze the original data. These are really good, there's no indication of a problem with normality. There's no indication of inequality of variance, things look really good. So the question here is, is the log model better? And I think the answer to that is yes. Because I think most of the time experimenters would would prefer to have a simpler model in a transformed metric than a complicated model in the original metric. Particularly if in the original metric there are some problems with underlying assumptions. Engineers, scientists and technical people are very used to working with data on a on a transformed scale, such as a log scale. So what happened to these interactions? Well, the transformation actually in this case did all three things that we hope for it to do. It stabilized the variance, it induced approximate normality at least into the data. And it produced a model with a simpler form, wow. This is really kind of interesting, because sometimes, and this is an example of this, transformations give you some insight into the underlying mechanism that's driving this process. For example, we've just seen that the log y = X1 + X2 + X3. And of course, these are multiplied by constants, let's call them C1, C2 and C3. So what does this suggest that the underlying physical mechanism looks like? Doesn't this suggest that y = X1 to the C1, times X2 to the C2, times X3 to the C3? In other words, the underlying physical mechanism is multiplicative, it's not additive, its multiplicative. And the logarithm essentially simplifies that into a more easily interpretable form. And why did the original linear model on the original scale have interactions? Well, because this underlying mechanism is not linear, it's curved, it's nonlinear. And the only way that linear model could account for any sort of potential curvature in that function in the original metric was to introduce interaction terms. Because, remember, interaction is a form of curvature. So I think this is a really nice example illustrating how a transformation can accomplish a lot of things. Including providing some insight into the underlying physical mechanism that drives this particular process. Okay, thanks for watching, that's the end of this lecture.