Okay. We're back talking about 2_k factorial designs. At the end of the last lecture, I talked a bit about the general case; k factors each at two levels, k main effects, and then the number of two-factor interactions would be k things taken two at a time. Number three-factor interactions, k things taken three at a time, and there would be in general 1k factor interaction. These can be rather large experiments. Five factors, for example, 2_5 would be 32 runs. Ten factors, 2_10 would be 1,024 runs. These are getting to be pretty big experiments and people are not going to be excited about doing that many runs. So one of the things we need to talk about is how do we reduce the number of runs? Well, one of the strategies that's widely used to do that is to run the design as an unreplicated factorial. So we have a 2_k design, but we have only one observation at each corner of the cube. Wow. By the way, an unreplicated two 2_k is sometimes called a single replica of the 2_k. These terminologies are used interchangeably. These designs are very widely used, and the reason for that, of course, is because if you start replicated the 2_k, the designs can get pretty big, pretty quickly. There are some risks associated with running an unreplicated 2_k. What if you have an unusual response observation at one of the corners? Does that spoil the results? Well, if it turns out not necessarily. It would have to be awfully huge for that to be a problem. Another more insidious problem is modeling noise. Let me explain what I mean by that. Suppose you have a single factor x, as you see on the plot, and this is the true effect of that factor, and when you run up an observation at some level of x down here, you get an observation at random from this band. So that band represents the noise in the system. Well, suppose you space your factor levels here and here fairly close together and you get the results that you see represented by the two dots in the diagram. Well, if you fit the first-order straight-line model to that, look at the slope of that line. The slope of that line is zero. So what would you conclude about factor x? It's not important, but yet we can see that factor x does clearly have a strong positive effect. So what do we do? How do we prevent the signal from being overwhelmed by the noise? Spread the factor levels out, be more aggressive in the spacing of factor levels. Now, if we get them same two observations, the line we fit has a positive slope. It's not exactly the same as the true factor effect, but it's positive, and we would get a very good idea that there is an active factor x present in the system. So what this is pointing out is if your factors are spaced too closely together, the effect estimate can be overwhelmed by the noise. So the answer is be more aggressive. Now, don't do crazy things. Don't be so aggressive that you blow the plant up or that you you injure somebody or you damage the equipment. Don't do crazy things like that. But in general, in running these experiments, particularly at the outset when we have a lot of factors and we're trying to figure out which factors are important, you should view your mission as being the same as the mission of Captain Kirk and the crew of the Starship Enterprise. All of you Star Trek fans know exactly what the mission was. What was it? It was to go boldly where no man has gone before. So be bold, spread them about, but as I say, don't go crazy. Now, how do you analyze unreplicated designs? Well, because you don't have replication, you don't have any purest or estimate the pure error or as some people call it, an internal estimate of error. If you fit the full model, you have zero degrees of freedom for error. What do you do? Well, one solution would be to rather arbitrarily pool some of your higher-order interactions together to estimate error. For example, you might rather arbitrarily say that, well, all the interactions beyond order 3 or beyond order 2 say three-factor error are not important, then just pull them together and consider them to be error. You could do that, but there is some risk, of course, that there could be a significant three-factor or higher interaction effect. There is a technique which I'll show you was developed by Cuthbert Daniel, published in 1959 that uses normal probability plots of factor effects to judge importance, and that's a better pretty reliable method, and there are also some other methods that are discussed in the text, and I'll show you at least one of those. It's a computer based method that is actually quite good. Here's an example of an unreplicated design. This is a 2_4 factorial, and it was used to investigate the effect of four factors on filtration rate of a resin. Now, this resin is a material that's going to be used to manufacture particle board and wood paneling. So it's producing what we think of as forest products. The four factors are: temperature, pressure, mole ratio, that's molar ratio of formaldehyde to urea, and D, the stirring rate, and this experiment was conducted on a pilot plant. It's of 2_4 full factorial unreplicated design, and here's the experimental results. Look at the test matrix, can you see that this is in standard order? Yes. Look at column A minus plus minus plus, column B, two minuses, two pluses. Column C, four minus four places, column D, eight minuses, eight places. This is standard order. Look at the filtration rates. Now, these filtration rates, the responses, had to be determined offline in an analytical lab using a sample of material. But when I look at these results, I get really excited. Why do I get excited? Well, because some of the lowest values are in the low to mid 40s and the highest values are around 100. When I change these factors, something happened. If you look at the outcome of an experiment like this, and all the results are in a very narrow range, say 70-75, I might be suspicious that we didn't change the factors enough to be able to see something or that there's just a lot of noise obscuring the results. But here I feel pretty confident that we've got results that will be useful. Here is a graphical display of the resin plant experiment. This is a 2_4 factorial, and from this, I think you can see why I said the single spurious observation doesn't impact you very much. Suppose you get a single spurious observation right here. You don't get a 96, you get like 140. Well, when you calculate factor affects, remember how you do that. If you calculate the effect of A, for example, you're going to get all the runs here. They're the plus runs, and you're going to get all the runs here, they're the minus runs and you're going to calculate the difference in averages of those two groups of runs. So how much influence does that one single crazy value have on the A plus average? It only has a weight of one-eighth. So far as the overall estimate of any effect is concerned, the effect of it is a total of one-sixteenth of its magnitude. So you would have to have a really huge outlier for it to have a gigantic effect on the results. In fact, it would have to be so big that you could probably see that it's an outlier just by looking at the raw data. Here's the table of plus and minus signs for this design. Much bigger than it was for the 2_3, but it's constructed the same way. Columns A, B, C, and D are just the four columns from the 2_4 design, you notice they're in standard order, and then all of the other columns are products of other columns in the table. AB, for example, is the product of A times B, ABD is the product of A times B times D, or you could also get that by multiplying BD times A. These columns allow you to produce the contrasts. You can generate the contrast for any factor simply by taking the signs for that column, appending the signs to the observations, and adding. Once you have the contrast, of course, then you can get the effect estimate. Simply divide the contrast by half the number of runs. If you want the regression coefficient, you divide the contrast by the total number of runs, and if you want the sum of squares, you can square the contrast and divide that by the total number of runs. So if you're doing the arithmetic manually, these designs are a joint.