In this video, we discuss how our methodology should change if the means we're comparing are paired, in other words, dependent. Good news, there's not much new here. We'll quickly see that we can summarize pair data in a way that allows us to reuse techniques we've already learned in the course. As usual, we're going to frame our discussion around a real world problem. 200 observations were randomly sampled from the High School and Beyond survey. The same students took a reading and a writing test. At a first glance, how are the two distributions of reading and writing scores similar, or how are they different? It appears that the median writing score is slightly higher than the median reading score. Both distributions seem fairly symmetric, but the reading scores are slightly more right-skewed. As evidenced by the median that is closer to the 25th percentile than the 75th percentile. And the reading scores are slightly more variable than the writing scores, as well. That all being said, at a first glance, it is really difficult to tell if there's a difference in reading and writing scores. So can the reading and writing scores for a given student be assumed to be independent of each other? Well, a student's reading score is likely not independent of their own writing score. If they're generally a high achieving student, they're likely to score highly on both tests. When two sets of observations have this special correspondence, or in other words they are not independent, they are said to be paired. To analyze paired data, it is often useful to look at the difference in outcomes of each pair of observations. So here, for example, for each student, we subtract their writing score from their reading score, and create a new variable called the difference between the two scores. And calculate this difference for each student in our data set. It's often a good idea to start by defining the parameter of interest and the point estimate. In this case, we're interested in finding the average difference between the reading and writing scores of all high school students. Which we're going to define as mu diff. Diff standing for difference and mu stands for the population mean. Since we don't have access to the whole population, we're going to estimate this unknown population mean with our sample statistic, which is the average difference between the reading and writing scores of these sampled 200 high school students, which we're defining here as x bar diff. x bar meaning the sample mean, and diff, once again, standing for the difference. If in fact there was no difference between the reading and writing scores, what would we expect the differences to be? No difference simply means zero. Let's take a look at the distribution of these differences. We can indeed see that they are centered around zero. However, the average difference is not exactly equal to zero. And we're also seeing quite a bit of variability in this distribution. Therefore, it's impossible to determine whether there is a statistically significant difference between the average reading and writing scores simply by visually evaluating this plot. For this we're going to need statistical inference tools once again. Next, let's define our hypotheses. Remember, the null hypothesis always says there's nothing going on, so given that we defined our parameter of interest as mu diff, the null hypothesis will set this value equal to zero, indicating no difference between the reading and writing scores. And the alternative hypothesis can then be defined as mu diff does not equal zero, indicating a difference between the test scores. Things are starting to look a lot like what we've seen before. We summarized our two columns of data into just one column of differences, and we're once again setting out to do inference on a single population mean that we're calling mu diff. Therefore, the structure is going to be exactly the same as hypothesis test for doing a test on any single mean, except that we're really doing inference on a difference of paired means. The mechanics, the conditions, etc. will all be the same as working with a single population mean. Then, the test statistic for this hypothesis test can be calculated as t equals negative 0.545, the observed difference in sample means, minus 0, the null value, divided by the standard error, which can be calculated as the standard deviation of the differences divided by the square root of the sample size. This yields a test statistic of -0.87. With each t-score, we also need the degrees of freedom, which in this case is 200- 1, 199. Then we draw our curve, mark the observed difference, and shade the tail areas corresponding to the p-value. Since we have a two-sided alternative, we want to shade both tails. Each tail area is approximately 0.193, resulting in a total p-value of approximately 38.6%. Making decisions based on the p-value is simple. Compare the p-value to the significance level and if it's lower, reject the null hypothesis, and conclude that the data provide convincing evidence for the alternative hypothesis. However, understanding what the p-value actually means as a conditional probability, it usually takes a little bit more practice. Here we have a series of options and we're tasked with finding the correct interpretation of the p-value as a probability. The best approach for answering questions like these is to actually take these statements out of context and think about what these probabilities mean generically. A, for example, says the p-value is the probability that the average scores on the reading and writing exams are equal. This basically means p-value equals the probability of the null hypothesis being true. We know that is not the definition of the p-value, so that is not correct. B says that the p-value is the probability that the average scores on the reading and writing exams are different. This basically means p-value equals the probability of the alternative hypothesis being true, which we also know is not the case for the definition of the p-value. So that's not correct either. C says, p-value is the probability of obtaining a random sample of 200 students. Where the average difference between the reading and writing scores is at least 0.545 in either direction, if in fact the true average difference between the scores is zero. This generically reads as probability of observed or more extreme outcome, given the null hypothesis is true. Which is indeed the definition of the p-value. That's the correct option, but for completeness's sake let's take a look at D as well. D say p-value is the probability of incorrectly rejecting the null hypothesis if in fact the null hypothesis is true. This is actually the probability of a type I error and not the definition of p-value. In summary, we started off with two variables, the reading and writing scores of the same set of students. And we summarized these variables into one by taking the pairwise differences. In situations where we do inference for pair data, most often the null hypothesis sets the average difference between the two paired means equal to zero. Indicating no difference between them. Paired data can happen when we have a set of data from the same set of people. Like in this case or in cases of pre-post studies. A weight-loss study, for example, is a good example here. The post-weight of an individual after a diet regimen will necessarily be dependent on their pre-weight. Other studies might also take repeated measures on the same set of people. For example, you might measure reaction time of the same set of people after they have spent the recommended amount of 7.5 hours the previous night or if they've only spent two hours. We might also use paired approaches when we have different sets of subjects to begin with. But for some reason these subjects we believe to be not independent. Twin studies is an obvious example for these or studies on partner A and partner B who are in a relationship. We would design these studies as paired if we believe these individuals in the two groups are similar on certain aspects and we're evaluating their differences on other aspects.