In this video we'll introduce the t-distribution and discuss its origins and mechanics. In a nut shell, the t-distribution is useful for describing the distribution of the sample mean when the population standard deviation, sigma, is unknown, which is almost always. We'll start our discussion with a review of conditions for inference so far as a motivation for why we need this new distribution. What purpose does a large sample serve? As long as your observations are independent and the populations distribution is not extremely skewed, a large sample is going to ensure you have a sampling distribution of the mean that is nearly normal. And that the estimate of the standard error is reliable. Remember, we estimate the standard error of the sampling distribution as s over the square root of n, where s is the sample standard deviation. That is the best estimate we have for the unknown population standard deviation sigma. If the sample size is large enough, chances are s is indeed a good estimate for sigma, and therefore your overall standard error estimate is reliable. But what if the sample size is small? You might be thinking in the age of big data why are we talking about small samples. It is true that in certain disciplines, especially when data are automatically recorded like webpage clicks or Twitter stream, small sample sizes might be irrelevant. However, there are certainly disciplines where this is not the case. Think for example about a lab experiment or a study that follows a near extinct mammal species. So we need methods that work well for both large samples and small samples. The uncertainty of the standard error estimate is addressed by using the T distribution. This distribution also has a bell shape, so it's unimodal and symmetric, and it looks a lot like the normal distribution. However, its tails are thicker. Comparing the normal and t-distributions visually is the best way to understand what we mean by thick tails. Notice that the peek of the t-distribution doesn't go as high as the peek of the normal distribution. In other words, the t-distribution is somewhat squished in the middle and the additional area is added to the tails. This means that under the t-distribution observations are more likely to fall two standard deviations away from the mean than under the normal distribution. Meaning that confidence intervals constructed using the t distribution will be wider, in other words more conservative than those constructed with the normal distribution. Another way of looking at this is that these extra thick tails are helpful for mitigating the effect of a less reliable estimate for the standard error of the sampling distribution caused by the sample standard deviation instead of the population standard deviation in its calculation. The T distribution just like the standard normal is always centered at zero, and it has one parameter. Degrees of freedom which determines the thickness of the tails. Remember that in contrast the normal distribution has two parameters, the mean and the standard deviation. What happens to the shape of the t distribution as the degrees of freedom increases? The plot shows a series of bell curves going from light to darker shades of grey as the degrees of freedom increases. And what we can see is that as the degrees of freedom increases the shape of the t distribution approaches the normal distribution. Let's talk practicalities next. How do we actually use the t distribution in the statistical inference? The answer is simple. Use the t distribution for inference on a single mean or for comparing two means when the population standard deviations are unknown, which is basically always. Calculate the T statistic just like you would calculate a Z statistic as the sample mean minus the null value divided by the standard error of the sample mean. And find the P value as the probability Of observed or more extreme outcome given that the null hypothesis is true, just like before, except using the T distribution instead of the Z distribution, using R the distribution calculator app or a table. First a little bit of mechanics. We're going to calculate three probabilities. A, probability that the absolute value of Z is greater than two, which is .0455 B, probability that the absolute value of t with 50 degrees of freedom is greater than 2, which is 0.0509. Remember, we talked about thicker tails and a higher percentage of observations falling further than 2 standard deviations away from the mean under the t-distribution. We're starting to see the effect of this with the larger tail area under the t-distribution. Let's take things a little further and decrease the degrees of freedom to 10. The new probability is 0.0734. In summary, as we go from the normal to a t distribution with a somewhat high degrees of freedom to a t distribution with low degrees of freedom, the probability of the test statistic being more than two standard deviations away from the mean Increases. Next, suppose you have a two sided hypothesis test, and your test statistic is 2. Under which of these scenarios would you be able to reject the null hypothesis at the 5% significance level? For the first scenario we have a P value of point, 4.55 % which is indeed less than five % so, we would reject the null hypothesis. We would also like to mention that this P value is pretty close to 0.05. In the second scenario the P value is greater than five % so, we would fail to reject the null hypothesis though again we would mention that the P value is pretty close to five %. And in the last scenario we would definitely fail to reject the null hypothesis. We can see that as we get more conservative with a t distribution with lower degrees of freedom, we also become less likely to be able to reject the null hypothesis. We'll discuss how to calculate degrees or freedom for a particular study or a data set in the following videos but generally degrees of freedom is tied so sample size. Meaning that if your sample size is low, it is not as easy to reject the null hypothesis and that stronger evidence is needed in order to be able to do so. Before we get to working with the t distribution and using an inference examples, let's pause for a moment and talk about where this distribution comes from. It actually has a peculiar name, the student's t distribution. This name come from a student name student used by William Gosset in papers where he develop much of the foundation for this distribution. William Gosset was the head experimental brewer at Guinness brewing company in the early 1900's and his main role was to experimentally brew and gradually improve a consistent and economical barrel of the Guinness stout. This required sometimes working with small samples because maybe he would just have a few batches to try. Therefore, much of the development of the t-distribution comes from trying to make the Guinness beer taste better. Since the Guinness company was worried about their trade secrets getting out, Gosset was asked to publish any work that he was doing under a pseudonym and Student was the name that he chose for himself. While others, like Fisher continued to work on the t-distribution, even Gosset's foundational work. It's named after his pseudonym student. So next time you're having a pint of Guinness, say cheers for statistics.