Welcome to Correlation – Statistics. In this video, you’ll learn about the Pearson correlation and how you can use it to measure and determine the strength of the correlation between two or more variables. One way to measure the strength of the correlation between continuous numerical variables is by using a method called Pearson correlation. The Pearson correlation method provides you with two values: the correlation coefficient and the P-value. So, how do you interpret these values? For the correlation coefficient, a value close to 1 implies a large positive correlation, while a value close to negative 1 implies a large negative correlation, and a value close to zero implies no correlation between the variables. The P-value measures the certainty of the correlation that you calculated. Roughly speaking, a P-value that is less than .001 indicates a strong certainty about the correlation coefficient. A value between .001 and .05 gives you moderate certainty. A value between .05 and .1 gives you a weak certainty. And a P-value larger than .1 gives you no certainty of correlation at all. So, when can you say the correlation between two variables is strong? There are two criteria you must meet. First, the correlation coefficient is close to 1 or negative 1. And second, the P-value is less than .001. This image shows data with different correlation values. In the top row, the correlation reflects the strength and direction of a linear relationship. In the middle row, the correlation not only shows the strength and direction but also the slope of a linear relationship. In the bottom row, it shows many aspects of nonlinear relationships. Note that the figure in the center has a slope of 0 but in that case, the correlation coefficient is undefined because the variance of Y is zero. This example looks at the correlation between “DepDelay” and “ArrDelay” using the correlation function cor(). This function takes two arguments, x and y, which are the variables you want to calculate the correlation for. Both values must be numeric, which includes integers or real numbers, such as 1, 2, 3, 1.5, or 6.7. Notice that the correlation coefficient is approximately 0.88, and because this is close to 1, there is a strong positive correlation. To examine whether this strong correlation is statistically significant, you must also calculate the P-value of this correlation using the cor.test() function. Notice that the P-value is very small, much smaller than .001. So, you can be certain that there is a strong positive correlation. To calculate correlations on multiple variables, you can use the rcorr() function from the hmisc package. This function computes a matrix of Pearson's r correlation coefficients for all possible pairs of columns. Then, it returns a data frame with all correlations between the columns. This example uses the rcorr() function to calculate the correlations between seven variables, including "ArrDelayMinutes", "DepDelayMinutes", "CarrierDelay", "WeatherDelay", "NASDelay", "SecurityDelay", and "LateAircraftDelay". Taking all variables into account, you can now create a heatmap that visualizes the correlation between each of the variables with one another. The color scheme indicates the Pearson correlation coefficient, indicating the strength of the correlation between two variables. Notice the diagonal line with a dark red color. This indicates that all the values on this diagonal are highly correlated. This makes sense because when you look closer, the values on the diagonal are the correlation of all variables with themselves, which will be always 1. This correlation heatmap gives you a good overview of how the different variables are related to one another and, most importantly, how these variables are related to arrival delays. This example uses the corrplot() function to plot an elegant graph of a correlation matrix. If you are interested in the code script that was used to create this heatmap, you can check out lab 3. In this video, you learned that the Pearson correlation measures the strength of the correlation between two or more variables. You also learned to measure the strength of a correlation by examining the correlation coefficients that are returned by the cor() and rcorr() functions as well as interpreting the P-values using cor.test().