Understanding Correlation in Quantitative Variables
Explanation of how to calculate correlation between two quantitative variables, the importance of outliers in correlation, and the impact of strength and direction on the correlation coefficient.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Do Now Describe the following relationships
Important ideas Explanatory- always on x axis (horizontal) Response always on y axis (vertical) D.O.F.S Direction Outliers Form Strength
Correlation Coefficient The correlation r is a measure of the strength and direction of a linear relationship between two quantitative variables. The correlation r is a value between -1 and 1 (-1 r 1). If the relationship is negative, then r < 0. If the relationship is positive, then r > 0. If r =1 or r = -1, then there is a perfect linear relationship. In other words, all of the points will be exactly on a line. If there is very little scatter from the linear form, then r is close to 1 or -1. The more scatter from the linear form, the closer r is to 0.
Do Now Homework out!
Lesson 2.4: Calculating Correlation Objectives Calculate the correlation between two quantitative variables. Apply the properties of the correlation. Describe how outliers influence the correlation.
Calculating the Correlation 1. Find the mean ? and the standard deviation Sx of the explanatory variable. Calculate the z-score for the value of the explanatory variable for each individual. 2. Find the mean ? and the standard deviation Sy of the response variable. Calculate the z-score for the value of the response variable for each individual. 3. For each individual, multiply the z-score for the explanatory variable and the z-score for the response variable. 4. Add the z-score products and divide the sum by n-1. Here is the formula:
The table shows the foot length (in centimeters) and the height (in centimeters) for a random sample of six high school seniors. Calculate the correlation for these data.
Calculator Time! 1. Hit 2nd0 (Brings us to CATALOG) 2. Scroll alllllll the way down to DiagnosticON (you only need to do this once, unless your calculator is reset or you use a different one) 3. Hit enter 4. And enter again. Should say Done 5. Enter data into L1 and L2 (Stat-Edit) 6. Then Stat-Calc LinReg (a+bx) This is #8, not #4 7. r is your correlation coefficient!
To see the scatter plot Go to Stat Plot (2ndy=) and make sure plot1 is on It should also be set to scatter Then hit ZOOM Stat (#9)
If I give you homework. You can use this website at home: https://www.socscistatistics.com/tests/pearson/
You try! Find r, state the type of correlation. Confirm by looking at the scatterplot.
Properties of Correlation Correlation makes no distinction between explanatory and response variables. It makes no difference which variable you call x and which you call y in calculating the correlation. As you can see in the formula, reversing the roles of x and y would only change the order of the multiplication, not the product
Likewise, the scatterplots in the figure show the same direction and strength, even though the variables are reversed in the second scatterplot.
Because r uses the standardized values of the observations, r does not change when we change the units of measurement of x, y, or both. Measuring foot length and height in inches rather than centimeters does not change the correlation between foot length and height. The figure on the next page gives two scatterplots of the same six students. The graph on the left (a) uses centimeters for both measurements, and the graph on the right (b) uses inches for both measurements. The strength and direction are identical only the scales on the axes have changed.
The correlation r has no units of measurement because we are using standardized values in the calculation and standardized values have no units.
You try! The scatterplot shows the relationship between 40-yarddash times and long-jump distances. The correlation is r = -0.838. a) What would happen to the correlation if long-jump distance was plotted on the horizontal axis and dash time was plotted on the vertical axis? Explain. b) What would happen to the correlation if long-jump distance was measured in feet instead of inches? Explain. c) Sabrina claims that the correlation between long-jump distance and dash time is r = -0.838 inches per second. Is this correct?
Solution a) The correlation would still be r = -0.838 because the correlation makes no distinction between explanatory and response variables. b) The correlation would still be r = -0.838 because the correlation doesn t change when we change the units of either variable. c) No. The correlation doesn t have units, so including inches per second is incorrect.