Understanding Correlation in Two-Variable Data Analysis

 
Analyzing Two-Variable Data
 
Lesson 2.4
Calculating the Correlation
 
2
 
 
 
Calculate the correlation between two quantitative variables.
Apply the properties of the correlation.
Describe how outliers influence the correlation.
 
Calculating the Correlation
Calculating the Correlation
In the previous lesson, we learned that the correlation 
r
 measures the
strength and direction of the linear relationship between two
quantitative variables.
 
D
o
 
t
o
u
c
h
d
o
w
n
s
 
p
r
e
d
i
c
t
 
w
i
n
s
?
C
a
l
c
u
l
a
t
i
n
g
 
t
h
e
 
c
o
r
r
e
l
a
t
i
o
n
 
P
R
O
B
L
E
M
:
 
 
T
h
e
 
t
a
b
l
e
 
s
h
o
w
s
 
t
h
e
 
n
u
m
b
e
r
 
o
f
 
t
o
u
c
h
d
o
w
n
s
 
a
n
d
 
t
h
e
n
u
m
b
e
r
 
o
f
 
w
i
n
s
 
f
o
r
 
t
h
e
 
f
o
u
r
 
t
e
a
m
s
 
i
n
 
t
h
e
 
N
F
C
 
W
e
s
t
 
d
i
v
i
s
i
o
n
 
o
f
 
t
h
e
 
N
F
L
d
u
r
i
n
g
 
a
 
r
e
c
e
n
t
 
s
e
a
s
o
n
.
 
C
a
l
c
u
l
a
t
e
 
t
h
e
 
c
o
r
r
e
l
a
t
i
o
n
 
f
o
r
 
t
h
e
s
e
 
d
a
t
a
.
D
o
 
t
o
u
c
h
d
o
w
n
s
 
p
r
e
d
i
c
t
 
w
i
n
s
?
C
a
l
c
u
l
a
t
i
n
g
 
t
h
e
 
c
o
r
r
e
l
a
t
i
o
n
 
Calculating the Correlation
 
To calculate the correlation, both variables must be quantitative. If one
or both of the variables are categorical, we can consider the
association
 between the two variables, but not the correlation.
 
Calculating the Correlation
 
Measuring foot length and height in inches rather than centimeters does
not change the correlation between foot length and height.
 
Calculating the Correlation
W
e
i
g
h
t
 
t
r
a
i
n
i
n
g
,
 
t
h
i
r
d
 
r
e
p
?
P
r
o
p
e
r
t
i
e
s
 
o
f
 
c
o
r
r
e
l
a
t
i
o
n
 
P
R
O
B
L
E
M
:
 
T
h
e
 
s
c
a
t
t
e
r
p
l
o
t
 
b
e
l
o
w
 
s
h
o
w
s
 
t
h
e
 
r
e
l
a
t
i
o
n
s
h
i
p
 
b
e
t
w
e
e
n
 
t
h
e
 
a
m
o
u
n
t
 
o
f
 
w
e
i
g
h
t
s
t
u
d
e
n
t
s
 
f
r
o
m
 
G
a
r
y
 
L
a
n
g
s
 
w
e
i
g
h
t
 
t
r
a
i
n
i
n
g
 
c
l
a
s
s
 
c
a
n
 
s
q
u
a
t
 
a
n
d
 
b
e
n
c
h
 
p
r
e
s
s
 
f
r
o
m
 
t
h
e
e
x
a
m
p
l
e
 
i
n
 
L
e
s
s
o
n
 
2
.
2
.
 
 
T
h
e
 
c
o
r
r
e
l
a
t
i
o
n
 
i
s
 
r
 
=
 
0
.
9
3
9
.
 
 
 
 
 
 
 
 
 
 
(a)  What would happen to the correlation if squat weight was plotted on the horizontal
axis and bench press weight was plotted on the vertical axis? Explain.
T
h
e
 
c
o
r
r
e
l
a
t
i
o
n
 
w
o
u
l
d
 
s
t
i
l
l
 
b
e
 
r
 
=
 
0
.
9
3
9
,
 
b
e
c
a
u
s
e
 
t
h
e
 
c
o
r
r
e
l
a
t
i
o
n
 
m
a
k
e
s
 
n
o
d
i
s
t
i
n
c
t
i
o
n
 
b
e
t
w
e
e
n
 
e
x
p
l
a
n
a
t
o
r
y
 
a
n
d
 
r
e
s
p
o
n
s
e
 
v
a
r
i
a
b
l
e
s
.
W
e
i
g
h
t
 
t
r
a
i
n
i
n
g
,
 
t
h
i
r
d
 
r
e
p
?
P
r
o
p
e
r
t
i
e
s
 
o
f
 
c
o
r
r
e
l
a
t
i
o
n
 
(b) What would happen to the correlation if squat weight was measured
in kilograms instead of pounds? Explain.
T
h
e
 
c
o
r
r
e
l
a
t
i
o
n
 
w
o
u
l
d
 
s
t
i
l
l
 
b
e
 
r
 
=
 
0
.
9
3
9
,
 
b
e
c
a
u
s
e
 
t
h
e
 
c
o
r
r
e
l
a
t
i
o
n
d
o
e
s
n
t
 
c
h
a
n
g
e
 
w
h
e
n
 
w
e
 
c
h
a
n
g
e
 
t
h
e
 
u
n
i
t
s
 
o
f
 
e
i
t
h
e
r
 
v
a
r
i
a
b
l
e
.
 
(c) Egnarts claims that the correlation between squat weight and bench
press weight is 
r
 = 0.939 pounds. Is this correct?
N
o
.
 
T
h
e
 
c
o
r
r
e
l
a
t
i
o
n
 
d
o
e
s
n
t
 
h
a
v
e
 
u
n
i
t
s
,
 
s
o
 
i
n
c
l
u
d
i
n
g
 
p
o
u
n
d
s
 
i
s
i
n
c
o
r
r
e
c
t
.
 
 
 
 
 
 
 
 
Calculating the Correlation
 
The formula for correlation involves the mean and standard deviation of
both variables. Because the mean and standard deviation aren’t
resistant to outliers, the correlation isn’t resistant to outliers either.
 
Consider a scatterplot in which the majority of points form a positive,
linear association.
 
An outlier that is in the same
pattern as the rest of the
points will make the
correlation closer to 1.
An outlier that is not in the
pattern of the rest of the
points will make the
correlation closer to 0—or
possibly negative.
S
m
a
l
l
e
r
 
p
e
r
s
o
n
,
 
s
m
a
l
l
e
r
 
I
.
Q
.
?
O
u
t
l
i
e
r
s
 
a
n
d
 
C
o
r
r
e
l
a
t
i
o
n
 
P
R
O
B
L
E
M
:
 
T
h
e
 
s
c
a
t
t
e
r
p
l
o
t
 
b
e
l
o
w
 
s
h
o
w
s
 
t
h
e
 
r
e
l
a
t
i
o
n
s
h
i
p
 
b
e
t
w
e
e
n
 
t
h
e
 
h
e
i
g
h
t
 
(
i
n
 
i
n
c
h
e
s
)
a
n
d
 
c
u
m
u
l
a
t
i
v
e
 
w
e
i
g
h
t
e
d
 
G
P
A
 
f
o
r
 
a
 
s
a
m
p
l
e
 
o
f
 
h
i
g
h
 
s
c
h
o
o
l
 
s
t
u
d
e
n
t
s
.
 
 
H
o
w
 
d
o
 
t
h
e
 
t
w
o
p
o
i
n
t
s
 
i
n
 
r
e
d
 
(
i
n
 
t
h
e
 
u
p
p
e
r
-
r
i
g
h
t
 
c
o
r
n
e
r
 
o
f
 
t
h
e
 
g
r
a
p
h
)
 
a
f
f
e
c
t
 
t
h
e
 
c
o
r
r
e
l
a
t
i
o
n
?
 
 
E
x
p
l
a
i
n
.
 
 
 
 
 
 
 
 
 
 
 
T
h
e
 
m
a
j
o
r
i
t
y
 
o
f
 
p
o
i
n
t
s
 
s
h
o
w
s
 
a
 
m
o
d
e
r
a
t
e
 
n
e
g
a
t
i
v
e
 
a
s
s
o
c
i
a
t
i
o
n
.
 
 
H
o
w
e
v
e
r
,
 
b
e
c
a
u
s
e
t
h
e
 
t
w
o
 
p
o
i
n
t
s
 
i
n
 
t
h
e
 
u
p
p
e
r
-
r
i
g
h
t
 
a
r
e
 
s
e
p
a
r
a
t
e
d
 
f
a
r
 
f
r
o
m
 
t
h
e
 
r
e
s
t
 
o
f
 
t
h
e
 
p
o
i
n
t
s
 
i
n
 
t
h
e
x
 
d
i
r
e
c
t
i
o
n
,
 
t
h
e
y
 
a
r
e
 
v
e
r
y
 
i
n
f
l
u
e
n
t
i
a
l
 
a
n
d
 
w
i
l
l
 
m
a
k
e
 
t
h
e
 
c
o
r
r
e
l
a
t
i
o
n
 
c
l
o
s
e
r
 
t
o
 
0
 
a
n
d
p
o
s
s
i
b
l
y
 
p
o
s
i
t
i
v
e
.
 
W
i
t
h
o
u
t
 
t
h
e
s
e
 
t
w
o
 
p
o
i
n
t
s
,
 
t
h
e
 
c
o
r
r
e
l
a
t
i
o
n
 
i
s
 
c
l
e
a
r
l
y
 
n
e
g
a
t
i
v
e
.
L
E
S
S
O
N
 
A
P
P
 
2
.
4
 
Archaeopteryx is an extinct beast having feathers like a bird, but teeth
and a long bony tail like a reptile. Because the known specimens differ
greatly in size, some scientists think they are different species rather
than individuals from the same species. However, if the specimens
belong to the same species and differ in size because some are
younger than others, there should be a positive linear relationship
between the lengths of a pair of bones from all individuals. An outlier
from this relationship would suggest a different species.
 
Here are data on the lengths (in centimeters) of the femur (a leg bone)
and the humerus (a bone in the upper arm) for five specimens that
preserve both bones.
 
F
l
y
i
n
g
 
d
i
n
o
s
a
u
r
 
o
r
 
e
a
r
l
y
 
b
i
r
d
?
L
E
S
S
O
N
 
A
P
P
 
2
.
4
 
F
l
y
i
n
g
 
d
i
n
o
s
a
u
r
 
o
r
 
e
a
r
l
y
 
b
i
r
d
?
 
1.
Make a scatterplot using length of femur as the explanatory variable.
Do you think that all five specimens come from the same species?
Explain.
 
1.
Find the correlation 
r
 step by step, using the formula on page 121.
Explain how your value for 
r
 matches your graph in part (1).
 
1.
Suppose that a new fossil was discovered. If the femur is 70
centimeters and the humerus is 40 centimeters, do you think this
specimen came from the same species? Explain.
 
1.
What effect will the new fossil have on the correlation? Explain.
L
E
S
S
O
N
 
A
P
P
 
2
.
4
 
F
l
y
i
n
g
 
d
i
n
o
s
a
u
r
 
o
r
 
e
a
r
l
y
 
b
i
r
d
?
L
E
S
S
O
N
 
A
P
P
 
2
.
4
 
F
l
y
i
n
g
 
d
i
n
o
s
a
u
r
 
o
r
 
e
a
r
l
y
 
b
i
r
d
?
L
E
S
S
O
N
 
A
P
P
 
2
.
4
 
F
l
y
i
n
g
 
d
i
n
o
s
a
u
r
 
o
r
 
e
a
r
l
y
 
b
i
r
d
?
 
 
 
Calculate the correlation between two quantitative variables.
Apply the properties of the correlation.
Describe how outliers influence the correlation.
 
Calculating the Correlation
Slide Note
Embed
Share

This content discusses calculating the correlation between two quantitative variables, its properties, and how outliers can influence the correlation. It provides a step-by-step guide on how to calculate the correlation coefficient 'r' using z-scores. Additionally, it presents a practical problem of whether touchdowns predict wins in the NFL's NFC West division, illustrating the application of correlation analysis in real-world scenarios.


Uploaded on Jul 16, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Analyzing Two-Variable Data Lesson 2.4 Calculating the Correlation Statistics and Probability with Applications, 3rdEdition Starnes & Tabor Bedford Freeman Worth Publishers

  2. Calculating the Correlation Learning Targets After this lesson, you should be able to: Calculate the correlation between two quantitative variables. Apply the properties of the correlation. Describe how outliers influence the correlation. Statistics and Probability with Applications, 3rdEdition 2 2

  3. Calculating the Correlation In the previous lesson, we learned that the correlation r measures the strength and direction of the linear relationship between two quantitative variables. How to Calculate the Correlation r 1. Find the mean x-bar and the standard deviation sx of the explanatory variable. Calculate the z-score for the value of the explanatory variable for each individual. Find the mean y-bar and the standard deviation sy of the response variable. Calculate the z-score for the value of the response variable for each individual. For each individual, multiply the z-score for the explanatory variable and the z-score for the response variable. Add the z-score products and divide the sum by n 1. 2. 3. 4. Statistics and Probability with Applications, 3rdEdition 3 3

  4. Do touchdowns predict wins? Do touchdowns predict wins? Calculating the correlation Calculating the correlation PROBLEM: The table shows the number of touchdowns and the number of wins for the four teams in the NFC West division of the NFL during a recent season. Calculate the correlation for these data. Team Arizona Cardinals Seattle Seahawks St. Louis Rams San Francisco 49ers Touchdowns 58 49 31 24 Wins 13 10 7 5 Statistics and Probability with Applications, 3rd Edition 4 4

  5. z z 58 40.5 13 8.75 49 40.5 10 8.75 31 40.5 7 8.75 24 40.5 5 8.75 x y x y = = = = = 15.7 3.5 15.7 3.5 15.7 3.5 15.7 3.5 1.11 1.21 0.54 0.36 = 0.61 = 0.5 1.05 1.07 = = = z z x y x y s s Do touchdowns predict wins? Do touchdowns predict wins? Calculating the correlation Calculating the correlation x y Statistics and Probability with Applications, 3rd Edition 5 5

  6. Calculating the Correlation To calculate the correlation, both variables must be quantitative. If one or both of the variables are categorical, we can consider the association between the two variables, but not the correlation. Properties of the Correlation r 1. Correlation makes no distinction between explanatory and response variables. It makes no difference which variable you call x and which you call y in calculating the correlation. Statistics and Probability with Applications, 3rd Edition 6 6

  7. Calculating the Correlation Properties of the Correlation r 2. Because r uses the standardized values of the observations, r does not change when we change the units of measurement of x, y, or both. Measuring foot length and height in inches rather than centimeters does not change the correlation between foot length and height. Statistics and Probability with Applications, 3rd Edition 7 7

  8. Calculating the Correlation Properties of the Correlation r 3. The correlation r has no units of measurement because we are using standardized values in the calculation and standardized values have no units. Statistics and Probability with Applications, 3rd Edition 8 8

  9. Weight training, third rep? Weight training, third rep? Properties of correlation Properties of correlation PROBLEM: The scatterplot below shows the relationship between the amount of weight students from Gary Lang s weight training class can squat and bench press from the example in Lesson 2.2. The correlation is r = 0.939. 450 400 350 Squat weight (pounds) 300 250 200 150 100 100 120 140 160 180 200 Bench press weight (pounds) (a) What would happen to the correlation if squat weight was plotted on the horizontal axis and bench press weight was plotted on the vertical axis? Explain. The correlation would still be r = 0.939, because the correlation makes no distinction between explanatory and response variables. Statistics and Probability with Applications, 3rd Edition 9 9

  10. Weight training, third rep? Weight training, third rep? Properties of correlation Properties of correlation (b) What would happen to the correlation if squat weight was measured in kilograms instead of pounds? Explain. The correlation would still be r = 0.939, because the correlation doesn t change when we change the units of either variable. (c) Egnarts claims that the correlation between squat weight and bench press weight is r = 0.939 pounds. Is this correct? No. The correlation doesn t have units, so including pounds is incorrect. Statistics and Probability with Applications, 3rd Edition 10 10

  11. Calculating the Correlation The formula for correlation involves the mean and standard deviation of both variables. Because the mean and standard deviation aren t resistant to outliers, the correlation isn t resistant to outliers either. Consider a scatterplot in which the majority of points form a positive, linear association. An outlier that is in the same pattern as the rest of the points will make the correlation closer to 1. An outlier that is not in the pattern of the rest of the points will make the correlation closer to 0 or possibly negative. Statistics and Probability with Applications, 3rd Edition 11 11

  12. Smaller person, smaller I.Q.? Smaller person, smaller I.Q.? Outliers and Correlation Outliers and Correlation PROBLEM: The scatterplot below shows the relationship between the height (in inches) and cumulative weighted GPA for a sample of high school students. How do the two points in red (in the upper-right corner of the graph) affect the correlation? Explain. The majority of points shows a moderate negative association. However, because the two points in the upper-right are separated far from the rest of the points in the x direction, they are very influential and will make the correlation closer to 0 and possibly positive. Without these two points, the correlation is clearly negative. Statistics and Probability with Applications, 3rd Edition 12 12

  13. LESSON APP 2.4 Flying dinosaur or early bird? Archaeopteryx is an extinct beast having feathers like a bird, but teeth and a long bony tail like a reptile. Because the known specimens differ greatly in size, some scientists think they are different species rather than individuals from the same species. However, if the specimens belong to the same species and differ in size because some are younger than others, there should be a positive linear relationship between the lengths of a pair of bones from all individuals. An outlier from this relationship would suggest a different species. Here are data on the lengths (in centimeters) of the femur (a leg bone) and the humerus (a bone in the upper arm) for five specimens that preserve both bones. Statistics and Probability with Applications, 3rd Edition 13 13

  14. LESSON APP 2.4 Flying dinosaur or early bird? 1. Make a scatterplot using length of femur as the explanatory variable. Do you think that all five specimens come from the same species? Explain. 1. Find the correlation r step by step, using the formula on page 121. Explain how your value for r matches your graph in part (1). 1. Suppose that a new fossil was discovered. If the femur is 70 centimeters and the humerus is 40 centimeters, do you think this specimen came from the same species? Explain. 1. What effect will the new fossil have on the correlation? Explain. Statistics and Probability with Applications, 3rd Edition 14 14

  15. LESSON APP 2.4 Flying dinosaur or early bird? Statistics and Probability with Applications, 3rd Edition 15 15

  16. LESSON APP 2.4 Flying dinosaur or early bird? Statistics and Probability with Applications, 3rd Edition 16 16

  17. LESSON APP 2.4 Flying dinosaur or early bird? Statistics and Probability with Applications, 3rd Edition 17 17

  18. Calculating the Correlation Learning Targets After this lesson, you should be able to: Calculate the correlation between two quantitative variables. Apply the properties of the correlation. Describe how outliers influence the correlation. Statistics and Probability with Applications, 3rd Edition 18 18

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#