Understanding Correlation and Causation in Data Analysis

 
Correlation
 
Learning Centre
 
Contents
 
1.
Correlation and its purpose
2.
Describing correlations
3.
An illustrated example
4.
Reporting
 
What is Correlation?
 
A correlation analysis is used to assess the strength and direction of a
relationship between two variables
A correlation coefficient is most commonly annotated using Pearson’s 
r
Pearson’s 
r
 can range from -1 to 1. A negative coefficient indicates a
negative association, while a positive coefficient indicates a positive
association
A common misconception is that correlation equals causation. This is
not the case.
 
How do we describe correlations?
 
Positive Correlation
A
n 
increase
 in one variable
that is accompanied with a
increase
 in another variable
 
Negative Correlation
An 
increase
 in one variable
that is accompanied with a
decrease
 in another
variable
 
No Correlation
Two variables do not covary
 
Pictures taken from https://astutesolutions.com/blog/articles/causation-vs-correlation
 
Correlation ≠ Causation
 
A correlation simply looks at the strength and direction of a relationship
For example, a study found that ice cream sales was strongly positively correlated
with shark attacks. The more ice creams were sold, the more shark attacks
occurred. But, selling ice creams 
do
 
not
 and 
cannot
 cause shark attacks.
Causation implies a cause and effect relationship: a change in Variable
B is caused by a change in Variable A, or vice versa
For example, the more I exercise, the more I feel the fatigue after the exercise. How
tired I feel is directly affected by how much I have worked out.
 
Correlation ≠ Causation
 
If we look at the ice cream example, do ice cream sales cause shark
attacks? Or do shark attacks cause more ice cream sales?
Neither is true! In fact, an increase in ice cream sales is actually caused by hot
weather during summer, and during summer, more people go to the beach to enjoy
water sports. This then leads to higher probability of shark attacks.
There is a third or hidden variable (i.e., hot summer weather), that affects our 2
variables, so we can only say that ice cream sales are positively correlated with shark
attacks, but not that one causes the other.
 
Location of SPSS Data Files
 
Example SPSS data f
or practice 
are available on 
LearnJCU
:
 
Log in to LearnJCU -> Organisations -> Learning Centre JCU Singapore ->
Learning Centre -> Statistics and Maths -> SPSS Data f
or Practice
 
SPSS time!
 
Using the ice cream and shark
example, let us now conduct a
correlation analysis
Go to Analyze -> Correlate ->
Bivariate
Shift the 2 variables of interest to
the right column
Click OK!
 
SPSS time!
 
Looking at the output, we have 
r
 = .941, and the 
p 
value of this correlation less
than < .001
We may classify 
r 
in a few ways:
If 
r 
is below |.3|, it is a
weak correlation
If 
r 
is between |.4| to |.6|, it is a
moderate correlation
If 
r 
is above |.7|, it is a
strong correlation
*Note that different sources will provide different benchmarks on what is a weak, moderate, or strong correlation
 
A Visual Representation
 
We can also plot a 
graph
 to visually
inspect the data
Go Graphs -> Legacy Dialogs ->
Scatter/Dot ->
Select the simple/scatter option -> Define
Move IceCreamSales under Y Axis
Move SharkAttacks under X axis.
You can swap the axes if you wish
Click OK!
 
 
 
 
Here our graph! If we draw a line of best fit, we would get something like this:
 
 
 
 
 
Recall that this line shows a positive correlation.
In our example, 
r = 
.941, which is very close to 1 (look at Slide 4!).
 
A Visual Representation
 
Reporting
 
An example write-up can be found on:
 
JCUS Learning Centre website -> Statistics and Mathematics Support
 
Questions?
learningcentre-singapore@jcu.edu.au
Slide Note
Embed
Share

A correlation analysis assesses the relationship strength and direction between two variables using the Pearson correlation coefficient. It's crucial to differentiate between correlation and causation, where correlation indicates a relationship but not causation. Exploring examples like ice cream sales and shark attacks helps clarify how correlation doesn't imply causation due to hidden variables. Descriptions of correlations - no correlation, negative correlation, and positive correlation - further illustrate this distinction in data analysis.


Uploaded on Jul 18, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Correlation Correlation Learning Centre

  2. Contents Contents 1. Correlation and its purpose 2. Describing correlations 3. An illustrated example 4. Reporting

  3. What is Correlation? What is Correlation? A correlation analysis is used to assess the strength and direction of a relationship between two variables A correlation coefficient is most commonly annotated using Pearson s r Pearson s r can range from -1 to 1. A negative coefficient indicates a negative association, while a positive coefficient indicates a positive association A common misconception is that correlation equals causation. This is not the case.

  4. How do we describe correlations? How do we describe correlations? No Correlation Two variables do not covary Negative Correlation An increase in one variable that is accompanied with a decrease in another variable Positive Correlation An increase in one variable that is accompanied with a increase in another variable Pictures taken from https://astutesolutions.com/blog/articles/causation-vs-correlation

  5. Correlation Causation Correlation Causation A correlation simply looks at the strength and direction of a relationship For example, a study found that ice cream sales was strongly positively correlated with shark attacks. The more ice creams were sold, the more shark attacks occurred. But, selling ice creams do not and cannot cause shark attacks. Causation implies a cause and effect relationship: a change in Variable B is caused by a change in Variable A, or vice versa For example, the more I exercise, the more I feel the fatigue after the exercise. How tired I feel is directly affected by how much I have worked out.

  6. Correlation Causation Correlation Causation If we look at the ice cream example, do ice cream sales cause shark attacks? Or do shark attacks cause more ice cream sales? Neither is true! In fact, an increase in ice cream sales is actually caused by hot weather during summer, and during summer, more people go to the beach to enjoy water sports. This then leads to higher probability of shark attacks. There is a third or hidden variable (i.e., hot summer weather), that affects our 2 variables, so we can only say that ice cream sales are positively correlated with shark attacks, but not that one causes the other.

  7. Location of SPSS Data Files Location of SPSS Data Files Example SPSS data for practice are available on LearnJCU: Log in to LearnJCU -> Organisations -> Learning Centre JCU Singapore -> Learning Centre -> Statistics and Maths -> SPSS Data for Practice

  8. SPSS time! SPSS time! Using the ice cream and shark example, let us now conduct a correlation analysis Go to Analyze -> Correlate -> Bivariate Shift the 2 variables of interest to the right column Click OK!

  9. SPSS time! SPSS time! Looking at the output, we have r = .941, and the p value of this correlation less than < .001 We may classify r in a few ways: If r is below |.3|, it is a weak correlation If r is between |.4| to |.6|, it is a moderate correlation If r is above |.7|, it is a strong correlation *Note that different sources will provide different benchmarks on what is a weak, moderate, or strong correlation

  10. A Visual Representation A Visual Representation We can also plot a graph to visually inspect the data Go Graphs -> Legacy Dialogs -> Scatter/Dot -> Select the simple/scatter option -> Define Move IceCreamSales under Y Axis Move SharkAttacks under X axis. You can swap the axes if you wish Click OK!

  11. A Visual Representation A Visual Representation Here our graph! If we draw a line of best fit, we would get something like this: Recall that this line shows a positive correlation. In our example, r = .941, which is very close to 1 (look at Slide 4!).

  12. Reporting Reporting An example write-up can be found on: JCUS Learning Centre website -> Statistics and Mathematics Support

  13. Questions? Questions? learningcentre-singapore@jcu.edu.au

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#