Regression Analysis in Social Sciences

Remember to hand
in Lab Project 4
to your TA
 
Regression Example
I
n
t
r
o
d
u
c
t
i
o
n
 
t
o
 
S
t
a
t
i
s
t
i
c
s
 
f
o
r
 
t
h
e
 
S
o
c
i
a
l
 
S
c
i
e
n
c
e
s
S
B
S
2
0
0
 
-
 
L
e
c
t
u
r
e
 
S
e
c
t
i
o
n
 
0
0
1
,
 
F
a
l
l
 
2
0
1
9
S
o
c
i
a
l
 
S
c
i
e
n
c
e
s
 
R
o
o
m
 
1
0
0
1
0
:
0
0
 
-
 
1
0
:
5
0
 
M
o
n
d
a
y
s
,
 
W
e
d
n
e
s
d
a
y
s
 
&
 
F
r
i
d
a
y
s
.
December 2
 
 Schedule of readings
 
Before next exam (December 9)
OpenStax
 
Chapters 1 – 13
(Chapter 12 is emphasized)
Please read Chapters 17 and 18 Plous
 
Chapter 17:  Social Influences
 
Chapter 18: Group Judgments and Decisions
Labs Continue
this week
Regression Example
 
Rory is an owner of a small software company and employs 10
sales staff.  Rory send his staff all over the world consulting, selling
and setting up his system. He wants to evaluate his staff in terms of
who are the most (and least) productive sales people and also
whether more sales calls actually result in more systems being sold.
So, he simply measures the number of sales calls made by each
sales person and how many systems they successfully sold.
Regression Example
 
Do more sales calls result
in more sales made?
Dependent 
Variable
Independent 
Variable
 
Step 1: Draw scatterplot
 
Step 2: Estimate r
Regression Example
Do more sales calls result
in more sales made?
 
Step 3: Calculate r
 
Step 4: Is it a significant correlation?
Do more sales calls result
in more sales made?
 
Step 4: Is it a significant correlation?
  n = 10, df = 8
  alpha = .05
  Observed r is larger than critical r
  (0.71 > 0.632)
  therefore we reject the null hypothesis.
  Yes it is a significant correlation
 r (8) = 0.71; p < 0.05
Regression: Predicting sales
Step 1: Draw prediction line
What are we
predicting?
r =  0.71
b
 = 11.579 
(slope)
a
 = 20.526 
(intercept)
Draw a regression line
and regression equation
 
Step 2: State the regression equation
 
Y’ = a + bx
 
Y’ = 20.526 + 11.579x
 
Step 3: Solve for some value of Y’
 
Y’ = 20.526 + 11.579(1)
 
Y’ = 32.105
If make one
sales call
You should sell
32.105 systems
Regression: Predicting sales
Step 1: Predict sales for a
 
certain number of sales calls
What should you
expect from a
salesperson who
makes 1 calls?
They should sell 32.105 systems
If they sell more 
 over performing
If they sell fewer 
 underperforming
Step 2: State the regression equation
 
Y’ = a + bx
 
Y’ = 20.526 + 11.579x
 
Step 3: Solve for some value of Y’
 
Y’ = 20.526 + 11.579(2)
 
Y’ = 43.684
Regression: Predicting sales
Step 1: Predict sales for a
 
certain number of sales calls
What should you
expect from a
salesperson who
makes 2 calls?
If make two
sales call
You should sell
43.684 systems
They should sell 43.68 systems
If they sell more 
 over performing
If they sell fewer 
 underperforming
Step 2: State the regression equation
 
Y’ = a + bx
 
Y’ = 20.526 + 11.579x
 
Step 3: Solve for some value of Y’
 
Y’ = 20.526 + 11.579(3)
 
Y’ = 55.263
Regression: Predicting sales
Step 1: Predict sales for a
 
certain number of sales calls
What should you
expect from a
salesperson who
makes 3 calls?
If make three
sales call
You should sell
55.263 systems
They should sell 55.263 systems
If they sell more 
 over performing
If they sell fewer 
 underperforming
Step 2: State the regression equation
 
Y’ = a + bx
 
Y’ = 20.526 + 11.579x
Regression: Predicting sales
Step 1: Predict sales for a
 
certain number of sales calls
What should you
expect from a
salesperson who
makes 4 calls?
 
Step 3: Solve for some value of Y’
 
Y’ = 20.526 + 11.579(4)
 
Y’ = 66.842
If make four
sales calls
You should sell
66.84 systems
They should sell 66.84 systems
If they sell more 
 over performing
If they sell fewer 
 underperforming
Regression: Evaluating Staff
 
Step 1: Compare expected sales
 
levels to actual sales levels
What should you
expect from each
salesperson
They should sell x systems depending
on sales calls
If they sell more 
 over performing
If they sell fewer 
 underperforming
Regression: Evaluating Staff
Step 1: Compare expected sales
 
levels to actual sales levels
How did
Ava do?
Ava sold 14.7 more than expected
taking into account how many sales
calls she made
 over performing
14.7
Difference between
expected Y’ and actual Y
is called “residual”
(it’s a deviation score)
70-55.3=14.7
Regression: Evaluating Staff
Step 1: Compare expected sales
 
levels to actual sales levels
How did
Jacob do?
Jacob sold 23.684 fewer
than expected taking into account
how many sales calls he
made
 under performing
-23.7
Difference between
expected Y’ and actual Y
is called “residual”
(it’s a deviation score)
20-43.7=-23.7
Regression: Evaluating Staff
Step 1: Compare expected sales
 
levels to actual sales levels
14.7
Difference between
expected Y’ and actual Y
is called “residual”
(it’s a deviation score)
-23.7
Regression: Evaluating Staff
Step 1: Compare expected sales
 
levels to actual sales levels
What should you
expect from each
salesperson
They should sell x systems depending
on sales calls
If they sell more 
 over performing
If they sell fewer 
 underperforming
Regression: Evaluating Staff
Step 1: Compare expected sales
 
levels to actual sales levels
14.7
Difference between
expected Y’ and actual Y
is called “residual”
(it’s a deviation score)
-23.7
-6.8
7.9
Difference between
expected Y’ and actual Y
is called “residual”
(it’s a deviation score)
How do we find the average amount
of error in our prediction
The green lines show how 
much “error” there is in our 
prediction line…how much
 we are wrong in our predictions
How would we
find our “average
residual”?
 
Step 1: Find error for each value
 
(just the residuals)
 
Y – Y’
Ava is 14.7
Emily is -6.8
Madison is 7.9
Jacob is -23.7
Residual scores
 
The average amount by which actual scores
deviate on either side of the predicted score
Big
problem
 
Σ
(Y – Y’) = 0
 
2
Square the
deviations
 
Step 2: Add up the residuals
 
Σ
(Y – Y’)
Divide by df
Square root
Difference between
expected Y’ and actual Y
is called “residual”
(it’s a deviation score)
 
How do we find the average amount
of error in our prediction
The green lines show how
much “error” there is in our
prediction line…how much
 we are wrong in our predictions
How would we
find our “average
residual”?
 
Step 1: Find error for each value
 
(just the residuals)
 
Y – Y’
 
Step 2: Find average
 
∑(Y – Y’)
2
 
n - 2
 
Diallo is 0”
 
Mike is -4”
 
Hunter is -2
 
Preston is 2”
 
Deviation scores
Sound familiar??
These would be helpful
to know by heart –
please memorize
 these formula
 
Standard error
of the estimate (line)
 
=
 
  Slope doesn’t give “variability” info
  Intercept doesn’t give “variability” info
 
  Correlation “r” does give “variability” info
How well does the prediction line predict the predicted variable
when using the predictor variable?
 
  Residuals do give “variability” info
 
Standard error
of the estimate (line)
 
Standard error of the estimate:
 
  a measure of the average amount of predictive error
 
  the average amount that Y’ scores differ from Y scores
 
  a mean of the lengths of the green lines
 
  Shorter green lines suggest  better prediction – smaller error
 
  Longer green lines suggest worse prediction – larger error
 
  Why are green lines vertical?
   Remember, we are predicting the variable on the Y axis
   So, error would be how we are wrong about Y (vertical)
How well does the prediction
line predict the Ys from the Xs?
Residuals
 
A note about curvilinear
relationships and patterns
of the residuals
14.7
Difference between
expected Y’ and actual Y
is called “residual”
(it’s a deviation score)
Does the regression line perfectly predict the dependent variable?
The green lines show how
much “error” there is in our
prediction line…how much
 we are wrong in our predictions
 
No, we are wrong sometimes…
How can we estimate how much “error” we have?
-23.7
 
Perfect correlation = +1.00 or -1.00
 
Each variable perfectly
                 predicts the other
No variability in the scatterplot
The dots approximate a straight line
Is the regression line better than just guessing the mean of the Y variable?
How much does the information about the relationship actually help?
 
How 
much better 
does the
regression line predict
the observed results?
 
r
2
 
Wow!
What is r
2
?
r
2  
= The proportion of the total variance in the predicted variable
       that is explained by its relationship with the predictor variable
 
If mother’s and daughter’s heights are
correlated with an r = .8, then what amount
(proportion or percentage)
 of variance of mother’s height is accounted
for by daughter’s height?
 
Examples
 
.64 because (.8)
2
 = .64
What is r
2
?
r
2  
= The proportion of the total variance in the predicted variable
       that is explained by its relationship with the predictor variable
 
If mother’s and daughter’s heights are
correlated with an r = .8, then what
proportion of variance of mother’s height
i
s
 
n
o
t
 
a
c
c
o
u
n
t
e
d
 
f
o
r
 
b
y
 
d
a
u
g
h
t
e
r
s
 
h
e
i
g
h
t
?
 
Examples
 
.36 because (1.0 - .64) = .36
or
36% because 100% - 64% = 36%
What is r
2
?
r
2  
= The proportion of the total variance in the predicted variable
       that is explained by its relationship with the predictor variable
 
If ice cream sales and temperature are correlated with an
r = .5, then what amount 
(proportion or percentage)
 of
variance of ice cream sales is accounted for by temperature?
 
Examples
 
.25 because (.5)
2
 = 
.25
What is r
2
?
r
2  
= The proportion of the total variance in the predicted variable
       that is explained by its relationship with the predictor variable
 
If ice cream sales and temperature are correlated with an
r
 
=
 
.
5
,
 
t
h
e
n
 
w
h
a
t
 
a
m
o
u
n
t
 
(
p
r
o
p
o
r
t
i
o
n
 
o
r
 
p
e
r
c
e
n
t
a
g
e
)
 
o
f
 
v
a
r
i
a
n
c
e
 
o
f
i
c
e
 
c
r
e
a
m
 
s
a
l
e
s
 
 
i
s
 
n
o
t
 
a
c
c
o
u
n
t
e
d
 
f
o
r
 
b
y
 
t
e
m
p
e
r
a
t
u
r
e
?
 
Examples
 
.75 because (1.0 - .25) = .75
or
75% because 100% - 25% = 75%
The expected cost for dinner
as predicted by the number of people
 
If 
“Persons” 
= 4, what is the prediction for 
“Cost”
?
 
 Cost = 15.22 + 19.96 Persons
 Cost = 15.22 + 19.96 (4)
Cost = 15.22 + 79.84 = 95.06
Prediction line
Y’ = a
 
+ b
1
X
1
Y-intercept
Slope
 
If 
“Persons” 
= 1, what is the prediction for 
“Cost”
?
 
 Cost = 15.22 + 19.96 Persons
 Cost = 15.22 + 19.96 (1)
Cost = 15.22 + 19.96 = 35.18
People
Cost
If People = 4
Cost will be
about 95.06
 
 Cost = 15.22 + 19.96 Persons
Interpreting regression equation
Slide Note
Embed
Share

Explore a practical regression example involving sales productivity evaluation in a software company. Learn how to draw scatterplots, estimate correlations, and determine significant relationships between sales calls and systems sold. Discover the process of predicting sales using regression analysis techniques.

  • Regression Analysis
  • Sales Productivity
  • Social Sciences
  • Data Analysis

Uploaded on Jul 18, 2024 | 2 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Regression Example

  2. Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2019 Social Sciences Room 100 10:00 - 10:50 Mondays, Wednesdays & Fridays. December 2

  3. Schedule of readings Before next exam (December 9) OpenStax Chapters 1 13 (Chapter 12 is emphasized) Please read Chapters 17 and 18 Plous Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions

  4. Labs Continue this week

  5. Regression Example Rory is an owner of a small software company and employs 10 sales staff. Rory send his staff all over the world consulting, selling and setting up his system. He wants to evaluate his staff in terms of who are the most (and least) productive sales people and also whether more sales calls actually result in more systems being sold. So, he simply measures the number of sales calls made by each sales person and how many systems they successfully sold.

  6. Regression Example Ava 70 Emily Do more sales calls result in more sales made? Isabella 60 Emma 50 40 Step 1: Draw scatterplot Ethan 30 20 Joshua Step 2: Estimate r Jacob 10 Dependent Variable 0 0 1 2 3 4 Number of sales calls made Independent Variable

  7. Regression Example Do more sales calls result in more sales made? Step 3: Calculate r Step 4: Is it a significant correlation?

  8. Do more sales calls result in more sales made? Step 4: Is it a significant correlation? n = 10, df = 8 alpha = .05 Observed r is larger than critical r (0.71 > 0.632) therefore we reject the null hypothesis. Yes it is a significant correlation r (8) = 0.71; p < 0.05

  9. Regression: Predicting sales Step 1: Draw prediction line r = 0.71 b = 11.579 (slope) a = 20.526 (intercept) Draw a regression line and regression equation

  10. Regression: Predicting sales You should sell 32.105 systems Step 1: Predict sales for a certain number of sales calls Madison Step 2: State the regression equation Y = a + bx Y = 20.526 + 11.579x Joshua If make one sales call Step 3: Solve for some value of Y Y = 20.526 + 11.579(1) Y = 32.105 What should you expect from a salesperson who makes 1 calls? They should sell 32.105 systems If they sell more over performing If they sell fewer underperforming

  11. Regression: Predicting sales You should sell 43.684 systems Step 1: Predict sales for a certain number of sales calls Isabella Step 2: State the regression equation Y = a + bx Y = 20.526 + 11.579x Jacob If make two sales call Step 3: Solve for some value of Y Y = 20.526 + 11.579(2) Y = 43.684 What should you expect from a salesperson who makes 2 calls? They should sell 43.68 systems If they sell more over performing If they sell fewer underperforming

  12. Regression: Predicting sales You should sell 55.263 systems Ava Step 1: Predict sales for a certain number of sales calls Emma Step 2: State the regression equation Y = a + bx Y = 20.526 + 11.579x If make three sales call Step 3: Solve for some value of Y Y = 20.526 + 11.579(3) Y = 55.263 What should you expect from a salesperson who makes 3 calls? They should sell 55.263 systems If they sell more over performing If they sell fewer underperforming

  13. Regression: Predicting sales You should sell 66.84 systems Step 1: Predict sales for a certain number of sales calls Emily Step 2: State the regression equation Y = a + bx Y = 20.526 + 11.579x If make four sales calls Step 3: Solve for some value of Y Y = 20.526 + 11.579(4) Y = 66.842 What should you expect from a salesperson who makes 4 calls? They should sell 66.84 systems If they sell more over performing If they sell fewer underperforming

  14. Regression: Evaluating Staff Step 1: Compare expected sales levels to actual sales levels Ava Emma Isabella Emily Madison Joshua What should you expect from each salesperson Jacob They should sell x systems depending on sales calls If they sell more over performing If they sell fewer underperforming

  15. Regression: Evaluating Staff Step 1: Compare expected sales levels to actual sales levels 70-55.3=14.7 Difference between expected Y and actual Y is called residual (it s a deviation score) Ava 14.7 How did Ava do? Ava sold 14.7 more than expected taking into account how many sales calls she made over performing

  16. Regression: Evaluating Staff Step 1: Compare expected sales levels to actual sales levels 20-43.7=-23.7 Difference between expected Y and actual Y is called residual (it s a deviation score) Ava How did Jacob do? Jacob sold 23.684 fewer than expected taking into account how many sales calls he made under performing -23.7 Jacob

  17. Regression: Evaluating Staff Step 1: Compare expected sales levels to actual sales levels Difference between expected Y and actual Y is called residual (it s a deviation score) Ava 14.7 Emma Isabella Emily Madison -23.7 Joshua Jacob

  18. Regression: Evaluating Staff Step 1: Compare expected sales levels to actual sales levels Ava Emma Isabella Emily Madison Joshua What should you expect from each salesperson Jacob They should sell x systems depending on sales calls If they sell more over performing If they sell fewer underperforming

  19. Regression: Evaluating Staff Step 1: Compare expected sales levels to actual sales levels Difference between expected Y and actual Y is called residual (it s a deviation score) Ava 14.7 Emma Isabella -6.8 Emily Madison -23.7 7.9 Joshua Jacob

  20. Residual scores How do we find the average amount of error in our prediction Ava is 14.7 Jacob is -23.7 Emily is -6.8 Madison is 7.9 The average amount by which actual scores deviate on either side of the predicted score Step 1: Find error for each value (just the residuals) Y Y Difference between expected Y and actual Y is called residual (it s a deviation score) Step 2: Add up the residuals Big (Y Y ) = 0 problem Square the deviations 2 (Y Y ) x How would we find our average residual ? N Square root 2 (Y Y ) The green lines show how much error there is in our prediction line how much we are wrong in our predictions Divide by df n - 2

  21. How do we find the average amount of error in our prediction Deviation scores Diallo is 0 Preston is 2 Mike is -4 Hunter is -2 Step 1: Find error for each value (just the residuals) Y Y Step 2: Find average Difference between expected Y and actual Y is called residual (it s a deviation score) (Y Y )2 n - 2 How would we find our average residual ? x N The green lines show how much error there is in our prediction line how much we are wrong in our predictions

  22. Standard error of the estimate (line) = These would be helpful to know by heart please memorize these formula

  23. How well does the prediction line predict the predicted variable when using the predictor variable? Standard error of the estimate (line) Standard error of the estimate: a measure of the average amount of predictive error the average amount that Y scores differ from Y scores a mean of the lengths of the green lines Slope doesn t give variability info Intercept doesn t give variability info Correlation r does give variability info Residuals do give variability info

  24. A note about curvilinear relationships and patterns of the residuals How well does the prediction line predict the Ys from the Xs? Residuals Shorter green lines suggest better prediction smaller error Longer green lines suggest worse prediction larger error Why are green lines vertical? Remember, we are predicting the variable on the Y axis So, error would be how we are wrong about Y (vertical)

  25. Does the regression line perfectly predict the dependent variable? No, we are wrong sometimes How can we estimate how much error we have? Difference between expected Y and actual Y is called residual (it s a deviation score) 14.7 The green lines show how much error there is in our prediction line how much we are wrong in our predictions -23.7 Perfect correlation = +1.00 or -1.00 Each variable perfectly predicts the other No variability in the scatterplot The dots approximate a straight line

  26. Is the regression line better than just guessing the mean of the Y variable? How much does the information about the relationship actually help? Which minimizes error better? How much better does the regression line predict the observed results? r r2 2

  27. What is r2? r2 = The proportion of the total variance in the predicted variable that is explained by its relationship with the predictor variable Examples If mother s and daughter s heights are correlated with an r = .8, then what amount (proportion or percentage) of variance of mother s height is accounted for by daughter s height? .64 because (.8)2 = .64

  28. What is r2? r2 = The proportion of the total variance in the predicted variable that is explained by its relationship with the predictor variable Examples If mother s and daughter s heights are correlated with an r = .8, then what proportion of variance of mother s height is not accounted for by daughter s height? .36 because (1.0 - .64) = .36 or 36% because 100% - 64% = 36%

  29. What is r2? r2 = The proportion of the total variance in the predicted variable that is explained by its relationship with the predictor variable Examples If ice cream sales and temperature are correlated with an r = .5, then what amount (proportion or percentage) of variance of ice cream sales is accounted for by temperature? .25 because (.5)2 = .25

  30. What is r2? r2 = The proportion of the total variance in the predicted variable that is explained by its relationship with the predictor variable Examples If ice cream sales and temperature are correlated with an r = .5, then what amount (proportion or percentage) of variance of ice cream sales is not accounted for by temperature? .75 because (1.0 - .25) = .75 or 75% because 100% - 25% = 75%

  31. Interpreting regression equation Prediction line Y = a+ b1X1 Cost will be about 95.06 The expected cost for dinner as predicted by the number of people Cost = 15.22 + 19.96 Persons Y-intercept People If People = 4 Slope If Persons = 4, what is the prediction for Cost ? Cost = 15.22 + 19.96 Persons Cost = 15.22 + 19.96 (4) Cost = 15.22 + 79.84 = 95.06 If Persons = 1, what is the prediction for Cost ? Cost = 15.22 + 19.96 Persons Cost = 15.22 + 19.96 (1) Cost = 15.22 + 19.96 = 35.18

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#