Understanding Correlation Analysis in Statistics

 
Chapter 4
 
Class 4
 
How strong is the linear relationship between the
variables?
Correlation does not necessarily imply causality!
Coefficient of correlation, r, measures degree of
association
Values range from 
-1
 to 
+1
 
Correlation
Correlation
Correlation Coefficient
Correlation Coefficient
 
Coefficient of Determination, r
2
, measures the percent of
change in y predicted by the change in x
Values range from 
0
 to 
1
Easy to interpret
Correlation
Correlation
 
For the Nodel Construction example:
For the Nodel Construction example:
r 
r 
= .901
= .901
r
r
2
2
 = .81
 = .81
Problem 4.24
Problem 4.24
 
Howard Weiss, owner of a musical instrument distributorship, thinks that
Howard Weiss, owner of a musical instrument distributorship, thinks that
demand for bass drums 
demand for bass drums 
may be related to the 
may be related to the 
number of television
number of television
appearances by the popular group Stone Temple Pilots 
appearances by the popular group Stone Temple Pilots 
during previous
during previous
month. Weiss has collected the data shown in the following table:
month. Weiss has collected the data shown in the following table:
 
 
 
 
 
A. 
A. 
Graph
Graph
 these data to see whether a linear equations might describe the
 these data to see whether a linear equations might describe the
relationship between the group's television shows and bass drum sales.
relationship between the group's television shows and bass drum sales.
B. use the 
B. use the 
least squares regression 
least squares regression 
method to derive a forecasting equation.
method to derive a forecasting equation.
C. What is your estimate for bass drum sales if the Stone Temple Pilots
C. What is your estimate for bass drum sales if the Stone Temple Pilots
Performed on TV 
Performed on TV 
nine times last month
nine times last month
?
?
D. What are the correlation coefficient (r) and the coefficient of
D. What are the correlation coefficient (r) and the coefficient of
determination (r
determination (r
2
2
) for this model, and what do they mean?
) for this model, and what do they mean?
Problem 4.24
Problem 4.24
 
(a) Graph of demand
The observations obviously do not form a straight line but do tend to
cluster about a straight line over the range shown.
Problem 4.24
Problem 4.24
 
(b) Least-squares regression:
 
Problem 4.24
Problem 4.24
 
The following figure shows both the data and the resulting equation:
 
Problem 4.24
Problem 4.24
 
(c) If there are nine performances by Stone Temple Pilots, the estimated
sales are:
Problem 4.24
Problem 4.24
 
 (d)
 
 R = .82 is the correlation coefficient, and R
2
 = .68
 
means 68% of the
variation in sales can be explained by TV appearances.
Multiple Regression Analysis
Multiple Regression Analysis
 
If more than one independent variable is to be used
If more than one independent variable is to be used
in the model, linear regression can be extended to
in the model, linear regression can be extended to
multiple regression to accommodate several
multiple regression to accommodate several
independent variables
independent variables
 
Computationally, this is quite complex and
Computationally, this is quite complex and
generally done on the computer
generally done on the computer
Multiple Regression Analysis
Multiple Regression Analysis
 
In the Nodel example, including 
In the Nodel example, including 
interest rates 
interest rates 
in the model gives the
in the model gives the
new equation:
new equation:
 
An improved correlation coefficient of r 
An improved correlation coefficient of r 
= .96
= .96
 means this model does a better
 means this model does a better
job of predicting the change in construction sales
job of predicting the change in construction sales
 
Sales 
Sales 
= 1.80 + .30(6) - 5.0(.12) = 3.00
= 1.80 + .30(6) - 5.0(.12) = 3.00
Sales 
Sales 
= $300,000
= $300,000
Problem 4.36
Problem 4.36
 
Accountants at the firm Michael Vest, CPAs, believed that several traveling
Accountants at the firm Michael Vest, CPAs, believed that several traveling
executives were submitting unusually high travel vouchers when they
executives were submitting unusually high travel vouchers when they
returned from business trips. First, they look a sample of 200 vouchers
returned from business trips. First, they look a sample of 200 vouchers
submitted from the past year. Then they developed the following multiple-
submitted from the past year. Then they developed the following multiple-
regression equation relating expected travel cost to number of days on the
regression equation relating expected travel cost to number of days on the
road (x1) and distance traveled (x2) in miles:
road (x1) and distance traveled (x2) in miles:
 
 
y = $90.00
y = $90.00
 
 
 +
 +
 
 
 $48.50 x
 $48.50 x
1
1
 
 
 
 
+ 
+ 
 
 
$.40 x
$.40 x
2
2
 
The coefficient of correlation computed was
The coefficient of correlation computed was
 .68
 .68
 
(a) If Wanda Fennell returns from a 
(a) If Wanda Fennell returns from a 
300-mile trip 
300-mile trip 
that took her out of town
that took her out of town
for 
for 
5 days
5 days
, what is the expected amount she should claim as expenses?
, what is the expected amount she should claim as expenses?
(b) Fennell submitted a reimbursement request for 
(b) Fennell submitted a reimbursement request for 
$685
$685
. What should the
. What should the
accountant do?
accountant do?
(c) Should any other variables be included? Which ones? Why?
(c) Should any other variables be included? Which ones? Why?
Problem 4.36
Problem 4.36
 
(a)
Number of days on the road 
 
X
1
 = 5
and distance traveled 
 
X
2
 = 300
then:
 
 
 Y
 = 90 + 48.5 
 5 + 0.4 
 300 =
  
90 + 242.5 + 120 = 452.5
Therefore, the expected cost of the trip is 
$452.50
.
 
 
(b)
 The reimbursement request is much higher than predicted by the
model. This request should probably be questioned by the accountant.
Problem 4.36
Problem 4.36
 
(c)
A number of other variables should be included, such as:
 
1. the type of travel (air or car)
 
2. conference fees, if any
 
3. costs of entertaining customers
 
4. other transportation costs—cab, limousine, special tolls, or
parking
 
In addition, the correlation coefficient of 
0.68
 is not exceptionally high. It
indicates that the model explains approximately 
46% 
of the overall
variation in trip cost. This correlation coefficient would suggest that the
model is not a particularly good one.
 
Measures how well the forecast is predicting
actual values
Ratio of running sum of forecast errors (RSFE) to
mean absolute deviation (MAD)
Good tracking signal has low values
If forecasts are continually high or low, the forecast
has a bias error
Monitoring and Controlling
Monitoring and Controlling
Forecasts
Forecasts
 
Tracking Signal
Tracking Signal
Monitoring and Controlling
Monitoring and Controlling
Forecasts
Forecasts
Tracking Signal
Tracking Signal
Tracking Signal Example
Tracking Signal Example
Tracking Signal Example
Tracking Signal Example
 
The variation of the tracking signal between 
The variation of the tracking signal between 
-2.0
-2.0
 and 
 and 
+2.5
+2.5
is within acceptable limits
is within acceptable limits
Problem 4.45
Problem 4.45
 
The following are monthly actual and forecast demand levels for May
The following are monthly actual and forecast demand levels for May
through December for units of a product manufactured by the N.Tamimi
through December for units of a product manufactured by the N.Tamimi
Pharmaceutical Company
Pharmaceutical Company
 
 
 
 
 
 
 
 
 
 
 
What is the value of tracking signal as of the end of December?
What is the value of tracking signal as of the end of December?
 
Problem 4.45
Problem 4.45
Slide Note
Embed
Share

Exploring the concept of correlation in statistics: from measuring the strength of linear relationships between variables to interpreting correlation coefficients and coefficients of determination. A practical example involving bass drum sales and TV appearances by a popular group illustrates how correlation analysis can help in forecasting and decision-making.


Uploaded on Jul 11, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Chapter 4 Class 4

  2. Correlation How strong is the linear relationship between the variables? Correlation does not necessarily imply causality! Coefficient of correlation, r, measures degree of association Values range from -1 to +1

  3. Correlation Coefficient n xy - x y r = [n x2 - ( x)2][n y2 - ( y)2]

  4. y y x x (a) Perfect positive correlation: r = +1 (b) Positive correlation: 0 < r < 1 y y x x (d) Perfect negative correlation: r = -1 (c) No correlation: r = 0

  5. Correlation Coefficient of Determination, r2, measures the percent of change in y predicted by the change in x Values range from 0 to 1 Easy to interpret For the Nodel Construction example: r = .901 r2 = .81

  6. Problem 4.24 Howard Weiss, owner of a musical instrument distributorship, thinks that demand for bass drums may be related to the number of television appearances by the popular group Stone Temple Pilots during previous month. Weiss has collected the data shown in the following table: Demand for Bass Drums 3 6 7 5 10 7 number of TV appearances 3 4 7 6 8 5 A. Graph these data to see whether a linear equations might describe the relationship between the group's television shows and bass drum sales. B. use the least squares regression method to derive a forecasting equation. C. What is your estimate for bass drum sales if the Stone Temple Pilots Performed on TV nine times last month? D. What are the correlation coefficient (r) and the coefficient of determination (r2) for this model, and what do they mean?

  7. Problem 4.24 (a) Graph of demand The observations obviously do not form a straight line but do tend to cluster about a straight line over the range shown.

  8. Problem 4.24 (b) Least-squares regression:

  9. Problem 4.24 The following figure shows both the data and the resulting equation: ?

  10. Problem 4.24 (c) If there are nine performances by Stone Temple Pilots, the estimated sales are: Y =.676+1.03x Y9=.676+1.03 9=.676+9.27=9.93 drums 10 drums

  11. Problem 4.24 (d) R = .82 is the correlation coefficient, and R2 = .68means 68% of the variation in sales can be explained by TV appearances.

  12. Multiple Regression Analysis If more than one independent variable is to be used in the model, linear regression can be extended to multiple regression to accommodate several independent variables ^ y = a + b1x1 + b2x2 Computationally, this is quite complex and generally done on the computer

  13. Multiple Regression Analysis In the Nodel example, including interest rates in the model gives the new equation: ^ y = 1.80 + .30x1 - 5.0x2 An improved correlation coefficient of r = .96 means this model does a better job of predicting the change in construction sales Sales = 1.80 + .30(6) - 5.0(.12) = 3.00 Sales = $300,000

  14. Problem 4.36 Accountants at the firm Michael Vest, CPAs, believed that several traveling executives were submitting unusually high travel vouchers when they returned from business trips. First, they look a sample of 200 vouchers submitted from the past year. Then they developed the following multiple- regression equation relating expected travel cost to number of days on the road (x1) and distance traveled (x2) in miles: y = $90.00 + $48.50 x1 + $.40 x2 The coefficient of correlation computed was .68 (a) If Wanda Fennell returns from a 300-mile trip that took her out of town for 5 days, what is the expected amount she should claim as expenses? (b) Fennell submitted a reimbursement request for $685. What should the accountant do? (c) Should any other variables be included? Which ones? Why?

  15. Problem 4.36 (a) Number of days on the road X1 = 5 and distance traveled X2 = 300 then: Y = 90 + 48.5 5 + 0.4 300 = 90 + 242.5 + 120 = 452.5 Therefore, the expected cost of the trip is $452.50. (b) The reimbursement request is much higher than predicted by the model. This request should probably be questioned by the accountant.

  16. Problem 4.36 (c) A number of other variables should be included, such as: 1. the type of travel (air or car) 2. conference fees, if any 3. costs of entertaining customers 4. other transportation costs cab, limousine, special tolls, or parking In addition, the correlation coefficient of 0.68 is not exceptionally high. It indicates that the model explains approximately 46% of the overall variation in trip cost. This correlation coefficient would suggest that the model is not a particularly good one.

  17. Monitoring and Controlling Forecasts Tracking Signal Measures how well the forecast is predicting actual values Ratio of running sum of forecast errors (RSFE) to mean absolute deviation (MAD) Good tracking signal has low values If forecasts are continually high or low, the forecast has a bias error

  18. Monitoring and Controlling Forecasts Tracking signal RSFE MAD = (actual demand in period i - forecast demand in period i) ( |actual - forecast|/n) Tracking signal =

  19. Tracking Signal Signal exceeding limit Tracking signal Upper control limit + Acceptable range 0 MADs Lower control limit Time

  20. Tracking Signal Example Cumulative Absolute Forecast Error Qtr Absolute Forecast Error Actual Demand Forecast Demand Error RSFE MAD 1 2 3 4 5 6 90 95 115 100 125 140 100 100 100 110 110 110 -10 -5 +15 -10 +15 +30 -10 -15 10 5 15 10 15 30 10 15 30 40 55 85 10.0 7.5 10.0 10.0 11.0 14.2 0 -10 +5 +35

  21. Tracking Signal Example Cumulative Absolute Forecast Error Tracking Signal Qtr Absolute Forecast Error Actual Demand Forecast Demand (RSFE/MAD) Error RSFE MAD 1 2 3 4 5 6 90 95 115 100 125 140 -10/10 = -1 -15/7.5 = -2 0/10 = 0 -10/10 = -1 +5/11 = +0.5 +35/14.2 = +2.5 100 100 100 110 110 110 -10 -5 +15 -10 +15 +30 -10 -15 10 5 15 10 15 30 10 15 30 40 55 85 10.0 7.5 10.0 10.0 11.0 14.2 0 -10 +5 +35 The variation of the tracking signal between -2.0 and +2.5 is within acceptable limits

  22. Problem 4.45 The following are monthly actual and forecast demand levels for May through December for units of a product manufactured by the N.Tamimi Pharmaceutical Company What is the value of tracking signal as of the end of December?

  23. Problem 4.45 n (At-Ft) t=1 Tracking signal = MAD So: MAD: 87 =10.875 8 39 10.875=3.586

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#