Understanding Least-Squares Regression Line in Statistics
The concept of the least-squares regression line is crucial in statistics for predicting values based on two-variable data. This regression line minimizes the sum of squared residuals, aiming to make predicted values as close as possible to actual values. By calculating the regression line using technology or summary statistics, one can make informed predictions and understand the impact of outliers on the line's accuracy.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Analyzing Two-Variable Data Lesson 2.6 The Least-Squares Regression Line Statistics and Probability with Applications, 3rdEdition Starnes & Tabor Bedford Freeman Worth Publishers
The Least-Squares Regression Line Learning Targets After this lesson, you should be able to: Calculate the equation of the least-squares regression line using technology. Calculate the equation of the least-squares regression line using summary statistics. Describe how outliers affect the least-squares regression line. Statistics and Probability with Applications, 3rdEdition 2 2
The Least-Squares Regression Line A good regression line makes the residuals as small as possible, so that the predicted values are close to the actual values. For this reason, statisticians prefer using the least-squares regression line. Least-Squares Regression Line The least-squares regression line is the line that makes the sum of the squared residuals as small as possible. Statistics and Probability with Applications, 3rdEdition 3 3
The Least-Squares Regression Line Different regression lines produce different residuals. The regression line we want is the one that minimizes the sum of the squared residuals. The least-squares regression line of y on x is the line that makes the sum of the squared residuals as small as possible. Statistics and Probability with Applications, 3rd Edition 4 4
Does taller mean faster? Does taller mean faster? Calculating the least Calculating the least- -squares PROBLEM: The table shows the height (in ft) and the maximum speed (in mph) for a random sample of seven roller coasters from the Roller Coaster Database (http://rcdb.com/). Use technology to calculate the least-squares regression line for predicting maximum speed from height. squares regression line regression line http://rcdb.com/ Maximum Speed (mph) 47.0 35.0 90.0 46.6 71.0 67.0 43.5 Height (ft) 116.5 51.8 305.0 72.2 205.0 195.0 124.6 The equation of the least-squares regression line is ? = ??.??? + ?.????. Statistics and Probability with Applications, 3rd Edition 5 5
The Least-Squares Regression Line It is possible to calculate the equation of the least-squares regression line using the means and standard deviations of each variable, along with their correlation. How to calculate the least-squares regression line using summary statistics Statistics and Probability with Applications, 3rd Edition 6 6
How expensive is a Dodge Charger? How expensive is a Dodge Charger? Calculating the least Calculating the least- -squares regression line using summary statistics squares regression line using summary statistics PROBLEM: Recall the data on the miles driven and price of used Dodge Chargers from Lesson 2.3. For these cars, the mean miles driven was 54,230 miles with a standard deviation of 33,651 miles. The mean price was $17,251 with a standard deviation of $5,710. The correlation between miles driven and price was r = 0.818. Calculate the equation of the least-squares regression line for predicting the price of a used Dodge Charger from the number of miles it has been driven. Letting x = miles driven and y = price, we calculate the slope using r = 0.818, sy= 5710, sx= 33,651: ????? = ? = ??? ?? ????? ???? = ?.??? = ?.???? Then, we find the intercept using ? = ?????, ? = ?????, and ? = ?.????: ? ????????? = ? = ? ? ? = ????? ?.???? ????? = ????? The equation of the least squares regression line is ? = ????? ?.????? Statistics and Probability with Applications, 3rd Edition 7 7
Pregnancy and poverty? Pregnancy and poverty? Describing the effect of outliers Describing the effect of outliers PROBLEM: Do states with higher incomes have more or less teen pregnancy than states with lower incomes? Data were collected on the median income and teen birth rate (per 1000 females) for all 50 states in a recent year. A scatterplot is shown below, along with the least-squares regression line. Two outliers, Vermont and Connecticut, are identified on the scatterplot. Connecticut Vermont Statistics and Probability with Applications, 3rd Edition 8 8
Pregnancy and poverty? Pregnancy and poverty? Describing the effect of outliers Describing the effect of outliers Connecticut Vermont (a) Describe the effect Vermont has on the equation of the least-squares regression line. Because this point is near ? but below the rest of the points, it pulls the line down a little, which decreases the ?intercept but doesn t change the slope much. (b) Describe the effect Connecticut has on the equation of the least-squares regression line. Because the point for this state is above the line on the right, it makes the regression line less steep (less negative with a slope closer to 0) and decreases the y intercept. Statistics and Probability with Applications, 3rd Edition 9 9
The Least-Squares Regression Line The formulas for the slope and y intercept of the least-squares regression line involve the mean and standard deviation of both variables. Because the mean and standard deviation aren t resistant to outliers, the least-squares regression line isn t resistant to outliers either. Statistics and Probability with Applications, 3rd Edition 10 10
LESSON APP 2.6 Did the Broncos buck the trend? In 2013 the Denver Broncos football team set the NFL record for the most points scored in a season. Below is a scatterplot showing the relationship between passing yards and points scored for all 32 NFL teams in 2013, along with the least-squares regression line. Statistics and Probability with Applications, 3rd Edition 11 11
LESSON APP 2.6 Did the Broncos buck the trend? 1. For passing yards, the mean is 3770 yards, with a standard deviation of 594 yards. For points scored, the mean is 375 points, with a standard deviation of 70 points. The correlation between passing yards and points scored is r = 0.616. Use this information to calculate the equation of the least-squares regression line. The point for the Denver Broncos is highlighted in red. What effect does this point have on the equation of the least-squares regression line? Explain. 2. Statistics and Probability with Applications, 3rd Edition 12 12
LESSON APP 2.6 1. For passing yards, the mean is 3770 yards, with a standard deviation of 594 yards. For points scored, the mean is 375 points, with a standard deviation of 70 points. The correlation between passing yards and points scored is r = 0.616. Use this information to calculate the equation of the least-squares regression line. Did the Broncos buck the trend? Statistics and Probability with Applications, 3rd Edition 13 13
LESSON APP 2.6 2. The point for the Denver Broncos is highlighted in red. What effect does this point have on the equation of the least-squares regression line? Explain. Did the Broncos buck the trend? Statistics and Probability with Applications, 3rd Edition 14 14
The Least-Squares Regression Line Learning Targets After this lesson, you should be able to: Calculate the equation of the least-squares regression line using technology. Calculate the equation of the least-squares regression line using summary statistics. Describe how outliers affect the least-squares regression line. Statistics and Probability with Applications, 3rd Edition 15 15