Predicting Quality of Wine Using Linear Regression Analysis

 
CSCI 200 DATA MINING
 
Introduction to Linear Regression – Predicting
Quality of Wine
 
Predicting Quality of Wine
 
Linear Regression is simple and powerful method
to analyze data and make predictions
Bordeaux is a region in France popular for
producing wine
There are differences in price and quality from
year to year that are sometimes very significant
Bordeaux wines are widely believed to taste
better when they are older.
There is an incentive to store young wines until
they are mature
 
Predicting Quality of Wine
 
The main issue: it is hard to determine the quality of the
wine when it is so young just by tasting it, since the taste
will change significantly by the time it will be consumed
Wine testers and experts taste the wine and then predict
which ones will be the best one latest
Question: can we model this process and make stronger
predictions
 
Predicting Quality of Wine
 
On March 4, 1990, the New York Times
announced that Princeton Professor of
Economics Orley Ashenfelter can predict the
quality of Bordeaux wine without tasting a single
drop.
Ashenfelter's predictions have nothing to do with
assessing the aroma of the wine.
They are the results of a mathematical model.
Ashenfelter used a method called linear
regression
.
 
Linear Regression
 
The methods predicts an outcome variable or
dependent variable.
It uses a set independent variables.
Dependent variable: a typical price in 1990-1991
for Bordeaux wine in an auction.
This approximates quality.
independent variables: age of the wine-- so the
older wines are more expensive--and weather-
related information
 
Linear Regression
 
Four independent variables:
The age of the wine
The average growing season
temperature
The harvest rain
The winter rain
 
Quality of Wine – Linear Regression
 
Professor Ashenfelter believed that his
predictions are more accurate than those of
the world's most influential wine critic,
Robert Parker.
Robert M. Parker Jr., generally regarded as
the most influential wine critic in America,
calls Professor Ashenfelter's research
''ludicrous and absurd.''
 
Predicting Quality of Wine - Links
 
http://www.wine-
economics.org/workingpapers/AAWE_WP0
4.pdf
http://www.wine-economics.org/
http://www.nytimes.com/1990/03/04/us/win
e-equation-puts-some-noses-out-of-
joint.html
 
 
One-Variable Linear Regression
 
This method uses one independent variable to predict the
dependent variable
Independent variable: average growing season
temperature (AGST)
The dependent variable, wine price.
The goal of linear regression is to create a predictive line
through the data.
There are many different lines that could be drawn to
predict wine price using average growing season
temperature
 
Simple Prediction - Average
 
The equation for this line:
 y = 7.07
This linear regression model
would predict 7.07 regardless
of the temperature.
 
Better Prediction
 
0.5*Only(AGST)-1.25
This linear regression model would
predict a higher price when the
temperature is higher.
 
General Equation
 
Y = A*X + B – the model
X – independent variable (in our case AGST)
Y- dependent variable (in our case Price)
Using this equation we will calculate
PREDICTION values
Model makes Errors
Y=A*X+B+E
Error term, E, is also often called a residual.
 
Y[i]=A*X[i]+B + E[i]
 
For each observation, i, we have data for the
dependent variable Yi and data for the
independent variable, Xi.
Using this equation we make a prediction.
This prediction is hopefully close to the true
outcome, Yi.
Since the coefficients have to be the same for all
data points, i, we often make a small error, E[i]
The best model (choice of A and B) has the
smallest error
 
SSE – Sum of Squared Errors
 
SSE for Average Line
1
0
.
1
5
0
6
4
SSE for  0.5*AGST-1.25
6.03251
 
Better Measures for Regression Quality
 
Root Means Squared Error (RMSE):
RMSE = SQRT(SSE/N) (N – is the total number of data points)
R squared – R
2
R
2 
  compares the best model to a baseline model
B
a
s
e
l
i
n
e
 
m
o
d
e
l
 
 
i
s
 
t
h
e
 
m
o
d
e
l
 
t
h
a
t
 
d
o
e
s
 
n
o
t
 
u
s
e
 
a
n
y
v
a
r
i
a
b
l
e
s
 
-
 
A
V
E
R
A
G
E
The baseline model predicts the average value of the
dependent variable regardless of the value of the
independent variable.
 
R
2
 
The sum of squared errors for the baseline model is also
known as the total sum of squares, commonly referred to
as SST.
In our Example: SST= 10.15
R
2   
= 1 – SSE/ SST
SSE>=0, SST>=0
SSE<=SST (Y = A*X + B, if A = 0 we get Baseline Model)
Linear regression model will never be worse than the
baseline model.
R
2   
= 1 – Perfect Predictive Mode
R
2   
= 0 – No Improvement over the baseline
 
 
R
2
 
R
2
 is unitless and universally interpretable
between problems.
However, it can still be hard to compare
between problems.
Good models for easy problems will have
an R
2 
close to 1.
But good models for hard problems can still
have an R
2
 close to zero.
 
Regression Model Result
 
The line that gives the minimum sum of squared
errors is the line that regression model will find.
Formula for the Linear Regression Model:
Y = 0.63509*AGST-3.4178
R
2   
= 0.43502
SSE = 5.73488
Slide Note
Embed
Share

Linear regression is a powerful method to analyze data and make predictions in the context of wine quality, particularly focusing on Bordeaux wines. This approach involves modeling the age of the wine, weather-related factors, and other independent variables to approximate quality and predict price trends accurately. Princeton Professor Orley Ashenfelter's use of linear regression to predict Bordeaux wine quality without tasting it has garnered attention, showcasing the effectiveness of this technique in the wine industry.

  • Linear Regression
  • Wine Quality
  • Bordeaux Wines
  • Predictive Modeling
  • Data Analysis

Uploaded on Sep 15, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. CSCI 200 DATA MINING Introduction to Linear Regression Predicting Quality of Wine

  2. Predicting Quality of Wine Linear Regression is simple and powerful method to analyze data and make predictions Bordeaux is a region in France popular for producing wine There are differences in price and quality from year to year that are sometimes very significant Bordeaux wines are widely believed to taste better when they are older. There is an incentive to store young wines until they are mature

  3. Predicting Quality of Wine The main issue: it is hard to determine the quality of the wine when it is so young just by tasting it, since the taste will change significantly by the time it will be consumed Wine testers and experts taste the wine and then predict which ones will be the best one latest Question: can we model this process and make stronger predictions

  4. Predicting Quality of Wine On March 4, 1990, the New York Times announced that Princeton Professor of Economics Orley Ashenfelter can predict the quality of Bordeaux wine without tasting a single drop. Ashenfelter's predictions have nothing to do with assessing the aroma of the wine. They are the results of a mathematical model. Ashenfelter used a method called linear regression.

  5. Linear Regression The methods predicts an outcome variable or dependent variable. It uses a set independent variables. Dependent variable: a typical price in 1990-1991 for Bordeaux wine in an auction. This approximates quality. independent variables: age of the wine-- so the older wines are more expensive--and weather- related information

  6. Linear Regression Four independent variables: The age of the wine The average growing season temperature The harvest rain The winter rain

  7. Quality of Wine Linear Regression Professor Ashenfelter believed that his predictions are more accurate than those of the world's most influential wine critic, Robert Parker. Robert M. Parker Jr., generally regarded as the most influential wine critic in America, calls Professor Ashenfelter's research ''ludicrous and absurd.''

  8. Predicting Quality of Wine - Links http://www.wine- economics.org/workingpapers/AAWE_WP0 4.pdf http://www.wine-economics.org/ http://www.nytimes.com/1990/03/04/us/win e-equation-puts-some-noses-out-of- joint.html

  9. One-Variable Linear Regression This method uses one independent variable to predict the dependent variable Independent variable: average growing season temperature (AGST) The dependent variable, wine price. The goal of linear regression is to create a predictive line through the data. There are many different lines that could be drawn to predict wine price using average growing season temperature

  10. Simple Prediction -Average The equation for this line: y = 7.07 This linear regression model would predict 7.07 regardless of the temperature.

  11. Better Prediction 0.5*Only(AGST)-1.25 This linear regression model would predict a higher price when the temperature is higher.

  12. General Equation Y = A*X + B the model X independent variable (in our case AGST) Y- dependent variable (in our case Price) Using this equation we will calculate PREDICTION values Model makes Errors Y=A*X+B+E Error term, E, is also often called a residual.

  13. Y[i]=A*X[i]+B + E[i] For each observation, i, we have data for the dependent variable Yi and data for the independent variable, Xi. Using this equation we make a prediction. This prediction is hopefully close to the true outcome, Yi. Since the coefficients have to be the same for all data points, i, we often make a small error, E[i] The best model (choice of A and B) has the smallest error

  14. SSE Sum of Squared Errors SSE for Average Line 10.15064 SSE for 0.5*AGST-1.25 6.03251

  15. Better Measures for Regression Quality Root Means Squared Error (RMSE): RMSE = SQRT(SSE/N) (N is the total number of data points) R squared R2 R2 compares the best model to a baseline model Baseline model is the model that does not use any variables - AVERAGE The baseline model predicts the average value of the dependent variable regardless of the value of the independent variable.

  16. R2 The sum of squared errors for the baseline model is also known as the total sum of squares, commonly referred to as SST. In our Example: SST= 10.15 R2 = 1 SSE/ SST SSE>=0, SST>=0 SSE<=SST (Y = A*X + B, if A = 0 we get Baseline Model) Linear regression model will never be worse than the baseline model. R2 = 1 Perfect Predictive Mode R2 = 0 No Improvement over the baseline

  17. R2 R2is unitless and universally interpretable between problems. However, it can still be hard to compare between problems. Good models for easy problems will have an R2 close to 1. But good models for hard problems can still have an R2close to zero.

  18. Regression Model Result The line that gives the minimum sum of squared errors is the line that regression model will find. Formula for the Linear Regression Model: Y = 0.63509*AGST-3.4178 R2 = 0.43502 SSE = 5.73488

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#