Predicting Quality of Wine Using Linear Regression Analysis

CSCI 200 DATA MINING

Introduction to Linear Regression – Predicting

Quality of Wine

Predicting Quality of Wine

•

Linear Regression is simple and powerful method

to analyze data and make predictions

•

Bordeaux is a region in France popular for

producing wine

•

There are differences in price and quality from

year to year that are sometimes very significant

•

Bordeaux wines are widely believed to taste

better when they are older.

•

There is an incentive to store young wines until

they are mature

Predicting Quality of Wine

•

The main issue: it is hard to determine the quality of the

wine when it is so young just by tasting it, since the taste

will change significantly by the time it will be consumed

•

Wine testers and experts taste the wine and then predict

which ones will be the best one latest

•

Question: can we model this process and make stronger

predictions

•

Predicting Quality of Wine

•

On March 4, 1990, the New York Times

announced that Princeton Professor of

Economics Orley Ashenfelter can predict the

quality of Bordeaux wine without tasting a single

drop.

•

Ashenfelter's predictions have nothing to do with

assessing the aroma of the wine.

•

They are the results of a mathematical model.

•

Ashenfelter used a method called linear

regression

Linear Regression

•

The methods predicts an outcome variable or

dependent variable.

•

It uses a set independent variables.

•

Dependent variable: a typical price in 1990-1991

for Bordeaux wine in an auction.

•

This approximates quality.

•

independent variables: age of the wine-- so the

older wines are more expensive--and weather-

related information

Linear Regression

•

Four independent variables:

•

The age of the wine

•

The average growing season

temperature

•

The harvest rain

•

The winter rain

Quality of Wine – Linear Regression

•

Professor Ashenfelter believed that his

predictions are more accurate than those of

the world's most influential wine critic,

Robert Parker.

•

Robert M. Parker Jr., generally regarded as

the most influential wine critic in America,

calls Professor Ashenfelter's research

''ludicrous and absurd.''

Predicting Quality of Wine - Links

•

http://www.wine-

economics.org/workingpapers/AAWE_WP0

4.pdf

•

http://www.wine-economics.org/

•

http://www.nytimes.com/1990/03/04/us/win

e-equation-puts-some-noses-out-of-

joint.html

One-Variable Linear Regression

•

This method uses one independent variable to predict the

dependent variable

•

Independent variable: average growing season

temperature (AGST)

•

The dependent variable, wine price.

•

The goal of linear regression is to create a predictive line

through the data.

•

There are many different lines that could be drawn to

predict wine price using average growing season

temperature

Simple Prediction - Average

•

The equation for this line:

•

 y = 7.07

•

This linear regression model

would predict 7.07 regardless

of the temperature.

Better Prediction

0.5*Only(AGST)-1.25

•

This linear regression model would

predict a higher price when the

temperature is higher.

General Equation

•

Y = A*X + B – the model

•

X – independent variable (in our case AGST)

•

Y- dependent variable (in our case Price)

•

Using this equation we will calculate

PREDICTION values

•

Model makes Errors

•

Y=A*X+B+E

•

Error term, E, is also often called a residual.

Y[i]=A*X[i]+B + E[i]

•

For each observation, i, we have data for the

dependent variable Yi and data for the

independent variable, Xi.

•

Using this equation we make a prediction.

•

This prediction is hopefully close to the true

outcome, Yi.

•

Since the coefficients have to be the same for all

data points, i, we often make a small error, E[i]

•

The best model (choice of A and B) has the

smallest error

SSE – Sum of Squared Errors

•

SSE for Average Line

•

SSE for  0.5*AGST-1.25

6.03251

Better Measures for Regression Quality

•

Root Means Squared Error (RMSE):

RMSE = SQRT(SSE/N) (N – is the total number of data points)

•

R squared – R

•

  compares the best model to a baseline model

•

–

•

The baseline model predicts the average value of the

dependent variable regardless of the value of the

independent variable.

•

The sum of squared errors for the baseline model is also

known as the total sum of squares, commonly referred to

as SST.

•

In our Example: SST= 10.15

•

= 1 – SSE/ SST

•

SSE>=0, SST>=0

•

SSE<=SST (Y = A*X + B, if A = 0 we get Baseline Model)

•

Linear regression model will never be worse than the

baseline model.

•

= 1 – Perfect Predictive Mode

•

= 0 – No Improvement over the baseline

•

 is unitless and universally interpretable

between problems.

•

However, it can still be hard to compare

between problems.

•

Good models for easy problems will have

an R

close to 1.

•

But good models for hard problems can still

have an R

 close to zero.

Regression Model Result

•

The line that gives the minimum sum of squared

errors is the line that regression model will find.

•

Formula for the Linear Regression Model:

•

Y = 0.63509*AGST-3.4178

•

= 0.43502

•

SSE = 5.73488

Slide Note

Embed Share

Download

Linear regression is a powerful method to analyze data and make predictions in the context of wine quality, particularly focusing on Bordeaux wines. This approach involves modeling the age of the wine, weather-related factors, and other independent variables to approximate quality and predict price trends accurately. Princeton Professor Orley Ashenfelter's use of linear regression to predict Bordeaux wine quality without tasting it has garnered attention, showcasing the effectiveness of this technique in the wine industry.

trimboli_d Follow

Uploaded on Sep 15, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

CSCI 200 DATA MINING Introduction to Linear Regression Predicting Quality of Wine

Predicting Quality of Wine Linear Regression is simple and powerful method to analyze data and make predictions Bordeaux is a region in France popular for producing wine There are differences in price and quality from year to year that are sometimes very significant Bordeaux wines are widely believed to taste better when they are older. There is an incentive to store young wines until they are mature

Predicting Quality of Wine The main issue: it is hard to determine the quality of the wine when it is so young just by tasting it, since the taste will change significantly by the time it will be consumed Wine testers and experts taste the wine and then predict which ones will be the best one latest Question: can we model this process and make stronger predictions

Predicting Quality of Wine On March 4, 1990, the New York Times announced that Princeton Professor of Economics Orley Ashenfelter can predict the quality of Bordeaux wine without tasting a single drop. Ashenfelter's predictions have nothing to do with assessing the aroma of the wine. They are the results of a mathematical model. Ashenfelter used a method called linear regression.

Linear Regression The methods predicts an outcome variable or dependent variable. It uses a set independent variables. Dependent variable: a typical price in 1990-1991 for Bordeaux wine in an auction. This approximates quality. independent variables: age of the wine-- so the older wines are more expensive--and weather- related information

Linear Regression Four independent variables: The age of the wine The average growing season temperature The harvest rain The winter rain

Quality of Wine Linear Regression Professor Ashenfelter believed that his predictions are more accurate than those of the world's most influential wine critic, Robert Parker. Robert M. Parker Jr., generally regarded as the most influential wine critic in America, calls Professor Ashenfelter's research ''ludicrous and absurd.''

Predicting Quality of Wine - Links http://www.wine- economics.org/workingpapers/AAWE_WP0 4.pdf http://www.wine-economics.org/ http://www.nytimes.com/1990/03/04/us/win e-equation-puts-some-noses-out-of- joint.html

One-Variable Linear Regression This method uses one independent variable to predict the dependent variable Independent variable: average growing season temperature (AGST) The dependent variable, wine price. The goal of linear regression is to create a predictive line through the data. There are many different lines that could be drawn to predict wine price using average growing season temperature

Simple Prediction -Average The equation for this line: y = 7.07 This linear regression model would predict 7.07 regardless of the temperature.

Better Prediction 0.5*Only(AGST)-1.25 This linear regression model would predict a higher price when the temperature is higher.

General Equation Y = A*X + B the model X independent variable (in our case AGST) Y- dependent variable (in our case Price) Using this equation we will calculate PREDICTION values Model makes Errors Y=A*X+B+E Error term, E, is also often called a residual.

Y[i]=A*X[i]+B + E[i] For each observation, i, we have data for the dependent variable Yi and data for the independent variable, Xi. Using this equation we make a prediction. This prediction is hopefully close to the true outcome, Yi. Since the coefficients have to be the same for all data points, i, we often make a small error, E[i] The best model (choice of A and B) has the smallest error

SSE Sum of Squared Errors SSE for Average Line 10.15064 SSE for 0.5*AGST-1.25 6.03251

Better Measures for Regression Quality Root Means Squared Error (RMSE): RMSE = SQRT(SSE/N) (N is the total number of data points) R squared R2 R2 compares the best model to a baseline model Baseline model is the model that does not use any variables - AVERAGE The baseline model predicts the average value of the dependent variable regardless of the value of the independent variable.

R2 The sum of squared errors for the baseline model is also known as the total sum of squares, commonly referred to as SST. In our Example: SST= 10.15 R2 = 1 SSE/ SST SSE>=0, SST>=0 SSE<=SST (Y = A*X + B, if A = 0 we get Baseline Model) Linear regression model will never be worse than the baseline model. R2 = 1 Perfect Predictive Mode R2 = 0 No Improvement over the baseline

R2 R2is unitless and universally interpretable between problems. However, it can still be hard to compare between problems. Good models for easy problems will have an R2 close to 1. But good models for hard problems can still have an R2close to zero.

Regression Model Result The line that gives the minimum sum of squared errors is the line that regression model will find. Formula for the Linear Regression Model: Y = 0.63509*AGST-3.4178 R2 = 0.43502 SSE = 5.73488

Predicting Quality of Wine Using Linear Regression Analysis

Download Presentation

Presentation Transcript

Related

More Related Content