Understanding Correlated Errors in Data Analysis
Explore the impact of correlated errors in data analysis, learn how to identify and address them, and discover solutions such as using Newey-West estimator or adding lagged variables to the model. See examples like Coca-Cola stock prices and understand how correlated errors can affect model fitting. Gain insights into the problem both conceptually and mathematically, and discover testing methods to detect autocorrelated errors.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Chapter 9 Correlated Errors
Learning Objectives Demonstrate the problem of correlated errors and its implications Conduct and interpret tests for correlated errors Correct for correlated errors using Newey and West s estimator (ex post) or using generalized least squares (ex ante) Correct for correlated errors by adding lagged variables to the model Show that correlated errors can arise in clustered and spatial data as well as in time-series data
Autocorrelated Errors Coca Cola Stock Price 50 40 30 20 10 0 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 -10 -20
Logs Make the Model Fit Better Log of Coca Cola Stock Price 5 4 3 2 1 0 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 -1 -2 -3 -4 -5
Autocorrelated Errors Log of Coca Cola Stock Price: Deviations from Trend 2 1.5 1 0.5 0 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 -0.5 -1 -1.5
The Problem in Words You think you have more information in your data than you really do. It can be the opposite you think you have less info than you really do, but this is rare OLS estimates unbiased, but not BLUE Causes standard errors to be underestimated Examples 1. Stock prices over time 2. Consumption over time 3. Income across space
The Problem Mathematically = = + + + + Y X u X 0 1 1 2 2 t t t t 1 t t t If > 0, then some of last period s error remains in this period s error we have less new information each period than the standard error formula assumes. If < 0, then we have more information than the s.e. formula assumes this is rare!
Solutions 1. Test and fix after-the-fact (ex-post) 2. Change the model to eliminate the correlated errors i. Generalized least squares (ex-ante correction) ii. Change the model by adding lagged variables (best approach)
Testing for Autocorrelation Autocorrelation Model = = + + + + Y X u X 0 1 1 2 2 t t t t 1 t t t Breusch-Godfrey test 1. Estimate regression = + + + Y b b X b X e 0 1 1 2 2 t t t t 2. Auxiliary regression of residuals on lag = + t t e e + + + X X u 1 0 1 1 2 2 t t t
Breusch-Godfrey Test Auxiliary regression of residuals on lag = + + + + e e X X u 1 0 1 1 2 2 t t t t t = 2 Test statistic: Critical value: ( 1) BG T R 2 (1) (e.g., at 5% significance, c.v. = 3.84) Can add more lags to auxiliary regression = + + + + + + + ... e e e m t m e X X u 1 1 2 2 0 1 1 2 2 t t t t t t m = 2 ( ) 2 ( ) BG T m R critical value:
Example: US Consumption vs Income 12000 10000 8000 $ per quarter 6000 4000 2000 0 1947 1957 1967 1977 1987 1997 2007 Year Real personal consumption expenditures per capita Real disposable personal income per capita
Taking Logs Straightens the Trend 9.5 natural log of $ per quarter 9 8.5 8 7.5 7 1947 1957 1967 1977 1987 1997 2007 Year Log real personal consumption expenditures per capita Log real disposable personal income per capita
Regression Residuals ln( ) 0.38 1.03ln( = + t t e cons ) income t 0.1 0.08 0.06 0.04 0.02 0 1947 1957 1967 1977 1987 1997 2007 -0.02 -0.04 -0.06 -0.08 Year There appears to be strong autocorrelation
Breusch-Godfrey Test Estimated Coefficient Standard Error t- Variable statistic e(t-1) 0.885 0.001 -0.007 0.027 0.001 0.012 32.70 0.56 -0.58 Ln(Income) Constant Sample Size 274 R-squared 0.80 (T-1)*R-squared 217.80 Critical 2(1)1 df, 5% 3.84 We reject the null hypothesis at the 5% level we have autocorrelation
Newey-West Correction for Standard Errors N = 2 i 2 [ ] V b w s If CR2 and CR3 Hold: 1 = 1 i N = 2 2 i [ ] V b w e If CR2 fails (White s Method) 1 i = 1 i If CR2 and CR3 fail (Newey-West) T L j t j = 1 2 + 2 2 t [ ] 1 V b w e r 1 t j L = = 1 1
Changing the Model: GLS Autocorrelation Model = = + + + Y X u 0 1 t t t 1 t t t The error utsatisfies CR2 and CR3 = + X = + + + + Y X u 0 1 1 t t t Y t ( ( ) ) 1 + X u 0 1 1 0 1 1 t t t t = + + (1 ) Y Y X X u 1 0 1 t t t t t We have a new model Y = + + * * 0 * t X u 1 t t
Changing the Model: GLS We have a new model = Y + + * * 0 * t Y X u 1 t t * t = = * X X X Y Y where and 1 t t 1 t t t But we don t know . Solution: Feasible GLS = + + Y b b X e 1. OLS regression 0 1 t t t T T = t t 2 t r ee e 2. Error autocorrelation 1 1 t t = = 2 2 = = * * t , Y Y rY X X rX 3. Transform variables 1 1 t t t t t = + + * * t GLS GLS GLS t e Y b b X 4. OLS regression 0 1 t
OLS with Newey-West vs FGLS OLS (Newey West std. error with L=40) FGLS Variable Estimated Coefficient Standard Error Estimated Coefficient Standard Error Income 1.030 0.015 1.012 0.013 Constant -0.379 0.132 -0.026 0.012 Sample Size 275 274 r 0.88
Changing the Model: Distributed Lags Autocorrelation Model = = + + + Y X u 0 1 t t t 1 t t t The error utsatisfies CR2 and CR3 = = = + + + + + Y + Y X X u 0 1 1 t t t Y t ( ) X + X u + 0 1 1 0 1 1 t t t X t + (1 ) u 0 1 1 1 1 t t t t We have a new model * 0 = + t Y + + + * 1 * 2 * 3 Y X X u 1 1 t t t t
Distributed Lag Model = + + + + * 0 * 1 * 2 * 3 Y Y X X u 1 1 t t t t t Interpreting the Model: Two Tricks 1. Drop the time subscripts to get long-run = + + + * 0 * 1 * 2 + * 3 Y Y X X * 0 * 2 * 3 * = + Y X 1 * 1 1 1
Distributed Lag Model = + + + + * 0 * 1 * 2 * 3 Y Y X X u 1 1 t t t t t Interpreting the Model: Two Tricks 1. Drop the time subscripts to get long-run * 0 1 * + * 2 * 3 * = + Y X 1 1 1 2. Derive the error correction model = + + + + + * 0 * 1 * 2 * 3 * 3 * 3 Y Y Y X X u Y X X 1 1 1 1 t t t t t t t t t + * 0 * 2 * 3 ( ) = + * 1 * 3 (1 ) Y X X X u 1 1 t t t t t * 1 * 1 1 1
Distributed Lag Estimates Estimated Coefficient Standard Error Variable t-statistic Consumption(t-1) 0.927 0.297 -0.222 -0.021 274 0.9997 0.020 0.046 0.048 0.012 47.20 6.44 -4.66 -1.75 Income Income(t-1) Constant Sample Size R-squared
Distributed Lag Model 0.021 0.927 = + + + 0.297 0.222 Y Y X X u 1 1 t t t t t Interpreting the Model: Two Tricks 1. Drop the time subscripts to get long-run 0.29 1.03 = + Y X 2. Derive the error correction model ( 1 1 0.073 = + t t t Y Y Y ) ( ) 0.29 1.03 + + 0.222 X X X u 1 t t t t
Correlated Errors Across Space Famous study by Brent Moulton in 1990 Data: (i) Wages for 18,946 workers in the US (ii) 14 garbage variables Moulton regressed wages on the 14 garbage variables plus education and work experience 6 of the 14 garbage variables were significant WHY? Spatial correlation in errors made standard errors 3-5 times too small
Clustering Standard Errors Correlation over time: Newey-West T L j t j = 1 2 + 2 2 t [ ] 1 V b w e r 1 t j L = = 1 1 Correlation over space: Clustering N N = i [ ] 1( , in same cluster) i j V b wew e 1 i i j j = = 1 1 j
What We Learned Correlated errors cause OLS to lose its best property and the estimated standard errors to be biased. Same as heteroskedasticity As long as the autocorrelation is not too strong, the standard error bias can be corrected with Newey and West s heteroskedasticity and autocorrelation consistent estimator. Getting the model right by adding lagged variables to the model is usually the best approach to deal with autocorrelation in time-series data.