Analysis of Quantile Regression on LPGA Prize Winnings for 2009/2010 Seasons

Slide Note
Embed
Share

This analysis focuses on using Quantile Regression to study professional female golfers' prize earnings in the Ladies Professional Golf Association (LPGA) during the 2009 and 2010 seasons. The study investigates how various factors like average driving distance, fairway accuracy, greens in regulation, putts per hole on greens, and sand saves impact earnings per event. The data exhibits high skewness, indicating non-normality. Multiple linear regression results are also provided, highlighting the predictive power of the variables studied.


Uploaded on Oct 07, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Quantile Regression Prize Winnings LPGA 2009/2010 Seasons www.lpga.com Kahane, L.H. (2010). Returns to Skill in Professional Golf: A Quantile Regression Approach, International Journal of Sport Finance, Vol. 5, pp. 167-170 Cameron, A.C., and P.K. Trivedi (2010). Microeconometrics Using Stata, Revised Edition, STATA Press, College Station, TX.

  2. Data Description Ladies Professional Golf Association (LPGA) participants during 2009 and 2010 seasons Response Variable: Earnings per Event entered ($1000s) Predictor Variables: Average Driving Distance Percent of Fairways reached on Drives Percent of Greens reached in Regulation Putts per Hole on Greens reached in Regulation Percent of Sand Saves (2 shots to hole)

  3. Quantile Regression Linear Regression is used to relate the Conditional Mean to predictors. Quantile Regression relates specific quantiles to predictors. Particularly useful with non-normal data Makes use of different loss function than Ordinary Least Squares Uses linear programming to estimate Cumulative Distribution Function (CDF): ( ) = Pr F y Y y ( ) F y = = 1 th Quantile: ( ) q q q y F q q th Loss Function to b e minimized for Quantile: q N N ( ) ( ) = + x ' 1 Q q y q y x ' i q i q q i i : ' : ' i y x i y x i i i i

  4. Summary Data for Earnings/Event earnevent ------------------------------------------------------------- Percentiles Smallest 1% .1654375 0 5% .5611538 0 10% 1.200667 .1654375 Obs 289 25% 2.991929 .2545882 Sum of Wgt. 289 50% 6.653733 Mean 13.44039 Largest Std. Dev. 17.40237 75% 15.45304 81.35504 90% 34.815 81.95658 Variance 302.8425 95% 54.85067 82.81731 Skewness 2.325991 99% 81.95658 99.06261 Kurtosis 8.461489 Note: The data are highly skewed: Mean > 2*Median Std. Dev. > Mean

  5. Plots of Earnings per Event Showing Skew

  6. Multiple Linear Regression Source | SS df MS Number of obs = 289 -------------+------------------------------ F( 5, 283) = 68.95 Model | 47899.6433 5 9579.92865 Prob > F = 0.0000 Residual | 39318.9937 283 138.936374 R-squared = 0.5492 -------------+------------------------------ Adj R-squared = 0.5412 Total | 87218.637 288 302.84249 Root MSE = 11.787 ------------------------------------------------------------------------------ earnevent | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- drive | .0749854 .112027 0.67 0.504 -.1455266 .2954974 fairway | .0765432 .1433219 0.53 0.594 -.2055689 .3586554 green | 1.515417 .2260856 6.70 0.000 1.070394 1.96044 girputtshole | -155.7758 19.96318 -7.80 0.000 -195.0709 -116.4806 sandsvpct | .4146478 .0919212 4.51 0.000 .2337117 .5955839 _cons | 160.2711 49.32258 3.25 0.001 63.1854 257.3567 ------------------------------------------------------------------------------ The model explains approximately 55% of the variation in earnings per event Important Factors: Greens in Regulation (+), Putts per hole (-), Sand Save Percent (+)

  7. Influential Observations wrt s 2 2 289 = = Influential Observations: 0.1176 DFBETAS ( ) j i n drive Obsnum _dfbeta_1 Obsnum _dfbeta_2 Obsnum _dfbeta_3 Obsnum _dfbeta_4 Obsnum _dfbeta_5 30 0.3509 30 0.3433 30 166 -0.2432 125 0.2899 104 249 -0.3423 249 -0.4798 166 268 -0.3883 238 249 256 fairway green girputtshole sandsvpct -0.3225 -0.2795 0.2661 -0.2572 -0.2418 0.4677 30 106 125 177 211 238 268 0.6184 -0.2525 -0.2833 0.2556 -0.4088 -0.5183 -0.2856 30 104 163 211 268 0.3295 -0.3027 0.5885 -0.2842 0.7242 These cases are extremely influential (higher than twice the rule of thumb ). Golfers: 30 (Michelle Ellis, 2009), 104 (Liselotte Neuman, 2009), 125 (Jiyai Shin, 2009), 166 (Paula Creamer, 2010), 211 (Cristie Kerr, 2010), 249 (Angela Park, 2010), and 268 (Jiyai Shin, 2010) appear to have high influence on several regression coefficients

  8. Quantile Regression Models the regression relation for various quantiles between the predictors and the response variable: Earnings per Event = + ' i x Y q i iq = ' i x 1 drive fairway green putts sandsav i i i i i th Loss Function to b e minimized for Quantile: q N N ( ) ( ) = + x ' 1 Q q y q y x ' i q i q q i i : ' : ' i y x i y x i i i i Standard errors of regression coefficients are estimated by bootstrapping 400 samples

  9. Quantile Regression Output (STATA) q25 drive fairway green girputtshole sandsvpct _cons Coef. -0.0254 -0.0073 0.8419 -72.3041 0.1215 86.0426 Std. Err. 0.0539 0.0665 0.1492 9.3716 0.0435 18.9736 t P>|t| [95% Conf.Interval] -0.1314 -0.1382 0.5483 0.0000 -90.7510 -53.8571 0.0060 0.0360 0.0000 48.6952 123.3899 -0.4700 -0.1100 5.6400 -7.7200 2.8000 4.5300 0.6380 0.9120 0.0000 0.0806 0.1235 1.1355 Note: 1) Driving distance is only significant among golfers at the 75th percentile 2) Putting ability effect increases among skill levels 3) Greens in regulation effect is fairly equal among skill levels 4) Fairway accuracy is not significant for any skill level 5) Sand saves are more important for golfers at the 75th percentile 0.2071 q50 drive fairway green girputtshole sandsvpct _cons Coef. Std. Err. 0.0786 0.0973 0.1832 17.1245 0.0777 43.1192 t P>|t| [95% Conf.Interval] -0.1018 -0.1440 0.6006 0.0000 -129.0947 -61.6795 0.0660 -0.0098 0.0210 14.9096 184.6598 0.0528 0.0476 0.9611 -95.3871 0.1432 99.7847 0.6700 0.4900 5.2500 -5.5700 1.8400 2.3100 0.5020 0.6250 0.0000 0.2074 0.2392 1.3217 0.2962 q75 drive fairway green girputtshole -127.4508 sandsvpct _cons Coef. Std. Err. 0.1399 0.1692 0.3229 37.0845 0.1288 86.7001 t P>|t| [95% Conf.Interval] 0.0377 -0.2475 0.4869 0.0010 -200.4471 -54.4544 0.0000 0.2014 0.3810 -94.6097 246.7080 0.3131 0.0857 1.1226 2.2400 0.5100 3.4800 -3.4400 3.5300 0.8800 0.0260 0.6130 0.0010 0.5886 0.4188 1.7583 0.4549 76.0492 0.7084

  10. Tests of Equality of Coefficients Across Quantiles . test[q25=q50=q75]: drive ( 1) [q25]drive - [q50]drive = 0 ( 2) [q25]drive - [q75]drive = 0 F( 2, 283) = 3.25 Prob > F = 0.0403 . test[q25=q50=q75]: fairway ( 1) [q25]fairway - [q50]fairway = 0 ( 2) [q25]fairway - [q75]fairway = 0 F( 2, 283) = 0.27 Prob > F = 0.7603 . test[q25=q50=q75]: green ( 1) [q25]green - [q50]green = 0 ( 2) [q25]green - [q75]green = 0 F( 2, 283) = 0.58 Prob > F = 0.5600 . test[q25=q50=q75]: girputtshole ( 1) [q25]putts - [q50]putts = 0 ( 2) [q25] putts - [q75] putts = 0 F( 2, 283) = 1.62 Prob > F = 0.1989 . test[q25=q50=q75]: sandsvpct ( 1) [q25]sandsvpct - [q50]sandsvpct = 0 ( 2) [q25]sandsvpct - [q75]sandsvpct = 0 F( 2, 283) = 5.35 Prob > F = 0.0053

  11. Plots of Regression Coefficients by Quantile

Related


More Related Content