Analysis of Quantile Regression on LPGA Prize Winnings for 2009/2010 Seasons

Quantile Regression
Prize Winnings – LPGA 2009/2010 Seasons
www.lpga.com
Kahane, L.H. (2010). “Returns to Skill in Professional Golf: A Quantile Regression Approach,”
International Journal of Sport Finance
, Vol. 5, pp. 167-170
Cameron, A.C., and P.K. Trivedi (2010). 
Microeconometrics Using Stata
, Revised Edition, STATA Press,
College Station, TX.
Data Description
Ladies Professional Golf Association (LPGA) participants
during 2009 and 2010 seasons
Response Variable: Earnings per Event entered
($1000s)
Predictor Variables:
Average Driving Distance
Percent of Fairways reached on Drives
Percent of Greens reached in Regulation
Putts per Hole on Greens reached in Regulation
Percent of Sand Saves (2 shots to hole)
Quantile Regression
Linear Regression is used to relate the Conditional
Mean to predictors.
Quantile Regression relates specific quantiles to
predictors. Particularly useful with non-normal data
Makes use of different loss function than Ordinary
Least Squares – Uses linear programming to estimate
Summary Data for Earnings/Event
 
 earnevent
-------------------------------------------------------------
      Percentiles      Smallest
 1%     .1654375              0
 5%     .5611538              0
10%     1.200667       .1654375       Obs                 289
25%     2.991929       .2545882       Sum of Wgt.         289
50%     6.653733                      Mean           13.44039
                        Largest       Std. Dev.      17.40237
75%     15.45304       81.35504
90%       34.815       81.95658       Variance       302.8425
95%     54.85067       82.81731       Skewness       2.325991
99%     81.95658       99.06261       Kurtosis       8.461489
Note: The data are highly skewed:
Mean > 2*Median
Std. Dev. > Mean
Plots of Earnings per Event – Showing Skew
Multiple Linear Regression
      Source |       SS       df       MS         Number of obs =     289
-------------+------------------------------      F(  5,   283) =   68.95
       Model |  47899.6433     5  9579.92865      Prob > F      =  0.0000
    Residual |  39318.9937   283  138.936374      R-squared     =  0.5492
-------------+------------------------------      Adj R-squared =  0.5412
       Total |   87218.637   288   302.84249      Root MSE      =  11.787
------------------------------------------------------------------------------
   earnevent |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       drive |   .0749854    .112027     0.67   0.504    -.1455266    .2954974
     fairway |   .0765432   .1433219     0.53   0.594    -.2055689    .3586554
       green |   1.515417   .2260856     6.70   0.000     1.070394     1.96044
girputtshole |  -155.7758   19.96318    -7.80   0.000    -195.0709   -116.4806
   sandsvpct |   .4146478   .0919212     4.51   0.000     .2337117    .5955839
       _cons |   160.2711   49.32258     3.25   0.001      63.1854    257.3567
------------------------------------------------------------------------------
The model explains approximately 55% of the variation in earnings per event
Important Factors: Greens in Regulation (+), Putts per hole (-),  Sand Save Percent (+)
Influential Observations wrt 
s
These cases are extremely influential (higher than twice the “rule of thumb”).
Golfers: 30 (Michelle Ellis, 2009), 104 (Liselotte Neuman, 2009), 125 (Jiyai Shin, 2009),
166 (Paula Creamer, 2010), 211 (Cristie Kerr, 2010), 249 (Angela Park, 2010), and 268
(Jiyai Shin, 2010) appear to have high influence on several regression coefficients
Quantile Regression
Models the regression relation for various quantiles between the
predictors and the response variable: Earnings per Event
Standard errors of regression coefficients are estimated by bootstrapping 400 samples
Quantile Regression Output (STATA)
Note:
1)
Driving distance is
only significant
among golfers at
the 75
th
 percentile
2)
Putting ability effect
increases among
skill levels
3)
Greens in regulation
effect is fairly equal
among skill levels
4)
Fairway accuracy is
not significant for
any skill level
5)
Sand saves are more
important for
golfers at the 75
th
percentile
Tests of Equality of Coefficients Across Quantiles
. test[q25=q50=q75]: drive
( 1)  [q25]drive - [q50]drive = 0     ( 2)  [q25]drive - [q75]drive = 0
F(  2,   283) =    3.25                  Prob > F =    0.0403
. test[q25=q50=q75]: fairway
( 1)  [q25]fairway - [q50]fairway = 0    ( 2)  [q25]fairway - [q75]fairway = 0
F(  2,   283) =    0.27                  Prob > F =    0.7603
. test[q25=q50=q75]: green
( 1)  [q25]green - [q50]green = 0             ( 2)  [q25]green - [q75]green = 0
F(  2,   283) =    0.58                   Prob > F =    0.5600
. test[q25=q50=q75]: girputtshole
( 1)  [q25]putts - [q50]putts = 0   ( 2)  [q25] putts - [q75] putts = 0
F(  2,   283) =    1.62                  Prob > F =    0.1989
. test[q25=q50=q75]: sandsvpct
( 1)  [q25]sandsvpct - [q50]sandsvpct = 0     ( 2)  [q25]sandsvpct - [q75]sandsvpct = 0
F(  2,   283) =    5.35                   Prob > F =    0.0053
Plots of Regression Coefficients by Quantile
Slide Note
Embed
Share

This analysis focuses on using Quantile Regression to study professional female golfers' prize earnings in the Ladies Professional Golf Association (LPGA) during the 2009 and 2010 seasons. The study investigates how various factors like average driving distance, fairway accuracy, greens in regulation, putts per hole on greens, and sand saves impact earnings per event. The data exhibits high skewness, indicating non-normality. Multiple linear regression results are also provided, highlighting the predictive power of the variables studied.

  • Quantile Regression
  • LPGA
  • Golf
  • Earnings
  • Data Analysis

Uploaded on Oct 07, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Quantile Regression Prize Winnings LPGA 2009/2010 Seasons www.lpga.com Kahane, L.H. (2010). Returns to Skill in Professional Golf: A Quantile Regression Approach, International Journal of Sport Finance, Vol. 5, pp. 167-170 Cameron, A.C., and P.K. Trivedi (2010). Microeconometrics Using Stata, Revised Edition, STATA Press, College Station, TX.

  2. Data Description Ladies Professional Golf Association (LPGA) participants during 2009 and 2010 seasons Response Variable: Earnings per Event entered ($1000s) Predictor Variables: Average Driving Distance Percent of Fairways reached on Drives Percent of Greens reached in Regulation Putts per Hole on Greens reached in Regulation Percent of Sand Saves (2 shots to hole)

  3. Quantile Regression Linear Regression is used to relate the Conditional Mean to predictors. Quantile Regression relates specific quantiles to predictors. Particularly useful with non-normal data Makes use of different loss function than Ordinary Least Squares Uses linear programming to estimate Cumulative Distribution Function (CDF): ( ) = Pr F y Y y ( ) F y = = 1 th Quantile: ( ) q q q y F q q th Loss Function to b e minimized for Quantile: q N N ( ) ( ) = + x ' 1 Q q y q y x ' i q i q q i i : ' : ' i y x i y x i i i i

  4. Summary Data for Earnings/Event earnevent ------------------------------------------------------------- Percentiles Smallest 1% .1654375 0 5% .5611538 0 10% 1.200667 .1654375 Obs 289 25% 2.991929 .2545882 Sum of Wgt. 289 50% 6.653733 Mean 13.44039 Largest Std. Dev. 17.40237 75% 15.45304 81.35504 90% 34.815 81.95658 Variance 302.8425 95% 54.85067 82.81731 Skewness 2.325991 99% 81.95658 99.06261 Kurtosis 8.461489 Note: The data are highly skewed: Mean > 2*Median Std. Dev. > Mean

  5. Plots of Earnings per Event Showing Skew

  6. Multiple Linear Regression Source | SS df MS Number of obs = 289 -------------+------------------------------ F( 5, 283) = 68.95 Model | 47899.6433 5 9579.92865 Prob > F = 0.0000 Residual | 39318.9937 283 138.936374 R-squared = 0.5492 -------------+------------------------------ Adj R-squared = 0.5412 Total | 87218.637 288 302.84249 Root MSE = 11.787 ------------------------------------------------------------------------------ earnevent | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- drive | .0749854 .112027 0.67 0.504 -.1455266 .2954974 fairway | .0765432 .1433219 0.53 0.594 -.2055689 .3586554 green | 1.515417 .2260856 6.70 0.000 1.070394 1.96044 girputtshole | -155.7758 19.96318 -7.80 0.000 -195.0709 -116.4806 sandsvpct | .4146478 .0919212 4.51 0.000 .2337117 .5955839 _cons | 160.2711 49.32258 3.25 0.001 63.1854 257.3567 ------------------------------------------------------------------------------ The model explains approximately 55% of the variation in earnings per event Important Factors: Greens in Regulation (+), Putts per hole (-), Sand Save Percent (+)

  7. Influential Observations wrt s 2 2 289 = = Influential Observations: 0.1176 DFBETAS ( ) j i n drive Obsnum _dfbeta_1 Obsnum _dfbeta_2 Obsnum _dfbeta_3 Obsnum _dfbeta_4 Obsnum _dfbeta_5 30 0.3509 30 0.3433 30 166 -0.2432 125 0.2899 104 249 -0.3423 249 -0.4798 166 268 -0.3883 238 249 256 fairway green girputtshole sandsvpct -0.3225 -0.2795 0.2661 -0.2572 -0.2418 0.4677 30 106 125 177 211 238 268 0.6184 -0.2525 -0.2833 0.2556 -0.4088 -0.5183 -0.2856 30 104 163 211 268 0.3295 -0.3027 0.5885 -0.2842 0.7242 These cases are extremely influential (higher than twice the rule of thumb ). Golfers: 30 (Michelle Ellis, 2009), 104 (Liselotte Neuman, 2009), 125 (Jiyai Shin, 2009), 166 (Paula Creamer, 2010), 211 (Cristie Kerr, 2010), 249 (Angela Park, 2010), and 268 (Jiyai Shin, 2010) appear to have high influence on several regression coefficients

  8. Quantile Regression Models the regression relation for various quantiles between the predictors and the response variable: Earnings per Event = + ' i x Y q i iq = ' i x 1 drive fairway green putts sandsav i i i i i th Loss Function to b e minimized for Quantile: q N N ( ) ( ) = + x ' 1 Q q y q y x ' i q i q q i i : ' : ' i y x i y x i i i i Standard errors of regression coefficients are estimated by bootstrapping 400 samples

  9. Quantile Regression Output (STATA) q25 drive fairway green girputtshole sandsvpct _cons Coef. -0.0254 -0.0073 0.8419 -72.3041 0.1215 86.0426 Std. Err. 0.0539 0.0665 0.1492 9.3716 0.0435 18.9736 t P>|t| [95% Conf.Interval] -0.1314 -0.1382 0.5483 0.0000 -90.7510 -53.8571 0.0060 0.0360 0.0000 48.6952 123.3899 -0.4700 -0.1100 5.6400 -7.7200 2.8000 4.5300 0.6380 0.9120 0.0000 0.0806 0.1235 1.1355 Note: 1) Driving distance is only significant among golfers at the 75th percentile 2) Putting ability effect increases among skill levels 3) Greens in regulation effect is fairly equal among skill levels 4) Fairway accuracy is not significant for any skill level 5) Sand saves are more important for golfers at the 75th percentile 0.2071 q50 drive fairway green girputtshole sandsvpct _cons Coef. Std. Err. 0.0786 0.0973 0.1832 17.1245 0.0777 43.1192 t P>|t| [95% Conf.Interval] -0.1018 -0.1440 0.6006 0.0000 -129.0947 -61.6795 0.0660 -0.0098 0.0210 14.9096 184.6598 0.0528 0.0476 0.9611 -95.3871 0.1432 99.7847 0.6700 0.4900 5.2500 -5.5700 1.8400 2.3100 0.5020 0.6250 0.0000 0.2074 0.2392 1.3217 0.2962 q75 drive fairway green girputtshole -127.4508 sandsvpct _cons Coef. Std. Err. 0.1399 0.1692 0.3229 37.0845 0.1288 86.7001 t P>|t| [95% Conf.Interval] 0.0377 -0.2475 0.4869 0.0010 -200.4471 -54.4544 0.0000 0.2014 0.3810 -94.6097 246.7080 0.3131 0.0857 1.1226 2.2400 0.5100 3.4800 -3.4400 3.5300 0.8800 0.0260 0.6130 0.0010 0.5886 0.4188 1.7583 0.4549 76.0492 0.7084

  10. Tests of Equality of Coefficients Across Quantiles . test[q25=q50=q75]: drive ( 1) [q25]drive - [q50]drive = 0 ( 2) [q25]drive - [q75]drive = 0 F( 2, 283) = 3.25 Prob > F = 0.0403 . test[q25=q50=q75]: fairway ( 1) [q25]fairway - [q50]fairway = 0 ( 2) [q25]fairway - [q75]fairway = 0 F( 2, 283) = 0.27 Prob > F = 0.7603 . test[q25=q50=q75]: green ( 1) [q25]green - [q50]green = 0 ( 2) [q25]green - [q75]green = 0 F( 2, 283) = 0.58 Prob > F = 0.5600 . test[q25=q50=q75]: girputtshole ( 1) [q25]putts - [q50]putts = 0 ( 2) [q25] putts - [q75] putts = 0 F( 2, 283) = 1.62 Prob > F = 0.1989 . test[q25=q50=q75]: sandsvpct ( 1) [q25]sandsvpct - [q50]sandsvpct = 0 ( 2) [q25]sandsvpct - [q75]sandsvpct = 0 F( 2, 283) = 5.35 Prob > F = 0.0053

  11. Plots of Regression Coefficients by Quantile

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#