Enhancing Player Selection in NHL and NBA Drafts Using Model Trees
Drafting exceptional players in the NHL and NBA is crucial for team success. This study by Yejia Liu, Oliver Schulte, and Chao Li from Simon Fraser University explores the use of model trees to identify top prospects. The research addresses the challenges in player selection, the limitations of previous models, and presents an innovative approach that combines regression-based and similarity-based methods to differentiate players effectively. By analyzing key metrics and player features, this model aims to improve the accuracy of player assessments in professional sports drafts.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Model Trees for Identifying Exceptional Players in the NHL and NBA Drafts Yejia Liu, Oliver Schulte and Chao Li School of Computing Science Simon Fraser University Vancouver, Canada
Problem Formulation: Drafting Prospects Drafting: essential to build a successful team 1. Entry Draft (Lottery system): mistakes (e.g. tanking games issue) NHL: Nikita Filatov (6th) vs. Erik Karlsson (15th) NBA: Sam Bowie vs. Michael Jordan (Portland Trail Blazers) 2. Scouts: expensive labour and hours 2/n ECML Workshop for Sport Analytics 2018
Previous Models Regression-based approaches o Generalized additive model by Schuckers using season- by-season data of NHL draft o Linear regression model by Greene, using predraft + rookie years stats for NBA Similarity-based approaches o Prospect Cohort Sucess Model (scoring rate, height and age) o PECOTA system in baseball o Lutz, D - A cluster analysis of NBA players Schuckers, M. & Statistical Sports Consulting, L. L. C. (2016), 'Draft by Numbers: Using Data and Analytics to Improve National Hockey League (NHL) Player Selection', MIT Sloan Sports Analytics Conference. Greene, A. (2015), 'The Success of NBA Draft Picks: Can College Career Predict NBA Winners', Master's thesis, St.Cloud State University. Lutz, D. (2012) A cluster analysis of NBA players. In: MIT Sloan Sports Analytics Conference 3/n
Dataset Input data = season aggregates Success Metrics o NHL: Number of games played in a player s first seven seasons o NBA: Player Efficiency Rating (PER) NBA input Data: 1985-2011 drafts (excluded players whose college stats are not available) NHL input data = NHL Data: demographic + performance metrics (e.g. CSS_rank) Cohort 1 1998, 1999, 2000, 2001, 2002 drafts Cohort 2 2004, 2005, 2006, 2007, 2008 drafts 1. https: // github. com/ liuyejia/ Model_ Trees_ Full_ Dataset 4/n 2. https: // github. com/ sfu-cl-lab/ Yeti-Thesis-Project/ tree/ master/ NBA_work
Our Model Tree Combine regression-based and similarity-based approaches o An ensemble of regression models o Present interactions between player features and player groups o Learn from data, no need to specify similarity metrics o Differentiate players from the same group 5/n
Developing the Model Tree (NHL) o Zero-inflation problem in NHL draft (about half of player not playing in NHL after being drafted) Binary target variable: Whether a drafted player can play at least one game at NHL? o Logistic regression model in the leaf node Ranking Process 1) The tree assigns each player i to a unique leaf node Ii, with a logistic regression model m(Ii). 2) Use m(Ii) to compute a probability pi= P (gi>0). 3) Rank players by their chance of playing an NHL game 7/n
Our Model Tree (NHL) Logistic Model Trees o Logistic regression model in every node o LogitBoost algorithm to maximize likelihood of training data points o Tree splitting based on information entropy, similar to C4.5 o Tree pruning based on training error and model complexity penalty 8/n
Evaluation (NHL) Spearman Rank Correlation Draft Order Spearman Rank Correlation Tree Model Classification Accuracy Tree Model Spearman Rank Correlation Training Data NHL Draft Years Out of Sample Draft Years 1998, 1999, 2000 2001 0.43 82.27% 0.83 2001, 2002 2002 0.3 85.79% 0.85 2004, 2005, 2006 2007 0.46 81.23% 0.84 2007, 2008 2008 0.51 63.56% 0.71 9/n
Our Model Tree (NBA) o No zero-inflation problem in NBA draft, over 80% drafted players appear in NBA Build a tree whose leaves contain a linear regression model. o Continuous Target Variable: Predict career PER of a drafted player Process: 1) The tree assigns each player i to a unique leaf node Ii, with a linear regression model m(Ii). 2) Use m(Ii) to compute predicted career PER. 3) Rank players by predicted career PER 10/n
Our Model Tree (NBA) M5 Regression Trees (M5P) o Intial tree construction based on standard deviation of target variables ?? ? ????? = ?? ? ??(??) ? o Linear regression model in every node using standard regression methods o Tree pruning based on estimated error o Tree Smoothing: predicted value at leaf node adjusted by the predicted values from root to this leaf node 11/n
Our Model Tree (NBA) 12/n
Our Model Tree Evaluation Pearson Correlation Spearman Rank Correlation RMSE Draft Order 0.42 NaN 0.39 Linear Regression (baseline) 0.45 0.40 7.14 Our Model Tree 0.55 0.43 6.16 13/n
Identifying Strong and Weak Points The numbers are the beginning of the conversation . Cam Lawrence, Florida Panthers We can leverage the weights to identify the player features that contribute the most to raising/lowering a player s ranking The log-probability difference of playing at least one game between a random player i and an average player in group g is ?=1 Find the features j that contribute the most to this difference ???????|??(??? ???)| ???(??? ???) 14/n
Case Studies 15/n
Conclusion Introduce model trees, which o assign players to groups that are statistically distinct o build separate prediction models for separate groups Model tree rankings correlate well with actual career success metric Tree structure is interpretable for scouts, sport experts Model trees can be used to highlight player strong points Our methods are flexible to apply to other sports with aggregate datasets 16/n