
Ensemble Methods and SuperLearning in R
Explore the concept of ensemble learning, where multiple algorithms are combined to enhance predictions, with a focus on Super Learning implementation in R. Learn how this approach mirrors seeking multiple opinions before making a crucial decision, leading to more informed choices in automated decision-making applications.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Introduction to Ensemble Methods and SuperLearning in R Jonathan Todd November 20, 2017
Outline Ensembling Methods Super Learning Cross-validation Algorithm steps NHANES Example 2
Ensemble Learning In matters of great importance that have financial, medical, In matters of great importance that have financial, medical, social, or other implications, we often seek a second opinion social, or other implications, we often seek a second opinion before making a decision, sometimes a third, and sometimes before making a decision, sometimes a third, and sometimes many more. In doing so, we weigh the individual opinions, and many more. In doing so, we weigh the individual opinions, and combine them through some thought process to reach a final combine them through some thought process to reach a final decision that is presumably the most informed one. The decision that is presumably the most informed one. The process of consulting several experts before making a final process of consulting several experts before making a final decision is perhaps second nature to us; yet, the extensive decision is perhaps second nature to us; yet, the extensive benefits of such a process in automated decision making benefits of such a process in automated decision making applications have only recently been discovered by applications have only recently been discovered by the computational intelligence computational intelligence community community. . -Ravi Polikar, Ensemble Based Systems in Decision Making the 4
More quotes Or to put it another way, for any two learning Or to put it another way, for any two learning algorithms, there are just as many situations algorithms, there are just as many situations (appropriately weighted) in which algorithm one is (appropriately weighted) in which algorithm one is superior to algorithm two as vice versa, according to superior to algorithm two as vice versa, according to any of the measures of superiority any of the measures of superiority -David Wolpert, The Supervised Learning No- Free-Lunch Theorems 5
Ensemble Learning Applying multiple algorithms or estimators to a problem, and then combining these estimators to improve the prediction of any one algorithm Ideally, an ensemble estimate should be as good or better (as measured by MSPE) than any single model or estimator Different modes of inference (parametric, non- parametric, etc) can be used for the same problem Super Learning is one implementation of ensemble learning and is available in R 6
Super Learning Originally developed by Mark van der Laan at UC Berkeley Implemented through the SuperLearner package in R, by Eric Polley Modularity of R makes adding additional algorithms easy 8
Cross-Validation Split the sample into V equally sized (as much as possible), mutually exclusive blocks. Each block gets to serve as the validation set once, while the remaining blocks serve as the training set. Typical numbers for V are 5, 10, etc. 9
V-Fold Cross Validation 1 1 2 3 4 5 6 7 8 9 10 1 1 2 3 4 5 6 7 8 9 10 1 1 2 3 4 5 6 7 8 9 10 1 1 2 3 4 5 6 7 8 9 10 1 1 2 3 4 5 6 7 8 9 10 1 1 2 3 4 5 6 7 8 9 10 1 1 2 3 4 5 6 7 8 9 10 1 1 2 3 4 5 6 7 8 9 10 1 1 2 3 4 5 6 7 8 9 10 1 1 2 3 4 5 6 7 8 9 10 Training Training Set Set Validation Set Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Fold 6 Fold 7 Fold 8 Fold 9 Fold 10 10
Algorithm for Super Learning Select a set of algorithms to use in the Super Learner Fit each algorithm to each of the V training sets Using the fits from the training sets, predict values of the outcome Y for each algorithm in the corresponding validation set You now have predicted values for each participant and algorithm 11
Prediction Participant Participant Hemoglobin Hemoglobin (observed) (observed) 14.5 11.8 15.0 13.4 12.3 GLM GLM Random Random Forest (prediction) (prediction) 13.86464 13.24328 14.47116 13.15683 13.41766 Forest ElasticNet ElasticNet (prediction) (prediction) 13.97643 13.26701 13.70927 12.87224 12.87082 (prediction) (prediction) 13.89338 13.24953 14.14557 13.04519 13.25144 1 2 3 4 5 . . . N 13.9 13.94182 13.72722 13.93876 12
Discrete Super Learner The discrete Super Learner is simply the single algorithm that has the lowest cross-validated MSPE Can specify other loss functions 13
Improving on the Discrete Super Learner Can improve the Discrete Super Learner by finding the optimal weighted combination of algorithms that minimizes MSPE Fit a regression of the outcome Y on the set of predicted values from each algorithm. 14
Super Learner Prediction Fit each algorithm to the entire dataset, generating predicted values for each algorithm. The Super Learner prediction function is: where the coefficients come from the regression in Equation 1, and the predicted values F come from fitting the algorithm to the entire dataset 15
Choosing Algorithms listWrappers() command in R lists the available algorithms. Many of the nonparametric algorithms have tuning parameters; you can adjust these and create your own algorithms How to choose? Best guidance is as many as you can within the bounds of available time and computing power 16
Parameters of Example To keep things simple, we will attempt to predict hemoglobin using the following variables: age, sex, transferrin saturation (tsat), and iron Algorithms: GLM (linear regression), random forest, and elastic net regression Will change the alpha parameter for Elastic Net regression to create new algorithms to include in the Super Learner 18
Super Learner Prediction Results Algorithm Algorithm Super Learner Discrete Super Learner Random Forest Elastic Net, =0 (Ridge) Elastic Net, =0.25 Elastic Net, =1 (Lasso) Main terms linear regression Elastic Net, =0.50 Elastic Net, =0.75 MSPE MSPE 1.0111 1.0451 1.0451 1.1391 1.1399 1.1400 1.1401 1.1402 1.1402 Relative Relative Efficiency Efficiency --- 1.03 1.03 1.13 1.13 1.13 1.13 1.13 1.13 Relative Efficiency = CV MSPE (algorithm) / CV MSPE (SL) 19
Practical Considerations With larger datasets and large libraries of algorithms, Super Learning can be slooooooow. Not practical to implement in SAS, as the many different algorithms are often not native to base SAS (macros exist for many of these) Can be used for effect estimation as well (you just need to set the exposure levels) Tempting to throw in a bunch of prediction algorithms (black box?) 22
Further Resources CRAN SuperLearner site https://cran.r- project.org/web/packages/SuperLearner/index.ht ml SuperLearner github https://github.com/ecpolley/SuperLearner Maya Petersen/Laura Balzer course site http://www.ucbbiostat.com/ 23
Conclusions Super Learning is a powerful tool for prediction Allows the use of a large library of candidate estimators No need to worry about model choice try them all! 24
Thanks! jvtodd@email.unc.edu