Ensemble Methods and SuperLearning in R

introduction to ensemble methods n.w
1 / 25
Embed
Share

Explore the concept of ensemble learning, where multiple algorithms are combined to enhance predictions, with a focus on Super Learning implementation in R. Learn how this approach mirrors seeking multiple opinions before making a crucial decision, leading to more informed choices in automated decision-making applications.

  • Ensemble Learning
  • SuperLearning
  • R Programming
  • Machine Learning
  • Data Science

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Introduction to Ensemble Methods and SuperLearning in R Jonathan Todd November 20, 2017

  2. Outline Ensembling Methods Super Learning Cross-validation Algorithm steps NHANES Example 2

  3. Ensemble Learning

  4. Ensemble Learning In matters of great importance that have financial, medical, In matters of great importance that have financial, medical, social, or other implications, we often seek a second opinion social, or other implications, we often seek a second opinion before making a decision, sometimes a third, and sometimes before making a decision, sometimes a third, and sometimes many more. In doing so, we weigh the individual opinions, and many more. In doing so, we weigh the individual opinions, and combine them through some thought process to reach a final combine them through some thought process to reach a final decision that is presumably the most informed one. The decision that is presumably the most informed one. The process of consulting several experts before making a final process of consulting several experts before making a final decision is perhaps second nature to us; yet, the extensive decision is perhaps second nature to us; yet, the extensive benefits of such a process in automated decision making benefits of such a process in automated decision making applications have only recently been discovered by applications have only recently been discovered by the computational intelligence computational intelligence community community. . -Ravi Polikar, Ensemble Based Systems in Decision Making the 4

  5. More quotes Or to put it another way, for any two learning Or to put it another way, for any two learning algorithms, there are just as many situations algorithms, there are just as many situations (appropriately weighted) in which algorithm one is (appropriately weighted) in which algorithm one is superior to algorithm two as vice versa, according to superior to algorithm two as vice versa, according to any of the measures of superiority any of the measures of superiority -David Wolpert, The Supervised Learning No- Free-Lunch Theorems 5

  6. Ensemble Learning Applying multiple algorithms or estimators to a problem, and then combining these estimators to improve the prediction of any one algorithm Ideally, an ensemble estimate should be as good or better (as measured by MSPE) than any single model or estimator Different modes of inference (parametric, non- parametric, etc) can be used for the same problem Super Learning is one implementation of ensemble learning and is available in R 6

  7. Super Learning

  8. Super Learning Originally developed by Mark van der Laan at UC Berkeley Implemented through the SuperLearner package in R, by Eric Polley Modularity of R makes adding additional algorithms easy 8

  9. Cross-Validation Split the sample into V equally sized (as much as possible), mutually exclusive blocks. Each block gets to serve as the validation set once, while the remaining blocks serve as the training set. Typical numbers for V are 5, 10, etc. 9

  10. V-Fold Cross Validation 1 1 2 3 4 5 6 7 8 9 10 1 1 2 3 4 5 6 7 8 9 10 1 1 2 3 4 5 6 7 8 9 10 1 1 2 3 4 5 6 7 8 9 10 1 1 2 3 4 5 6 7 8 9 10 1 1 2 3 4 5 6 7 8 9 10 1 1 2 3 4 5 6 7 8 9 10 1 1 2 3 4 5 6 7 8 9 10 1 1 2 3 4 5 6 7 8 9 10 1 1 2 3 4 5 6 7 8 9 10 Training Training Set Set Validation Set Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Fold 6 Fold 7 Fold 8 Fold 9 Fold 10 10

  11. Algorithm for Super Learning Select a set of algorithms to use in the Super Learner Fit each algorithm to each of the V training sets Using the fits from the training sets, predict values of the outcome Y for each algorithm in the corresponding validation set You now have predicted values for each participant and algorithm 11

  12. Prediction Participant Participant Hemoglobin Hemoglobin (observed) (observed) 14.5 11.8 15.0 13.4 12.3 GLM GLM Random Random Forest (prediction) (prediction) 13.86464 13.24328 14.47116 13.15683 13.41766 Forest ElasticNet ElasticNet (prediction) (prediction) 13.97643 13.26701 13.70927 12.87224 12.87082 (prediction) (prediction) 13.89338 13.24953 14.14557 13.04519 13.25144 1 2 3 4 5 . . . N 13.9 13.94182 13.72722 13.93876 12

  13. Discrete Super Learner The discrete Super Learner is simply the single algorithm that has the lowest cross-validated MSPE Can specify other loss functions 13

  14. Improving on the Discrete Super Learner Can improve the Discrete Super Learner by finding the optimal weighted combination of algorithms that minimizes MSPE Fit a regression of the outcome Y on the set of predicted values from each algorithm. 14

  15. Super Learner Prediction Fit each algorithm to the entire dataset, generating predicted values for each algorithm. The Super Learner prediction function is: where the coefficients come from the regression in Equation 1, and the predicted values F come from fitting the algorithm to the entire dataset 15

  16. Choosing Algorithms listWrappers() command in R lists the available algorithms. Many of the nonparametric algorithms have tuning parameters; you can adjust these and create your own algorithms How to choose? Best guidance is as many as you can within the bounds of available time and computing power 16

  17. NHANES Example

  18. Parameters of Example To keep things simple, we will attempt to predict hemoglobin using the following variables: age, sex, transferrin saturation (tsat), and iron Algorithms: GLM (linear regression), random forest, and elastic net regression Will change the alpha parameter for Elastic Net regression to create new algorithms to include in the Super Learner 18

  19. Super Learner Prediction Results Algorithm Algorithm Super Learner Discrete Super Learner Random Forest Elastic Net, =0 (Ridge) Elastic Net, =0.25 Elastic Net, =1 (Lasso) Main terms linear regression Elastic Net, =0.50 Elastic Net, =0.75 MSPE MSPE 1.0111 1.0451 1.0451 1.1391 1.1399 1.1400 1.1401 1.1402 1.1402 Relative Relative Efficiency Efficiency --- 1.03 1.03 1.13 1.13 1.13 1.13 1.13 1.13 Relative Efficiency = CV MSPE (algorithm) / CV MSPE (SL) 19

  20. Super Learner Algorithm Results 20

  21. Conclusions

  22. Practical Considerations With larger datasets and large libraries of algorithms, Super Learning can be slooooooow. Not practical to implement in SAS, as the many different algorithms are often not native to base SAS (macros exist for many of these) Can be used for effect estimation as well (you just need to set the exposure levels) Tempting to throw in a bunch of prediction algorithms (black box?) 22

  23. Further Resources CRAN SuperLearner site https://cran.r- project.org/web/packages/SuperLearner/index.ht ml SuperLearner github https://github.com/ecpolley/SuperLearner Maya Petersen/Laura Balzer course site http://www.ucbbiostat.com/ 23

  24. Conclusions Super Learning is a powerful tool for prediction Allows the use of a large library of candidate estimators No need to worry about model choice try them all! 24

  25. Thanks! jvtodd@email.unc.edu

Related


More Related Content