The Importance of Interpretability in Machine Learning for Medicine
Interpretability in machine learning models is crucial in medicine for clear decision-making and validation by medical professionals. This article discusses the definitions, global and local explanations, tradeoffs between interpretability and accuracy, and reasons why interpretable models are essential in medical settings.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Expert Augmented Machine Learning. Gilmer Valdes, PhD DABR Assistant Professor, Department of Radiation Oncology Department of Epidemiology and Biostatistics University of California, San Francisco
Outline 1. What is interpretability? 2. Why do we need interpretable models in medicine? 3. Expert Augmented Machine Learning.
Defining Interpretability 1. The algorithm is interpretable 2. The model is interpretable in a global sense. (e.x Decision Trees) 3. The model is interpretable locally (ex. LIME). Post-hoc justifications or explanations (ex. variable importance) https://arxiv.org/abs/1606.03490 The Mythos of Model Interpretability Zachary C. Lipton
Local Explanations LIME Ribero et al 2016 LIME. Ribero et al https://www.kdd.org/kdd2016/papers/files/rfp0573-ribeiroA.pdf
Definition of interpretability Definition 1. A machine learning model is globally interpretable in a medical sense if physicians are able to precisely describe using clinical language how the model makes predictions for every possible patient in a way that them could contest or agree with the prediction. G Valdes et al. MediBoost: a Patient Stratification Tool for Interpretable Decision Making in the Era of Precision Medicine. Nat Sci Rep 6. Article Number: 37854. http://www.mediboostml.com Not the only notion of interpretability
Tradeoff between interpretability and accuracy Data-driven Advice for Applying Machine Learning to Bioinformatics Problems. Randal S. Olson et al. https://arxiv.org/pdf/1708.05070.pdf
Reasons for interpretability The need for interpretable models in medicine rises from practical and theoretical reasons: 1.Acceptance. 2. Known limitations of observational training data (cofounders, noise, bias, etc)
1. Acceptance Teach, R.L. and E.H. Shortliffe, An analysis of physician attitudes regarding computer-based clinical consultation systems. Computers and Biomedical Research, 1981. 14(6): p. 542-558.
2. Limitations of observational data Example: Predicting Risk of dying of Pneumonia for In-hospital patients Most accurate model trained: Multi-purpose neural net . Rule Based Model Asthmatic Lower Risk Harmful to patients High Risk of Liability = https://www.ncbi.nlm.nih.gov/pubmed/9040894
2. Limitations of observational data Example: Predicting Risk of stroke for Emergency Department patients Does Machine Learning Automate Moral Hazard and Error? .American Economic Review: Papers & Proceedings 2017, 107(5): 476 480
2. Limitations of observational data 88% ranking. The CNN was learning the hospital type. https://medium.com/@jrzech/what-are-radiological-deep-learning-models-actually-learning-f97a546c5b98
2. Limitations of observational data Bias in Medicine 1. Psychologically salient diseases are over-diagnosed. Cognitive Biases and Heuristics in Medical Decision Making: A Critical Review Using a Systematic Search Strategy. Medical Decision Making 35 (4): 539 57. 2. Physicians are 40 percent less likely to refer female or black patients for catherization. Effect of Race and Sex on Physicians Recommendations for Cardiac Catheterization. New England Journal of Medicine 340: 618 26. 3. Minorities receive less aggressive cancer treatment. Racial Differences in the Treatment of Early-Stage Lung Cancer. New England Journal of Medicine 341: 1198 1205.
Context is everything Context is everything
Outline 1. Why do we need interpretable models? 2. What is interpretability? 3. Three takes on interpretability: Expert Augmented Machine Learning.
???????? ??????+ ? ? ????????,?????? =
EAML EAML 1 2 3 4 5
EAML EAML
EAML EAML Data Physicians answer PFratio PFratio Age Urea GCS WBC Age Urea HR HR Renal Bilirubin Bilirubin SBP SBP GCS WBC K K Na Na Renal CO2 CO2 0e+00 4e+06 8e+06 0 500 Variable Importance 1500 Variable Importance
EAML EAML 1 2 3 4 5
Ranking Comparison Ranking Comparison
Ranking Ranking Comparison Comparison
GCS: Ranking Comparison GCS: Ranking Comparison - Intubated patients were codified as GCS of 3 (1 in our scale) - Mortality of intubated patients with low GCS is 0.19 vs 0.28 for those patients that are not intubated but have a low GCS. - There are 6493 intubated patients vs 1236 no intubated patients with GCS 1 or 2.
Ranking Ranking Comparison Comparison
PaO2/FiO2: Ranking Comparison PaO2/FiO2 > 336 - 54% of patients have missing value for PaO2/F1O2. These values were imputed with the mean = 332.60. - 94.2% (N= 14430) of patients missing PaO2/F1O2 (N =15332) are not intubated. - 60.35% (N= 5591) of patients with PaO2/F1O2 (N =9265) are intubated. - Mortality Ratio of intubated patients with high PaO2/F1O2 is 0.13 vs 0.046 for non intubated.
EAML EAML Data Physicians answer PFratio PFratio Age Urea GCS WBC Age Urea HR HR Renal Bilirubin Bilirubin SBP SBP GCS WBC K K Na Na Renal CO2 CO2 0e+00 4e+06 8e+06 0 500 Variable Importance 1500 Variable Importance
MIMIC II vs MIMIC II vs MIMIC MIMIC III III
Discretizing Empirical Discretizing Empirical Risk Difference Risk Difference 1 2 3 4 5
EAML: Hard Version EAML: Hard Version MIMIC2 trained model MIMIC2 Training MIMIC2 Testing MIMIC3i Testing MIMIC3ii Testing 0.84 0.82 0.80 AUC 0.78 0.76 0.74 <1 Subset of Rules by Rank Difference <2 <3 <4 All
EAML: Hard Version EAML: Hard Version MIMIC2 trained model on MIMIC3 data Rank difference <5 (All Rules) <4 <3 <2 <1 0.78 0.76 Mean AUC 0.74 0.72 0.70 200 400 800 1600 3200 6400 Training N Cases
EAML: Hard Version EAML: Hard Version MIMIC2 trained model on MIMIC3 data 0.74 Rank difference <5 (All Rules) <4 <3 <2 <1 0.72 Mean Balanced Accuracy 0.70 0.68 0.66 0.64 200 400 800 1600 3200 6400 Training N Cases
EAML EAML Advantages Disadvantages Detects confounders It demands time from physicians Builds models that are robust to change in the variable distributions and time decay. It is very accurate on expectation https://rtemis.lambdamd.org/rulefit.html https://www.pnas.org/content/117/9/4571
gilmer.valdes@ucsf.edu Supported by NIBIB of the NIH under award K08EB026500