Machine Learning for Improved Risk Stratification in Health Care

 
M
A
C
H
I
N
E
 
L
E
A
R
N
I
N
G
 
F
O
R
 
I
M
P
R
O
V
E
D
 
R
I
S
K
S
T
R
A
T
I
F
I
C
A
T
I
O
N
 
O
F
 
N
C
D
 
P
A
T
I
E
N
T
S
 
I
N
 
E
S
T
O
N
I
A
B
i
g
 
D
a
t
a
 
a
n
d
 
M
a
c
h
i
n
e
 
L
e
a
r
n
i
n
g
 
i
n
 
H
e
a
l
t
h
 
C
a
r
e
 
Marvin Ploetz
Philip Docena
Ojaswi Pandey
Aakash Mohpal
23 April, 2019
 
Objectives
 
Propose an alternative - machine learning based - approach
to patient risk stratification for ECM in Estonia
Illustrate the use and applicability of machine learning to
other areas of work relevant to EHIF
 
Overview
 
 
Big Data and Machine Learning in Health Care
Machine Learning Basics
Context of ECM
Research Question
Data Overview & Sample Construction
Feature Engineering
Evaluation & Modelling Choices
Results
Conclusions
 
Big Data and Machine
Learning in Health Care
 
 
B
i
g
 
D
a
t
a
 
a
n
d
 
M
a
c
h
i
n
e
 
L
e
a
r
n
i
n
g
 
i
n
 
H
e
a
l
t
h
 
C
a
r
e
 
Take advantage of
massive amounts of data
and provide the right
intervention to the right
patient at the right time
 
Personalized care to the
patient
 
Potentially benefit all
agents in the health care
system: patient,
provider, payer,
management
 
U
s
e
s
 
o
f
 
M
a
c
h
i
n
e
 
L
e
a
r
n
i
n
g
 
i
n
 
H
e
a
l
t
h
 
C
a
r
e
Right patient
 
Right intervention
 
Right time
Personalized medicine
Patients
 
Providers
 
Payers
Benefits
 
E
x
a
m
p
l
e
 
1
:
 
H
i
p
 
a
n
d
 
k
n
e
e
 
r
e
p
l
a
c
e
m
e
n
t
 
i
n
 
t
h
e
 
U
S
 
Osteoarthritis
: a common and painful chronic condition
Often requires replacement of hip and knees
More than 500,000 Medicare beneficiaries receive replacements each year
Medical costs
: roughly $15,000 per surgery
Medical benefits
: accrue over time, since some months after surgery is painful and
spent in disability
Therefore, a joint replacement only makes sense if you will live long enough to enjoy it. If
you die soon after, could be futile and painful
Prediction/classification problem
: Can we predict which surgeries will be futile using only
data available at the time of surgery?
 
E
x
a
m
p
l
e
 
1
:
 
H
i
p
 
a
n
d
 
k
n
e
e
 
r
e
p
l
a
c
e
m
e
n
t
 
i
n
 
t
h
e
 
U
S
20% of 7.4
million
beneficiaries
98,090 had hip or
knee replacement
in 2010
1.4% died within one month
of surgery
4.2% died within 1-12 months
65,395
observations
32,695
observations
3,305 independent
variables
Traditional Analysis: About Averages
Big Data and ML Analytics: Predict Individual Risks
Model to
predict
riskiest
patients
 
Train data
 
Test data
 
E
x
a
m
p
l
e
 
1
:
 
H
i
p
 
a
n
d
 
k
n
e
e
 
r
e
p
l
a
c
e
m
e
n
t
 
i
n
 
t
h
e
 
U
S
 
The first column sorts the test sample by risk percentiles. In the top 5
th
 percentile riskiest population, the observed
mortality rate within 1 year within 1-12 months was 43.5%. Reallocating these surgeries to those with median risk
level (50
th
 percentile) would have averted 1,984 futile procedures, and reallocated $30m to other beneficiaries.
 
E
x
a
m
p
l
e
 
2
:
 
D
i
a
g
n
o
s
e
s
 
o
f
 
p
e
d
i
a
t
r
i
c
 
c
o
n
d
i
t
i
o
n
s
 
Apply natural language processing
algorithms to extract data from EHRs
 
Extract 101.6m data points from 1.3m
EHRs of pediatric patients
 
High diagnostic accuracy among multiple
organ systems and comparable to
performance of experienced pediatric
physicians
 
E
x
a
m
p
l
e
 
2
:
 
D
i
a
g
n
o
s
e
s
 
o
f
 
p
e
d
i
a
t
r
i
c
 
c
o
n
d
i
t
i
o
n
s
 
E
x
a
m
p
l
e
 
3
:
 
B
r
e
a
s
t
 
c
a
n
c
e
r
 
s
c
r
e
e
n
i
n
g
 
Most common form of cancer afflicting
2.5 million patients worldwide in 2015
Need to distinguish malignant tumors
from benign ones
Early detection is key
Data: 62,219 mammography findings
from the Wisconsin State Cancer
Reporting System
A Neural Network based algorithm does
as well as radiologists in classifying the
tumors
 
Machine Learning Basics
 
 
D
e
f
i
n
i
t
i
o
n
 
o
f
 
B
i
g
 
D
a
t
a
 
Collection of large and complex data sets
which are difficult to process using common
database management tools or traditional
data processing applications
 
Not only about size: finding insights from
complex, noisy, heterogeneous, and
longitudinal data sets
This includes capturing, storing, searching,
sharing and analyzing
 
T
y
p
e
s
 
o
f
 
M
a
c
h
i
n
e
 
L
e
a
r
n
i
n
g
 
P
r
o
b
l
e
m
s
 
1)
Supervised
 – Making predictions using labeled/structured data
Classification
: use data to predict which category something falls into
Examples: If an image contains a store front or not; If a patient is high risk or not
Regression
: use data to make predictions on a continuous scale
Examples: Predict stock price of a company; given historical data, what will the
temperature be tomorrow
 
2)
Unsupervised
 – Detecting patterns from unstructured data
Problems where we have little or no idea what the results should look like
Provide algorithms with data and ask to look for hidden features and cluster the
data in a way it makes sense
Examples: identify patterns from genomics data, separating voice from noise in audio files
 
M
a
c
h
i
n
e
 
L
e
a
r
n
i
n
g
 
I
m
p
l
e
m
e
n
t
a
t
i
o
n
Data
Feature
engineering
/ Data
construction
Train data
 
80%
Test data
20%
Build
Machine
Learning
Model
 
Collect data
 
Validate model
results using test
data
Data
Data
Data
Model Results
 
Standardize and
clean data
 
Split data in
test/train
 
Build model using
train data
 
A
s
s
e
s
s
i
n
g
 
M
o
d
e
l
 
P
e
r
f
o
r
m
a
n
c
e
:
 
P
r
e
c
i
s
i
o
n
 
a
n
d
 
R
e
c
a
l
l
 
Accuracy
 = 
(TP+TN)/All
Precision
 = TP/(TP+FP)
Recall
 = TP/(TP+FN)
 
A
s
s
e
s
s
i
n
g
 
M
o
d
e
l
 
P
e
r
f
o
r
m
a
n
c
e
:
 
P
r
e
c
i
s
i
o
n
 
a
n
d
 
R
e
c
a
l
l
 
Accuracy
 = 
150/165 = 78%
Precision
 = 100/145 = 69%
Recall
 = 100/105 = 95%
Case I: High recall, low precision
 
Accuracy
 = 
190/230 = 83%
Precision
 = 90/95 = 95%
Recall
 = 90/125 = 72%
Case II: Low recall, high precision
 
A
s
s
e
s
s
i
n
g
 
M
o
d
e
l
 
P
e
r
f
o
r
m
a
n
c
e
:
 
R
O
C
 
C
u
r
v
e
 
Plot the true and false positive
rate for every classification
threshold
A perfect model has a curve
that passes through the upper
left corner (AUC = 1)
The diagonal (red line)
represents random guessing
(AUC = 0.5)
 
D
e
c
i
s
i
o
n
 
T
r
e
e
:
 
P
l
a
y
i
n
g
 
G
o
l
f
 
A non-parametric supervised
learning method used for
classification and regression
Built in the form a tree structure
Breaks data down in smaller and
smaller subsets while incrementally
building tree
Final result is tree with decision
nodes and leaf nodes
 
D
e
c
i
s
i
o
n
 
T
r
e
e
:
 
P
l
a
y
i
n
g
 
G
o
l
f
Outlook
No Golf
Golf
Windy
Play Golf
No Golf
 
Rainy
 
Overcast
 
Sunny
 
False
 
True
 
D
e
c
i
s
i
o
n
 
t
r
e
e
 
t
o
 
R
a
n
d
o
m
 
F
o
r
e
s
t
 
A collection of decision
trees whose results are
aggregated into one final
output
 
Use different sub-samples
of the data and different set
of features
 
Helps reduce overfitting,
bias and variance
 
 
 
Context of ECM
 
 
A Big Challenge of the Estonian Healthcare System
 
Changes in the demand for health care due to population ageing and
rise of non-communicable diseases
Chronic conditions as the driving force behind needs for better care
integration
Low coverage of preventive services and considerable share of
avoidable specialist and hospital care
Opportunity to improve management of specific patient groups at the
PHC level -> 
care management for empaneled patients
Prediction for which patients breaches in care coordination will occur
-> 
risk-stratification of patients
 
Risk
Stratification
Until Now
 
No actual prediction
analysis done
Involvement of providers
to gain
trust/understanding
Behavioral and social
criteria are key, but
sparsely available -> use
insider knowledge of
doctors
Review by GPs (Behavioral & social factors,
information not in data)
Dominant/complex
condition (cancer,
schizophrenia, rare
disease etc.)
DM/ Hypertension/ Hyperlipidemia
Min. and Max.
Number/Combination of:
CVD/ Respiratory/ Mental Health/
Functional Impairment
Not eligible
Not eligible
Not eligible
Not eligible
ECM Candidate
 
Yes
 
No
 
No
 
Yes
 
Yes
 
No
 
Yes
 
No
 
Enhanced Care Management So Far In Estonia
 
Successful enhanced care management
 pilot with 15 GPs and < 1,000
patients to assess the feasibility and acceptability of enhanced care
management
Commitment of the Estonian Health Insurance Fund (EHIF) to 
scale-up 
the
care management pilot
Model for risk stratification: - Clinical algorithm + provider intuition
Need for a 
better risk-stratification approach!?
 
Research Question
 
 
The Prediction Problem
 
Target patients 
- Who benefits from care management?
A 
combination of disease, social and behavioral factors…
Objective of ECM -
 
Ultimately improve health outcomes for patients with
cardio-vascular, respiratory, and mental disease.
 What is the right proxy prediction variable in the data?
There is 
not 
one single relevant adverse event (e.g. death,
hospital admission, health complication, high healthcare spending)
Some discussions on how to choose the dependent variable…
-> Unplanned hospital admissions have a large negative impact on patient
lives, are costly and relatively frequent. Some are also avoidable…
 
Many Patients
Repeatedly Have
Hospitalizations
 
22 percent of patients
need to be hospitalized
again in the following
year…
 
Hospitalizations
account for a bulk
of healthcare costs
 
Predicting Hospital Admissions
 
Hospital admissions are the main (avoidable) adverse health event
But predicting hospitalizations is a hard problem
Social factors matter a lot, patients may have a lot or no contacts with the
healthcare systems at all…
Tradeoff to choose which hospitalizations we want to predict
Admissions due to specific conditions vs. hospitalizations in general
 
Predicting Hospital Admissions
 
 
Key question
 
Not
“What is the best algorithm for predicting hospital
admissions?”
 
But
“How can we obtain the most useful prediction of
hospital admissions for a specific purpose?”
 
Data Overview & Sample
Construction
 
 
Administrative Claims Data (in Estonia)
 
Very reliable
High-quality data availability 
as of 2007/2008
Comprehensive coding requirements 
for providers
Reporting lag of data 
is on average 2 weeks
No info on clinical outcomes 
(i.e. test results)
Limited information on 
social conditions and behavioral
characteristics
Need for a lot of feature engineering to create “meaningful” variables
at the patient level
 
Description of Available Data
 
 
 
 
 
 
 
 
Patient Cohort
Selection for
the ML Analysis
 
Characteristics of Patients in the ML sample vs.
Total Population
 
 
 
Relative to the population, the ML sample is older and more likely to be female.
 
Characteristics of Patients in the ML sample vs.
Total Population
 
 
Most Common
Chronic
Conditions
 
The ML Sample population is
also more sick on average
(i.e. the prevalence of
chronic conditions is higher)
 
Characteristics of Patients in the ML sample vs.
Total Population
 
 
Feature Selection &
Engineering
 
Feature
Selection &
Engineering
 
Series of attempts with
interim features to extract
better performance…
Final set: 141 features
 
Features Used…
 
Features Used…
 
Features Used…
 
Getting to Know the Data: Diagnosis and Admissions
 
Afib (Atrial Fibrillation And Flutter), Chf (Congestive Heart Failure), Htn (Hypertension),
and Ischemic Htd (Ischemic Heart Disease) are strong indicators of potential admissions
in the following year (2017)
Patient groups with these conditions have a non-trivial (~10% likelihood) of hospital
admissions
This likelihood increases to ~20%-~30% with one 2016 hospital admission and to >50%
with 3 and more admissions in 2016
 
Single DGN
 
Pairs of DGNs
 
Evaluation & Modelling
Choices
 
ML Models Selected for Evaluation
 
Selection criteria:
Algorithms are 
readily available, easy-to-use, comprehensive 
and 
well-tested open-source 
libraries in Python (
scikit
)
Algorithms and results are 
relatively easy to describe/explain 
(common algorithms)
For interpretability and model familiarity, no attempt at exploring more complex models; no deep networks
 
Included in comparison:
Decision Tree
Random Forest and Extremely Randomized Trees (ExtraTrees)
k-Nearest Neighbors*
Gaussian Naïve-Bayes**
Logistic Regression (L1, L2)
SVM (RBF, polynomial)***
Multi-layer Perceptrons (1 hidden layer)
Adaboost (Decision Tree and Random Forest)
Gradient Boosted Trees (scikit GBT, not XGBoost)
Calibrated (isotonic) variations of above classifiers
Neural Networks
 
Eventually excluded: *kNN for execution time and memory requirements, **NB for weak performance, and ***SVMs for very slow
training (but considered for final paper)
 
Evaluation metrics
 
Variable to be predicted: 
Yes/No 
hospital admission in 2017
Use data from 2011-2016
We deal with an unbalanced sample (i.e. 7.5% of patients had an
admission in 2017)
Appropriate metrics of model performance in an unbalanced dataset:
Precision, Recall, ROC curve and area under the curve (AUC)
(Problem-specific custom metric to penalize mistakes) for one type of error
more heavily: cost of a false positive (cost of ECM) vs. cost of
a missed positive (cost of subsequent hospitalization)
 
Different ML models have different strengths, but differences should
not be huge
 
Intuitive Interpretation of Metrics
 
Precision
 is the probability that a patient classified as a patient with a
hospital admission by an algorithm is actually going to have a hospital
admission.
Recall
 is the probability that a patient who is going to have a hospital
admission is being classified as such by an algorithm.
 
Which one is more important?
It depends a lot on the application. There is a tradeoff between
maximizing either of them…
 
Future: Deriving a Custom Score with Cost Data
 
We can represent savings in terms of true positives (TP), false positives (FP), true negatives (TN)
and false negatives (FN):
 
 
Savings = cost under status quo – cost under model
  
 = [(TP + FN)*c
t
] – [(TP + FP)*c
p
 + FN*c
t
 + TP*(1 – e
p
)*c
t
)]
  
  
 = TP* c
p
*(e
p
*k – 1) – FP*c
p
 
c
p
 – per patient annual average cost for ECM enrollment
 
c
t 
– the 
k
 
multiple of 
c
p 
– the average annual cost of hospital admission(s) per patient
 
e
p 
– the impact of ECM enrollment on hospital admissions (decrease in probability)
 
We can convert the previous calculation into a score with a maximum positive value of 1 by
normalizing over a maximum value
  
Savings coefficient = [TP* (e
p
*k – 1)– FP] / [(TP + FN)* (e
p
*k – 1)]
 
 
 
Future: Custom Evaluation Score Based on Cost Data
 
A hypothetical exercise (not all the benefits of ECM are being captured)
Hypothetical cost/savings assumptions (based on historical data, references
from the literature):
ECM Prevention-to-treatment cost ratio is 1:30
Impact of ECM enrollment on hospital admissions (decrease in probability) is 10%, 15%, to
20%
 
Model Implementation Approach
 
Dataset
Size: 
~610k records, randomly re-shuffled
Split: 
80-20 split, so ~490k training records and ~120k for testing
Highly unbalanced: 
only 7.5% positive samples
 
Algorithms: 
optimized via cross-validated parameter grid search
 
Parameter grid:
Parameters and values are based on known useful combinations, and trials on small sets, p
arameter 
grid
is limited to max two parameters per model (to manage execution time growth)
 
Cross validation:
5-fold CV over training set, stratified to maintain target variable imbalance
CV scoring metric: 
Log loss and custom cost-sensitive metric
 
Benchmarks:
‘Expert’ algorithm developed for the same problem/dataset (see above, slide 5)
Random selection of patients (using the prevailing positive case rate in the training set, 7.5%)
 
First Results
 
Precision and Recall for Log Loss Models
 
ML models 
outperform
 the benchmark on 
precision
, but 
lag
 behind on
recall
ML models have difficulty identifying all positive cases (i.e. patients with an
admission). Most positive samples have low probabilities of being a positive
case – a typical consequence of highly unbalanced datasets.
But classification above the 50% threshold is highly accurate (few false
alarms)
 
ROC and ROC-AUC for Log Loss Models
 
ROC curves are closely clustered
Optimized ROC curves are very close
to each other (suggesting reasonably
effective optimization)
Decision Tree based algorithms tend
to have lower AUC
 
Comparison to the Expert Benchmark ROC curve is
not possible as the benchmark model does not
produce probability estimates
 
Summary So Far…
 
Performance of ML models are 
promising
, in line with known
expectations (close to 75% AUC) and beats benchmark on precision
But clear 
weakness
 on recall (i.e., Patients with a high chance of a hospital
admission are not detected consistently)
 
Results on 
original
 dataset have room for improvement
Why the sub-par classification capacity?
 
Next Round of Results…
 
Dealing with an Unbalanced Dataset…
 
How difficult is prediction using
standard ML models on the original
unbalanced
 dataset?
Example: Random forest
Quite difficult, the distributions of class
predictions are not separable
All models are consistently putting low
estimates for positive samples, well below
the 50% threshold (poor recall)
Almost total overlap (see the red and
green distributions on the right)
Change of 
classification threshold (default
at 50%) does not help
 
Dealing with an Unbalanced Dataset…
 
Improve results via:
better features (possible, as a follow-up phase)
more complex models (possible, as a follow-up phase)
or 
influence training 
directly?
 
Rebalancing techniques (e.g., under-sampling majority) could be
applied during 
training
Overall goal is to identify more positive cases, at (an acceptable) expense of
false positives, subject to tradeoff factors
So the accurate prediction of probabilities is not the main goal
Consider 
some
 amount of rebalancing on the training set 
only
Retain full set of minority class 
and decrease the majority class to reach ratio
No hard rule on single most effective rebalancing ratio, so several trials
 
Effect of Under-sampling
 
‘Probability’ distribution for original and under-sampled training datasets.
Predictions are over-estimated as expected
Model can now detect 
more
 positive samples than before 
(more red/positive samples above
0.50)
, thus improved recall, in exchange for some precision loss.
 
Results for Models Based on Resampling
 
 
Next Round of Results…
 
More complex models: Preliminary Results for
Neural Networks
 
Preliminary results are from a run
of a Neural Network algorithm.
neural network
 is a series of
algorithms that endeavors to
recognize underlying
relationships in a set of data
through a process that mimics
the way the human brain
operates.
 
More complex models: Preliminary Results for
Neural Networks
 
Preliminary results are from a run
of a Neural Network algorithm.
The model detects much more
positive samples than before (more
red/positive samples above 0.50)…
 
More complex models: Preliminary Results for
Neural Networks II
 
…The model recall is 69% and
precision is 13% (outperforming the
expert reference model on both
measures)
The resulting ROC-AUC is comparable
to the one of the top high-precision
models presented above (i.e. 0.73)
We now have a 
high-precision
 and a
high-recall
 model…
 
C
o
m
b
i
n
i
n
g
 
d
i
f
f
e
r
e
n
t
 
m
o
d
e
l
s
:
 
E
n
s
e
m
b
l
e
s
 
Different classifiers have different misclassification rates
Crucially, 
a few models misclassify samples that other models get
right
, so taking an average over several classifiers might improve
results
Create a model that ensembles multiple classifiers to reduce
prediction variance
Hard voting: every individual classifier votes for a class, and the majority wins.
Soft voting: every individual classifier provides a probability value that a
specific data point belongs to a particular target class. The predictions are
weighted by the classifier's importance and summed up.
 
C
o
m
b
i
n
i
n
g
 
d
i
f
f
e
r
e
n
t
 
m
o
d
e
l
s
:
 
E
n
s
e
m
b
l
e
s
 
 
C
o
m
b
i
n
i
n
g
 
d
i
f
f
e
r
e
n
t
 
m
o
d
e
l
s
:
 
E
n
s
e
m
b
l
e
s
 
A Soft Voting Ensemble Model
There is no significant advantage
to ensembling in this sample,
but there is indeed a small gain.
 
Conclusions
-
ML vs. the Old Approach
 
Which Patients Do the ML Models Identify?
 
Which Patients Do the ML Models Identify?
 
Comparisons – Old Approach and the Literature
 
The ML models are better than the old approach at predicting
hospital admissions (but this is only one aspect/one goal of ECM)
Results are comparable to best performing results from the literature
-> John Hopkins Adjusted Clinical Groups (the leading proprietary risk
stratification tool) - 
Haas et al.; Risk-Stratification Methods for
Identifying Patients for Care Coordination; The American Journal of
Managed Care (September 2013)
Predicting hospital admissions still remains a hard problem…
 
How to use ML techniques for ECM?
 
Use ML instead of the old approach or combine them?
Both approaches have advantages and disadvantages:
Mainly interpretability vs. prediction performance
What is the objective of ECM?
How long is ECM enrolment going to be for a patient?
 
Use ML in addition to the old algorithm?
Implement ML predictions for other purposes than hospital admissions
Use dashboards as a chance to give more frequent feedback for GPs
Move from retrospective feedback to forward-looking information sharing for
better decision making by care teams
 
Some More Observations
 
Improving the ML models: additional data on social status and
conditions of patients is key
Updating the models based on new available information (every 3, 6, 12
months) can improve performance considerably
More trials and evaluations offer more chances for model improvement
 
Implementation: 
Analysis was carried out using Python. All codes will be
made available and can be adapted.
Data cleaning and preparation is a lengthy process…
Running the more advanced models takes some time. The availability of
multiple/scalable computing resources is key…
 
 
Other Potential ML Applications at EHIF
 
Predicting costs per patient and which patients are going to be the
high-cost patients in the next year
Predicting volumes of care services at different providers
Predicting which patients on a waiting list can benefit the most from a
given surgery (see above example)
Unsupervised machine learning: Identifying provider fraud and outlier
providers (in terms of their performance or their care provision)
Slide Note
Embed
Share

Explore the use of machine learning for risk stratification of patients with non-communicable diseases in Estonia. This study showcases the application of big data and machine learning in healthcare, emphasizing the benefits of personalized care, proactive disease prevention, and efficient interventions based on patient data.

  • Machine Learning
  • Health Care
  • Risk Stratification
  • Big Data
  • Estonia

Uploaded on Aug 14, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. MACHINE LEARNING FOR IMPROVED RISK STRATIFICATION OF NCD PATIENTS IN ESTONIA Big Data and Machine Learning in Health Care Big Data and Machine Learning in Health Care Marvin Ploetz Philip Docena Ojaswi Pandey Aakash Mohpal 23 April, 2019

  2. Objectives Propose an alternative - machine learning based - approach to patient risk stratification for ECM in Estonia Illustrate the use and applicability of machine learning to other areas of work relevant to EHIF

  3. Big Data and Machine Learning in Health Care Machine Learning Basics Context of ECM Research Question Data Overview & Sample Construction Feature Engineering Evaluation & Modelling Choices Results Conclusions Overview

  4. Big Data and Machine Learning in Health Care

  5. Big Data and Machine Learning in Health Care Big Data and Machine Learning in Health Care Vital signs Activity data Behavioral data Nutritional data EMR Clinical notes Medical images Genome data Take advantage of massive amounts of data and provide the right intervention to the right patient at the right time Patients Providers Personalized care to the patient Other Payers Potentially benefit all agents in the health care system: patient, provider, payer, management stakeholders Public health Pharma companies and drug discoveries Claims and billing Approvals and denials Population health and risk

  6. Uses of Machine Learning in Health Care Uses of Machine Learning in Health Care Improve care and efficiency, lower costs Personalized medicine Benefits Proactively prevent diseases Assist diagnostics Right patient Patients Big data and machine learning analytics in health care Right intervention Providers Improve clinical trials Predict disease risk Right time Payers Study population health Find cures for conditions

  7. Example 1: Hip and knee replacement in the US Example 1: Hip and knee replacement in the US Osteoarthritis: a common and painful chronic condition Often requires replacement of hip and knees More than 500,000 Medicare beneficiaries receive replacements each year Medical costs: roughly $15,000 per surgery Medical benefits: accrue over time, since some months after surgery is painful and spent in disability Therefore, a joint replacement only makes sense if you will live long enough to enjoy it. If you die soon after, could be futile and painful Prediction/classification problem: Can we predict which surgeries will be futile using only data available at the time of surgery?

  8. Example 1: Hip and knee replacement in the US Example 1: Hip and knee replacement in the US 3,305 independent variables Train data 65,395 observations 98,090 had hip or knee replacement in 2010 20% of 7.4 million beneficiaries Model to predict riskiest patients Test data 32,695 observations 1.4% died within one month of surgery 4.2% died within 1-12 months Traditional Analysis: About Averages Big Data and ML Analytics: Predict Individual Risks

  9. Example 1: Hip and knee replacement in the US Example 1: Hip and knee replacement in the US Predicted Mortality Percentile Observed mortality rate Futile procedures averted Futile spending ($ millions) 1 43.5% 1,984 30 2 42.2% 3,844 58 5 35.8% 8,061 121 10 24.2% 10,512 158 20 15.2% 12,317 185 30 13.6% 16,151 242 The first column sorts the test sample by risk percentiles. In the top 5th percentile riskiest population, the observed mortality rate within 1 year within 1-12 months was 43.5%. Reallocating these surgeries to those with median risk level (50th percentile) would have averted 1,984 futile procedures, and reallocated $30m to other beneficiaries.

  10. Example 2: Diagnoses of pediatric conditions Example 2: Diagnoses of pediatric conditions Apply natural language processing algorithms to extract data from EHRs Extract 101.6m data points from 1.3m EHRs of pediatric patients High diagnostic accuracy among multiple organ systems and comparable to performance of experienced pediatric physicians

  11. Example 2: Diagnoses of pediatric conditions Example 2: Diagnoses of pediatric conditions

  12. Example 3: Breast cancer screening Example 3: Breast cancer screening Most common form of cancer afflicting 2.5 million patients worldwide in 2015 Need to distinguish malignant tumors from benign ones Early detection is key Data: 62,219 mammography findings from the Wisconsin State Cancer Reporting System A Neural Network based algorithm does as well as radiologists in classifying the tumors

  13. Machine Learning Basics

  14. Definition of Big Data Definition of Big Data Collection of large and complex data sets which are difficult to process using common database management tools or traditional data processing applications Volume Big Data Not only about size: finding insights from complex, noisy, heterogeneous, and longitudinal data sets This includes capturing, storing, searching, sharing and analyzing Variety Velocity

  15. Types of Machine Learning Problems Types of Machine Learning Problems 1) Supervised Making predictions using labeled/structured data Classification: use data to predict which category something falls into Examples: If an image contains a store front or not; If a patient is high risk or not Regression: use data to make predictions on a continuous scale Examples: Predict stock price of a company; given historical data, what will the temperature be tomorrow 2) Unsupervised Detecting patterns from unstructured data Problems where we have little or no idea what the results should look like Provide algorithms with data and ask to look for hidden features and cluster the data in a way it makes sense Examples: identify patterns from genomics data, separating voice from noise in audio files

  16. Machine Learning Implementation Machine Learning Implementation Standardize and clean data Build model using train data Split data in test/train Collect data Validate model results using test data Data Build Machine Learning Model Train data Model Results Data 80% Feature engineering / Data construction Data Test data 20% Data

  17. Assessing Model Performance: Precision and Recall Assessing Model Performance: Precision and Recall Actual Condition/Outcome True False Accuracy = (TP+TN)/All Precision = TP/(TP+FP) Recall = TP/(TP+FN) Condition/Outcome True True Positive (TP) False positive (FP) Predicted False False negative (FN) True negative (TN)

  18. Assessing Model Performance: Precision and Recall Assessing Model Performance: Precision and Recall Case I: High recall, low precision Case II: Low recall, high precision Actual Actual True False True False 100 TP 45 FP 90 TP 5 FP True True Predicted Predicted 5 80 TN 35 FN 100 TN False False FN Accuracy = 150/165 = 78% Precision = 100/145 = 69% Recall = 100/105 = 95% Accuracy = 190/230 = 83% Precision = 90/95 = 95% Recall = 90/125 = 72%

  19. Assessing Model Performance: ROC Curve Assessing Model Performance: ROC Curve Plot the true and false positive rate for every classification threshold A perfect model has a curve that passes through the upper left corner (AUC = 1) The diagonal (red line) represents random guessing (AUC = 0.5)

  20. Decision Tree: Playing Golf Decision Tree: Playing Golf A non-parametric supervised Outlook Temperature Humidity Windy Play Golf Rainy Hot High False No learning method used for Rainy Hot High True No classification and regression Overcast Hot High False Yes Built in the form a tree structure Sunny Mild High False Yes Sunny Cool Normal False Yes Breaks data down in smaller and Sunny Cool Normal True No smaller subsets while incrementally Overcast Cool Normal True Yes building tree Rainy Mild High False No Final result is tree with decision Rainy Cool Normal False Yes Sunny Mild Normal False Yes nodes and leaf nodes

  21. Decision Tree: Playing Golf Decision Tree: Playing Golf Outlook Rainy Overcast Sunny No Golf Golf Windy False True Play Golf No Golf

  22. Decision tree to Random Forest Decision tree to Random Forest A collection of decision trees whose results are aggregated into one final output Use different sub-samples of the data and different set of features Helps reduce overfitting, bias and variance

  23. Context of ECM

  24. A Big Challenge of the Estonian Healthcare System Changes in the demand for health care due to population ageing and rise of non-communicable diseases Chronic conditions as the driving force behind needs for better care integration Low coverage of preventive services and considerable share of avoidable specialist and hospital care Opportunity to improve management of specific patient groups at the PHC level -> care management for empaneled patients Prediction for which patients breaches in care coordination will occur -> risk-stratification of patients

  25. DM/ Hypertension/ Hyperlipidemia No Yes Not eligible Risk Stratification Until Now Min. and Max. Number/Combination of: CVD/ Respiratory/ Mental Health/ Functional Impairment No Yes No actual prediction analysis done Involvement of providers to gain trust/understanding Behavioral and social criteria are key, but sparsely available -> use insider knowledge of doctors Not eligible Dominant/complex condition (cancer, schizophrenia, rare disease etc.) No Yes Review by GPs (Behavioral & social factors, information not in data) Not eligible No Yes Not eligible ECM Candidate

  26. Enhanced Care Management So Far In Estonia Successful enhanced care management pilot with 15 GPs and < 1,000 patients to assess the feasibility and acceptability of enhanced care management Commitment of the Estonian Health Insurance Fund (EHIF) to scale-up the care management pilot Model for risk stratification: - Clinical algorithm + provider intuition Need for a better risk-stratification approach!?

  27. Research Question

  28. The Prediction Problem Target patients - Who benefits from care management? A combination of disease, social and behavioral factors Objective of ECM - Ultimately improve health outcomes for patients with cardio-vascular, respiratory, and mental disease. What is the right proxy prediction variable in the data? There is not one single relevant adverse event (e.g. death, hospital admission, health complication, high healthcare spending) Some discussions on how to choose the dependent variable -> Unplanned hospital admissions have a large negative impact on patient lives, are costly and relatively frequent. Some are also avoidable

  29. Patients with an Admission in 2011 - Subsequent Hospitalization Rates Many Patients Repeatedly Have Hospitalizations 25 Percentage of patients who were 22.9 20.2 20 18.8 hospitalized in 2011 17.7 16.3 15 13.6 22 percent of patients need to be hospitalized again in the following year 9.3 10 5 0 One year later Two years later Three years later Four years later Five years later Six years later Seven years later

  30. Average costs (in Euros, s) in different types of care in 2016 Hospitalizations account for a bulk of healthcare costs 167.63 Inpatient Care 148.17 Outpatient Care 37.72 Day Care 22.53 PHC 10.01 Inpatient Nursing Care 6.41 Outpatient Rehabilitation Care 6.01 Inpatient Rehabilitation Care 4.94 Outpatient Nursing Care ML Sample (N=712,104) General Population (N=1,0260,630)

  31. Predicting Hospital Admissions Hospital admissions are the main (avoidable) adverse health event But predicting hospitalizations is a hard problem Social factors matter a lot, patients may have a lot or no contacts with the healthcare systems at all Tradeoff to choose which hospitalizations we want to predict Admissions due to specific conditions vs. hospitalizations in general

  32. Predicting Hospital Admissions Hospital Admissions Excluded ICD-10 Chapter Title A00-B99 Certain infectious and parasitic diseases Key question C00-D48 Neoplasms Not What is the best algorithm for predicting hospital admissions? O00-O99 Pregnancy, childbirth and the puerperium P00-P96 Certain conditions originating in the perinatal period But How can we obtain the most useful prediction of hospital admissions for a specific purpose? S00-T98 Injury, poisoning V01-X59 Accidents

  33. Data Overview & Sample Construction

  34. Administrative Claims Data (in Estonia) Very reliable High-quality data availability as of 2007/2008 Comprehensive coding requirements for providers Reporting lag of data is on average 2 weeks No info on clinical outcomes (i.e. test results) Limited information on social conditions and behavioral characteristics Need for a lot of feature engineering to create meaningful variables at the patient level

  35. Description of Available Data Administrative Beneficiary Family Doctor Patient-Year Level Types of Care 1. 2. 3. 4. 5. 6. 7. 8. Day Care Inpatient Care Inpatient Nursing Care Inpatient Rehabilitation Care Outpatient Care Outpatient Nursing Care Outpatient Rehabilitation Care Primary Health Care Utilization, Diagnosis, Procedures (Surgical and Other) Medications Prescriptions and Filling of Prescriptions

  36. Patient Cohort Selection for the ML Analysis

  37. Characteristics of Patients in the ML sample vs. Total Population Age distribution of the population in the data Gender Distribution 12 60 59 58 10 57 Percentage of population 56 8 55 54 6 53 52 4 51 % of Women 2 General Pop. ML Sample 0 18-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70-74 75-79 80-84 85+ Percentage of population Percentage of sample Relative to the population, the ML sample is older and more likely to be female.

  38. Characteristics of Patients in the ML sample vs. Total Population AVERAGE COSTS (IN ) FOR PRESCRIPTIONS PRESCRIBED TO PATIENTS IN 2016 General Population ML Sample 375.99 Insurance type 1 = General 49.39 43.63 EUROS ( S) 180.21 2 = Unemployed 2.18 2.04 141.58 3 = Pensioner 28.94 37.59 75.01 4 = Disabled 9.67 12.51 5 = Welfare 0 0.00 TOTAL PRICE OF PRESCRIPTIONS TOTAL PRICE SHARE OF PRESCRIPTIONS BY PATIENTS 6 = Widow 0.28 0.07 ML Sample (N=712,104) Uninsured 9.54 4.15 ML Sample, conditional on patients being hospitalized at least once in 2016

  39. Top-20 Chronic Conditions Percentages of Patients With Condition Hypertension 48.1 Joint Arthrosis 31.1 Most Common Chronic Conditions Hyperlipidemia 26.7 Chronic Gastritis/GERD 25.0 Congestive Heart Failure 15.7 Neuropathies 15.2 Thyroid Diseases 14.5 Mood Disorders 14.3 Ischemic Heart Disease 13.8 The ML Sample population is also more sick on average (i.e. the prevalence of chronic conditions is higher) Cardiac arrhythmias 13.0 Dizziness 11.4 Obesity 10.9 Diabetes Mellitus 10.8 Anemia 10.2 Migraine 9.75 Hemorrhoids 9.59 Vision And Hearing Impairments 8.87 COPD 8.53 Stroke 8.41 Asthma 8.10 General Population (N=1,260,630) ML Sample (N=712,104)

  40. Characteristics of Patients in the ML sample vs. Total Population Percentage of people living in a given county (%) Name of county Harju Saare Tartu J rva Rapla P rnu L ne Viljandi Hiiu L ne-Viru J geva P lva V ru Ida-Viru Valga Poverty rate (%) 1 = 12.6 2 = 12.6-15.8 3 = 15.8-17.63 3 = 15.8-17.63 4 = 17.63-18.3 5 = 18.3-21.7 5 = 18.3-21.7 5 = 18.3-21.7 5 = 18.3-21.7 5 = 18.3-21.7 6 = 21.7-24.7 6 = 21.7-24.7 7 = 24.7-25.1 8 = 25.1-26.9 8 = 25.1-26.9 General Population ML sample 43.15 2.65 11.22 2.39 2.55 6.63 1.62 3.75 0.78 4.65 2.38 2.07 2.85 11.05 2.25 43.23 2.53 10.86 2.4 2.45 6.53 1.61 3.7 0.75 4.61 2.34 2.06 2.8 11.88 2.24

  41. Feature Selection & Engineering

  42. Feature Selection & Engineering Series of attempts with interim features to extract better performance Final set: 141 features

  43. Features Used Feature Categories 1. Healthcare utilization Features Total number of hospital admissions Inpatient Admissions Inpatient Nursing Admissions Inpatient Rehab Admissions Total number of hospital stay days Stay days in Inpatient Care Stay days in Inpatient Nursing Care Stay days in Inpatient Rehabilitation Care Total number of PHC visits Total number of specialist visits Outpatient specialist visits Outpatient rehabilitation specialist visits Total number of surgeries Emergency surgeries Surgeries that took between 1 and 3 hours Respiratory surgeries Whether lab tests were done Cholesterol Fractions Cholesterol Glucose Total number of prescriptions

  44. Features Used Feature Categories 2. Health Status Features Total number of major chronic conditions Any of - Joint arthrosis, Chronic gastritis Whether a patient had major chronic condition Hypertension Diabetes Mellitus Hyperlipidemia COPD Asthma Dementia Vision And Hearing Impairments Prescriptions Diabetic agents Diuretics NSAIDs Anticoagulants Antiplatelets Antihypertensives Antidepressants Narcotics Total price of prescriptions (In Euros) Total out-of-pocket expenditures of prescriptions by patients (In Euros)

  45. Features Used Feature Categories 4. Socioeconomic Status Features Feature Categories 3. Patient Behavior Features Insurance status % of prescriptions picked up by patients 1 = General 2 = Unemployed 3 = Pensioner 4 = Disabled 5 = Welfare 6 = Widow Uninsured Feature Categories 5. Quality of care received Features Total number of family doctors utilized across time Admission rates of GPs, standardized by age and gender of patients in the patient list Compliance with diabetes guidelines by PHC doctor Poverty rates at the county level 1 = 12.6 2 = 12.6-15.8 3 = 15.8-17.63 4 = 17.63-18.3 5 = 18.3-21.7 6 = 21.7-24.7 7 = 24.7-25.1 8 = 25.1-26.9 All tests done No tests done

  46. Getting to Know the Data: Diagnosis and Admissions Single DGN Pairs of DGNs Afib (Atrial Fibrillation And Flutter), Chf (Congestive Heart Failure), Htn (Hypertension), and Ischemic Htd (Ischemic Heart Disease) are strong indicators of potential admissions in the following year (2017) Patient groups with these conditions have a non-trivial (~10% likelihood) of hospital admissions This likelihood increases to ~20%-~30% with one 2016 hospital admission and to >50% with 3 and more admissions in 2016

  47. Evaluation & Modelling Choices

  48. ML Models Selected for Evaluation Selection criteria: Algorithms are readily available, easy-to-use, comprehensive and well-tested open-source libraries in Python (scikit) Algorithms and results are relatively easy to describe/explain (common algorithms) For interpretability and model familiarity, no attempt at exploring more complex models; no deep networks Included in comparison: Decision Tree Random Forest and Extremely Randomized Trees (ExtraTrees) k-Nearest Neighbors* Gaussian Na ve-Bayes** Logistic Regression (L1, L2) SVM (RBF, polynomial)*** Multi-layer Perceptrons (1 hidden layer) Adaboost (Decision Tree and Random Forest) Gradient Boosted Trees (scikit GBT, not XGBoost) Calibrated (isotonic) variations of above classifiers Neural Networks Eventually excluded: *kNN for execution time and memory requirements, **NB for weak performance, and ***SVMs for very slow training (but considered for final paper)

  49. Evaluation metrics Variable to be predicted: Yes/No hospital admission in 2017 Use data from 2011-2016 We deal with an unbalanced sample (i.e. 7.5% of patients had an admission in 2017) Appropriate metrics of model performance in an unbalanced dataset: Precision, Recall, ROC curve and area under the curve (AUC) (Problem-specific custom metric to penalize mistakes) for one type of error more heavily: cost of a false positive (cost of ECM) vs. cost of a missed positive (cost of subsequent hospitalization) Different ML models have different strengths, but differences should not be huge

  50. Intuitive Interpretation of Metrics Precision is the probability that a patient classified as a patient with a hospital admission by an algorithm is actually going to have a hospital admission. Recall is the probability that a patient who is going to have a hospital admission is being classified as such by an algorithm. Which one is more important? It depends a lot on the application. There is a tradeoff between maximizing either of them

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#