ROC Curves in Multiclass Classification

ROC curves extended to
multiclass classification, and
how they do or do not map to
the binary case
Mario Inchiosa
Microsoft AI Platform
BARUG “ROC Day”
November 10, 2020
Binary, Multiclass, Multilabel
Binary classification
Malignant vs. benign tumor
Dog vs. cat
Multiclass classification
Stage 1, 2, 3, or 4 cancer
Dog, cat, monkey, squirrel
Multilabel classification
#woman, #pool
#dog, #boy, #ball, #beach, #sky
Binary Classification ROC
TPR(t)
Hit rate
Probability of Detection
Probability that a randomly drawn 
Positive
 case will be classified Positive when the
threshold is set to t
FPR(t)
Fall-out
Probability of False Alarm
Probability of Type I error
Probability that a randomly drawn 
Negative
 case will be classified Positive when the
threshold is set to t
TPR(t) vs. FPR(t)
Multiclass Classification ROC – One vs. Rest
TPR
i
(t)
Probability that a randomly drawn 
Class i
 case will be classified as Class i
when the threshold is set to t
FPR
i
(t)
Probability that a randomly drawn 
non-Class i
 case will be classified as Class i
when the threshold is set to t
Macro average: average TPR
i
(t) and FPR
i
(t) over all classes i
Weighted average: same as Macro, but weighted by class prevalence
Micro average: average over sample-class pairs
Pros and Cons – One vs. Rest
Macro average
In problems where infrequent classes are important (e.g. when all classes are equally
important), macro-averaging highlights their performance
However, if infrequent classes are not so important, macro-averaging will over-
emphasize the typically low performance on an infrequent class
Good when infrequent classes are important
Weighted average
Infrequent classes will make very little contribution to the weighted average
Good when infrequent classes are not so important
Micro average
“Macro-averaging gives equal weight to each class, whereas micro-averaging gives equal weight to each per-
document classification decision” [Van Asch]
The above metrics are not independent of class prevalence
They don’t reduce to the Binary Classification ROC in the 2-class case
Multiclass Classification AUC – One vs. One
AUC(j|k) is the probability that a randomly drawn member of class k will
have a lower estimated probability of belonging to class j than a randomly
drawn member of class j
Independent of costs and prevalences (priors) 
Reduces to binary classification AUC when number of classes c=2 
Macro averaged:
Hand, D.J. and Till, R.J., (2001). 
A simple generalisation of the area under the ROC curve for multiple class classification
problems.
 Machine learning, 45(2), pp.171-186.
Multiclass ROC Implementations
CRAN package: multiROC
One vs. rest
CRAN package: 
HandTill2001
One vs. one
Python Scikit-learn
One vs. rest
One vs. one
Azure Machine Learning
References
Wikipedia
o
https://en.wikipedia.org/wiki/Receiver_operating_characteristic
o
https://en.wikipedia.org/wiki/Multiclass_classification
Scikit-learn
o
https://scikit-
learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html#sklearn.metrics.roc_auc_score
o
https://scikit-learn.org/stable/modules/model_evaluation.html#roc-metrics
o
https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html
o
Reference for one-vs-one: 
Hand, D.J. and Till, R.J., (2001). 
A simple generalisation of the area under the ROC curve
for multiple class classification problems.
 Machine learning, 45(2), pp.171-186.
R’s "HandTill2001" package for Hand & Till’s “M” measure that extends AUC to multiclass using One vs. One
o
https://cran.r-project.org/web/packages/HandTill2001/
R’s multiROC
o
https://cran.r-project.org/web/packages/multiROC/
Slide Note
Embed
Share

ROC curves are extended to multiclass classification to evaluate the performance of models in scenarios such as binary, multiclass, and multilabel classifications. Different metrics such as True Positive Rate (TPR), False Positive Rate (FPR), macro, weighted, and micro averages are used to analyze the classification results. Pros and cons of applying one vs. rest strategy in multiclass classification are discussed based on the importance of infrequent classes. The metrics presented are not independent of class prevalence and provide insights beyond binary classification ROC curves.

  • ROC Curves
  • Multiclass Classification
  • True Positive Rate
  • False Positive Rate
  • One vs Rest

Uploaded on Jul 17, 2024 | 3 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. ROC curves extended to multiclass classification, and how they do or do not map to the binary case Mario Inchiosa Microsoft AI Platform BARUG ROC Day November 10, 2020

  2. Binary, Multiclass, Multilabel Binary classification Malignant vs. benign tumor Dog vs. cat feat1 feat2 label 0 or 1 Multiclass classification Stage 1, 2, 3, or 4 cancer Dog, cat, monkey, squirrel feat1 feat2 label 0 or 1 or 2 or 3 Multilabel classification #woman, #pool #dog, #boy, #ball, #beach, #sky feat1 feat2 labels 0, 1, 2, 3,

  3. Binary Classification ROC TPR(t) Hit rate Probability of Detection Probability that a randomly drawn Positive case will be classified Positive when the threshold is set to t FPR(t) Fall-out Probability of False Alarm Probability of Type I error Probability that a randomly drawn Negative case will be classified Positive when the threshold is set to t TPR(t) vs. FPR(t)

  4. Multiclass Classification ROC One vs. Rest TPRi(t) Probability that a randomly drawn Class i case will be classified as Class i when the threshold is set to t FPRi(t) Probability that a randomly drawn non-Class i case will be classified as Class i when the threshold is set to t Macro average: average TPRi(t) and FPRi(t) over all classes i Weighted average: same as Macro, but weighted by class prevalence Micro average: average over sample-class pairs

  5. Pros and Cons One vs. Rest Macro average In problems where infrequent classes are important (e.g. when all classes are equally important), macro-averaging highlights their performance However, if infrequent classes are not so important, macro-averaging will over- emphasize the typically low performance on an infrequent class Good when infrequent classes are important Weighted average Infrequent classes will make very little contribution to the weighted average Good when infrequent classes are not so important Micro average Macro-averaging gives equal weight to each class, whereas micro-averaging gives equal weight to each per- document classification decision [Van Asch] The above metrics are not independent of class prevalence They don t reduce to the Binary Classification ROC in the 2-class case

  6. Multiclass Classification AUC One vs. One Macro averaged: AUC(j|k) is the probability that a randomly drawn member of class k will have a lower estimated probability of belonging to class j than a randomly drawn member of class j Independent of costs and prevalences (priors) Reduces to binary classification AUC when number of classes c=2 Hand, D.J. and Till, R.J., (2001). A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine learning, 45(2), pp.171-186.

  7. Multiclass ROC Implementations CRAN package: multiROC One vs. rest CRAN package: HandTill2001 One vs. one Python Scikit-learn One vs. rest One vs. one Azure Machine Learning

  8. References Wikipedia o https://en.wikipedia.org/wiki/Receiver_operating_characteristic o https://en.wikipedia.org/wiki/Multiclass_classification Scikit-learn o https://scikit- learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html#sklearn.metrics.roc_auc_score o https://scikit-learn.org/stable/modules/model_evaluation.html#roc-metrics o https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html o Reference for one-vs-one: Hand, D.J. and Till, R.J., (2001). A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine learning, 45(2), pp.171-186. R s "HandTill2001" package for Hand & Till s M measure that extends AUC to multiclass using One vs. One o https://cran.r-project.org/web/packages/HandTill2001/ R s multiROC o https://cran.r-project.org/web/packages/multiROC/

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#