Advanced Data Analysis Techniques for Imbalanced Multi-Class Classification

Slide Note
Embed
Share

The SAMME.C2 algorithm addresses severely imbalanced multi-class classification problems by utilizing boosting techniques such as AdaBoost and cost-sensitive learning. Through numerical experiments and performance statistics, the algorithm shows the trade-off between accurately classifying minority classes and sacrificing accuracy in majority classes to improve overall predictive power.


Uploaded on Sep 17, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Tcnicas avanadas de anlise de dados The SAMME.C2 algorithm for severely imbalanced multi-class classification Jo o Silva, n 2017255149

  2. Introduction Introduction Several machine learning classification problems involve imbalanced class distributions (minority classes have much fewer observations to learn from than those from majority classes), where minority classes are often considered the most interesting ones The authors present a solution based on multi-class classification algorithm specialized to handle severely imbalanced classes based on a method called SAMME.C2.

  3. Boosting: powerful technique, that is based on training a sequence of weak models into a strong learner in order to improve predictive power. AdaBoost: specific boosting technique primarily developed for classification, a class of adaptive boosting algorithms. SAMME (Stagewise Additive Modeling using a Multi-class Exponential Loss Function): avoids computational inefficiencies without the multiple binary problem. In order to further improve prediction within an imbalanced classification, we can implement cost-sensitive learning algorithms that take costs into consideration Ada.C2: a method of AdaBoost, for binary classification, that combines cost-sensitive learning. The cost values, estimated as hyperparameters, are additional inputs to the learning procedure and are generally used to reduce misclassification costs, which attach penalty to predictions that lead to significant errors. SAMME SAMME.C2 Ada.C2

  4. Numerical experiments Numerical experiments N of samples :100000 N features: 50 N of classes: 3 Classes ratios: 90%,9%,1% Simulated dataset Class_sep=1 :high classification difficulty Class_sep=1.5 :medium classification difficulty Class_sep=2 :low classification difficulty Class_sep

  5. Performance statistics: -Accuracy: proportion of all observations that were correctly classified irrelevant for imbalance datasets! -Recall : some times called the sensitivity, for class i, ??, is defined to be the proportion of observations in class i correctly classified more interesting measure for imbalanced classification! Training purpose of the SAMME algorithm is to reduce test error rates. As the level of classification task increases, to correctly classify observations in the minority class, SAMME.C2 has to correspondingly reduce accuracy of observations in the majority class. This is a very important result because when observations in the minority class for severely imbalanced datasets are extremely difficult to classify, SAMME assigns nearly all observations in the majority class. In order to have a higher accuracy for minority class, SAMME.C2 has to sacrifice accuracy for majority class!

Related


More Related Content