Robust Decision Tree Induction from Unreliable Data Sources - STAIRS 2020 Presentation

Slide Note
Embed
Share

Introduction to a study focusing on Decision Tree Learning in the context of missing data, proposing Expected Information Gain to enhance robustness. The study explores background concepts, related work, and evaluates the approach using various datasets and strategies. STAIRS 2020 presentation provides insights on handling unreliable data sources for effective decision tree induction.


Uploaded on Aug 14, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Robust Decision Tree Induction from Unreliable Data Sources Christian Schreckenberger, Christian Bartelt, Heiner Stuckenschmidt STAIRS 2020 29.-30. August 2020 1

  2. Outline Introduction Background Related Work Expected Information Gain Evaluation Conclusion STAIRS 2020 29.-30. August 2020 2

  3. Introduction Missing Data is a well established and studied Problem Intralearning approaches Preprocessing approaches Our focus is on Decision Tree Learning Proposition: Expected Information Gain Takes source reliablity into account Sources with low reliability in the past, will have low reliability in the future The Goal: Increase Robustness STAIRS 2020 29.-30. August 2020 3

  4. Background Missing Data MCAR, MAR, MNAR Multiple Imputation: kNN Imputation Decision Tree Divide and conquer Split dataset until a stopping criterion is met C4.5 and Missing Data Calculation of criterion Propagation of samples during training Propagation of samples during prediction STAIRS 2020 29.-30. August 2020 4

  5. Related Work Propagate Samples with MV down both branches [Fri76] Impute most common value on the fly [CN89] Lazy Decision Trees [FKY96] Surrogate splits [BFOS84] Branch Exclusive Splits [BR18] MIA [TJH08] STAIRS 2020 29.-30. August 2020 5

  6. Expected Information Gain Information Gain: Expected Information Gain: STAIRS 2020 29.-30. August 2020 6

  7. Learning with Expected IG STAIRS 2020 29.-30. August 2020 7

  8. Evaluation Setting Six UCI ML Repo Datasets & One Synthetic Dataset Baselines C45 MV Strategy Mean Imputation/C45 kNN Imputation/C45 5-fold Cross Validation Training Data always has MV Three Scenarios for Test Data Amount of missing data in a range from 5% to 95%, using steps of 5% STAIRS 2020 29.-30. August 2020 8

  9. Evaluation Prediction with Full Data STAIRS 2020 29.-30. August 2020 9

  10. Evaluation Prediction with Missing Data STAIRS 2020 29.-30. August 2020 10

  11. Evaluation Prediction with Imputed Data STAIRS 2020 29.-30. August 2020 11

  12. Conclusion Discussion Most beneficial when data is also missing at prediction time A more accurate imputation method, provides better results Interdepedency of features is required Future Work Analyze impact of the imputation method on the result Extend to further imputation methods Work on pruning methods and stopping criterion STAIRS 2020 29.-30. August 2020 12

  13. Q&A Any Questions? Get in Contact schreckenberger@es.uni-mannheim.de STAIRS 2020 29.-30. August 2020 13

  14. REFERENCES [Fri76] Jerome H Friedman. A recursive partitioning decision rule for nonparametric classication. IEEE Trans. Comput., 26(SLAC-PUB-1573-REV):404, 1976. [CN89] Peter Clark and Tim Niblett. The CN2 induction algorithm. Machine learning, 3(4):261{283, 1989. [FKY96] Jerome H Friedman, Ron Kohavi, and Yeogirl Yun. Lazy decision trees. In AAAI/IAAI, Vol. 1, pages 717-724, 1996. [BFOS84] L Breiman, JH Friedman, R Olshen, and CJ Stone. Classication and regression trees. 1984. [BR18] Cedric Beaulac and Jerey S Rosenthal. Best: A decision tree algorithm that handles missing values. arXiv preprint arXiv:1804.10168, 2018. [TJH08] Beth Twala, MC Jones, and David J Hand. Good methods for coping with missing data in decision trees. Pattern Recognition Letters, 29(7):950{956, 2008. STAIRS 2020 29.-30. August 2020 14

More Related Content