Measures of Model Goodness in Classification

Measures of Model Goodness in Classification
Slide Note
Embed
Share

In this learning material, metrics for classifiers are explored, focusing on the accuracy measure as a common but limited metric. The concept of accuracy is discussed alongside its drawbacks when there is uneven assignment to categories, leading to a presentation of the Kappa statistic as a more nuanced evaluation tool. Examples and illustrations aid in understanding how these metrics are computed and their implications for assessing model performance.

  • Model Goodness
  • Classification Metrics
  • Accuracy
  • Kappa
  • Classifier

Uploaded on Mar 01, 2025 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Week 2 Video 2 Diagnostic Metrics, Part 1

  2. Different Methods, Different Measures Today we ll focus on metrics for classifiers Later this week we ll discuss metrics for regressors And metrics for other methods will be discussed later in the course

  3. Metrics for Classifiers

  4. Accuracy

  5. Accuracy One of the easiest measures of model goodness is accuracy Also called agreement, when measuring inter-rater reliability # of agreements total number of codes/assessments

  6. Accuracy There is general agreement across fields that accuracy is not a good metric

  7. Accuracy Let s say that my new Kindergarten Failure Detector achieves 92% accuracy Good, right?

  8. Non-even assignment to categories Accuracy does poorly when there is non-even assignment to categories Which is almost always the case Imagine an extreme case 92% of students pass Kindergarten My detector always says PASS Accuracy of 92% But essentially no information

  9. Kappa

  10. Kappa (Agreement Expected Agreement) (1 Expected Agreement)

  11. Computing Kappa (Simple 2x2 example) Detector Off-Task Detector On-Task Data Off-Task 20 5 Data On-Task 15 60

  12. Computing Kappa (Simple 2x2 example) Detector Off-Task Detector On-Task Data Off-Task 20 5 Data On-Task 15 60 What is the percent agreement?

  13. Computing Kappa (Simple 2x2 example) Detector Off-Task Detector On-Task Data Off-Task 20 5 Data On-Task 15 60 What is the percent agreement? 80%

  14. Computing Kappa (Simple 2x2 example) Detector Off-Task Detector On-Task Data Off-Task 20 5 Data On-Task 15 60 What is Data s expected frequency for on-task?

  15. Computing Kappa (Simple 2x2 example) Detector Off-Task Detector On-Task Data Off-Task 20 5 Data On-Task 15 60 What is Data s expected frequency for on-task? 75%

  16. Computing Kappa (Simple 2x2 example) Detector Off-Task Detector On-Task Data Off-Task 20 5 Data On-Task 15 60 What is Detector s expected frequency for on-task?

  17. Computing Kappa (Simple 2x2 example) Detector Off-Task Detector On-Task Data Off-Task 20 5 Data On-Task 15 60 What is Detector s expected frequency for on-task? 65%

  18. Computing Kappa (Simple 2x2 example) Detector Off-Task Detector On-Task Data Off-Task 20 5 Data On-Task 15 60 What is the expected on-task agreement?

  19. Computing Kappa (Simple 2x2 example) Detector Off-Task Detector On-Task Data Off-Task 20 5 Data On-Task 15 60 What is the expected on-task agreement? 0.65*0.75= 0.4875

  20. Computing Kappa (Simple 2x2 example) Detector Off-Task Detector On-Task Data Off-Task 20 5 Data On-Task 15 60 (48.75) What is the expected on-task agreement? 0.65*0.75= 0.4875

  21. Computing Kappa (Simple 2x2 example) Detector Off-Task Detector On-Task Data Off-Task 20 5 Data On-Task 15 60 (48.75) What are Data and Detector s expected frequencies for off-task behavior?

  22. Computing Kappa (Simple 2x2 example) Detector Off-Task Detector On-Task Data Off-Task 20 5 Data On-Task 15 60 (48.75) What are Data and Detector s expected frequencies for off- task behavior? 25% and 35%

  23. Computing Kappa (Simple 2x2 example) Detector Off-Task Detector On-Task Data Off-Task 20 5 Data On-Task 15 60 (48.75) What is the expected off-task agreement?

  24. Computing Kappa (Simple 2x2 example) Detector Off-Task Detector On-Task Data Off-Task 20 5 Data On-Task 15 60 (48.75) What is the expected off-task agreement? 0.25*0.35= 0.0875

  25. Computing Kappa (Simple 2x2 example) Detector Off-Task Detector On-Task Data Off-Task 20 (8.75) 5 Data On-Task 15 60 (48.75) What is the expected off-task agreement? 0.25*0.35= 0.0875

  26. Computing Kappa (Simple 2x2 example) Detector Off-Task Detector On-Task Data Off-Task 20 (8.75) 5 Data On-Task 15 60 (48.75) What is the total expected agreement?

  27. Computing Kappa (Simple 2x2 example) Detector Off-Task Detector On-Task Data Off-Task 20 (8.75) 5 Data On-Task 15 60 (48.75) What is the total expected agreement? 0.4875+0.0875 = 0.575

  28. Computing Kappa (Simple 2x2 example) Detector Off-Task Detector On-Task Data Off-Task 20 (8.75) 5 Data On-Task 15 60 (48.75) What is kappa?

  29. Computing Kappa (Simple 2x2 example) Detector Off-Task Detector On-Task Data Off-Task 20 (8.75) 5 Data On-Task 15 60 (48.75) What is kappa? (0.8 0.575) / (1-0.575) 0.225/0.425 0.529

  30. So is that any good? Detector Off-Task Detector On-Task Data Off-Task 20 (8.75) 5 Data On-Task 15 60 (48.75) What is kappa? (0.8 0.575) / (1-0.575) 0.225/0.425 0.529

  31. Interpreting Kappa Kappa = 0 Agreement is at chance Kappa = 1 Agreement is perfect Kappa = -1 Agreement is perfectly inverse Kappa > 1 You messed up somewhere

  32. Kappa<0 This means your model is worse than chance Very rare to see unless you re using cross-validation Seen more commonly if you re using cross-validation It means your model is junk

  33. 0<Kappa<1 What s a good Kappa? There is no absolute standard

  34. 0<Kappa<1 For data mined models, Typically 0.3-0.5 is considered good enough to call the model better than chance and publishable In affective computing, lower is still often OK

  35. Why is there no standard? Because Kappa is scaled by the proportion of each category When one class is much more prevalent Expected agreement is higher than If classes are evenly balanced

  36. Because of this Comparing Kappa values between two data sets, in a principled fashion, is highly difficult It is OK to compare two Kappas, in the same data set, that have at least one variable in common A lot of work went into statistical methods for comparing Kappa values in the 1990s No real consensus Informally, you can compare two data sets if the proportions of each category are similar

  37. Quiz Detector Insult during Collaboration 16 Detector No Insult during Collaboration Data Insult 7 Data No Insult 8 19 What is kappa? A: 0.645 B: 0.502 C: 0.700 D: 0.398

  38. Quiz Detector Academic Suspension Detector No Academic Suspension Data Suspension 1 2 Data No Suspension 4 141 What is kappa? A: 0.240 B: 0.947 C: 0.959 D: 0.007

  39. Next lecture ROC curves A Precision Recall

More Related Content