Learn about Decision Trees and Classification through Practical Assignments

Slide Note
Embed
Share

Explore the concepts of decision trees, classification, and feature representation through practical assignments in David Kauchak's CS 158 course. Dive into examples, features, and classification models while working on assignments individually or in pairs. Stay updated with lecture notes, recordings, and office hours to enhance your understanding of machine learning algorithms.


Uploaded on Aug 30, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. DECISION TREES David Kauchak CS 158 Spring 2022

  2. Admin Assignment 1 due tomorrow (Friday) Assignment 2 out soon: start ASAP! (due next Sunday) Can (and are encouraged to) work in pairs Slack Office hours M-Th, 2:30-3:30pm, starting today (zoom link in sakai)

  3. Admin Lecture notes posted (webpage) Lecture recordings uploaded (box see sakai for link) Keep up with the reading Videos before class Class ends at 2:30

  4. Representing examples examples What is an example? How is it represented?

  5. Features examples features How our algorithms actually view the data f1, f2, f3, , fn f1, f2, f3, , fn Features are the questions we can ask about the examples f1, f2, f3, , fn f1, f2, f3, , fn

  6. Features examples features How our algorithms actually view the data red, round, leaf, 3oz, green, round, no leaf, 4oz, Features are the questions we can ask about the examples yellow, curved, no leaf, 8oz, green, curved, no leaf, 7oz,

  7. Classification revisited label examples red, round, leaf, 3oz, apple green, round, no leaf, 4oz, apple model/ classifier yellow, curved, no leaf, 8oz, banana banana green, curved, no leaf, 7oz, During learning/training/induction, learn a model of what distinguishes apples and bananas based on the features

  8. Classification revisited Apple or banana? model/ classifier red, round, no leaf, 4oz, The model can then classify a new example based on the features

  9. Classification revisited model/ classifier Apple red, round, no leaf, 4oz, Why? The model can then classify a new example based on the features

  10. Classification revisited Training data Test set label examples red, round, leaf, 3oz, apple red, round, no leaf, 4oz, ? green, round, no leaf, 4oz, apple yellow, curved, no leaf, 4oz, banana banana green, curved, no leaf, 5oz,

  11. Classification revisited Training data Test set label examples red, round, leaf, 3oz, apple red, round, no leaf, 4oz, ? green, round, no leaf, 4oz, apple yellow, curved, no leaf, 4oz, banana Learning is about generalizing from the training data banana green, curved, no leaf, 5oz, What does this assume about the training and test set?

  12. A sample data set Features Label Hour Weather Accident Stall Commute 8 AM Sunny No No Long 8 AM Cloudy No Yes Long 10 AM Sunny No No Short 9 AM Rainy Yes No Long 9 AM Sunny Yes Yes Long 10 AM Sunny No No Short 10 AM Cloudy No No Short 9 AM Sunny Yes No Long 10 AM Cloudy Yes Yes Long 10 AM Rainy No No Short 8 AM Cloudy Yes No Long 9 AM Rainy No No Short 8 AM, Rainy, Yes, No? 10 AM, Rainy, No, No? Can you describe a model that could be used to make decisions in general?

  13. Decision trees Leave At Tree with internal nodes labeled by features 10 AM 9 AM 8 AM Stall? Accident? Branches are labeled by tests on that feature No Yes Long No Yes Leaves labeled with classes Short Long Short Long

  14. Decision trees Leave At Tree with internal nodes labeled by features 10 AM 9 AM 8 AM Stall? Accident? Branches are labeled by tests on that feature No Yes Long No Yes Leaves labeled with classes Short Long Short Long Leave = 8 AM Weather = Rainy Accident = Yes Stall = No

  15. Decision trees Leave At Tree with internal nodes labeled by features 10 AM 9 AM 8 AM Stall? Accident? Branches are labeled by tests on that feature No Yes Long No Yes Leaves labeled with classes Short Long Short Long Leave = 8 AM Weather = Rainy Accident = Yes Stall = No

  16. Decision trees Leave At Tree with internal nodes labeled by features 10 AM 9 AM 8 AM Stall? Accident? Branches are labeled by tests on that feature No Yes Long No Yes Leaves labeled with classes Short Long Short Long Leave = 10 AM Weather = Rainy Accident = No Stall = No

  17. Decision trees Leave At Tree with internal nodes labeled by features 10 AM 9 AM 8 AM Stall? Accident? Branches are labeled by tests on that feature No Yes Long No Yes Leaves labeled with classes Short Long Short Long Leave = 10 AM Weather = Rainy Accident = No Stall = No

  18. To ride or not to ride, that is the question Terrain Unicycle- type Weather Go-For-Ride? Trail Normal Rainy NO Road Normal Sunny YES Trail Mountain Sunny YES Road Mountain Rainy YES Trail Normal Snowy NO Road Normal Rainy YES Road Mountain Snowy YES Trail Normal Sunny NO Road Normal Snowy NO Trail Mountain Snowy YES Build a decision tree

  19. Recursive approach Base case: If all data belong to the same class, create a leaf node with that label Otherwise: - calculate the score for each feature if we used it to split the data - pick the feature with the highest score, partition the data based on that data value and call recursively

  20. Partitioning the data Terrain Trail Road Terrain Unicycle- type Weather Go-For- Ride? ? Trail Normal Rainy NO Road Normal Sunny YES Trail Mountain Sunny YES Road Mountain Rainy YES Trail Normal Snowy NO Road Normal Rainy YES Road Mountain Snowy YES Trail Normal Sunny NO Road Normal Snowy NO Trail Mountain Snowy YES

  21. Partitioning the data Terrain Trail Road Terrain Unicycle- type Weather Go-For- Ride? ? Trail Normal Rainy NO Road Normal Sunny YES Trail Mountain Sunny YES Road Mountain Rainy YES Trail Normal Snowy NO Road Normal Rainy YES Road Mountain Snowy YES Trail Normal Sunny NO Road Normal Snowy NO Trail Mountain Snowy YES

  22. Partitioning the data Terrain Trail Road Terrain Unicycle- type Weather Go-For- Ride? Trail Normal Rainy NO YES: 4 NO: 1 Road Normal Sunny YES Trail Mountain Sunny YES Road Mountain Rainy YES Trail Normal Snowy NO Road Normal Rainy YES Road Mountain Snowy YES Trail Normal Sunny NO Road Normal Snowy NO Trail Mountain Snowy YES

  23. Partitioning the data Terrain Trail Road Terrain Unicycle- type Weather Go-For- Ride? Trail Normal Rainy NO YES: 4 NO: 1 ? Road Normal Sunny YES Trail Mountain Sunny YES Road Mountain Rainy YES Trail Normal Snowy NO Road Normal Rainy YES Road Mountain Snowy YES Trail Normal Sunny NO Road Normal Snowy NO Trail Mountain Snowy YES

  24. Partitioning the data Terrain Trail Road Terrain Unicycle- type Weather Go-For- Ride? Trail Normal Rainy NO YES: 4 NO: 1 YES: 2 NO: 3 Road Normal Sunny YES Trail Mountain Sunny YES Road Mountain Rainy YES Trail Normal Snowy NO Road Normal Rainy YES Road Mountain Snowy YES Trail Normal Sunny NO Road Normal Snowy NO Trail Mountain Snowy YES

  25. Partitioning the data Terrain Trail Road Terrain Unicycle- type Weather Go-For- Ride? Trail Normal Rainy NO YES: 4 NO: 1 YES: 2 NO: 3 Road Normal Sunny YES Trail Mountain Sunny YES Unicycle Road Mountain Rainy YES Normal Mountain Trail Normal Snowy NO Road Normal Rainy YES ? ? Road Mountain Snowy YES Trail Normal Sunny NO Road Normal Snowy NO Trail Mountain Snowy YES

  26. Partitioning the data Terrain Trail Road Terrain Unicycle- type Weather Go-For- Ride? Trail Normal Rainy NO YES: 4 NO: 1 YES: 2 NO: 3 Road Normal Sunny YES Trail Mountain Sunny YES Unicycle Road Mountain Rainy YES Normal Mountain Trail Normal Snowy NO Road Normal Rainy YES YES: 4 NO: 0 YES: 2 NO: 4 Road Mountain Snowy YES Trail Normal Sunny NO Road Normal Snowy NO Trail Mountain Snowy YES

  27. Partitioning the data Terrain Trail Road Terrain Unicycle- type Weather Go-For- Ride? Trail Normal Rainy NO YES: 4 NO: 1 YES: 2 NO: 3 Road Normal Sunny YES Trail Mountain Sunny YES Unicycle Road Mountain Rainy YES Normal Mountain Trail Normal Snowy NO Road Normal Rainy YES YES: 4 NO: 0 YES: 2 NO: 4 Road Mountain Snowy YES Trail Normal Sunny NO Road Normal Snowy NO Weather Trail Mountain Snowy YES Sunny Rainy Snowy YES: 2 NO: 1 YES: 2 NO: 2 YES: 2 NO: 1

  28. Partitioning the data Terrain Unicycle Weather Trail Normal Sunny Road Mountain Rainy Snowy YES: 4 NO: 1 YES: 2 NO: 3 YES: 4 NO: 0 YES: 2 NO: 4 YES: 2 NO: 1 YES: 2 NO: 2 YES: 2 NO: 1 calculate the score for each feature if we used it to split the data What score should we use? If we just stopped here, which tree would be best? How could we make these into decision trees?

  29. Decision trees Terrain Unicycle Weather Trail Normal Sunny Road Mountain Rainy Snowy YES: 4 NO: 1 YES: 2 NO: 3 YES: 4 NO: 0 YES: 2 NO: 4 YES: 2 NO: 1 YES: 2 NO: 2 YES: 2 NO: 1 How could we make these into decision trees?

  30. Decision trees Terrain Unicycle Weather Trail Normal Sunny Road Mountain Rainy Snowy YES: 4 NO: 1 YES: 2 NO: 3 YES: 4 NO: 0 YES: 2 NO: 4 YES: 2 NO: 1 YES: 2 NO: 2 YES: 2 NO: 1

  31. Decision trees Terrain Unicycle Weather Trail Normal Sunny Road Mountain Rainy Snowy YES: 4 NO: 1 YES: 2 NO: 3 YES: 4 NO: 0 YES: 2 NO: 4 YES: 2 NO: 1 YES: 2 NO: 2 YES: 2 NO: 1 Training error: the average error over the training set For classification, the most common error is the number of mistakes Training error for each of these?

  32. Decision trees Terrain Unicycle Weather Trail Normal Sunny Road Mountain Rainy Snowy YES: 4 NO: 1 YES: 2 NO: 3 YES: 4 NO: 0 YES: 2 NO: 4 YES: 2 NO: 1 YES: 2 NO: 2 YES: 2 NO: 1 3/10 2/10 4/10 Training error: the average error over the training set

  33. Training error vs. accuracy Terrain Unicycle Weather Trail Normal Sunny Road Mountain Rainy Snowy YES: 4 NO: 1 YES: 2 NO: 3 YES: 4 NO: 0 YES: 2 NO: 4 YES: 2 NO: 1 YES: 2 NO: 2 YES: 2 NO: 1 Training error: Training accuracy: 3/10 2/10 4/10 7/10 8/10 6/10 training error = 1-accuracy (and vice versa) Training error: the average error over the training set Training accuracy: the average proportion correct over the training set

  34. Recurse Terrain Unicycle- type Weather Go-For- Ride? Unicycle Trail Normal Rainy NO Normal Mountain Road Normal Sunny YES Trail Mountain Sunny YES YES: 4 NO: 0 YES: 2 NO: 4 Road Mountain Rainy YES Trail Normal Snowy NO Road Normal Rainy YES Road Mountain Snowy YES Trail Normal Sunny NO Road Normal Snowy NO Trail Mountain Snowy YES

  35. Recurse Unicycle Normal Mountain YES: 4 NO: 0 YES: 2 NO: 4 Terrain Unicycle- type Weather Go-For- Ride? Terrain Unicycle- type Weather Go-For- Ride? Trail Normal Rainy NO Trail Mountain Sunny YES Road Normal Sunny YES Road Mountain Rainy YES Trail Normal Snowy NO Road Mountain Snowy YES Road Normal Rainy YES Trail Mountain Snowy YES Trail Normal Sunny NO Road Normal Snowy NO

  36. Recurse Unicycle Normal Mountain YES: 4 NO: 0 Terrain Unicycle- type Weather Go-For- Ride? Trail Mountain Sunny YES What should we do? Road Mountain Rainy YES Road Mountain Snowy YES Trail Mountain Snowy YES

  37. Recurse Unicycle Normal Mountain YES: 4 NO: 0 Terrain Unicycle- type Weather Go-For- Ride? No need to examine other features since all examples have the same label. Trail Mountain Sunny YES Road Mountain Rainy YES Road Mountain Snowy YES Trail Mountain Snowy YES

  38. Recurse Unicycle Normal Mountain YES: 4 NO: 0 YES: 2 NO: 4 Terrain Unicycle- type Weather Go-For- Ride? Trail Normal Rainy NO Road Normal Sunny YES Trail Normal Snowy NO Road Normal Rainy YES Trail Normal Sunny NO Road Normal Snowy NO

  39. Recurse Unicycle Normal Mountain Still two features left we can split on YES: 4 NO: 0 YES: 2 NO: 4 Terrain Unicycle- type Weather Go-For- Ride? Trail Normal Rainy NO Road Normal Sunny YES Trail Normal Snowy NO Road Normal Rainy YES Trail Normal Sunny NO Road Normal Snowy NO

  40. Recurse Terrain Unicycle Trail Road Normal Mountain YES: 4 NO: 0 YES: 2 NO: 4 Terrain Unicycle- type Weather Go-For- Ride? Trail Normal Rainy NO Road Normal Sunny YES Trail Normal Snowy NO Road Normal Rainy YES Trail Normal Sunny NO Road Normal Snowy NO

  41. Recurse Terrain Unicycle Trail Road Normal Mountain YES: 2 NO: 1 YES: 0 NO: 3 YES: 4 NO: 0 YES: 2 NO: 4 Terrain Unicycle- type Weather Go-For- Ride? Trail Normal Rainy NO Road Normal Sunny YES Trail Normal Snowy NO Road Normal Rainy YES Trail Normal Sunny NO Road Normal Snowy NO

  42. Recurse Terrain Unicycle Trail Road Normal Mountain YES: 2 NO: 1 YES: 0 NO: 3 YES: 4 NO: 0 YES: 2 NO: 4 Terrain Unicycle- type Weather Go-For- Ride? Weather Trail Normal Rainy NO Sunny Rainy Snowy Road Normal Sunny YES Trail Normal Snowy NO YES: 1 NO: 1 YES: 0 NO: 2 YES: 1 NO: 1 Road Normal Rainy YES Trail Normal Sunny NO Road Normal Snowy NO

  43. Recurse Terrain Unicycle Trail Road Normal Mountain YES: 2 NO: 1 YES: 0 NO: 3 YES: 4 NO: 0 YES: 2 NO: 4 1/6 Terrain Unicycle- type Weather Go-For- Ride? Weather Trail Normal Rainy NO Sunny Rainy Snowy Road Normal Sunny YES Trail Normal Snowy NO YES: 1 NO: 1 YES: 0 NO: 2 YES: 1 NO: 1 Road Normal Rainy YES Trail Normal Sunny NO 2/6 Road Normal Snowy NO Which should we pick?

  44. Recurse Unicycle Normal Mountain Terrain YES: 4 NO: 0 Trail Road YES: 2 NO: 1 YES: 0 NO: 3 Terrain Unicycle- type Weather Go-For- Ride? Road Normal Sunny YES Road Normal Rainy YES Road Normal Snowy NO

  45. Recurse Unicycle Normal Mountain Terrain YES: 4 NO: 0 Trail Road Weather YES: 0 NO: 3 Sunny Rainy Snowy YES: 1 NO: 1 YES: 0 NO: 1 YES: 1 NO: 0

  46. Recurse Unicycle Terrain Unicycle- type Weather Go-For- Ride? Normal Mountain Trail Normal Rainy NO Terrain Road Normal Sunny YES YES: 4 NO: 0 Trail Trail Mountain Sunny YES Road Road Mountain Rainy YES Weather YES: 0 NO: 3 Trail Normal Snowy NO Sunny Rainy Road Normal Rainy YES Snowy Road Mountain Snowy YES YES: 1 NO: 0 YES: 0 NO: 1 YES: 1 NO: 0 Trail Normal Sunny NO Road Normal Snowy NO Trail Mountain Snowy YES Are we always guaranteed to get a training error of 0? Training error?

  47. Problematic data Terrain Unicycle- type Weather Go-For- Ride? Trail Normal Rainy NO Road Normal Sunny YES Trail Mountain Sunny YES Road Mountain Snowy NO Trail Normal Snowy NO Road Normal Rainy YES Road Mountain Snowy YES Trail Normal Sunny NO Road Normal Snowy NO Trail Mountain Snowy YES When can this happen?

  48. Recursive approach Base case: If all data belong to the same class, create a leaf node with that label OR all the data has the same feature values Do we always want to go all the way to the bottom?

  49. What would the tree look like for Terrain Unicycle- type Weather Go-For- Ride? Trail Mountain Rainy YES Trail Mountain Sunny YES Road Mountain Snowy YES Road Mountain Sunny YES Trail Normal Snowy NO Trail Normal Rainy NO Road Normal Snowy YES Road Normal Sunny NO Trail Normal Sunny NO

  50. What would the tree look like for Unicycle Terrain Unicycle- type Weather Go-For- Ride? Normal Mountain Trail Mountain Rainy YES Terrain YES Trail Mountain Sunny YES Trail Road Road Mountain Snowy YES NO Weather Road Mountain Sunny YES Trail Normal Snowy NO Sunny Rainy Snowy Trail Normal Rainy NO NO YES NO Road Normal Snowy YES Road Normal Sunny NO Trail Normal Sunny NO Is that what you would do?

More Related Content