Decision Trees in R

Decision Trees in R
Slide Note
Embed
Share

Explore decision tree models in R through examples and tasks. Understand tree splitting attributes, classification tasks, and model application to test data. Learn how decision trees are used for data classification by partitioning attribute space. Dive into tree induction algorithms and model deduction through practical scenarios.

  • Decision Trees
  • Data Mining
  • R Programming
  • Classification Tasks
  • Model Application

Uploaded on Feb 27, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Decision Trees in R Arko Barman With additions and modifications by Ch. Eick COSC 4335 Data Mining

  2. Example of a Decision Tree Splitting Attributes Tid Refund Marital Status Taxable Income Cheat No 1 Yes Single 125K Refund No 2 No Married 100K Yes No No 3 No Single 70K No 4 Yes Married 120K NO MarSt Yes 5 No Divorced 95K Married Single, Divorced No 6 No Married 60K TaxInc NO No 7 Yes Divorced 220K < 80K > 80K Yes 8 No Single 85K No 9 No Married 75K YES NO Yes 10 No Single 90K 10 Model: Decision Tree Training Data

  3. Another Example of Decision Tree Single, Divorced MarSt Married Tid Refund Marital Status Taxable Income Cheat NO Refund No 1 Yes Single 125K No Yes No 2 No Married 100K NO TaxInc No 3 No Single 70K < 80K > 80K No 4 Yes Married 120K Yes 5 No Divorced 95K YES NO No 6 No Married 60K No 7 Yes Divorced 220K Yes 8 No Single 85K There could be more than one tree that fits the same data! No 9 No Married 75K Yes 10 No Single 90K 10

  4. Decision Tree Classification Task Tree Induction algorithm Attrib1 Attrib2 Attrib3 Class Tid No 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K Induction No 4 Yes Medium 120K Yes 5 No Large 95K No 6 No Medium 60K Learn Model No 7 Yes Large 220K Yes 8 No Small 85K No 9 No Medium 75K Yes 10 No Small 90K Model 10 Training Set Apply Model Decision Tree Attrib1 Attrib2 Attrib3 Class Tid ? 11 No Small 55K ? 12 Yes Medium 80K Deduction ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large Test Set 67K 10

  5. Apply Model to Test Data Test Data Start from the root of tree. Refund Marital Taxable Income Cheat Status No Married 80K ? Refund 10 Yes No NO MarSt Married Single, Divorced TaxInc NO < 80K > 80K YES NO

  6. Apply Model to Test Data Test Data Refund Marital Taxable Income Cheat Status No Married 80K ? Refund 10 Yes No NO MarSt Married Single, Divorced TaxInc NO < 80K > 80K YES NO

  7. Decision Trees Used for classifying data by partitioning attribute space Tries to find axis-parallel decision boundaries for specified optimality criteria Leaf nodes contain class labels, representing classification decisions Keeps splitting nodes based on split criterion, such as GINI index, information gain or entropy Pruning necessary to avoid overfitting

  8. Decision Trees in R mydata<-data.frame(iris) attach(mydata) library(rpart) model<-rpart(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data=mydata, method="class") plot(model) text(model,use.n=TRUE,all=TRUE,cex=0.8)

  9. Decision Trees in R library(tree) model1<-tree(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data=mydata, method="class", split="gini") plot(model1) text(model1,all=TRUE,cex=0.6)

  10. Decision Trees in R library(party) model2<-ctree(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data=mydata) plot(model2)

  11. Controlling number of nodes This is just an example. You can come up with better or more efficient methods! library(tree) mydata<-data.frame(iris) attach(mydata) model1<-tree(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data=mydata, method="class", control = tree.control(nobs = 150, mincut = 10)) plot(model1) text(model1,all=TRUE,cex=0.6) predict(model1,iris) Note how the number of nodes is reduced by increasing the minimum number of observations in a child node!

  12. Controlling number of nodes This is just an example. You can come up with better or more efficient methods! model2<-ctree(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = mydata, controls = ctree_control(maxdepth=2)) plot(model2) Note that setting the maximum depth to 2 has reduced the number of nodes!

  13. http://data.princeton.edu/R/linearmodels.html Linear Models in R abline() adds one or more straight lines to a plot lm() function to fit linear regression model x1<-c(1:5,1:3) x2<-c(2,2,2,3,6,7,5,1) abline(lm(x2~x1)) title('Regression of x2 on X1') plot(x1,x2) abline(lm(x2~x1)) title('Regression of x2 on + x1') s<-lm(x2~x1) lm(x1~x2) abline(1,2)

  14. Scaling and Z-Scoring Datasets http://stat.ethz.ch/R-manual/R- patched/library/base/html/scale.html s<-scale(iris[1:4]) mean(s[,1]) sd(s[,1]) t<-scale(s, center=c(5,5,5,5), scale=FALSE) #subtracts the mean-vector and additionally (5,5,5,5) and does not dived by the standard deviation. https://archive.ics.uci.edu/ml/datasets/banknote +authentication

More Related Content