Classification and Attribute Selection Measures

SEEM4630 2016-2017 Tutorial 1

Classification

Yingfan Liu, liuyf@se.cuhk.edu.hk

Classification: Definition

•

•

•

•

ï

•

k-NN

•

Decision Tree

•

Goal

•

Construct a tree so that instances belonging to different

classes should be separated

•

Basic algorithm (a greedy algorithm)

•

Tree is constructed in a

top-down recursive manner

•

At start, all the training examples are at the root

•

Test attributes are selected on the basis of a heuristics or

statistical measure (e.g.,

information gain

•

Examples are partitioned recursively based on selected

attributes



Let

be the probability that a tuple belongs to class

estimated by

|C

i,D

|/|D|



Expected information

(entropy) needed to classify a tuple in D:



Information

needed (after using A to split D into v partitions) to

classify D:



Information

gained

by branching on attribute A:

Attribute Selection Measure 1: Information Gain



Information gain measure is biased towards attributes with a

large number of values



C4.5 (a successor of ID3) uses gain ratio to overcome the

problem (normalization to information gain):



GainRatio(A) = Gain(A)/SplitInfo(A)

Attribute Selection Measure 2:  Gain Ratio



If a data set

contains examples from

 classes, gini index,

gini

 is defined as:

where

 is the relative frequency of class

in



If a data set

  is split on A into two subsets

and

, the

gini

index

gini

) is defined as



Reduction in

Impurity



If a data set

contains examples from

 classes, gini index,

gini

 is defined as:

where

 is the relative frequency of class

in



If a data set

  is split on A into two subsets

and

, the

gini

index

gini

) is defined as



Reduction in

Impurity

Attribute Selection Measure 3: Gini index

Example



Entropy of data S



Split data by attribute

Outlook

Tree induction example

S[9+, 5-]

         Outlook

Sunny [2+,3-]

 Overcast [4+,0-]

 Rain [3+,2-]

Gain(Outlook) = 0.94 – 5/14[-2/5(log

(2/5))-3/5(log

(3/5))]

                                 – 4/14[-4/4(log

(4/4))-0/4(log

(0/4))]

                                 – 5/14[-3/5(log

(3/5))-2/5(log

(2/5))]

                      = 0.94 – 0.69 = 0.25

Info(S) = -9/14(log

(9/14))-5/14(log

(5/14)) = 0.94

Tree induction example

S[9+, 5-]

   Temperature

<15 [3+,1-]

15-25 [4+,2-]

>25 [2+,2-]

Gain(Temperature) = 0.94 – 4/14[-3/4(log

(3/4))-1/4(log

(1/4))]

                                         – 6/14[-4/6(log

(4/6))-2/6(log

(2/6))]

                                        – 4/14[-2/4(log

(2/4))-2/4(log

(2/4))]

                             = 0.94 – 0.91 = 0.03



Split data by attribute

Temperature

Tree induction example

Gain(Humidity) = 0.94 – 7/14[-3/7(log

(3/7))-4/7(log

(4/7))]

                                   – 7/14[-6/7(log

(6/7))-1/7(log

(1/7))]

                       = 0.94 – 0.79 = 0.15

Gain(Wind) = 0.94 – 8/14[-6/8(log

(6/8))-2/8(log

(2/8))]

                             – 6/14[-3/6(log

(3/6))-3/6(log

(3/6))]

                  = 0.94 – 0.89 = 0.05



Split data by attribute

Humidity



Split data by attribute

Wind

Gain(Outlook) = 0.25

Gain(Temperature)=0.03

Gain(Humidity) = 0.15

Gain(Wind) = 0.05

No

Weak

High

>25

Sunny

No

Strong

High

>25

Sunny

Yes

Weak

High

>25

Overcast

Yes

Weak

High

15-25

Rain

Yes

Weak

Normal

<15

Rain

No

Strong

Normal

<15

Rain

Yes

Strong

Normal

<15

Overcast

No

Weak

High

15-25

Sunny

Yes

Weak

Normal

<15

Sunny

Yes

Weak

Normal

15-25

Rain

Yes

Strong

Normal

15-25

Sunny

Yes

Strong

High

15-25

Overcast

Yes

Weak

Normal

>25

Overcast

No

Strong

High

15-25

Rain

Play

Tennis

Wind

Humidity

Temperat

ure

Outlook

Tree induction example



Entropy of branch Sunny



Split Sunny branch by attribute

Temperature



Split Sunny branch by attribute

Humidity



Split Sunny branch by attribute

Wind

Gain(Humidity)

= 0.97

   – 3/5[-0/3(log

(0/3))-3/3(log

(3/3))]

   – 2/5[-2/2(log

(2/2))-0/2(log

(0/2))]

= 0.97 – 0 =

0.97

Gain(Wind)

= 0.97

   – 3/5[-1/3(log

(1/3))-2/3(log

(2/3))]

   – 2/5[-1/2(log

(1/2))-1/2(log

(1/2))]

= 0.97 – 0.95= 0.02

Info(Sunny) = -2/5(log

(2/5))-3/5(log

(3/5)) = 0.97

Sunny[2+,3-]

Temperature

<15 [1+,0-]

15-25 [1+,1-]

>25 [0+,2-]

Gain(Temperature)

= 0.97

    – 1/5[-1/1(log

(1/1))-0/1(log

(0/1))]

    – 2/5[-1/2(log

(1/2))-1/2(log

(1/2))]

    – 2/5[-0/2(log

(0/2))-2/2(log

(2/2))]

= 0.97 – 0.4 = 0.57

Tree induction example

Gain(Humidity)

= 0.97

  – 2/5[-1/2(log

(1/2))-1/2(log

(1/2))]

  – 3/5[-2/3(log

(2/3))-1/3(log

(1/3))]

= 0.97 – 0.95 = 0.02

Gain(Wind)

= 0.97

   – 3/5[-3/3(log

(3/3))-0/3(log

(0/3))]

   – 2/5[-0/2(log

(0/2))-2/2(log

(2/2))]

= 0.97 – 0 =

0.97



Entropy of branch Rain



Split

Rain

branch by attribute

Temperature



Split Rain branch by attribute

Humidity



Split Rain branch by attribute

Wind

Info(Rain) = -3/5(log

(3/5))-2/5(log

(2/5))  = 0.97

Rain[3+,2-]

Temperature

<15 [1+,1-]

15-25 [2+,1-]

>25 [0+,0-]

Gain(Outlook)

= 0.97

    – 2/5[-1/2(log

(1/2))-1/2(log

(1/2))]

    – 3/5[-2/3(log

(2/3))-1/3(log

(1/3))]

– 0/5[-0/0(log

(0/0))-0/0(log

(0/0))]

= 0.97 – 0.95 = 0.02

Tree induction example



A statistical classifier

: performs probabilistic prediction, i.e.,

predicts

class membership

probabilities



                                          where

is the value of attribute A



Choose the class label that has the highest probability



Foundation:

 Based on Bayes’ Theorem.



                                       posteriori probability



                prior probability



                                      likelihood

Bayesian Classification



Problem: joint probabilities are difficult to estimate



Naïve Bayes Classifier



Assumption: attributes are

conditionally independent

Naïve Bayes Classifier

P(C=t) = 1/2      P(C=f) = 1/2

P(A=m|C=t) = 2/5

P(A=m|C=f) = 1/5

P(B=q|C=t) = 2/5

P(B=q|C=f) = 2/5

Test Record: A=m, B=q,

C=?

Example: Naïve Bayes Classifier

•

For C = t

•

P(A=m|C=t) * P(B=q|C=t) * P(C=t) = 2/5 * 2/5 * 1/2

2/25

•

P(C=t|A=m, B=q) = (2/25) / P(A=m, B=q)

•

For C = f

•

P(A=m|C=f) * P(B=q|C=f) * P(C=f) = 1/5 * 2/5 * 1/2

1/25

•

P(C=t|A=m, B=q) = (1/25) / P(A=m, B=q)

•

Conclusion: A=m, B=q,

C=t

Higher!

Example: Naïve Bayes Classifier

Thank you & Questions?

Slide Note

Embed Share

Download

This content discusses classification techniques such as decision trees and attribute selection measures like information gain, gain ratio, and Gini index in machine learning. It covers topics like constructing decision trees, calculating information gain, handling biased attributes, and reducing impurity, providing a comprehensive overview of classification methodologies.

caro_et Follow

Uploaded on Feb 22, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

SEEM4630 2016-2017 Tutorial 1 Yingfan Liu, liuyf@se.cuhk.edu.hk

Classification: Definition Given a collection of records (training set contains a set of attributes attributes, one of the attributes is the class training set ), each record class. Find a model other attributes. Decision tree Decision tree Na ve Na ve bayes k-NN model for class class attribute as a function of the values of bayes Goal: previously unseen previously unseenrecords should be assigned a class as accurately as possible. 2

Decision Tree Goal Construct a tree so that instances belonging to different classes should be separated Basic algorithm (a greedy algorithm) Tree is constructed in a top-down recursive manner At start, all the training examples are at the root Test attributes are selected on the basis of a heuristics or statistical measure (e.g., information gain) Examples are partitioned recursively based on selected attributes 3

Attribute Selection Measure 1: Information Gain Let pi be the probability that a tuple belongs to class Ci, estimated by |Ci,D|/|D| Expected information (entropy) needed to classify a tuple in D: m = ( ) log ( ) Info D p p 2 i i = 1 i Information needed (after using A to split D into v partitions) to classify D: | | D v = j ( ) ( ) Info D Info D A j | | D = 1 j Information gained by branching on attribute A: Gain(A) = Info(D) Info (D) A 4

Attribute Selection Measure 2: Gain Ratio Information gain measure is biased towards attributes with a large number of values C4.5 (a successor of ID3) uses gain ratio to overcome the problem (normalization to information gain): | | | | D D v = j j ( ) log ( ) SplitInfo D 2 A | | | | D D = 1 j GainRatio(A) = Gain(A)/SplitInfo(A) 5

Attribute Selection Measure 3: Gini index If a data set D contains examples from n classes, gini index, gini(D) is defined as: gini(D) is defined as: If a data set D contains examples from n classes, gini index, n 2 = ( ) 1 gini D pj = j where pj is the relative frequency of class j in D If a data set D is split on A into two subsets D1 and D2, the gini index gini(D) is defined as index gini(D) is defined as where pj is the relative frequency of class j in D If a data set D is split on A into two subsets D1 and D2, the gini 1 | | | | | | | | D D D D = = + + ( ( ) ) ( ( ) ) ( ( ) ) D D gini gini gini gini 1 1 2 2 giniA giniA D D D D 1 1 2 2 | | | | | | | | D D D D gini = ( ) ( ) ( ) A gini D gini D Reduction in Impurity: Reduction in Impurity: A 6

Example ID Outlook Temperature Humidity Wind Play Tennis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Sunny Sunny Overcast Rain Rain Rain Overcast Sunny Sunny Rain Sunny Overcast Overcast Rain >25 >25 >25 15-25 <15 <15 <15 15-25 <15 15-25 15-25 15-25 >25 15-25 High High High High Normal Normal Normal High Normal Normal Normal High Normal High Weak Strong Weak Weak Weak Strong Strong Weak Weak Weak Strong Strong Weak Strong No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No 7

Tree induction example Entropy of data S Info(S) = Info(S) = - -9/14(log 9/14(log2 2(9/14)) (9/14))- -5/14(log 5/14(log2 2(5/14)) = 0.94 (5/14)) = 0.94 Split data by attribute Outlook S[9+, 5 S[9+, 5- -] ] Outlook Outlook Sunny [2+,3 Sunny [2+,3- -] ] Overcast [4+,0 Overcast [4+,0- -] ] Rain [3+,2 Rain [3+,2- -] ] 5/14[- -2/5(log 2/5(log2 2(2/5)) 4/14[- -4/4(log 4/4(log2 2(4/4)) 5/14[- -3/5(log 3/5(log2 2(3/5)) = 0.94 0.69 = 0.25 0.69 = 0.25 Gain(Outlook) = 0.94 Gain(Outlook) = 0.94 5/14[ 4/14[ 5/14[ = 0.94 (2/5))- -3/5(log 3/5(log2 2(3/5))] (4/4))- -0/4(log 0/4(log2 2(0/4))] (3/5))- -2/5(log 2/5(log2 2(2/5))] (3/5))] (0/4))] (2/5))] 8

Tree induction example Split data by attribute Temperature <15 [3+,1 <15 [3+,1- -] ] 15 15- -25 [4+,2 25 [4+,2- -] ] >25 [2+,2 >25 [2+,2- -] ] S[9+, 5 S[9+, 5- -] ] Temperature Temperature Gain(Temperature) = 0.94 Gain(Temperature) = 0.94 4/14[ 6/14[ 4/14[ = 0.94 4/14[- -3/4(log 3/4(log2 2(3/4)) 6/14[- -4/6(log 4/6(log2 2(4/6)) 4/14[- -2/4(log 2/4(log2 2(2/4)) = 0.94 0.91 = 0.03 0.91 = 0.03 (3/4))- -1/4(log 1/4(log2 2(1/4))] (4/6))- -2/6(log 2/6(log2 2(2/6))] (2/4))- -2/4(log 2/4(log2 2(2/4))] (1/4))] (2/6))] (2/4))] 9

Tree induction example Split data by attribute Humidity Humidity High [3+,4 High [3+,4- -] ] S[9+, 5 S[9+, 5- -] Humidity ] Humidity Normal [6+, 1 Normal [6+, 1- -] ] Gain(Humidity) = 0.94 Gain(Humidity) = 0.94 7/14[ 7/14[ = 0.94 Split data by attribute Wind 7/14[- -3/7(log 3/7(log2 2(3/7)) 7/14[- -6/7(log 6/7(log2 2(6/7)) = 0.94 0.79 = 0.15 0.79 = 0.15 Wind (3/7))- -4/7(log 4/7(log2 2(4/7))] (6/7))- -1/7(log 1/7(log2 2(1/7))] (4/7))] (1/7))] Weak [6+, 2 Weak [6+, 2- -] ] S[9+, 5 S[9+, 5- -] Wind ] Wind Strong [3+, 3 Strong [3+, 3- -] ] Gain(Wind) = 0.94 Gain(Wind) = 0.94 8/14[ 6/14[ = 0.94 = 0.94 0.89 = 0.05 8/14[- -6/8(log 6/8(log2 2(6/8)) 6/14[- -3/6(log 3/6(log2 2(3/6)) 0.89 = 0.05 (6/8))- -2/8(log 2/8(log2 2(2/8))] (3/6))- -3/6(log 3/6(log2 2(3/6))] (2/8))] (3/6))] 10

Tree induction example Outlook Outlook Temperat Temperat ure ure Humidity Humidity Wind Wind Play Play Tennis Tennis Sunny Sunny Sunny Sunny Overcast Overcast >25 >25 >25 >25 >25 >25 High High High High High High Weak Weak Strong Strong Weak Weak No No No No Yes Yes Gain(Outlook) = 0.25 Gain(Temperature)=0.03 Gain(Humidity) = 0.15 Gain(Wind) = 0.05 Rain Rain Rain Rain Rain Rain Overcast Overcast 15 15- -25 <15 <15 <15 <15 <15 <15 25 High High Normal Normal Normal Normal Normal Normal Weak Weak Weak Weak Strong Strong Strong Strong Yes Yes Yes Yes No No Yes Yes Outlook Outlook Sunny Sunny Sunny Sunny Rain Rain Sunny Sunny Overcast Overcast 15 15- -25 <15 <15 15 15- -25 15 15- -25 15 15- -25 25 High High Normal Normal Normal Normal Normal Normal High High Weak Weak Weak Weak Weak Weak Strong Strong Strong Strong No No Yes Yes Yes Yes Yes Yes Yes Yes 25 25 25 Sunny Sunny Overcast Overcast Rain Rain ?? ?? Yes Yes ?? ?? Overcast Overcast >25 >25 Normal Normal Weak Weak Yes Yes Rain Rain 15 15- -25 25 High High Strong Strong No No 11

Entropy of branch Sunny Info(Sunny) = Info(Sunny) = - -2/5(log 2/5(log2 2(2/5)) (2/5))- -3/5(log 3/5(log2 2(3/5)) = 0.97 (3/5)) = 0.97 Split Sunny branch by attribute Temperature <15 [1+,0 <15 [1+,0- -] ] 15 15- -25 [1+,1 >25 [0+,2 >25 [0+,2- -] ] Gain(Temperature) Gain(Temperature) = 0.97 = 0.97 1/5[ 1/5[- -1/1(log 1/1(log2 2(1/1)) 2/5[ 2/5[- -1/2(log 1/2(log2 2(1/2)) 2/5[ 2/5[- -0/2(log 0/2(log2 2(0/2)) = 0.97 = 0.97 0.4 = 0.57 0.4 = 0.57 Sunny[2+,3 Sunny[2+,3- -] ] Temperature Temperature 25 [1+,1- -] ] (1/1))- -0/1(log 0/1(log2 2(0/1))] (1/2))- -1/2(log 1/2(log2 2(1/2))] (0/2))- -2/2(log 2/2(log2 2(2/2))] (0/1))] (1/2))] (2/2))] Split Sunny branch by attribute Humidity Gain(Humidity) Gain(Humidity) = 0.97 = 0.97 3/5[ 3/5[- -0/3(log 2/5[ 2/5[- -2/2(log = 0.97 = 0.97 0 = High [0+,3 High [0+,3- -] ] Sunny[2+,3 Sunny[2+,3- -] ] Humidity Humidity 0/3(log2 2(0/3)) 2/2(log2 2(2/2)) 0 = 0.97 0.97 (0/3))- -3/3(log 3/3(log2 2(3/3))] (2/2))- -0/2(log 0/2(log2 2(0/2))] (3/3))] (0/2))] Normal [2+, 0 Normal [2+, 0- -] ] Split Sunny branch by attribute Wind Gain(Wind) Gain(Wind) = 0.97 = 0.97 3/5[ 3/5[- -1/3(log 2/5[ 2/5[- -1/2(log = 0.97 = 0.97 0.95= 0.02 0.95= 0.02 Weak [1+, 2 Weak [1+, 2- -] ] Sunny[2+, 3 Sunny[2+, 3- -] ] Wind 1/3(log2 2(1/3)) 1/2(log2 2(1/2)) (1/3))- -2/3(log 2/3(log2 2(2/3))] (1/2))- -1/2(log 1/2(log2 2(1/2))] (2/3))] (1/2))] Wind Strong [1+, 1 Strong [1+, 1- -] ] 12

Tree induction example Outlook Sunny Overcast Rain Humidity Yes ?? High Normal No Yes 13

Entropy of branch Rain Info(Rain) = Info(Rain) = - -3/5(log 3/5(log2 2(3/5)) (3/5))- -2/5(log 2/5(log2 2(2/5)) = 0.97 (2/5)) = 0.97 Split Rain branch by attribute Temperature Temperature Gain(Outlook) Gain(Outlook) = 0.97 = 0.97 2/5[ 2/5[- -1/2(log 3/5[ 3/5[- -2/3(log 0/5[ 0/5[- -0/0(log = 0.97 = 0.97 0.95 = 0.02 <15 [1+,1 <15 [1+,1- -] ] 15 15- -25 [2+,1 25 [2+,1- -] ] >25 [0+,0 >25 [0+,0- -] ] Rain[3+,2 Rain[3+,2- -] ] Temperature Temperature 1/2(log2 2(1/2)) 2/3(log2 2(2/3)) 0/0(log2 2(0/0)) 0.95 = 0.02 (1/2))- -1/2(log 1/2(log2 2(1/2))] (2/3))- -1/3(log 1/3(log2 2(1/3))] (0/0))- -0/0(log 0/0(log2 2(0/0))] (1/2))] (1/3))] (0/0))] Split Rain branch by attribute Humidity Humidity Gain(Humidity) Gain(Humidity) = 0.97 = 0.97 2/5[ 2/5[- -1/2(log 3/5[ 3/5[- -2/3(log = 0.97 = 0.97 0.95 = 0.02 0.95 = 0.02 Wind High [1+,1 High [1+,1- -] ] Rain[3+,2 Rain[3+,2- -] ] Humidity Humidity 1/2(log2 2(1/2)) 2/3(log2 2(2/3)) (1/2))- -1/2(log 1/2(log2 2(1/2))] (2/3))- -1/3(log 1/3(log2 2(1/3))] (1/2))] (1/3))] Normal [2+, 1 Normal [2+, 1- -] ] Split Rain branch by attribute Wind Gain(Wind) Gain(Wind) = 0.97 = 0.97 3/5[ 3/5[- -3/3(log 2/5[ 2/5[- -0/2(log = 0.97 = 0.97 0 = Weak [3+, 0 Weak [3+, 0- -] ] Rain[3+,2 Rain[3+,2- -] ] Wind Wind 3/3(log2 2(3/3)) 0/2(log2 2(0/2)) 0 = 0.97 0.97 (3/3))- -0/3(log 0/3(log2 2(0/3))] (0/2))- -2/2(log 2/2(log2 2(2/2))] (0/3))] (2/2))] Strong [0+, 2 Strong [0+, 2- -] ] 14

Tree induction example Outlook Sunny Overcast Rain Humidity Yes Wind High Normal Weak Strong No No Yes Yes 15

Bayesian Classification A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities where xi is the value of attribute Ai Choose the class label that has the highest probability Foundation: Based on Bayes Theorem. ( ) ,..., , | ( 2 1 n i x x x C P = ( | , ,..., ) P C x x x 1 2 i n , P ,..., x | ) ( ) P x x x C P C 1 2 n i i ( , ,..., ) x x 1 2 n posteriori probability ) ,..., , | ( 2 1 n i x x x C P ) ( P Model: compute i C prior probability from data likelihood ,..., , ( 2 1 x x x P | ) nC i 16

Nave Bayes Classifier Problem: joint probabilities are difficult to estimate ( , ,..., | ) P x x x nC 1 2 i Na ve Bayes Classifier Assumption: attributes are conditionally independent = ( , ,..., | ) ( | ) ( | ) P x x x C P x C P x C 1 2 1 n i i n i = n ( | ) ( ) P x C P C j i i = 1 ( , P x x j ( | , ,..., ) P C x x x 1 2 i n ,..., ) x 1 2 n 17

Example: Nave Bayes Classifier A B C P(C=t) = 1/2 P(C=f) = 1/2 P(C=t) = 1/2 P(C=f) = 1/2 P(A=m|C=t) = 2/5 P(A=m|C=f) = 1/5 P(B=q|C=t) = 2/5 P(B=q|C=f) = 2/5 m b t m s t g q t h s t g q t g q f Test Record: A=m, B=q, C=? g s f h b F h q f m b f 18

Example: Nave Bayes Classifier For C = t P(A=m|C=t) * P(B=q|C=t) * P(C=t) = 2/5 * 2/5 * 1/2 = 2/25 P(C=t|A=m, B=q) = (2/25) / P(A=m, B=q) Higher! For C = f P(A=m|C=f) * P(B=q|C=f) * P(C=f) = 1/5 * 2/5 * 1/2 = 1/25 P(C=t|A=m, B=q) = (1/25) / P(A=m, B=q) Conclusion: A=m, B=q, C=t 19

Thank you & Questions? 20

Classification and Attribute Selection Measures

Download Presentation

Presentation Transcript

Related

More Related Content