Classification and Attribute Selection Measures

 
SEEM4630 2016-2017 Tutorial 1
Classification
 
Yingfan Liu, liuyf@se.cuhk.edu.hk
Classification: Definition
 
G
i
v
e
n
 
a
 
c
o
l
l
e
c
t
i
o
n
 
o
f
 
r
e
c
o
r
d
s
 
(
t
r
a
i
n
i
n
g
 
s
e
t
 
)
,
 
e
a
c
h
 
r
e
c
o
r
d
c
o
n
t
a
i
n
s
 
a
 
s
e
t
 
o
f
 
a
t
t
r
i
b
u
t
e
s
,
 
o
n
e
 
o
f
 
t
h
e
 
a
t
t
r
i
b
u
t
e
s
 
i
s
 
t
h
e
 
c
l
a
s
s
.
F
i
n
d
 
a
 
m
o
d
e
l
 
 
f
o
r
 
c
l
a
s
s
 
a
t
t
r
i
b
u
t
e
 
a
s
 
a
 
f
u
n
c
t
i
o
n
 
o
f
 
t
h
e
 
v
a
l
u
e
s
 
o
f
o
t
h
e
r
 
a
t
t
r
i
b
u
t
e
s
.
D
e
c
i
s
i
o
n
 
t
r
e
e
N
a
ï
v
e
 
b
a
y
e
s
k-NN
G
o
a
l
:
 
p
r
e
v
i
o
u
s
l
y
 
u
n
s
e
e
n
 
r
e
c
o
r
d
s
 
s
h
o
u
l
d
 
b
e
 
a
s
s
i
g
n
e
d
 
a
 
c
l
a
s
s
 
a
s
a
c
c
u
r
a
t
e
l
y
 
a
s
 
p
o
s
s
i
b
l
e
.
2
Decision Tree
 
Goal
Construct a tree so that instances belonging to different
classes should be separated
Basic algorithm (a greedy algorithm)
Tree is constructed in a 
top-down recursive manner
At start, all the training examples are at the root
Test attributes are selected on the basis of a heuristics or
statistical measure (e.g., 
information gain
)
Examples are partitioned recursively based on selected
attributes
 
3
 
Let 
p
i
 
be the probability that a tuple belongs to class 
C
i
,
estimated by 
|C
i,D
|/|D|
Expected information 
(entropy) needed to classify a tuple in D:
 
 
Information 
needed (after using A to split D into v partitions) to
classify D:
 
 
 
Information
 
gained 
by branching on attribute A:
Attribute Selection Measure 1: Information Gain
4
 
Information gain measure is biased towards attributes with a
large number of values
C4.5 (a successor of ID3) uses gain ratio to overcome the
problem (normalization to information gain):
 
 
GainRatio(A) = Gain(A)/SplitInfo(A)
Attribute Selection Measure 2:  Gain Ratio
 
5
 
If a data set 
D 
contains examples from 
n
 classes, gini index,
gini
(
D
)
 is defined as:
 
 
where 
p
j
 is the relative frequency of class 
j
 in 
D
If a data set 
D
  is split on A into two subsets 
D
1
 and 
D
2
, the 
gini
index 
gini
(
D
) is defined as
 
 
 
Reduction in 
Impurity
:
 
If a data set 
D 
contains examples from 
n
 classes, gini index,
gini
(
D
)
 is defined as:
 
 
where 
p
j
 is the relative frequency of class 
j
 in 
D
If a data set 
D
  is split on A into two subsets 
D
1
 and 
D
2
, the 
gini
index 
gini
(
D
) is defined as
 
 
 
Reduction in 
Impurity
:
Attribute Selection Measure 3: Gini index
 
6
Example
 
7
 
Entropy of data S
 
 
Split data by attribute 
Outlook
Tree induction example
 
8
 
S[9+, 5-]
         Outlook
 
Sunny [2+,3-]
 
 Overcast [4+,0-]
 
 Rain [3+,2-]
 
 
Gain(Outlook) = 0.94 – 5/14[-2/5(log
2
(2/5))-3/5(log
2
(3/5))]
                                 – 4/14[-4/4(log
2
(4/4))-0/4(log
2
(0/4))]
                                 – 5/14[-3/5(log
2
(3/5))-2/5(log
2
(2/5))]
                      = 0.94 – 0.69 = 0.25
 
Info(S) = -9/14(log
2
(9/14))-5/14(log
2
(5/14)) = 0.94
Tree induction example
 
9
 
S[9+, 5-]
   Temperature
 
<15 [3+,1-]
 
15-25 [4+,2-]
 
>25 [2+,2-]
 
Gain(Temperature) = 0.94 – 4/14[-3/4(log
2
(3/4))-1/4(log
2
(1/4))]
                                         – 6/14[-4/6(log
2
(4/6))-2/6(log
2
(2/6))]
                                        – 4/14[-2/4(log
2
(2/4))-2/4(log
2
(2/4))]
                             = 0.94 – 0.91 = 0.03
 
Split data by attribute 
Temperature
Tree induction example
 
10
 
Gain(Humidity) = 0.94 – 7/14[-3/7(log
2
(3/7))-4/7(log
2
(4/7))]
                                   – 7/14[-6/7(log
2
(6/7))-1/7(log
2
(1/7))]
                       = 0.94 – 0.79 = 0.15
 
Gain(Wind) = 0.94 – 8/14[-6/8(log
2
(6/8))-2/8(log
2
(2/8))]
                             – 6/14[-3/6(log
2
(3/6))-3/6(log
2
(3/6))]
                  = 0.94 – 0.89 = 0.05
 
Split data by attribute 
Humidity
 
 
 
 
 
Split data by attribute 
Wind
11
Gain(Outlook) = 0.25
Gain(Temperature)=0.03
Gain(Humidity) = 0.15
Gain(Wind) = 0.05
No
Weak
High
>25
Sunny
No
Strong
High
>25
Sunny
Yes
Weak
High
>25
Overcast
Yes
Weak
High
15-25
Rain
Yes
Weak
Normal
<15
Rain
No
Strong
Normal
<15
Rain
Yes
Strong
Normal
<15
Overcast
No
Weak
High
15-25
Sunny
Yes
Weak
Normal
<15
Sunny
Yes
Weak
Normal
15-25
Rain
Yes
Strong
Normal
15-25
Sunny
Yes
Strong
High
15-25
Overcast
Yes
Weak
Normal
>25
Overcast
No
Strong
High
15-25
Rain
Play
Tennis
Wind
Humidity
Temperat
ure
Outlook
Tree induction example
 
Entropy of branch Sunny
 
Split Sunny branch by attribute 
Temperature
 
 
 
 
Split Sunny branch by attribute 
Humidity
 
 
 
Split Sunny branch by attribute 
Wind
 
12
 
Gain(Humidity)
= 0.97
   – 3/5[-0/3(log
2
(0/3))-3/3(log
2
(3/3))]
   – 2/5[-2/2(log
2
(2/2))-0/2(log
2
(0/2))]
= 0.97 – 0 = 
0.97
 
Gain(Wind)
= 0.97
   – 3/5[-1/3(log
2
(1/3))-2/3(log
2
(2/3))]
   – 2/5[-1/2(log
2
(1/2))-1/2(log
2
(1/2))]
= 0.97 – 0.95= 0.02
 
Info(Sunny) = -2/5(log
2
(2/5))-3/5(log
2
(3/5)) = 0.97
 
Sunny[2+,3-]
Temperature
 
<15 [1+,0-]
 
15-25 [1+,1-]
 
>25 [0+,2-]
 
Gain(Temperature)
= 0.97
    – 1/5[-1/1(log
2
(1/1))-0/1(log
2
(0/1))]
    – 2/5[-1/2(log
2
(1/2))-1/2(log
2
(1/2))]
    – 2/5[-0/2(log
2
(0/2))-2/2(log
2
(2/2))]
= 0.97 – 0.4 = 0.57
 
13
O
u
t
l
o
o
k
 
Y
e
s
H
u
m
i
d
i
t
y
?
?
 
Y
e
s
 
N
o
 
H
i
g
h
 
S
u
n
n
y
 
R
a
i
n
 
N
o
r
m
a
l
 
O
v
e
r
c
a
s
t
Tree induction example
 
Gain(Humidity)
= 0.97
  – 2/5[-1/2(log
2
(1/2))-1/2(log
2
(1/2))]
  – 3/5[-2/3(log
2
(2/3))-1/3(log
2
(1/3))]
= 0.97 – 0.95 = 0.02
 
Gain(Wind)
= 0.97
   – 3/5[-3/3(log
2
(3/3))-0/3(log
2
(0/3))]
   – 2/5[-0/2(log
2
(0/2))-2/2(log
2
(2/2))]
= 0.97 – 0 = 
0.97
 
Entropy of branch Rain
 
Split 
Rain 
branch by attribute 
Temperature
 
 
 
Split Rain branch by attribute 
Humidity
 
 
 
Split Rain branch by attribute 
Wind
 
14
 
Info(Rain) = -3/5(log
2
(3/5))-2/5(log
2
(2/5))  = 0.97
 
Rain[3+,2-]
Temperature
 
<15 [1+,1-]
 
15-25 [2+,1-]
 
>25 [0+,0-]
 
Gain(Outlook)
= 0.97
    – 2/5[-1/2(log
2
(1/2))-1/2(log
2
(1/2))]
    – 3/5[-2/3(log
2
(2/3))-1/3(log
2
(1/3))]
    
– 0/5[-0/0(log
2
(0/0))-0/0(log
2
(0/0))]
= 0.97 – 0.95 = 0.02
 
15
O
u
t
l
o
o
k
Y
e
s
H
u
m
i
d
i
t
y
W
i
n
d
Y
e
s
N
o
 
N
o
r
m
a
l
 
H
i
g
h
N
o
Y
e
s
 
S
t
r
o
n
g
 
W
e
a
k
 
O
v
e
r
c
a
s
t
 
S
u
n
n
y
 
R
a
i
n
Tree induction example
 
16
 
A statistical classifier
: performs probabilistic prediction, i.e.,
predicts 
class membership 
probabilities
                                          where 
x
i 
is the value of attribute A
i
Choose the class label that has the highest probability
Foundation:
 Based on Bayes’ Theorem.
                                       posteriori probability
                prior probability
                                      likelihood
 
Bayesian Classification
 
Problem: joint probabilities are difficult to estimate
 
Naïve Bayes Classifier
Assumption: attributes are 
conditionally independent
Naïve Bayes Classifier
 
17
 
18
 
P(C=t) = 1/2      P(C=f) = 1/2
P(A=m|C=t) = 2/5
P(A=m|C=f) = 1/5
P(B=q|C=t) = 2/5
P(B=q|C=f) = 2/5
 
Test Record: A=m, B=q, 
C=?
 
Example: Naïve Bayes Classifier
 
19
 
For C = t
P(A=m|C=t) * P(B=q|C=t) * P(C=t) = 2/5 * 2/5 * 1/2
 
= 
2/25
P(C=t|A=m, B=q) = (2/25) / P(A=m, B=q)
 
For C = f
P(A=m|C=f) * P(B=q|C=f) * P(C=f) = 1/5 * 2/5 * 1/2
 
= 
1/25
P(C=t|A=m, B=q) = (1/25) / P(A=m, B=q)
 
Conclusion: A=m, B=q, 
C=t
 
Higher!
Example: Naïve Bayes Classifier
 
Thank you & Questions?
 
20
Slide Note
Embed
Share

This content discusses classification techniques such as decision trees and attribute selection measures like information gain, gain ratio, and Gini index in machine learning. It covers topics like constructing decision trees, calculating information gain, handling biased attributes, and reducing impurity, providing a comprehensive overview of classification methodologies.

  • Machine Learning
  • Classification
  • Decision Trees
  • Information Gain
  • Gain Ratio

Uploaded on Feb 22, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. SEEM4630 2016-2017 Tutorial 1 Yingfan Liu, liuyf@se.cuhk.edu.hk

  2. Classification: Definition Given a collection of records (training set contains a set of attributes attributes, one of the attributes is the class training set ), each record class. Find a model other attributes. Decision tree Decision tree Na ve Na ve bayes k-NN model for class class attribute as a function of the values of bayes Goal: previously unseen previously unseenrecords should be assigned a class as accurately as possible. 2

  3. Decision Tree Goal Construct a tree so that instances belonging to different classes should be separated Basic algorithm (a greedy algorithm) Tree is constructed in a top-down recursive manner At start, all the training examples are at the root Test attributes are selected on the basis of a heuristics or statistical measure (e.g., information gain) Examples are partitioned recursively based on selected attributes 3

  4. Attribute Selection Measure 1: Information Gain Let pi be the probability that a tuple belongs to class Ci, estimated by |Ci,D|/|D| Expected information (entropy) needed to classify a tuple in D: m = ( ) log ( ) Info D p p 2 i i = 1 i Information needed (after using A to split D into v partitions) to classify D: | | D v = j ( ) ( ) Info D Info D A j | | D = 1 j Information gained by branching on attribute A: Gain(A) = Info(D) Info (D) A 4

  5. Attribute Selection Measure 2: Gain Ratio Information gain measure is biased towards attributes with a large number of values C4.5 (a successor of ID3) uses gain ratio to overcome the problem (normalization to information gain): | | | | D D v = j j ( ) log ( ) SplitInfo D 2 A | | | | D D = 1 j GainRatio(A) = Gain(A)/SplitInfo(A) 5

  6. Attribute Selection Measure 3: Gini index If a data set D contains examples from n classes, gini index, gini(D) is defined as: gini(D) is defined as: If a data set D contains examples from n classes, gini index, n 2 = ( ) 1 gini D pj = j where pj is the relative frequency of class j in D If a data set D is split on A into two subsets D1 and D2, the gini index gini(D) is defined as index gini(D) is defined as where pj is the relative frequency of class j in D If a data set D is split on A into two subsets D1 and D2, the gini 1 | | | | | | | | D D D D = = + + ( ( ) ) ( ( ) ) ( ( ) ) D D gini gini gini gini 1 1 2 2 giniA giniA D D D D 1 1 2 2 | | | | | | | | D D D D gini = ( ) ( ) ( ) A gini D gini D Reduction in Impurity: Reduction in Impurity: A 6

  7. Example ID Outlook Temperature Humidity Wind Play Tennis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Sunny Sunny Overcast Rain Rain Rain Overcast Sunny Sunny Rain Sunny Overcast Overcast Rain >25 >25 >25 15-25 <15 <15 <15 15-25 <15 15-25 15-25 15-25 >25 15-25 High High High High Normal Normal Normal High Normal Normal Normal High Normal High Weak Strong Weak Weak Weak Strong Strong Weak Weak Weak Strong Strong Weak Strong No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No 7

  8. Tree induction example Entropy of data S Info(S) = Info(S) = - -9/14(log 9/14(log2 2(9/14)) (9/14))- -5/14(log 5/14(log2 2(5/14)) = 0.94 (5/14)) = 0.94 Split data by attribute Outlook S[9+, 5 S[9+, 5- -] ] Outlook Outlook Sunny [2+,3 Sunny [2+,3- -] ] Overcast [4+,0 Overcast [4+,0- -] ] Rain [3+,2 Rain [3+,2- -] ] 5/14[- -2/5(log 2/5(log2 2(2/5)) 4/14[- -4/4(log 4/4(log2 2(4/4)) 5/14[- -3/5(log 3/5(log2 2(3/5)) = 0.94 0.69 = 0.25 0.69 = 0.25 Gain(Outlook) = 0.94 Gain(Outlook) = 0.94 5/14[ 4/14[ 5/14[ = 0.94 (2/5))- -3/5(log 3/5(log2 2(3/5))] (4/4))- -0/4(log 0/4(log2 2(0/4))] (3/5))- -2/5(log 2/5(log2 2(2/5))] (3/5))] (0/4))] (2/5))] 8

  9. Tree induction example Split data by attribute Temperature <15 [3+,1 <15 [3+,1- -] ] 15 15- -25 [4+,2 25 [4+,2- -] ] >25 [2+,2 >25 [2+,2- -] ] S[9+, 5 S[9+, 5- -] ] Temperature Temperature Gain(Temperature) = 0.94 Gain(Temperature) = 0.94 4/14[ 6/14[ 4/14[ = 0.94 4/14[- -3/4(log 3/4(log2 2(3/4)) 6/14[- -4/6(log 4/6(log2 2(4/6)) 4/14[- -2/4(log 2/4(log2 2(2/4)) = 0.94 0.91 = 0.03 0.91 = 0.03 (3/4))- -1/4(log 1/4(log2 2(1/4))] (4/6))- -2/6(log 2/6(log2 2(2/6))] (2/4))- -2/4(log 2/4(log2 2(2/4))] (1/4))] (2/6))] (2/4))] 9

  10. Tree induction example Split data by attribute Humidity Humidity High [3+,4 High [3+,4- -] ] S[9+, 5 S[9+, 5- -] Humidity ] Humidity Normal [6+, 1 Normal [6+, 1- -] ] Gain(Humidity) = 0.94 Gain(Humidity) = 0.94 7/14[ 7/14[ = 0.94 Split data by attribute Wind 7/14[- -3/7(log 3/7(log2 2(3/7)) 7/14[- -6/7(log 6/7(log2 2(6/7)) = 0.94 0.79 = 0.15 0.79 = 0.15 Wind (3/7))- -4/7(log 4/7(log2 2(4/7))] (6/7))- -1/7(log 1/7(log2 2(1/7))] (4/7))] (1/7))] Weak [6+, 2 Weak [6+, 2- -] ] S[9+, 5 S[9+, 5- -] Wind ] Wind Strong [3+, 3 Strong [3+, 3- -] ] Gain(Wind) = 0.94 Gain(Wind) = 0.94 8/14[ 6/14[ = 0.94 = 0.94 0.89 = 0.05 8/14[- -6/8(log 6/8(log2 2(6/8)) 6/14[- -3/6(log 3/6(log2 2(3/6)) 0.89 = 0.05 (6/8))- -2/8(log 2/8(log2 2(2/8))] (3/6))- -3/6(log 3/6(log2 2(3/6))] (2/8))] (3/6))] 10

  11. Tree induction example Outlook Outlook Temperat Temperat ure ure Humidity Humidity Wind Wind Play Play Tennis Tennis Sunny Sunny Sunny Sunny Overcast Overcast >25 >25 >25 >25 >25 >25 High High High High High High Weak Weak Strong Strong Weak Weak No No No No Yes Yes Gain(Outlook) = 0.25 Gain(Temperature)=0.03 Gain(Humidity) = 0.15 Gain(Wind) = 0.05 Rain Rain Rain Rain Rain Rain Overcast Overcast 15 15- -25 <15 <15 <15 <15 <15 <15 25 High High Normal Normal Normal Normal Normal Normal Weak Weak Weak Weak Strong Strong Strong Strong Yes Yes Yes Yes No No Yes Yes Outlook Outlook Sunny Sunny Sunny Sunny Rain Rain Sunny Sunny Overcast Overcast 15 15- -25 <15 <15 15 15- -25 15 15- -25 15 15- -25 25 High High Normal Normal Normal Normal Normal Normal High High Weak Weak Weak Weak Weak Weak Strong Strong Strong Strong No No Yes Yes Yes Yes Yes Yes Yes Yes 25 25 25 Sunny Sunny Overcast Overcast Rain Rain ?? ?? Yes Yes ?? ?? Overcast Overcast >25 >25 Normal Normal Weak Weak Yes Yes Rain Rain 15 15- -25 25 High High Strong Strong No No 11

  12. Entropy of branch Sunny Info(Sunny) = Info(Sunny) = - -2/5(log 2/5(log2 2(2/5)) (2/5))- -3/5(log 3/5(log2 2(3/5)) = 0.97 (3/5)) = 0.97 Split Sunny branch by attribute Temperature <15 [1+,0 <15 [1+,0- -] ] 15 15- -25 [1+,1 >25 [0+,2 >25 [0+,2- -] ] Gain(Temperature) Gain(Temperature) = 0.97 = 0.97 1/5[ 1/5[- -1/1(log 1/1(log2 2(1/1)) 2/5[ 2/5[- -1/2(log 1/2(log2 2(1/2)) 2/5[ 2/5[- -0/2(log 0/2(log2 2(0/2)) = 0.97 = 0.97 0.4 = 0.57 0.4 = 0.57 Sunny[2+,3 Sunny[2+,3- -] ] Temperature Temperature 25 [1+,1- -] ] (1/1))- -0/1(log 0/1(log2 2(0/1))] (1/2))- -1/2(log 1/2(log2 2(1/2))] (0/2))- -2/2(log 2/2(log2 2(2/2))] (0/1))] (1/2))] (2/2))] Split Sunny branch by attribute Humidity Gain(Humidity) Gain(Humidity) = 0.97 = 0.97 3/5[ 3/5[- -0/3(log 2/5[ 2/5[- -2/2(log = 0.97 = 0.97 0 = High [0+,3 High [0+,3- -] ] Sunny[2+,3 Sunny[2+,3- -] ] Humidity Humidity 0/3(log2 2(0/3)) 2/2(log2 2(2/2)) 0 = 0.97 0.97 (0/3))- -3/3(log 3/3(log2 2(3/3))] (2/2))- -0/2(log 0/2(log2 2(0/2))] (3/3))] (0/2))] Normal [2+, 0 Normal [2+, 0- -] ] Split Sunny branch by attribute Wind Gain(Wind) Gain(Wind) = 0.97 = 0.97 3/5[ 3/5[- -1/3(log 2/5[ 2/5[- -1/2(log = 0.97 = 0.97 0.95= 0.02 0.95= 0.02 Weak [1+, 2 Weak [1+, 2- -] ] Sunny[2+, 3 Sunny[2+, 3- -] ] Wind 1/3(log2 2(1/3)) 1/2(log2 2(1/2)) (1/3))- -2/3(log 2/3(log2 2(2/3))] (1/2))- -1/2(log 1/2(log2 2(1/2))] (2/3))] (1/2))] Wind Strong [1+, 1 Strong [1+, 1- -] ] 12

  13. Tree induction example Outlook Sunny Overcast Rain Humidity Yes ?? High Normal No Yes 13

  14. Entropy of branch Rain Info(Rain) = Info(Rain) = - -3/5(log 3/5(log2 2(3/5)) (3/5))- -2/5(log 2/5(log2 2(2/5)) = 0.97 (2/5)) = 0.97 Split Rain branch by attribute Temperature Temperature Gain(Outlook) Gain(Outlook) = 0.97 = 0.97 2/5[ 2/5[- -1/2(log 3/5[ 3/5[- -2/3(log 0/5[ 0/5[- -0/0(log = 0.97 = 0.97 0.95 = 0.02 <15 [1+,1 <15 [1+,1- -] ] 15 15- -25 [2+,1 25 [2+,1- -] ] >25 [0+,0 >25 [0+,0- -] ] Rain[3+,2 Rain[3+,2- -] ] Temperature Temperature 1/2(log2 2(1/2)) 2/3(log2 2(2/3)) 0/0(log2 2(0/0)) 0.95 = 0.02 (1/2))- -1/2(log 1/2(log2 2(1/2))] (2/3))- -1/3(log 1/3(log2 2(1/3))] (0/0))- -0/0(log 0/0(log2 2(0/0))] (1/2))] (1/3))] (0/0))] Split Rain branch by attribute Humidity Humidity Gain(Humidity) Gain(Humidity) = 0.97 = 0.97 2/5[ 2/5[- -1/2(log 3/5[ 3/5[- -2/3(log = 0.97 = 0.97 0.95 = 0.02 0.95 = 0.02 Wind High [1+,1 High [1+,1- -] ] Rain[3+,2 Rain[3+,2- -] ] Humidity Humidity 1/2(log2 2(1/2)) 2/3(log2 2(2/3)) (1/2))- -1/2(log 1/2(log2 2(1/2))] (2/3))- -1/3(log 1/3(log2 2(1/3))] (1/2))] (1/3))] Normal [2+, 1 Normal [2+, 1- -] ] Split Rain branch by attribute Wind Gain(Wind) Gain(Wind) = 0.97 = 0.97 3/5[ 3/5[- -3/3(log 2/5[ 2/5[- -0/2(log = 0.97 = 0.97 0 = Weak [3+, 0 Weak [3+, 0- -] ] Rain[3+,2 Rain[3+,2- -] ] Wind Wind 3/3(log2 2(3/3)) 0/2(log2 2(0/2)) 0 = 0.97 0.97 (3/3))- -0/3(log 0/3(log2 2(0/3))] (0/2))- -2/2(log 2/2(log2 2(2/2))] (0/3))] (2/2))] Strong [0+, 2 Strong [0+, 2- -] ] 14

  15. Tree induction example Outlook Sunny Overcast Rain Humidity Yes Wind High Normal Weak Strong No No Yes Yes 15

  16. Bayesian Classification A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities where xi is the value of attribute Ai Choose the class label that has the highest probability Foundation: Based on Bayes Theorem. ( ) ,..., , | ( 2 1 n i x x x C P = ( | , ,..., ) P C x x x 1 2 i n , P ,..., x | ) ( ) P x x x C P C 1 2 n i i ( , ,..., ) x x 1 2 n posteriori probability ) ,..., , | ( 2 1 n i x x x C P ) ( P Model: compute i C prior probability from data likelihood ,..., , ( 2 1 x x x P | ) nC i 16

  17. Nave Bayes Classifier Problem: joint probabilities are difficult to estimate ( , ,..., | ) P x x x nC 1 2 i Na ve Bayes Classifier Assumption: attributes are conditionally independent = ( , ,..., | ) ( | ) ( | ) P x x x C P x C P x C 1 2 1 n i i n i = n ( | ) ( ) P x C P C j i i = 1 ( , P x x j ( | , ,..., ) P C x x x 1 2 i n ,..., ) x 1 2 n 17

  18. Example: Nave Bayes Classifier A B C P(C=t) = 1/2 P(C=f) = 1/2 P(C=t) = 1/2 P(C=f) = 1/2 P(A=m|C=t) = 2/5 P(A=m|C=f) = 1/5 P(B=q|C=t) = 2/5 P(B=q|C=f) = 2/5 m b t m s t g q t h s t g q t g q f Test Record: A=m, B=q, C=? g s f h b F h q f m b f 18

  19. Example: Nave Bayes Classifier For C = t P(A=m|C=t) * P(B=q|C=t) * P(C=t) = 2/5 * 2/5 * 1/2 = 2/25 P(C=t|A=m, B=q) = (2/25) / P(A=m, B=q) Higher! For C = f P(A=m|C=f) * P(B=q|C=f) * P(C=f) = 1/5 * 2/5 * 1/2 = 1/25 P(C=t|A=m, B=q) = (1/25) / P(A=m, B=q) Conclusion: A=m, B=q, C=t 19

  20. Thank you & Questions? 20

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#