Decision Trees in Classification

Classification
Decision trees
"First" node is the 
root
 of the tree
"Last" nodes are called 
leaves
The edges between nodes represent
branches
How do we "read" a decision tree?
What is a decision tree?
How to "build" a decision tree?
Procedure
 (recursive):
Choose the "best" attribute
Split the data into subsets according to the values
of the "best" attribute
Repeat the 
procedure
 in each of the subsets
Stop when
 (the stopping criterion):
out of attributes,
all leaves are "pure"
(
all examples in a leaf belong to the same class
),
the (sub)set is empty.
Choosing the
 "best
" attribute
Which attribute 
is "the 
best"?
The one, yielding the smaller tree
The one, yielding the highest classification accuracy
Heuristic
: choose attribute, yielding 
"purest" 
nodes
Let's look at some 
"(im)purity" measures:
Information gain
Information gain ration
Gini index
5
5
Which attribute to choose? – example
Information gain
Measuring the information as 
entropy
:
g
iven the probability distribution of some events,
entropy measures the information that is needed
to encode every possible outcome of these events
entropy is measured in bits
Formula for computing the entropy:
where 
p
1
, 
p
2
, …, 
p
n
 are the probabilities of
given events
If a set 
T
 contains examples with 
n
 classes,
info(T)
 is defined as:
where 
p
j
 is the relative frequency (probability)
of class 
j
 in 
T
.
Computing information 
– "before
"
We split the set 
T
 containing 
N
 examples into subsets
T
1
, 
T
2
, …, 
T
k
 containing 
N
1
, 
N
2
, …, 
N
k
 examples, respectively.
The information of this split is defined as:
Computing information – "after
"
Information gain – definition
Information gain
 =
   = information "before split" – information "after split"
InfoGain = IBS – IAS
InfoGain – the "weather" example
Computing the 
IBS
Information "before the split":
Yes
: 9 examples
No
: 5 examples
Info([9,5]) = entropy(5/14, 9/14) =
= – 5/14 * log
2
(5/14) – 9/14 * log
2
(9/14) = 
0.940
InfoGain("outlook")
 
InfoGain("outlook") = 
IBS
 – Info("outlook")
 
Entropy
:
  
 
Sunny: 
Info([2,3]) = entropy(2/5,3/5) = – 2/5 * log
2
(2/5) – 3/5 * log
2
(3/5) = 
0.971
 
0.971
 
0
 
0.971
 
5/14
 
4/14
 
5/14
 
Overcast: 
Info([4,0]) = entropy(1,0)         =     – 1 * log
2
(1)        – 0 * log
2
(0)     = 
0
 
 Rainy: 
Info([3,2]) = entropy(3/5,2/5) = – 3/5 * log
2
(3/5) – 2/5 * log
2
(2/5) = 
0.971
 
Info("outlook")
 
=
        Info([2,3],[4,0],[3,2]) = (5/14) * 0.971 + (4/14) * 0 + (5/14) * 0.971 = 
0.693
 
I
n
f
o
G
a
i
n
(
"
o
u
t
l
o
o
k
"
)
 
=
 
0
.
9
4
0
 
 
0
.
6
9
3
 
=
 
0
.
2
4
7
InfoGain("temperature")
 
Entropy
:
              Hot: 
Info([2,2]) = entropy(2/4,2/4) = – 2/4 * log
2
(2/4) – 2/4 * log
2
(2/4) = 
1
 
1
 
0.918
 
0.811
 
4/14
 
6/14
 
4/14
 
Mild: 
Info([4,2]) = entropy(4/6,2/6) = – 4/6 * log
2
(4/6) – 2/6 * log
2
(2/6) = 
0.918
 
Cool: 
Info([3,1]) = entropy(3/4,1/4) = – 3/4 * log
2
(3/4) – 1/4 * log
2
(1/4) = 
0.811
 
Info("temperature") =
        Info([2,2],[4,2],[3,1]) = (4/14) * 1 + (6/14) * 0.918 + (4/14) * 0.811 = 
0.911
 
I
n
f
o
G
a
i
n
(
"
t
e
m
p
e
r
a
t
u
r
e
"
)
 
=
 
0
.
9
4
0
 
 
0
.
9
1
1
 
=
 
0
.
0
2
9
InfoGain("humidity")
 
Entropy
:
             High: 
Info([3,4]) = entropy(3/7,4/7) = – 3/7 * log
2
(3/7) – 4/7 * log
2
(4/7) = 
0.985
 
0.985
 
0.592
 
7/14
 
7/14
 
Normal: 
Info([6,1]) = entropy(6/7,1/7) = – 6/7 * log
2
(6/7) – 1/7 * log
2
(1/7) = 
0.592
 
Info("humidity") =
        Info([3,4],[6,1]) = (7/14) * 0.985 + (7/14) * 0.592 = 
0.789
 
I
n
f
o
G
a
i
n
(
"
h
u
m
i
d
i
t
y
"
)
 
=
 
0
.
9
4
0
 
 
0
.
7
8
9
 
=
 
0
.
1
5
1
InfoGain("windy")
 
Entropy
:
            True: 
Info([6,2]) = entropy(6/8,2/8) = – 6/8 * log
2
(6/8) – 2/8 * log
2
(2/8) = 
0.811
 
0.811
 
1
 
8/14
 
6/14
 
False: 
Info([3,3]) = entropy(3/6,3/6) = – 3/6 * log
2
(3/6) – 3/6 * log
2
(3/6) = 
1
 
Info("windy") =
        Info([6,2],[3,3]) = (8/14) * 0.811 + (6/14) * 1 = 
0.892
 
I
n
f
o
G
a
i
n
(
"
w
i
n
d
y
"
)
 
=
 
0
.
9
4
0
 
 
0
.
8
9
2
 
=
 
0
.
0
4
8
The one with the highest 
information gain
(
InfoGain
):
Outlook: 
 
0.247 bits
Temperature:
 
0.029 bits
Humidity:
 
0.152 bits
Wind:
  
0.048 bits
Which attribute is "best"?
Outlook = Sunny
Step 2 = recursion
Next level in the tree …
Outlook = Sunny
Attribute 
Temperature
Info("Temperature")
 = info([0,2],[1,1,],[1,0]) = 2/5*0 + 2/5*1 + 1/5*0 = 
0.4
I
n
f
o
G
a
i
n
(
"
T
e
m
p
e
r
a
t
u
r
e
"
)
 
=
 
i
n
f
o
(
[
2
,
3
]
)
 
 
i
n
f
o
(
[
0
,
2
]
,
[
1
,
1
,
]
,
[
1
,
0
]
)
 
=
 
0
.
9
7
1
 
 
0
.
4
 
=
 
0
.
5
7
1
 
 
Hot
: info([0,2]) = entropy(0,1)         = 
0
Mild
: info([1,1]) = entropy(1/2,1/2) = 
1
Cool
: info([1,0]) = entropy(1,0)         = 
0
Outlook = Sunny
Attribute 
Humidity
Info("Humidity")
 = info([0,3],[2,0]) = 2/5*0 + 3/5*0 = 
0
I
n
f
o
G
a
i
n
(
"
H
u
m
i
d
i
t
y
"
)
 
=
 
i
n
f
o
(
[
2
,
3
]
)
 
 
i
n
f
o
(
[
0
,
3
]
,
[
2
,
0
]
)
 
=
 
0
.
9
7
1
 
 
0
 
=
 
0
.
9
7
1
 
     
High
: info([0,3]) = entropy(0,1) = 
0
Normal
: info([2,0]) = entropy(1,0) = 
0
Next level in the tree …
Outlook = Sunny
Attribute 
Windy
Info("Windy")
 = info([1,2],[1,1]) = 3/5* 0.918 + 2/5*1 = 
0.951
I
n
f
o
G
a
i
n
(
"
W
i
n
d
y
"
)
 
=
 
i
n
f
o
(
[
2
,
3
]
)
 
 
i
n
f
o
(
[
1
,
2
]
,
[
1
,
1
,
]
)
 
=
 
0
.
9
7
1
 
 
0
.
9
5
1
 
=
 
0
.
0
2
 
 
True
: info([1,2]) = entropy(1/3,2/3) = – 1/3*log
2
(1/3) – 2/3*log
2
(2/3) = 
0.918
False
: info([1,1]) = entropy(1/2,1/2) = – 1/2*log
2
(1/2) – 1/2*log
2
(1/2) = 
1
Next level in the tree …
Information gain (
InfoGain
):
Temperature: 0.571
Humidity: 
 
0.971
Windy: 
 
0.02
Choose the attribute 
Humidity
The leaves are pure 
no need to continue the procedure
Choosing the "best" attribute
Outlook = Overcast
Step 2 = recursion
Outlook = Overcast
No need to further split 
 the node is "pure"
(contains just examples of class "
Yes
")
Next level in the tree …
Outlook = Rainy
Step 2 = recursion
Outlook = Rainy
Info("Temperature")
 = info([2,1],[1,1]) = 3/5*0.918 + 2/5*1 = 
0.951
I
n
f
o
G
a
i
n
(
"
T
e
m
p
e
r
a
t
u
r
e
"
)
 
=
 
i
n
f
o
(
[
2
,
3
]
)
 
 
i
n
f
o
(
[
1
,
2
]
,
[
1
,
1
,
]
)
 
=
 
0
.
9
7
1
 
 
0
.
9
5
1
 
=
 
0
.
0
2
 
Mild
: info([1,2]) = entropy(1/3,2/3) = – 1/3*log
2
(1/3) – 2/3*log
2
(2/3) = 
0.918
Cool
: info([1,1]) = entropy(1/2,1/2) = – 1/2*log
2
(1/2) – 1/2*log
2
(1/2) = 
1
Next level in the tree …
Info("Humidity")
 = info([2,1],[1,1]) = 3/5*0.918 + 2/5*1 = 
0.951
I
n
f
o
G
a
i
n
(
"
H
u
m
i
d
i
t
y
"
)
 
=
 
i
n
f
o
(
[
2
,
3
]
)
 
 
i
n
f
o
(
[
1
,
2
]
,
[
1
,
1
,
]
)
 
=
 
0
.
9
7
1
 
 
0
.
9
5
1
 
=
 
0
.
0
2
 
     High
: info([1,1]) = entropy(1/2,1/2) = – 1/2*log
2
(1/2) – 1/2*log
2
(1/2) = 
1
Normal
: info([1,2]) = entropy(1/3,2/3) = – 1/3*log
2
(1/3) – 2/3*log
2
(2/3) = 
0.918
Next level in the tree …
Outlook = Rainy
Info ("Windy")
 = info([2,1],[1,1]) = 3/5*0 + 2/5*0 = 
0
I
n
f
o
G
a
i
n
(
"
W
i
n
d
y
"
)
 
=
 
i
n
f
o
(
[
2
,
3
]
)
 
 
i
n
f
o
(
[
1
,
2
]
,
[
1
,
1
,
]
)
 
=
 
0
.
9
7
1
 
 
0
 
=
 
0
.
9
7
1
 
     
High
: info([0,2]) = 
0
Normal
: info([3,0]) = 
0
Next level in the tree …
Outlook = Rainy
Informatiom gain (
InfoGain
):
Temperature: 0.02
Humidity: 
 
0.02
Windy: 
 
0.971
Choose the attribute 
Windy
The leaves are pure 
no need to continue the procedure
Choosing the "best" attribute
The "final" decision tree
Information gain: problems
If an attribute has many values, there is a
higher probability that the subsets will be
"pure";
Consequence 
 InfoGain "perfers" attributes
  
              with larger number of values
.
Example: attribute "ID code"
The entropy of the split = 
0
(each leaf is "pure" containing just a single example);
Information gain of attribute "ID code" is maximal
= extreme case
Information gain ratio (GainRatio)
Intrinsic/split information = entropy of the split
Information gain ratio (GainRatio):
GainRatio("outlook")
0.971
0
0.971
5/14
4/14
5/14
 
InfoGain("outlook") 
= 0.940 – 0.693 =
 0.247
 
SplitInfo("outlook")
 = info([5,4,5]) = entropy(5/14, 4/14, 5/14) =
                                     = – 5/14*log
2
(5/14) – 4/14*log
2
(4/14) – 5/14*log
2
(5/14) = 
1.577
 
GainRatio("outlook") 
= InfoGain("outlook") / SplitInfo("outlook") =
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
=
 
0
.
2
4
7
 
/
 
1
.
5
7
7
 
=
 
0
.
1
5
6
GainRatio("temperature")
1
0.918
0.811
4/14
6/14
4/14
 
InfoGain("temperature")
 = 0.940 – 0.911 = 
0.029
 
SplitInfo("temperature")
 = info([4,6,4]) = entropy(4/14, 6/14, 4/14) =
 
  
           = – 4/14*log
2
(4/14) – 6/14*log
2
(6/14) – 4/14*log
2
(4/14) = 
1.362
 
G
a
i
n
R
a
t
i
o
(
"
t
e
m
p
e
r
a
t
u
r
e
"
)
 
=
 
I
n
f
o
G
a
i
n
(
"
t
e
m
p
e
r
a
t
u
r
e
"
)
 
/
 
S
p
l
i
t
I
n
f
o
(
"
t
e
m
p
e
r
a
t
u
r
e
"
)
 
=
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
=
 
0
.
0
2
9
 
/
 
1
.
3
6
2
 
=
 
0
.
0
2
1
GainRatio("humidity")
0.985
0.592
7/14
7/14
 
InfoGain("humidity") 
= 0.940 – 0.789 = 
0.151
 
SplitInfo("humidity")
 = info([7,7]) = entropy(7/14, 7/14) =
                                       = – 7/14*log
2
(7/14) – 7/14*log
2
(7/14) = 
1
 
GainRatio("humidity") 
= InfoGain("humidity") / SplitInfo("humidity") =
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
=
 
0
.
1
5
1
 
/
 
1
 
=
 
0
.
1
5
1
GainRatio("windy")
0.811
1
8/14
6/14
 
InfoGain("windy") 
= 0.940 – 0.892 = 
0.048
 
SplitInfo("windy")
 = info([8,6]) = entropy(8/14, 6/14) =
                                  = – 8/14*log
2
(8/14) – 6/14*log
2
(6/14) = 
0.985
 
GainRatio("windy") 
= InfoGain("windy") / SplitInfo("windy") =
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
=
 
0
.
0
4
8
 
/
 
0
.
9
8
5
 
=
 
0
.
0
4
9
Information gain ratio (GainRatio):
Outlook: 
 
0.156 bits
Temperature:
 
0.021 bits
Humidity:
 
0.152 bits
Windy:
  
0.049 bits
Choosing the "best" attribute
Gini index – 
"before 
split"
We split the set
 
T
 containing 
N
 
examples into subsets
T
1
, 
T
2
, …, 
T
k
 
containing
 
N
1
, 
N
2
, …, 
N
k
 
examples, respectively.
The information of this split is defined as
:
The attribute with the lowest 
gini
split
(T)
 is chosen as the
"best" 
attribute.
Gini index – 
"after 
split"
Gini("outlook")
 
Gini index:
                  
Sunny
: 
Gini([2/5,3/5]) = 1 – ((2/5)
2
 + (3/5)
2
) = 
0.48
 
0.48
 
0
 
0.48
 
5/14
 
4/14
 
5/14
 
Overcast
: 
Gini([4/4,0/4]) = 1 – ((4/4)
2
 + (0/4)
2
) = 
0
 
Rainy
: 
Gini([3/5,2/5]) = 1 – ((3/5)
2
 + (2/5)
2
) = 
0.48
 
G
i
n
i
(
"
o
u
t
l
o
o
k
"
)
 
=
 
(
5
/
1
4
)
*
0
.
4
8
 
+
 
(
4
/
1
4
)
*
0
 
+
 
(
5
/
1
4
)
*
0
.
4
8
 
=
 
0
.
3
4
2
Gini("temperature")
 
Gini index:
                   
Hot
: 
Gini(2/4,2/4) = 1 – ((2/4)
2
 + (2/4)
2
) = 
0.5
 
0.5
 
0.444
 
0.375
 
4/14
 
6/14
 
4/14
 
Mild
: 
Gini(4/6,2/6) = 1 – ((4/6)
2
 + (2/6)
2
) = 
0.444
 
Cool
: 
Gini(3/4,1/4) = 1 – ((3/4)
2
 + (1/4)
2
) = 
0.375
 
G
i
n
i
(
"
t
e
m
p
e
r
a
t
u
r
e
"
)
 
=
 
(
4
/
1
4
)
*
0
.
5
 
+
 
(
6
/
1
4
)
*
0
.
4
4
4
 
+
 
(
4
/
1
4
)
*
0
.
3
7
5
 
=
 
0
.
4
4
0
Gini("humidity")
 
Gini index:
  
 
High
: 
Gini(3/7,4/7) = 1 – ((3/7)
2
 + (4/7)
2
) = 
0.490
 
0.490
 
0.245
7/14
7/14
 
Normal
: 
Gini(6/7,1/7) = 1 – ((6/7)
2
 + (1/7)
2
) = 
0.245
 
G
i
n
i
(
"
h
u
m
i
d
i
t
y
"
)
 
=
 
(
7
/
1
4
)
*
0
.
4
9
0
 
+
 
(
7
/
1
4
)
*
0
.
2
4
5
 
=
 
0
.
3
6
8
Gini("windy")
 
Gini index:
  
 
True
: 
Gini(6/8,2/8) = 1 – ((6/8)
2
 + (2/8)
2
) = 
0.375
 
0.375
 
0.5
8/14
6/14
 
False
: 
Gini(3/6,3/6) = 1 – ((3/6)
2
 + (3/6)
2
) = 
0.5
 
G
i
n
i
(
"
w
i
n
d
y
"
)
 
=
 
(
8
/
1
4
)
*
0
.
3
7
5
 
+
 
(
6
/
1
4
)
*
0
.
5
 
=
 
0
.
4
2
9
Gini index:
Outlook: 
 
0.342
Temperature:
 
0.440
Humidity:
 
0.368
Windy:
  
0.429
Choosing the "best" attribute
How to classify new examples?
 
Yes
 
No
 
Yes
And for slightly different data?
Info("A")
 = 1/11*0 + 2/11*1 + 3/11*1.585 + 1/11*0 + 4/11*0.811 = 
0.909
I
n
f
o
G
a
i
n
(
"
A
"
)
 
=
 
1
.
7
9
 
 
0
.
9
0
9
 
=
 
0
.
8
8
1
 
1
:  info([0,0,1,0]) = 
entropy
(0,0,1,0)             = – 3*0/1*log
2
(0/1) – 1/1*log
2
(1/1)                            = 
0
2
:  info([0,0,1,1]) = 
entropy
(0,0,1/2,1/2)     = – 2*0/2*log
2
(0/2) – 2*1/2*log
2
(1/2)                        = 
1
3
:  info([1,1,0,1]) = 
entropy
(1/3,1/3,0,1/3) = – 0/3*log
2
(0/3) – 3*1/3*log
2
(1/3)                             = 
1.585
4
:  info([0,0,0,1]) = 
entropy
(0,0,0,1)             = – 3*0/1*log
2
(0/1) – 1/1*log
2
(1/1)                             = 
0
5
:  info([1,0,3,0]) = 
entropy
(1/4,0,3/4,0)     = – 2*0/4*log
2
(0/4) – 1/4*log
2
(1/4) – 3/4*log
2
(3/4) = 
0.811
IBS
 = info([2,1,5,3]) = entropy(2/11,1/11,5/11,3/11) =
       = – 2/11*log
2
(2/11) – 1/11*log
2
(1/11) – 5/11*log
2
(5/11) – 3/11*log
2
(3/11) = 
1.79
InfoGain("A")
 
The "rest" … for homework 
Slide Note
Embed
Share

Decision trees are a popular machine learning algorithm for classification tasks. They consist of nodes representing conditions, branches indicating decisions, and leaves representing outcomes. By choosing the best attribute and splitting data recursively, decision trees can efficiently classify data. The selection of the best attribute relies on factors such as information gain and Gini index. Understanding the concepts of entropy, information gain, and node purity is crucial in building accurate decision trees.

  • Decision Trees
  • Classification
  • Information Gain
  • Node Purity
  • Machine Learning

Uploaded on Nov 26, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Classification Decision trees

  2. What is a decision tree? "First" node is the root of the tree "Last" nodes are called leaves The edges between nodes represent branches How do we "read" a decision tree?

  3. How to "build" a decision tree? Procedure (recursive): Choose the "best" attribute Split the data into subsets according to the values of the "best" attribute Repeat the procedure in each of the subsets Stop when (the stopping criterion): out of attributes, all leaves are "pure" (all examples in a leaf belong to the same class), the (sub)set is empty.

  4. Choosing the "best" attribute Which attribute is "the best"? The one, yielding the smaller tree The one, yielding the highest classification accuracy Heuristic: choose attribute, yielding "purest" nodes Let's look at some "(im)purity" measures: Information gain Information gain ration Gini index

  5. Which attribute to choose? example 5

  6. Information gain Measuring the information as entropy: given the probability distribution of some events, entropy measures the information that is needed to encode every possible outcome of these events entropy is measured in bits Formula for computing the entropy: entropy ?1,?2,...,?? = ?1log2?1 ?2log2?2 ... ??log2?? where p1, p2, , pn are the probabilities of given events

  7. Computing information "before" If a set T contains examples with n classes, info(T) is defined as: where pj is the relative frequency (probability) of class j in T.

  8. Computing information "after" We split the set T containing N examples into subsets T1, T2, , Tk containing N1, N2, , Nk examples, respectively. The information of this split is defined as: ?? ?????(??) ? = ?=1 ?????????(?) = ???? ?1,?2, ,??

  9. Information gain definition ???????? ? = ???? ? ???? ??,??, ,?? Information gain = = information "before split" information "after split" InfoGain = IBS IAS

  10. InfoGain the "weather" example Outlook Temperature Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No

  11. Computing the IBS Information "before the split": Yes: 9 examples No: 5 examples Info([9,5]) = entropy(5/14, 9/14) = = 5/14 * log2(5/14) 9/14 * log2(9/14) = 0.940

  12. InfoGain("outlook") Entropy (bits) Outlook Yes No P(Yes) P(No) Probability Sunny 2 3 2/5 3/5 0.971 5/14 Overcast 4 0 4/4 0/4 0 4/14 Rainy 3 2 3/5 2/5 0.971 5/14 Entropy: Sunny: Info([2,3]) = entropy(2/5,3/5) = 2/5 * log2(2/5) 3/5 * log2(3/5) = 0.971 Overcast: Info([4,0]) = entropy(1,0) = 1 * log2(1) 0 * log2(0) = 0 Rainy: Info([3,2]) = entropy(3/5,2/5) = 3/5 * log2(3/5) 2/5 * log2(2/5) = 0.971 Info("outlook")= Info([2,3],[4,0],[3,2]) = (5/14) * 0.971 + (4/14) * 0 + (5/14) * 0.971 = 0.693 InfoGain("outlook") = IBS Info("outlook") InfoGain("outlook") = 0.940 0.693 = 0.247 0.247

  13. InfoGain("temperature") Entropy (bits) Temperature Yes No P(Yes) P(No) Probability Hot 2 2 2/4 2/4 1 4/14 Mild 4 2 4/6 2/6 0.918 6/14 Cool 3 1 3/4 1/4 0.811 4/14 Entropy: Hot: Info([2,2]) = entropy(2/4,2/4) = 2/4 * log2(2/4) 2/4 * log2(2/4) = 1 Mild: Info([4,2]) = entropy(4/6,2/6) = 4/6 * log2(4/6) 2/6 * log2(2/6) = 0.918 Cool: Info([3,1]) = entropy(3/4,1/4) = 3/4 * log2(3/4) 1/4 * log2(1/4) = 0.811 Info("temperature") = Info([2,2],[4,2],[3,1]) = (4/14) * 1 + (6/14) * 0.918 + (4/14) * 0.811 = 0.911 InfoGain("temperature") = 0.940 0.911 = 0.029 0.029

  14. InfoGain("humidity") Entropy (bits) Humidity Yes No P(Yes) P(No) Probability High 3 4 3/7 4/7 0.985 7/14 Normal 6 1 6/7 1/7 0.592 7/14 Entropy: High: Info([3,4]) = entropy(3/7,4/7) = 3/7 * log2(3/7) 4/7 * log2(4/7) = 0.985 Normal: Info([6,1]) = entropy(6/7,1/7) = 6/7 * log2(6/7) 1/7 * log2(1/7) = 0.592 Info("humidity") = Info([3,4],[6,1]) = (7/14) * 0.985 + (7/14) * 0.592 = 0.789 InfoGain("humidity") = 0.940 0.789 = 0.151 0.151

  15. InfoGain("windy") Entropy (bits) Windy Yes No P(Yes) P(No) Probability True 6 2 6/8 2/8 8/14 0.811 False 3 3 3/6 3/6 6/14 1 Entropy: True: Info([6,2]) = entropy(6/8,2/8) = 6/8 * log2(6/8) 2/8 * log2(2/8) = 0.811 False: Info([3,3]) = entropy(3/6,3/6) = 3/6 * log2(3/6) 3/6 * log2(3/6) = 1 Info("windy") = Info([6,2],[3,3]) = (8/14) * 0.811 + (6/14) * 1 = 0.892 InfoGain("windy") = 0.940 0.892 = 0.048 0.048

  16. Which attribute is "best"? The one with the highest information gain (InfoGain): Outlook: 0.247 bits Temperature: 0.029 bits Humidity: 0.152 bits Wind: 0.048 bits

  17. Step 2 = recursion Outlook = Sunny Outlook Temperature Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No

  18. Next level in the tree Outlook = Sunny Attribute Temperature Outlook Temperature Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Sunny Mild High False No Sunny Cool Normal False Yes Sunny Mild Normal True Yes Entropy (bits) Temperature Yes No P(Yes) P(No) Probability Hot 0 2 0/2 2/2 0 2/5 Hot: info([0,2]) = entropy(0,1) = 0 Mild: info([1,1]) = entropy(1/2,1/2) = 1 Cool: info([1,0]) = entropy(1,0) = 0 Mild 1 1 1/2 1/2 1 2/5 Cool 1 0 1/1 0/1 0 1/5 Info("Temperature") = info([0,2],[1,1,],[1,0]) = 2/5*0 + 2/5*1 + 1/5*0 = 0.4 InfoGain("Temperature") = info([2,3]) info([0,2],[1,1,],[1,0]) = 0.971 0.4 = 0.571 0.571

  19. Next level in the tree Outlook = Sunny Attribute Humidity Outlook Temperature Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Sunny Mild High False No Sunny Cool Normal False Yes Sunny Mild Normal True Yes Entropy (bits) Humidity Yes No P(Yes) P(No) Probability High 0 3 0/3 3/3 0 3/5 High: info([0,3]) = entropy(0,1) = 0 Normal: info([2,0]) = entropy(1,0) = 0 Normal 2 0 2/2 0/2 0 2/5 Info("Humidity") = info([0,3],[2,0]) = 2/5*0 + 3/5*0 = 0 InfoGain("Humidity") = info([2,3]) info([0,3],[2,0]) = 0.971 0 = 0.971 0.971

  20. Next level in the tree Outlook = Sunny Attribute Windy Outlook Temp Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Sunny Mild High False No Sunny Cool Normal False Yes Sunny Mild Normal True Yes Entropy (bits) Windy Yes No P(Yes) P(No) Probability True 1 2 1/3 2/3 0.918 3/5 False 1 1 1/2 1/2 1 2/5 True: info([1,2]) = entropy(1/3,2/3) = 1/3*log2(1/3) 2/3*log2(2/3) = 0.918 False: info([1,1]) = entropy(1/2,1/2) = 1/2*log2(1/2) 1/2*log2(1/2) = 1 Info("Windy") = info([1,2],[1,1]) = 3/5* 0.918 + 2/5*1 = 0.951 InfoGain("Windy") = info([2,3]) info([1,2],[1,1,]) = 0.971 0.951 = 0.02 0.02

  21. Choosing the "best" attribute Information gain (InfoGain): Temperature: 0.571 Humidity: Windy: Choose the attribute Humidity The leaves are pure no need to continue the procedure 0.971 0.02

  22. Step 2 = recursion Outlook = Overcast Outlook Temp Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No

  23. Next level in the tree Outlook = Overcast Temp Humidity Outlook Windy Play Overcast Hot High False Yes Overcast Cool Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes No need to further split the node is "pure" (contains just examples of class "Yes")

  24. Step 2 = recursion Outlook = Rainy Outlook Temp Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No

  25. Next level in the tree Outlook = Rainy Outlook Temperature Humidity Windy Play Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Rainy Mild Normal False Yes Rainy Mild High True No Entropy (bits) Temperature Yes No P(Yes) P(No) Probability Mild 2 1 2/3 1/3 0.918 3/5 Cool 1 1 1/2 1/2 1 2/5 Mild: info([1,2]) = entropy(1/3,2/3) = 1/3*log2(1/3) 2/3*log2(2/3) = 0.918 Cool: info([1,1]) = entropy(1/2,1/2) = 1/2*log2(1/2) 1/2*log2(1/2) = 1 Info("Temperature") = info([2,1],[1,1]) = 3/5*0.918 + 2/5*1 = 0.951 InfoGain("Temperature") = info([2,3]) info([1,2],[1,1,]) = 0.971 0.951 = 0.02 0.02

  26. Next level in the tree Outlook = Rainy Outlook Temp Humidity Windy Play Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Rainy Mild Normal False Yes Rainy Mild High True No Entropy (bits) Humidity Yes No P(Yes) P(No) Probability High 1 1 1/2 1/2 1 2/5 Normal 2 1 2/3 1/3 0.918 3/5 High: info([1,1]) = entropy(1/2,1/2) = 1/2*log2(1/2) 1/2*log2(1/2) = 1 Normal: info([1,2]) = entropy(1/3,2/3) = 1/3*log2(1/3) 2/3*log2(2/3) = 0.918 Info("Humidity") = info([2,1],[1,1]) = 3/5*0.918 + 2/5*1 = 0.951 InfoGain("Humidity") = info([2,3]) info([1,2],[1,1,]) = 0.971 0.951 = 0.02 0.02

  27. Next level in the tree Outlook = Rainy Outlook Temp Humidity Windy Play Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Rainy Mild Normal False Yes Rainy Mild High True No Entropy (bits) Windy Yes No P(Yes) P(No) Probability True 0 2 0/2 2/2 0 2/5 High: info([0,2]) = 0 Normal: info([3,0]) = 0 False 3 0 3/3 0/3 0 3/5 Info ("Windy") = info([2,1],[1,1]) = 3/5*0 + 2/5*0 = 0 InfoGain("Windy") = info([2,3]) info([1,2],[1,1,]) = 0.971 0 = 0.971 0.971

  28. Choosing the "best" attribute Informatiom gain (InfoGain): Temperature: 0.02 Humidity: Windy: Choose the attribute Windy The leaves are pure no need to continue the procedure 0.02 0.971

  29. The "final" decision tree

  30. Information gain: problems If an attribute has many values, there is a higher probability that the subsets will be "pure"; Consequence InfoGain "perfers" attributes with larger number of values.

  31. Example: attribute "ID code" The entropy of the split = 0 (each leaf is "pure" containing just a single example); Information gain of attribute "ID code" is maximal = extreme case

  32. Information gain ratio (GainRatio) Intrinsic/split information = entropy of the split ????????????? ?,? = |??| |??| |?|. |?|log2 Information gain ratio (GainRatio): ????????(?,?) ?????????????(?,?). ?????????(?,?) =

  33. GainRatio("outlook") Entropy (bits) Outlook Yes No P(Yes) P(No) Probability Sunny 2 3 2/5 3/5 0.971 5/14 Overcast 4 0 4/4 0/4 0 4/14 Rainy 3 2 3/5 2/5 0.971 5/14 InfoGain("outlook") = 0.940 0.693 = 0.247 SplitInfo("outlook") = info([5,4,5]) = entropy(5/14, 4/14, 5/14) = = 5/14*log2(5/14) 4/14*log2(4/14) 5/14*log2(5/14) = 1.577 GainRatio("outlook") = InfoGain("outlook") / SplitInfo("outlook") = = 0.247 / 1.577 =0.156 0.156

  34. GainRatio("temperature") Entropy (bits) Temperature Yes No P(Yes) P(No) Probability Hot 2 2 2/4 2/4 1 4/14 Mild 4 2 4/6 2/6 0.918 6/14 Cool 3 1 3/4 1/4 0.811 4/14 InfoGain("temperature") = 0.940 0.911 = 0.029 SplitInfo("temperature") = info([4,6,4]) = entropy(4/14, 6/14, 4/14) = = 4/14*log2(4/14) 6/14*log2(6/14) 4/14*log2(4/14) = 1.362 GainRatio("temperature") = InfoGain("temperature") / SplitInfo("temperature") = = 0.029 / 1.362 = 0.021 0.021

  35. GainRatio("humidity") Entropy (bits) Humidity Yes No P(Yes) P(No) Probability Hot 3 4 3/7 4/7 0.985 7/14 Mild 6 1 6/7 1/7 0.592 7/14 InfoGain("humidity") = 0.940 0.789 = 0.151 SplitInfo("humidity") = info([7,7]) = entropy(7/14, 7/14) = = 7/14*log2(7/14) 7/14*log2(7/14) = 1 GainRatio("humidity") = InfoGain("humidity") / SplitInfo("humidity") = = 0.151 / 1 = 0.151 0.151

  36. GainRatio("windy") Entropy (bits) Windy Yes No P(Yes) P(No) Probability True 6 2 6/8 2/8 0.811 8/14 False 3 3 3/3 3/3 1 6/14 InfoGain("windy") = 0.940 0.892 = 0.048 SplitInfo("windy") = info([8,6]) = entropy(8/14, 6/14) = = 8/14*log2(8/14) 6/14*log2(6/14) = 0.985 GainRatio("windy") = InfoGain("windy") / SplitInfo("windy") = = 0.048 / 0.985 = 0.049 0.049

  37. Choosing the "best" attribute Information gain ratio (GainRatio): Outlook: 0.156 bits Temperature: 0.021 bits Humidity: 0.152 bits Windy: 0.049 bits

  38. Gini index "before split" If a set T contains examples with n classes, gini(T) is defined as: ? ??2 ???? ? = 1 ?=1 where pj is the relative frequency (probability) of class j in T.

  39. Gini index "after split" We split the set T containing N examples into subsets T1, T2, , Tk containing N1, N2, , Nk examples, respectively. The information of this split is defined as: ? ?? ?????(??) ?????????(?) = ?=1 The attribute with the lowest ginisplit(T) is chosen as the "best" attribute.

  40. Gini("outlook") Outlook Yes No P(Yes) P(No) Gini Probability Sunny 2 3 2/5 3/5 0.48 5/14 Overcast 4 0 4/4 0/4 0 4/14 Rainy 3 2 3/5 2/5 5/14 0.48 Gini index: Sunny: Gini([2/5,3/5]) = 1 ((2/5)2 + (3/5)2) = 0.48 Overcast: Gini([4/4,0/4]) = 1 ((4/4)2 + (0/4)2) = 0 Rainy: Gini([3/5,2/5]) = 1 ((3/5)2 + (2/5)2) = 0.48 Gini("outlook") = (5/14)*0.48 + (4/14)*0 + (5/14)*0.48 = 0.342 0.342

  41. Gini("temperature") Temperature Yes No P(Yes) P(No) Gini Probability Hot 2 2 2/4 2/4 0.5 4/14 Mild 4 2 4/6 2/6 0.444 6/14 Cool 3 1 3/4 1/4 0.375 4/14 Gini index: Hot: Gini(2/4,2/4) = 1 ((2/4)2 + (2/4)2) = 0.5 Mild: Gini(4/6,2/6) = 1 ((4/6)2 + (2/6)2) = 0.444 Cool: Gini(3/4,1/4) = 1 ((3/4)2 + (1/4)2) = 0.375 Gini("temperature") = (4/14)*0.5 + (6/14)*0.444 + (4/14)*0.375 = 0.440 0.440

  42. Gini("humidity") Humidity Yes No P(Yes) P(No) Gini Probability High 3 4 3/7 4/7 0.490 7/14 Normal 6 1 6/7 1/7 0.245 7/14 Gini index: High: Gini(3/7,4/7) = 1 ((3/7)2 + (4/7)2) = 0.490 Normal: Gini(6/7,1/7) = 1 ((6/7)2 + (1/7)2) = 0.245 Gini("humidity") = (7/14)*0.490 + (7/14)*0.245 = 0.368 0.368

  43. Gini("windy") Windy Yes No P(Yes) P(No) Gini Probability True 6 2 6/8 3/8 0.375 8/14 False 3 3 3/3 3/3 0.5 6/14 Gini index: True: Gini(6/8,2/8) = 1 ((6/8)2 + (2/8)2) = 0.375 False: Gini(3/6,3/6) = 1 ((3/6)2 + (3/6)2) = 0.5 Gini("windy") = (8/14)*0.375 + (6/14)*0.5 = 0.429 0.429

  44. Choosing the "best" attribute Gini index: Outlook: Temperature: 0.440 Humidity: Windy: 0.342 0.368 0.429

  45. How to classify new examples? Outlook Temperature Humidity Windy Play Overcast Hot High False Yes Sunny Cool High True No Rainy Mild High False Yes

  46. And for slightly different data? I D A 5 3 5 1 5 3 5 3 2 4 2 B E 14 32 12 21 20 3 4 2 16 20 13 F C y z y y y w w x y z z 438 12.03.2040 450 24.04.1934 461 05.01.1989 466 07.08.1945 467 21.07.2028 469 30.04.1966 485 28.02.2015 514 19.03.2033 522 13.03.2022 529 28.07.2037 534 05.10.1986 3.49 58.48 47.23 31.40 79.60 19.88 59.13 27.05 80.14 65.02 99.17 good bad bad good bad bad bad bad good bad good

  47. InfoGain("A") A 1 2 3 4 5 w 0 0 1 0 1 x 0 0 1 0 0 y 1 1 0 0 3 z 0 1 1 1 0 P(w) 0/1 0/2 1/3 0/1 1/4 P(x) 0/1 0/2 1/3 0/1 0/4 P(y) 1/1 1/2 0/3 0/1 3/4 P(z) 0/1 1/2 1/3 1/1 0/4 entropy probabilities 0 1 1.585 0 0.811 1/11 2/11 3/11 1/11 4/11 IBS = info([2,1,5,3]) = entropy(2/11,1/11,5/11,3/11) = = 2/11*log2(2/11) 1/11*log2(1/11) 5/11*log2(5/11) 3/11*log2(3/11) = 1.79 1: info([0,0,1,0]) = entropy(0,0,1,0) = 3*0/1*log2(0/1) 1/1*log2(1/1) = 0 2: info([0,0,1,1]) = entropy(0,0,1/2,1/2) = 2*0/2*log2(0/2) 2*1/2*log2(1/2) = 1 3: info([1,1,0,1]) = entropy(1/3,1/3,0,1/3) = 0/3*log2(0/3) 3*1/3*log2(1/3) = 1.585 4: info([0,0,0,1]) = entropy(0,0,0,1) = 3*0/1*log2(0/1) 1/1*log2(1/1) = 0 5: info([1,0,3,0]) = entropy(1/4,0,3/4,0) = 2*0/4*log2(0/4) 1/4*log2(1/4) 3/4*log2(3/4) = 0.811 Info("A") = 1/11*0 + 2/11*1 + 3/11*1.585 + 1/11*0 + 4/11*0.811 = 0.909 The "rest" for homework InfoGain("A") = 1.79 0.909 = 0.881 0.881

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#