Bayes Classifier in Pattern Recognition

 
Pattern Recognition
Chapter 4: Bayes Classifier
 
Chumphol Bunkhumpornpat
Department of Computer Science
Faculty of Science
Chiang Mai University
 
Learning Objectives
 
Be able to state the Bayes theorem
Be able to classify objects using naive Bayes
classifiers
Be able to understand how Bayesian belief
network
 
204453: Pattern Recognition
 
2
 
Bayes Classifier
 
Simple Probabilistic Classifier
Optimal Classifier
It minimises the average probability of error.
Assumption
Information about classes is in the form of prior
probabilities.
Distributions of patterns in the class are known.
 
204453: Pattern Recognition
 
3
 
Bayes Classifier (cont.)
 
It employs the posterior probabilities to assign
the class label to a test pattern.
A pattern is assigned the label of the class that
has the maximum posterior probability.
Bayes theorem converts the prior probability
into posterior probability based on the pattern
to be classified.
 
204453: Pattern Recognition
 
4
 
Bayes Theorem
 
X: Pattern whose class label is unknown
H
i
: Hypothesis that X belongs to class C
i
P(H
i
): Prior probability of H
i
Known
Obtained before observing X
P(H
i
 |X): Posterior probability of H
i
conditioned on X
P(H
i
 |X) = P(X|H
i
) 
 
P(H
i
) / P(X)
 
204453: Pattern Recognition
 
5
 
Example 1
 
In a coffee shop, 99% of the customers prefer
coffee.
The remaining 1% prefer tea.
P(coffee drinker) = 0.99
P(tea drinker) = 0.01
 
204453: Pattern Recognition
 
6
 
Example 1 (cont.)
 
In the absence of any other information, we
can classify any customer as a coffee drinker
and the probability of error is only 0.01.
This is because we are classifying a tea drinker
also as a coffee drinker.
 
 
204453: Pattern Recognition
 
7
 
Example 2
 
The prior probability of H that a road is wet is
P(H) = 0.3.
The probability that  a road is not wet is 0.7.
If we use only this information, then it is good
to decide that a road is not wet.
The corresponding probability of error is 0.3.
 
204453: Pattern Recognition
 
8
 
Example 2 (cont.)
 
Probability of rain, P(X), is 0.3.
If it rains, we need to calculate the posterior
probability that the roads are wet, i.e., P(H|X).
If 90% of the time when the roads are wet, it
is because it has rained.
P(X|H)
P(it has rained | roads are wet)
P(road is wet|it has rained) =
P(X|H)
P(H) / P(X) = 0.9
0.3 / 0.3 = 0.9
 
204453: Pattern Recognition
 
9
 
Example 2 (cont.)
 
The probability of error is 0.1, which is the
probability that a road is not wet given that it
has rained.
Requirements
P(H)
P(X|H)
P(X)
 
204453: Pattern Recognition
 
10
 
Classification Using Naive Bayes Classifier
 
Class-Conditionally Independent
It simplifies the computation (naive).
Comparable Performance
Classification Trees
Neural Network
Large Databases
High Accuracy
High Speed
 
204453: Pattern Recognition
 
11
 
Parameter Estimation
 
Class Priors and Feature Probability
Distributions
Frequently-Based Probability Estimate
It incorporates a small-sample correction .
No probability is ever set to be zero.
To avoid this, a small value is taken.
 
 
204453: Pattern Recognition
 
12
 
Example 3
 
Total number of examples = 100
Number of examples of class 1 = 40
Number of examples of class 2 = 30
Number of examples of class 3 = 30
Prior probability of class 1 = 40/100 = 0.4
Prior probability of class 2 = 30/100 = 0.3
Prior probability of class 3 = 30/100 = 0.3
 
204453: Pattern Recognition
 
13
 
Example 3 (cont.)
 
Out of the 40 examples  of Class 1, if a binary
feature takes 0 in 30 examples and 1 in 10
examples, then the prior probability that this
feature is 0 in this class will be 30/40 = 0.75.
 
204453: Pattern Recognition
 
14
 
Constructing a Classifier from
the Probability Model
 
Bayes Probability Model and MAP Decision Rule
Most Probable Hypothesis
Maximum Posterior
Function “classify”
C: Class Variable
f
1
, …, f
n
: n Feature Variables
 
204453: Pattern Recognition
 
15
 
Example Training Data Set
 
204453: Pattern Recognition
 
16
 
Cook = Sita; Mood = Bad; Cuisine = Continental
 
P(Cook = Sita|Tasty = yes) = 2/6 = 0.33
P(Cook = Sita|Tasty = no) = 0/4 = 0 
 0.01
P(Mood = Bad|Tasty = yes) = 2/6 = 0.33
P(Mood = Bad|Tasty = no) = 3/4 = 0.75
P(Cuisine = Continental|Tasty = yes) = 2/6 = 0.33
P(Cuisine = Continental|Tasty = no) = 3/4 = 0.75
 
204453: Pattern Recognition
 
17
 
Cook = Sita; Mood = Bad; Cuisine = Continental
(cont.)
 
P(Tasty = yes|X) = 0.6 
 0.33 
 0.33 
 0.33 =
0.0216
P(Tasty = no|X) = 0.4 
 0.01 
 0.75 
 0.75 =
0.00225
P(Tasty = yes|X) > P(Tasty = no|X)
X is classified as belonging to the class
Tasty = yes.
 
204453: Pattern Recognition
 
18
 
Bayesian Belief Network
 
Probabilistic Graphical Model (Directed Acyclic Graph)
Sets of Variables (Nodes)
Probabilistic Dependencies (Arcs)
Node A is a parent of Node B if there is an arc
from A to B.
Parent(X
i
): Set of parent nodes of a node X
i
 
204453: Pattern Recognition
 
19
 
Example 4
 
 
Ram is a student. He loves going to the
movies. He will go to the theatre in the
evening if he has money in his pocket. On the
other hand, if it rains, he will not go the
theatre. When Ram does not go to the
movies, he stays home and watches the
television. He also dedicates some time to his
studies.
 
204453: Pattern Recognition
 
20
 
The variables involved are
 
M: Money in Ram’s Pocket
R: Rain
S: Ram Studies
T: Ram watches television.
G: Ram goes to the movies.
M and R are not influenced by any factor.
S and T are influenced by G.
 
204453: Pattern Recognition
 
21
 
Belief Network of
How Ram Spends an Evening
 
204453: Pattern Recognition
 
22
 
Example 4 (cont.)
 
Let 
A stand for negation of the preposition A.
The probability that it does not rain, Ram does
not have money, Ram goes to the movies and
Ram does not watch television
P(
R, 
M, G, 
T
)
P(
R
) 
 
P(
M
) 
 
P(G|
R and 
M
) 
 
P(
T|G
)
0.7
 
 0.4 
 0.1 
 1.0 = 0.028
 
204453: Pattern Recognition
 
23
 
Reference
 
Murty, M. N., Devi, V. S.: Pattern Recognition:
An Algorithmic Approach (Undergraduate
Topics in Computer Science). Springer (2012)
 
204453: Pattern Recognition
 
24
Slide Note
Embed
Share

Bayes Classifier is a simple probabilistic classifier that minimizes error probability by utilizing prior and posterior probabilities. It assigns class labels based on maximum posterior probability, making it an optimal tool for classification tasks. This chapter covers the Bayes Theorem, classification examples, and the application of Bayesian belief networks in pattern recognition.

  • Bayes Classifier
  • Pattern Recognition
  • Bayesian Belief Networks
  • Classification

Uploaded on Sep 26, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Pattern Recognition Chapter 4: Bayes Classifier Chumphol Bunkhumpornpat Department of Computer Science Faculty of Science Chiang Mai University

  2. Learning Objectives Be able to state the Bayes theorem Be able to classify objects using naive Bayes classifiers Be able to understand how Bayesian belief network 2 204453: Pattern Recognition

  3. Bayes Classifier Simple Probabilistic Classifier Optimal Classifier It minimises the average probability of error. Assumption Information about classes is in the form of prior probabilities. Distributions of patterns in the class are known. 3 204453: Pattern Recognition

  4. Bayes Classifier (cont.) It employs the posterior probabilities to assign the class label to a test pattern. A pattern is assigned the label of the class that has the maximum posterior probability. Bayes theorem converts the prior probability into posterior probability based on the pattern to be classified. 4 204453: Pattern Recognition

  5. Bayes Theorem X: Pattern whose class label is unknown Hi: Hypothesis that X belongs to class Ci P(Hi): Prior probability of Hi Known Obtained before observing X P(Hi|X): Posterior probability of Hi conditioned on X P(Hi|X) = P(X|Hi) P(Hi) / P(X) 5 204453: Pattern Recognition

  6. Example 1 In a coffee shop, 99% of the customers prefer coffee. The remaining 1% prefer tea. P(coffee drinker) = 0.99 P(tea drinker) = 0.01 6 204453: Pattern Recognition

  7. Example 1 (cont.) In the absence of any other information, we can classify any customer as a coffee drinker and the probability of error is only 0.01. This is because we are classifying a tea drinker also as a coffee drinker. 7 204453: Pattern Recognition

  8. Example 2 The prior probability of H that a road is wet is P(H) = 0.3. The probability that a road is not wet is 0.7. If we use only this information, then it is good to decide that a road is not wet. The corresponding probability of error is 0.3. 8 204453: Pattern Recognition

  9. Example 2 (cont.) Probability of rain, P(X), is 0.3. If it rains, we need to calculate the posterior probability that the roads are wet, i.e., P(H|X). If 90% of the time when the roads are wet, it is because it has rained. P(X|H) P(it has rained | roads are wet) P(road is wet|it has rained) = P(X|H) P(H) / P(X) = 0.9 0.3 / 0.3 = 0.9 9 204453: Pattern Recognition

  10. Example 2 (cont.) The probability of error is 0.1, which is the probability that a road is not wet given that it has rained. Requirements P(H) P(X|H) P(X) 10 204453: Pattern Recognition

  11. Classification Using Naive Bayes Classifier Class-Conditionally Independent It simplifies the computation (naive). Comparable Performance Classification Trees Neural Network Large Databases High Accuracy High Speed 11 204453: Pattern Recognition

  12. Parameter Estimation Class Priors and Feature Probability Distributions Frequently-Based Probability Estimate It incorporates a small-sample correction . No probability is ever set to be zero. To avoid this, a small value is taken. 12 204453: Pattern Recognition

  13. Example 3 Total number of examples = 100 Number of examples of class 1 = 40 Number of examples of class 2 = 30 Number of examples of class 3 = 30 Prior probability of class 1 = 40/100 = 0.4 Prior probability of class 2 = 30/100 = 0.3 Prior probability of class 3 = 30/100 = 0.3 13 204453: Pattern Recognition

  14. Example 3 (cont.) Out of the 40 examples of Class 1, if a binary feature takes 0 in 30 examples and 1 in 10 examples, then the prior probability that this feature is 0 in this class will be 30/40 = 0.75. 14 204453: Pattern Recognition

  15. Constructing a Classifier from the Probability Model Bayes Probability Model and MAP Decision Rule Most Probable Hypothesis Maximum Posterior Function classify C: Class Variable f1, , fn: n Feature Variables 15 204453: Pattern Recognition

  16. Example Training Data Set 16 204453: Pattern Recognition

  17. Cook = Sita; Mood = Bad; Cuisine = Continental P(Cook = Sita|Tasty = yes) = 2/6 = 0.33 P(Cook = Sita|Tasty = no) = 0/4 = 0 0.01 P(Mood = Bad|Tasty = yes) = 2/6 = 0.33 P(Mood = Bad|Tasty = no) = 3/4 = 0.75 P(Cuisine = Continental|Tasty = yes) = 2/6 = 0.33 P(Cuisine = Continental|Tasty = no) = 3/4 = 0.75 17 204453: Pattern Recognition

  18. Cook = Sita; Mood = Bad; Cuisine = Continental (cont.) P(Tasty = yes|X) = 0.6 0.33 0.33 0.33 = 0.0216 P(Tasty = no|X) = 0.4 0.01 0.75 0.75 = 0.00225 P(Tasty = yes|X) > P(Tasty = no|X) X is classified as belonging to the class Tasty = yes. 18 204453: Pattern Recognition

  19. Bayesian Belief Network Probabilistic Graphical Model (Directed Acyclic Graph) Sets of Variables (Nodes) Probabilistic Dependencies (Arcs) Node A is a parent of Node B if there is an arc from A to B. Parent(Xi): Set of parent nodes of a node Xi 19 204453: Pattern Recognition

  20. Example 4 Ram is a student. He loves going to the movies. He will go to the theatre in the evening if he has money in his pocket. On the other hand, if it rains, he will not go the theatre. When Ram does not go to the movies, he stays home and watches the television. He also dedicates some time to his studies. 20 204453: Pattern Recognition

  21. The variables involved are M: Money in Ram s Pocket R: Rain S: Ram Studies T: Ram watches television. G: Ram goes to the movies. M and R are not influenced by any factor. S and T are influenced by G. 21 204453: Pattern Recognition

  22. Belief Network of How Ram Spends an Evening 22 204453: Pattern Recognition

  23. Example 4 (cont.) Let A stand for negation of the preposition A. The probability that it does not rain, Ram does not have money, Ram goes to the movies and Ram does not watch television P( R, M, G, T) P( R) P( M) P(G| R and M) P( T|G) 0.7 0.4 0.1 1.0 = 0.028 23 204453: Pattern Recognition

  24. Reference Murty, M. N., Devi, V. S.: Pattern Recognition: An Algorithmic Approach (Undergraduate Topics in Computer Science). Springer (2012) 24 204453: Pattern Recognition

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#