Pattern Recognition, Association Rules, and Rule-based Classifiers

 
Pattern Recognition
Association Rules and Rule-based Classifiers
 
Chumphol Bunkhumpornpat
Department of Computer Science
Faculty of Science
Chiang Mai University
 
Outline
 
Market Basket Analysis
Association Rules
Apriori Algorithm
Rule-based Classifiers
 
204453: Pattern Recognition
 
2
 
Market Basket Analysis
 
204453: Pattern Recognition
 
4
 
204453: Pattern Recognition
 
APRIORI ALGORITHM
 
Itemset
 
A set of items together is called an 
itemset
.
If any itemset has k-items it is called
a 
k-itemset
.
An itemset that occurs frequently is called
a 
frequent itemset
.
frequent itemset mining is a data mining
technique to identify the items that often
occur together.
 
204453: Pattern Recognition
 
16
 
Frequent Itemset
 
A set of items is called 
frequent
 if it satisfies a
minimum threshold value for support and
confidence.
Support
 shows transactions with items
purchased together in a single transaction.
Confidence
 shows transactions where the
items are purchased one after the other.
 
204453: Pattern Recognition
 
17
 
computer 
 antivirus
[support = 2%, confidence = 60%]
 
A support of 2% for this rule means that 2% of
all the transactions under analysis show that
computer and antivirus software are
purchased together.
A confidence of 60% means that 60% of the
customers who purchased a computer also
bought the software.
 
204453: Pattern Recognition
 
18
 
Dataset
 
204453: Pattern Recognition
 
20
 
Apriori Algorithm
 
1.
Join Step: 
This step generates (K+1) itemset
from K-itemsets by joining each item with
itself.
2.
Prune Step: 
This step scans the count of each
item in the database. If the candidate item
does not meet minimum support, then it is
regarded as infrequent and thus it is
removed. This step is performed to reduce
the size of the candidate itemsets.
 
204453: Pattern Recognition
 
21
 
Transactional Data
 
Apriori Algorithm:
minimum support count = 2
 
204453: Pattern Recognition
 
24
 
Generating Association Rules
 
{I1,I2} 
 
I5,
 
confidence = 2/4 = 50%
{I1,I5} 
 
I2,
 
confidence = 2/2 = 100%
{I2,I5} 
 
I1,
 
confidence = 2/2 = 100%
I1 
 {I2,
I5}
 
confidence = 2/6 = 33%
I2 
 {I1,
I5}
 
confidence = 2/7 = 29%
I5 
 {
I1,I2}
 
confidence = 2/2 = 100%
 
204453: Pattern Recognition
 
25
 
RULE-BASED CLASSIFIERS
 
0
R
C
l
a
s
s
i
f
i
e
r
 
0R = “Zero Rule”
is the simplest classification method
which relies on the target
ignores all predictors
ZeroR classifier simply predicts the
majority category (class)
no predictability power in ZeroR
it is useful for determining a baseline
performance as a benchmark for other
classification methods.
 
204453: Pattern Recognition
 
28
 
"Play Golf = Yes" is the ZeroR model for the following dataset with an accuracy of 0.64
 
29
 
204453: Pattern Recognition
 
1
R
C
l
a
s
s
i
f
i
e
r
 
1R = "One Rule“
generates one rule for each predictor
in the data
selects the rule with the smallest
total error as its "one rule“
create a rule for a predictor,
construct a frequency table for each
predictor against the target.
OneR produces rules only slightly less
accurate than state-of-the-art
classification algorithms while
producing rules that are simple for
humans to interpret.
 
204453: Pattern Recognition
 
30
 
 
204453: Pattern Recognition
 
31
 
204453: Pattern Recognition
 
32
 
D
e
c
i
s
i
o
n
S
t
u
m
p
 
 
 
Consisting of a one-level decision
tree
it is a decision tree with
one 
internal node 
(the root) which is
immediately connected to the
terminal nodes 
(its leaves)
makes a prediction based on the
value of just a single input feature
Sometimes they are also called 1-
rules
 
204453: Pattern Recognition
 
33
 
D
e
c
i
s
i
o
n
 
S
t
u
m
p
 
204453: Pattern Recognition
 
34
 
J
R
I
P
 
i
n
 
W
e
k
a
 
204453: Pattern Recognition
 
35
 
Reference
 
Han, J., Kamber, M., Pei, J.: Data Mining:
Concepts and Techniques, 3rd Edition. Morgan
Kaufman (2011)
https://www.softwaretestinghelp.com/apriori-
algorithm/
 
204453: Pattern Recognition
 
36
Slide Note
Embed
Share

Delve into the world of Pattern Recognition, Market Basket Analysis, and Association Rules with a focus on the Apriori Algorithm. Explore the significance of itemsets and frequent itemsets in data mining techniques, uncovering associations among items and their occurrences.

  • Pattern Recognition
  • Data Mining
  • Association Rules
  • Market Basket Analysis
  • Apriori Algorithm

Uploaded on Oct 09, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Pattern Recognition Association Rules and Rule-based Classifiers Chumphol Bunkhumpornpat Department of Computer Science Faculty of Science Chiang Mai University

  2. Outline Market Basket Analysis Association Rules Apriori Algorithm Rule-based Classifiers 2 204453: Pattern Recognition

  3. Market Basket Analysis 4 204453: Pattern Recognition

  4. 204453: Pattern Recognition

  5. APRIORI ALGORITHM

  6. Itemset A set of items together is called an itemset. If any itemset has k-items it is called a k-itemset. An itemset that occurs frequently is called a frequent itemset. frequent itemset mining is a data mining technique to identify the items that often occur together. 16 204453: Pattern Recognition

  7. Frequent Itemset A set of items is called frequent if it satisfies a minimum threshold value for support and confidence. Support shows transactions with items purchased together in a single transaction. Confidence shows transactions where the items are purchased one after the other. 17 204453: Pattern Recognition

  8. computer antivirus [support = 2%, confidence = 60%] A support of 2% for this rule means that 2% of all the transactions under analysis show that computer and antivirus software are purchased together. A confidence of 60% means that 60% of the customers who purchased a computer also bought the software. 18 204453: Pattern Recognition

  9. Dataset 20 204453: Pattern Recognition

  10. Apriori Algorithm 1. Join Step: This step generates (K+1) itemset from K-itemsets by joining each item with itself. 2. Prune Step: This step scans the count of each item in the database. If the candidate item does not meet minimum support, then it is regarded as infrequent and thus it is removed. This step is performed to reduce the size of the candidate itemsets. 21 204453: Pattern Recognition

  11. Transactional Data

  12. Apriori Algorithm: minimum support count = 2 24 204453: Pattern Recognition

  13. Generating Association Rules {I1,I2} I5, {I1,I5} I2, {I2,I5} I1, I1 {I2,I5} I2 {I1,I5} I5 {I1,I2} confidence = 2/4 = 50% confidence = 2/2 = 100% confidence = 2/2 = 100% confidence = 2/6 = 33% confidence = 2/7 = 29% confidence = 2/2 = 100% 25 204453: Pattern Recognition

  14. RULE-BASED CLASSIFIERS

  15. 0R = Zero Rule is the simplest classification method which relies on the target ignores all predictors ZeroR classifier simply predicts the majority category (class) no predictability power in ZeroR it is useful for determining a baseline performance as a benchmark for other classification methods. 0R Classifier 28 204453: Pattern Recognition

  16. "Play Golf = Yes" is the ZeroR model for the following dataset with an accuracy of 0.64 29 204453: Pattern Recognition

  17. 1R = "One Rule generates one rule for each predictor in the data selects the rule with the smallest total error as its "one rule create a rule for a predictor, construct a frequency table for each predictor against the target. OneR produces rules only slightly less accurate than state-of-the-art classification algorithms while producing rules that are simple for humans to interpret. 1R Classifier 30 204453: Pattern Recognition

  18. 204453: Pattern Recognition 31

  19. 204453: Pattern Recognition 32

  20. Consisting of a one-level decision tree it is a decision tree with one internal node (the root) which is immediately connected to the terminal nodes (its leaves) makes a prediction based on the value of just a single input feature Sometimes they are also called 1- rules Decision Stump 33 204453: Pattern Recognition

  21. Decision Stump 204453: Pattern Recognition 34

  22. JRIP in Weka 204453: Pattern Recognition 35

  23. Reference Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd Edition. Morgan Kaufman (2011) https://www.softwaretestinghelp.com/apriori- algorithm/ 36 204453: Pattern Recognition

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#