Association Analysis in Data Mining

 
Association Analysis
(An Example)
 
Yuzhen Ye
Luddy School of Informatics, Computing and Engineering
Spring 2020
 
From transaction data to association rules
 
Itemset
Definition
 A collection of one or more items; e.g., {A, B}
Support count/
support  
Frequency/fraction of occurrence of an
itemset; e.g.   
Support count for {A, B} is 3, and the support is 1/2
Frequent itemset 
An itemset whose support is greater than or
equal to a specified threshold; e.g., if the threshold is set at 3
(count), then {A, B} is a frequent itemset, but {A, B, C} is not.
Association rule
Definition
 An implication expression of the form X 
 Y, where X and Y
are itemsets; e.g., {A, B} 
 {C}
Support 
Fraction of transactions that contain both X and Y, P(X, Y)
Confidence 
Measures how often items in Y appear in transactions that
contain X, P(Y|X)
Association rule mining
: Given a set of transactions T, find all
rules X 
 Y 
having
support 
minsup 
threshold
confidence ≥ 
minconf 
threshold
 
 
 
 
 
 
 
Unique items:
A (apple), B (beer), C (coke),
D (diaper), E (egg)
 
Frequent itemset identification: Apriori
algorithm
A
B
C
D
E
AB
AC
AD
AE
BC
BD
BE
CD
CE
DE
ABC
ABD
ABE
ACD
ACE
BCD
BCE
CDE
ADE
BDE
ABCD
ABCE
ABDE
ACDE
BCDE
null
ABCDE
 
Support threshold = 3
D
A
BC
 
Infrequent itemset
 
Infrequent itemset (all of {D}’s supersets are infrequent)
 
frequent itemset
 
Apriori principle:
If an itemset is frequent, then all of its subsets
must also be frequent
Support-based pruning
:
If an itemset is infrequent, then all of its supersets
must also be infrequent so can be pruned
 
Candidate generation:
Fk-1 x F1 method, e.g., 
{B} + {A,E} 
 {A, B, E}
Fk-1 x Fk-1 method: 
Merge two frequent (k-1)-
itemsets if their first (k-2) items are identical;
e.g., {A,B}+{A,E} 
 {A, B, E}
Alternate F
k-1
 
x
 F
k-1
 Method
 
 
Support counting using a hash tree
 
1 5 9
 
1 3 6
 
3 4 5
 
transaction
 
Match transaction against 11 out of 15 candidates
 
Rule generation from frequent itemset
{A,B,E}
 
{}
 
E.g., generate all the association rules for itemset {A, B, E}
{B,E}
 
{A}
{A,E}
 
{B}
{A,B}
 
{E}
{B}
 
{A,E}
{E}
 
{A,B}
{A}
 
{B,E}
 
confidence
({B,E}
 
{A}) = support({A,B,E})/support({B,E}) =  3/5
confidence
({A,E}
 
{B}) = support({A,B,E})/support({A,E}) =  3/5
confidence
({A,B}
 
{E}) = support({A,B,E})/support({A,B}) =  4/5
(no need to scan the transactions again to compute confidence scores)
 
Minimum support = 0.5 (count: 3)
Minimum confidence = 0.8
 
Rules pruned
 
Compact representation of frequent itemsets
 
An itemset X is 
maximal
 frequent if it is
frequent and none of its immediate
supersets is frequent
An itemset X is closed if none of its
immediate supersets has the same support
as the itemset X.
 
A
B
C
D
E
AB
AC
AD
AE
BC
BD
BE
CD
CE
DE
ABC
ABD
ABE
ACD
ACE
BCD
BCE
CDE
ADE
BDE
ABCD
ABCE
ABDE
ACDE
BCDE
null
ABCDE
 
4
 
Minimum support = 0.5 (count=3)
ABE
 
4
 
3
 
3
AB
 
Closed but not maximal
 
Maximal
 
3
AE
 
Not closed
BE
CE
 
?
BCE
 
?
 
?
Slide Note
Embed
Share

Explore the concepts of association analysis, itemset definition, frequent itemset identification, Apriori algorithm, support counting using hash tree, and rule generation from frequent itemsets. Learn the principles of identifying frequent itemsets, support-based pruning, candidate generation, and rule generation to extract valuable insights from transaction data efficiently.

  • Data Mining
  • Association Analysis
  • Frequent Itemset
  • Apriori Algorithm
  • Rule Generation

Uploaded on Jul 25, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Association Analysis (An Example) Yuzhen Ye Luddy School of Informatics, Computing and Engineering Spring 2020

  2. From transaction data to association rules Itemset Definition A collection of one or more items; e.g., {A, B} Support count/support Frequency/fraction of occurrence of an itemset; e.g. Support count for {A, B} is 3, and the support is 1/2 Frequent itemset An itemset whose support is greater than or equal to a specified threshold; e.g., if the threshold is set at 3 (count), then {A, B} is a frequent itemset, but {A, B, C} is not. Association rule Definition An implication expression of the form X Y, where X and Y are itemsets; e.g., {A, B} {C} Support Fraction of transactions that contain both X and Y, P(X, Y) Confidence Measures how often items in Y appear in transactions that contain X, P(Y|X) Association rule mining: Given a set of transactions T, find all rules X Y having support minsup threshold confidence minconf threshold Unique items: A (apple), B (beer), C (coke), D (diaper), E (egg) TID Items 1 {A, B} 2 {A, B, C, D, E} 3 {A, B, E} 4 {B, D, E} 5 {A, C, E} 6 {B, C, D, E}

  3. Frequent itemset identification: Apriori algorithm Apriori principle: If an itemset is frequent, then all of its subsets must also be frequent Support-based pruning: If an itemset is infrequent, then all of its supersets must also be infrequent so can be pruned null A B C D E CE DE CD AD AE BC BD BE AC AB Candidate generation: Fk-1 x F1 method, e.g., {B} + {A,E} {A, B, E} Fk-1 x Fk-1 method: Merge two frequent (k-1)- itemsets if their first (k-2) items are identical; e.g., {A,B}+{A,E} {A, B, E} Alternate Fk-1x Fk-1 Method CDE ABC ABD ABE ACD ACE ADE BCD BCE BDE ABCE BCDE ABCD ABDE ACDE TID Items 1 {A, B} 2 {A, B, C, D, E} 3 {A, B, E} 4 {B, E} 5 {A, B, C, E} 6 {B, C, D, E} ABCDE Support threshold = 3 A frequent itemset D Infrequent itemset (all of {D} s supersets are infrequent) BC Infrequent itemset

  4. Support counting using a hash tree Hash Function transaction 1 2 3 5 6 1 + 2 3 5 6 2 + 3 5 6 1,4,7 3,6,9 1 2 + 3 5 6 2,5,8 3 + 5 6 1 3 + 5 6 2 3 4 5 6 7 1 5 + 6 1 4 5 1 3 6 3 4 5 3 5 6 3 5 7 6 8 9 3 6 7 3 6 8 1 2 4 4 5 7 1 5 9 1 2 5 4 5 8 Match transaction against 11 out of 15 candidates

  5. Rule generation from frequent itemset E.g., generate all the association rules for itemset {A, B, E} TID Items {A,B,E} {} 1 {A, B} 2 {A, B, C, D, E} 3 {A, B, E} {B,E} {A} {A,E} {B} {A,B} {E} 4 {B, E} Rules pruned 5 {A, B, C, E} 6 {B, C, D, E} {B} {A,E} {E} {A,B} {A} {B,E} Minimum support = 0.5 (count: 3) Minimum confidence = 0.8 confidence({B,E} {A}) = support({A,B,E})/support({B,E}) = 3/5 confidence({A,E} {B}) = support({A,B,E})/support({A,E}) = 3/5 confidence({A,B} {E}) = support({A,B,E})/support({A,B}) = 4/5 (no need to scan the transactions again to compute confidence scores)

  6. Compact representation of frequent itemsets An itemset X is maximal frequent if it is frequent and none of its immediate supersets is frequent An itemset X is closed if none of its immediate supersets has the same support as the itemset X. Minimum support = 0.5 (count=3) null 4 A B C D E CE DE CD AD AE BC BD BE AC AB 3 4 3 CDE 3 ABC ABD ABE ACD ACE ADE BCD BCE BDE Frequent Itemsets TID Items Closed Frequent Itemsets ABCE BCDE ABCD ABDE ACDE 1 {A, B} 2 {A, B, C, D, E} Maximal Frequent Itemsets ABCDE 3 {A, B, E} 4 {B, E} 5 {A, B, C, E} AB AE Not closed Closed but not maximal 6 {B, C, D, E} BE CE ? Maximal ? BCE ? ABE

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#