Understanding Association Analysis in Data Mining

Slide Note
Embed
Share

Explore the concepts of association analysis, itemset definition, frequent itemset identification, Apriori algorithm, support counting using hash tree, and rule generation from frequent itemsets. Learn the principles of identifying frequent itemsets, support-based pruning, candidate generation, and rule generation to extract valuable insights from transaction data efficiently.


Uploaded on Jul 25, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Association Analysis (An Example) Yuzhen Ye Luddy School of Informatics, Computing and Engineering Spring 2020

  2. From transaction data to association rules Itemset Definition A collection of one or more items; e.g., {A, B} Support count/support Frequency/fraction of occurrence of an itemset; e.g. Support count for {A, B} is 3, and the support is 1/2 Frequent itemset An itemset whose support is greater than or equal to a specified threshold; e.g., if the threshold is set at 3 (count), then {A, B} is a frequent itemset, but {A, B, C} is not. Association rule Definition An implication expression of the form X Y, where X and Y are itemsets; e.g., {A, B} {C} Support Fraction of transactions that contain both X and Y, P(X, Y) Confidence Measures how often items in Y appear in transactions that contain X, P(Y|X) Association rule mining: Given a set of transactions T, find all rules X Y having support minsup threshold confidence minconf threshold Unique items: A (apple), B (beer), C (coke), D (diaper), E (egg) TID Items 1 {A, B} 2 {A, B, C, D, E} 3 {A, B, E} 4 {B, D, E} 5 {A, C, E} 6 {B, C, D, E}

  3. Frequent itemset identification: Apriori algorithm Apriori principle: If an itemset is frequent, then all of its subsets must also be frequent Support-based pruning: If an itemset is infrequent, then all of its supersets must also be infrequent so can be pruned null A B C D E CE DE CD AD AE BC BD BE AC AB Candidate generation: Fk-1 x F1 method, e.g., {B} + {A,E} {A, B, E} Fk-1 x Fk-1 method: Merge two frequent (k-1)- itemsets if their first (k-2) items are identical; e.g., {A,B}+{A,E} {A, B, E} Alternate Fk-1x Fk-1 Method CDE ABC ABD ABE ACD ACE ADE BCD BCE BDE ABCE BCDE ABCD ABDE ACDE TID Items 1 {A, B} 2 {A, B, C, D, E} 3 {A, B, E} 4 {B, E} 5 {A, B, C, E} 6 {B, C, D, E} ABCDE Support threshold = 3 A frequent itemset D Infrequent itemset (all of {D} s supersets are infrequent) BC Infrequent itemset

  4. Support counting using a hash tree Hash Function transaction 1 2 3 5 6 1 + 2 3 5 6 2 + 3 5 6 1,4,7 3,6,9 1 2 + 3 5 6 2,5,8 3 + 5 6 1 3 + 5 6 2 3 4 5 6 7 1 5 + 6 1 4 5 1 3 6 3 4 5 3 5 6 3 5 7 6 8 9 3 6 7 3 6 8 1 2 4 4 5 7 1 5 9 1 2 5 4 5 8 Match transaction against 11 out of 15 candidates

  5. Rule generation from frequent itemset E.g., generate all the association rules for itemset {A, B, E} TID Items {A,B,E} {} 1 {A, B} 2 {A, B, C, D, E} 3 {A, B, E} {B,E} {A} {A,E} {B} {A,B} {E} 4 {B, E} Rules pruned 5 {A, B, C, E} 6 {B, C, D, E} {B} {A,E} {E} {A,B} {A} {B,E} Minimum support = 0.5 (count: 3) Minimum confidence = 0.8 confidence({B,E} {A}) = support({A,B,E})/support({B,E}) = 3/5 confidence({A,E} {B}) = support({A,B,E})/support({A,E}) = 3/5 confidence({A,B} {E}) = support({A,B,E})/support({A,B}) = 4/5 (no need to scan the transactions again to compute confidence scores)

  6. Compact representation of frequent itemsets An itemset X is maximal frequent if it is frequent and none of its immediate supersets is frequent An itemset X is closed if none of its immediate supersets has the same support as the itemset X. Minimum support = 0.5 (count=3) null 4 A B C D E CE DE CD AD AE BC BD BE AC AB 3 4 3 CDE 3 ABC ABD ABE ACD ACE ADE BCD BCE BDE Frequent Itemsets TID Items Closed Frequent Itemsets ABCE BCDE ABCD ABDE ACDE 1 {A, B} 2 {A, B, C, D, E} Maximal Frequent Itemsets ABCDE 3 {A, B, E} 4 {B, E} 5 {A, B, C, E} AB AE Not closed Closed but not maximal 6 {B, C, D, E} BE CE ? Maximal ? BCE ? ABE

Related