Eclat Algorithm for Frequent Itemset Mining

Eclat Algorithm for Frequent Itemset Mining
Slide Note
Embed
Share

The Eclat algorithm is a powerful tool for mining frequent itemsets in transactional databases. It scans the database to identify itemsets that occur more frequently than a specified threshold, benefiting various scientific and industrial applications. Transactions are converted to vertical format, and infrequent items are removed based on minimum support. The algorithm operates recursively, generating new candidates and finding all frequent itemsets. See the process step-by-step with illustrations.

  • Eclat Algorithm
  • Frequent Itemset Mining
  • Transactional Databases
  • Vertical Format
  • Minimum Support

Uploaded on Mar 04, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. ECLAT Jelena Stan i 3212/2015

  2. ECLAT Equivalence Class Transformation A Frequent Itemset Mining (FIM) algorithm scans the database, and finds item-sets that occur in transactions more frequently than a given threshold. Scientific and industrial applications, including those in machine learning, computational biology, intrusion detection, web log mining, and e-business benefit from the use of frequent itemset mining. 2/35

  3. ECLAT Transactions, originally stored in horizontal format, are read from disk and converted to vertical format. TID Items Bread Milk Diaper 1 1 2 1 Bread, Milk 2 3 3 2 Bread, Diaper, Beer, Eggs 4 4 4 3 Milk, Diaper, Beer, Coke 5 5 5 4 Bread , Milk, Diaper, Beer 5 Bread , Milk, Diaper, Coke Beer Coke Eggs 2 2 3 3 5 4 3/35

  4. ECLAT The frequency of each item is counted and the infrequent items and their corresponding vertical lists are deleted from the vertical list. Bread Milk Diaper Bread Milk Diaper 1 1 2 1 1 2 2 3 3 2 3 3 4 4 4 4 4 4 5 5 5 5 5 5 Beer Coke Eggs Beer 2 2 3 2 3 5 3 4 4 4/35 Minimum support = 3

  5. ECLAT The Eclat algorithm is defined recursively. The initial call uses all the single items with their tidsets. In each recursive call, the function verifies each itemset-tidset pair with all the others pairs to generate new candidates. If the new candidate is frequent, it is added to the set. Then, recursively, it finds all the frequent itemsets in the branch. 5/35

  6. Bread Milk Diaper Beer 1 1 2 2 ECLAT 2 3 3 3 4 4 4 4 5 5 5 6/35

  7. Bread Milk Diaper Beer 1 1 2 2 ECLAT 2 3 3 3 4 4 4 4 5 5 5 7/35

  8. Bread Milk Diaper Beer 1 1 2 2 ECLAT 2 3 3 3 4 4 4 4 5 5 5 {Bread, Milk} {Bread, Diaper} {Bread, Beer} 1 2 2 4 4 4 5 5 8/35

  9. Bread Milk Diaper Beer 1 1 2 2 ECLAT 2 3 3 3 4 4 4 4 5 5 5 {Bread, Milk} {Bread, Diaper} {Bread, Beer} 1 2 2 4 4 4 5 5 9/35

  10. Bread Milk Diaper Beer 1 1 2 2 ECLAT 2 3 3 3 4 4 4 4 5 5 5 {Bread, Milk} {Bread, Diaper} {Bread, Beer} 1 2 2 4 4 4 5 5 10/35

  11. Bread Milk Diaper Beer 1 1 2 2 ECLAT 2 3 3 3 4 4 4 4 5 5 5 {Bread, Milk} {Bread, Diaper} {Bread, Beer} 1 2 2 4 4 4 5 5 {Bread, Milk, Diaper} 4 5 11/35

  12. Bread Milk Diaper Beer 1 1 2 2 ECLAT 2 3 3 3 4 4 4 4 5 5 5 {Bread, Milk} {Bread, Diaper} {Bread, Beer} 1 2 2 4 4 4 5 5 {Bread, Milk, Diaper} 4 5 12/35

  13. Bread Milk Diaper Beer 1 1 2 2 ECLAT 2 3 3 3 4 4 4 4 5 5 5 13/35

  14. Bread Milk Diaper Beer 1 1 2 2 ECLAT 2 3 3 3 4 4 4 4 5 5 5 14/35

  15. Bread Milk Diaper Beer 1 1 2 2 ECLAT 2 3 3 3 4 4 4 4 5 5 5 {Milk, Diaper} {Milk, Beer} 3 3 4 4 5 15/35

  16. Bread Milk Diaper Beer 1 1 2 2 ECLAT 2 3 3 3 4 4 4 4 5 5 5 {Milk, Diaper} {Milk, Beer} 3 3 4 4 5 16/35

  17. Bread Milk Diaper Beer 1 1 2 2 ECLAT 2 3 3 3 4 4 4 4 5 5 5 17/35

  18. Bread Milk Diaper Beer 1 1 2 2 ECLAT 2 3 3 3 4 4 4 4 5 5 5 18/35

  19. Bread Milk Diaper Beer 1 1 2 2 ECLAT 2 3 3 3 4 4 4 4 5 5 5 {Diaper, Beer} 2 3 4 19/35

  20. Bread Milk Diaper Beer 1 1 2 2 ECLAT 2 3 3 3 4 4 4 4 5 5 5 20/35

  21. Bread Milk Diaper Beer 1 1 2 2 ECLAT 2 3 3 3 4 4 4 4 5 5 5 {Bread, Milk} {Bread, Diaper} 1 2 4 4 5 5 {Milk, Diaper} 3 4 5 {Diaper, Beer} 2 3 21/35 4

  22. ECLAT Uses vertical database tidset(bitset) intersections. Scans the database only once. Depth-first search algorithm. 22/35

  23. ECLAT 23/35

  24. ECLAT - PARALLEL 24/35

  25. ECLAT - PARALLEL Equivalent Class , which can be defined by a set of candidates with the same size, assumed k, shared the common k 1 prefix. {Bread, Milk} {Bread, Diaper} {Bread, Beer} {Milk, Diaper} {Milk, Beer} {Diaper, Beer} 25/35

  26. ECLAT - PARALLEL 26/35

  27. ECLAT CPU IMPLEMENTATION I public class AlgoEclat{ .... public Itemsets runAlgorithm{ final Map<Integer, Set<Integer>> mapItemCount = new HashMap<Integer, Set<Integer>>(); int maxItemId = calculateSupportSingleItems(database, mapItemCount); // (2) create the list of single items List<Integer> frequentItems = new ArrayList<Integer>(); // for each item for(Entry<Integer, Set<Integer>> entry : mapItemCount.entrySet()) { Set<Integer> tidset = entry.getValue(); int support = tidset.size(); int item = entry.getKey(); if(support >= minsupRelative) { frequentItems.add(item); saveSingleItem(item, tidset, tidset.size()); } } Collections.sort(frequentItems, new Comparator<Integer>() { public int compare(Integer arg0, Integer arg1) { return mapItemCount.get(arg0).size() - mapItemCount.get(arg1).size(); }}); // For each frequent item I according to the total order 27/35 for(int i=0; i < frequentItems.size(); i++) { Integer itemI = frequentItems.get(i); // we obtain the tidset and support of that item Set<Integer> tidsetI = mapItemCount.get(itemI); int supportI = tidsetI.size();

  28. ECLAT CPU IMPLEMENTATION II // We create empty equivalence class for storing all 2-itemsets starting with List<Integer> equivalenceClassIitems = new ArrayList<Integer>(); List<Set<Integer>> equivalenceClassItidsets = new ArrayList<Set<Integer>>(); for(int j=i+1; j < frequentItems.size(); j++) { int itemJ = frequentItems.get(j); Set<Integer> tidsetJ = mapItemCount.get(itemJ); int supportJ = tidsetJ.size(); Set<Integer> tidsetIJ = performANDFirstTime(tidsetI, supportI, tidsetJ, supportJ); if(calculateSupport(2, supportI, tidsetIJ) >= minsupRelative){ equivalenceClassIitems.add(itemJ); // We also keep the tidset of "ij". equivalenceClassItidsets.add(tidsetIJ); } } if(equivalenceClassIitems.size() > 0) { // This is done by a recursive call. Note that we pass // item I to that method as the prefix of that equivalence class. itemsetBuffer[0] = itemI; equivalenceClassItidsets); processEquivalenceClass(itemsetBuffer, 1, supportI, equivalenceClassIitems, } } 28/35 } ... }

  29. ECLAT CPU IMPLEMENTATION III private void processEquivalenceClass(int[] prefix, int prefixLength, int supportPrefix, List<Integer> equivalenceClassItems, List<Set<Integer>> equivalenceClassTidsets) throws IOException { int length = prefixLength+1; if(equivalenceClassItems.size() == 1) { int itemI = equivalenceClassItems.get(0); Set<Integer> tidsetItemset = equivalenceClassTidsets.get(0); save(prefix, prefixLength, itemI, tidsetItemset, calculateSupport(length, supportPrefix, tidsetItemset)); return; } if(equivalenceClassItems.size() == 2) { int itemI = equivalenceClassItems.get(0); Set<Integer> tidsetI = equivalenceClassTidsets.get(0); int supportI = calculateSupport(length, supportPrefix, tidsetI); save(prefix, prefixLength, itemI, tidsetI, supportI); int itemJ = equivalenceClassItems.get(1); Set<Integer> tidsetJ = equivalenceClassTidsets.get(1); int supportJ = calculateSupport(length, supportPrefix, tidsetJ); save(prefix, prefixLength, itemJ, tidsetJ, supportJ); Set<Integer> tidsetIJ = this.performAND(tidsetI, tidsetI.size(), tidsetJ, tidsetJ.size()); int supportIJ = calculateSupport(length, supportI, tidsetIJ); if(supportIJ >= minsupRelative) { int newPrefixLength = prefixLength+1; 29/35 prefix[prefixLength] = itemI; save(prefix, newPrefixLength, itemJ, tidsetIJ, supportIJ); } } return;

  30. ECLAT CPU IMPLEMENTATION IV for(int i=0; i< equivalenceClassItems.size(); i++) { int suffixI = equivalenceClassItems.get(i); Set<Integer> tidsetI = equivalenceClassTidsets.get(i); int supportI = calculateSupport(length, supportPrefix, tidsetI); save(prefix, prefixLength, suffixI, tidsetI, supportI); List<Integer> equivalenceClassISuffixItems= new ArrayList<Integer>(); List<Set<Integer>> equivalenceITidsets = new ArrayList<Set<Integer>>(); for(int j=i+1; j < equivalenceClassItems.size(); j++) { int suffixJ = equivalenceClassItems.get(j); Set<Integer> tidsetJ = equivalenceClassTidsets.get(j); int supportJ = calculateSupport(length, supportPrefix, tidsetJ); Set<Integer> tidsetIJ = performAND(tidsetI, supportI, tidsetJ, supportJ); int supportIJ = calculateSupport(length, supportI, tidsetIJ); if(supportIJ >= minsupRelative) { equivalenceClassISuffixItems.add(suffixJ); equivalenceITidsets.add(tidsetIJ); } } if(equivalenceClassISuffixItems.size() >0) { prefix[prefixLength] = suffixI; 30/35 int newPrefixLength = prefixLength+1; processEquivalenceClass(prefix, newPrefixLength, supportI, equivalenceClassISuffixItems, equivalenceITidsets); } }

  31. ECLAT GPU IMPLEMENTATION I 31/35

  32. ECLAT GPU IMPLEMENTATION II __global__ void kernel_calc( unsigned int** src_list_1, unsigned int** src_list_2, unsigned int** dst_list, unsigned int* result, unsigned int list_len,unsigned int vlist_len) { __shared__ unsigned int sup[MAX_THREAD]; pdst[thread_pos]=tmp; unsigned int* psrc1; unsigned int* psrc2; unsigned int* pdst; } __syncthreads(); for (bound = blockDim.x / 2; bound > 0; bound >>= 1) { if (threadIdx.x < bound) sup[threadIdx.x]+=sup[threadIdx.x+bound]; __syncthreads(); } unsigned int iter, i, tmp; unsigned int bound; if (blockIdx.x >= list_len) return; sup[threadIdx.x] = 0; __syncthreads(); iter = (vlist_len - 1) / blockDim.x + 1; if(threadIdx.x == 0) { *(result+current_block_pos)=sup[0]; } } psrc1 = src_list_1[blockIdx.x]; psrc2 = src_list_2[blockIdx.x]; pdst = dst_list[blockIdx.x]; __syncthreads(); for (i = 0; i < iter; i++) { int thread_pos = i * blockDim.x + threadIdx.x; if (thread_pos >= vlist_len) break; 32/35 tmp=psrc1[thread_pos] & psrc2[thread_pos]; sup[threadIdx.x]+=__popc(tmp);

  33. ECLAT CPU VS GPU System characteristics: Processor: Installed memory(RAM): System type: GPU: OS: Intel(R) Core(TM) i5-5200 CPU @ 2.20 GHZ 8.00 GB 64-bit Operating system, x64-based processor NVIDIA GeForce 940M Windows 10 Chess Retail 14 70 12 60 10 50 8 40 CPU CPU 6 30 GPU GPU 4 20 2 10 0 0 50% 60% 70% 80% 90% 100% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 33/35

  34. ECLAT CPU VS GPU Pumsb Accidents 160 140 7000 120 6000 100 5000 CPU 80 4000 CPU 60 GPU 3000 GPU 40 2000 20 1000 0 0 50% 60% 70% 80% 90% 100% 60% 70% 80% 90% 100% Mushroom Connect 1600 80 1400 70 1200 60 1000 50 CPU 800 40 CPU GPU 30 600 GPU 20 400 10 200 0 34/35 0 70% 80% 90% 100%

  35. THANK YOU FOR YOUR ATTENTION! Questions? 35/35

More Related Content