LCJoin: Set Containment Join via List Crosscutting

lcjoin set containment join via list crosscutting l.w
1 / 19
Embed
Share

Explore the concept of LCJoin, a method for performing set containment join via list crosscutting. The content covers problem definitions, existing solutions like Intersection-Oriented and Union-Oriented methods, and motivations behind different approaches. Dive into experimental results, conclusion, and examples of 4-bit signatures for better understanding. Discover how LCJoin leverages list cross-cutting to efficiently handle set containment joins.

  • LCJoin
  • Set Containment Join
  • List Crosscutting
  • Existing Solutions
  • Intersection-Oriented Methods

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. LCJoin: Set Containment Join via List Crosscutting Dong Deng, Chengcheng Yang, Shuo Shang, Fan Zhu, Li Liu, Ling Shao Inception Institute of Artificial Intelligence

  2. Contents Set Containment Join Existing Solutions LCJoin Experimental Results Conclusion

  3. Problem Definition Set Containment Join (??,??),(??,??) Find All Pairs Such That ? ? ? Rid Sid ?1 {?1,?2,?4,?5,?6} ?1 {?1,?2,?3,?4} ?2 {?2,?4,?5} ?2 {?2,?3,?5} ?3 {?1,?2,?3,?4,?6} ?3 {?1,?2,?5,?6} Job Advertisements Job Seeker Publish/Subscribe

  4. Existing Solutions Set Containment Join (SCJ) SCJ Intersection- Oriented Method Union-Oriented Method Basic Idea 1 1 Generate Signature ???(?)/ ???(?) for Each Set in and ? Build an Inverted Index for ? 2 Generate Candidate Pairs (R, S) Where ???(?) ???(?) 2 For Each R in , Intersect all Inverted Lists Corresponding to Elements in R 3 Verify Candidates

  5. Motivation Existing Solutions Union-Oriented Methods Example: 4-bit Signatures for R and S. To set a bit, i mod 4 for each ?? ? Rid Sid Sig Sig ?1 {?1,?2,?4,?5,?6} 1110 1111 ?1 {?1,?2,?3,?4} ?2 {?2,?4,?5} 1110 0111 ?2 {?2,?3,?5} ?3 {?1,?2,?3,?4,?6} 1111 ?3 {?1,?2,?5,?6} 0110 Enumerate & Lookup Candidates: (??,??), (??,??), (??,??) Verification Result: (??,??) Large Signature Size High Enumeration Cost Signature Size? Small Signature Size Many Candidates. High Verification Cost

  6. Existing Solutions Intersection-Oriented Methods ? Rid Sid ?1 {?1,?2,?4,?5,?6} ?1 {?1,?2,?3,?4} ?2 {?2,?4,?5} ?2 {?2,?3,?5} ?3 {?1,?2,?3,?4,?6} Build Inverted Index ??(??): ?3 {?1,?2,?5,?6} ??, ?? Result of ??= ??(??) ??(??) ??(??) ???? = ?? ??(??): ??, ??, ?? Result of ??= ??(??): ?? Result of ??= ?? ??(??): ??, ??, ?? ??(??): ??, ?? Verification Free ??(??): ??, ??

  7. LCJoin List Cross-Cutting Based Set Containment Join ?1 ?2 ?3 ?4 ??(?1) ?3 ?4 ?5 ?6 ?7 ?1 ?2 ?3 ?4 ??(?1) ?3 ?4 ?5 ?6 ?7 ? ? ??(?2) ??(?3) ??(?4) ??(?2) ??(?3) ??(?4) ?1 ?2 ?3 ?7 ?1 ?2 ?3 ?5 ?6 ?7 ?1 ?3 ?4 ?5 ?6 ?1 ?2 ?3 ?7 ?1 ?2 ?3 ?5 ?6 ?7 ?1 ?3 ?4 ?5 ?6 Cross-Cutting Rip-Cutting ??(?) ??(?) GAP ??(?) ??(?) GAP GAP

  8. LCJoin List Cross-Cutting Based Set Containment Join Example s1s10s13s14s18s25s27s30 ??(?1) s1s13s17s20s21s23s25 ??(?2) s7s12s13s16s26s29 ??(?3) s1s3s13s15s19 ??(?4) MaxSid

  9. LCJoin List Cross-Cutting Based Set Containment Join Early Termination s1s10s13s14s18s25s27s30 ??(?1) s1s13s17s20s21s23s25 ??(?2) s7s12s13s16s26s29 ??(?3) s1s3s13s15s19 ??(?4) MaxSid

  10. LCJoin List Cross-Cutting Based Set Containment Join Two Optimizations Design a prefix-tree-based method to share the computation between sets in Partition the data to further improve efficiency

  11. LCJoin Prefix-Tree Based Method PostOrder-Traverse root ?(?1) ?(?2) ?(?3) ??????? = ?? ??exists on the nodes from root to leaf ??. result pair (??,??) ?1 ?????? = ?? ??????? = ?1 ?3 ?4 ?5 ?6 ?7 ?1 ?2 ?3 ?5 ?6 ?7 ?1 ?2 ?3 ?7 ??????? = ?? ?2 ?????? = ?? ??????? = ?1 ??????? = ?? ??????? = ?? ?(?4) ?(?5) ?(?6) ?3 ?5 ?????? = ?? ??????? = ? ?????? = ?? ??????? = ?1 ?1 ?3 ?4 ?5 ?6 ?1 ?2 ?4 ?5 ?1 ?3 ?4 ?5 ?6 ?7 ??????? = ?? ??????? = ?? ?6 ?4 ?????? = ?? ??????? = ?? ?????? = ?? ??????? = ?1 ?1 ?2 use the max NextMax on the path from root to leaf as the next MaxSid

  12. SAL-Hashing LCJoin Data Partition ???? ?(?2) ?(?3) ?(?1) ?(?2) ?(?3) ?3 ?7 ?1 ?2 ?3 ?7 ?3 ?4 ?5 ?6 ?7 ?1 ?2 ?3 ?5 ?6 ?7 ?1 ?2 ?3 ?7 ?1 ?2 Use ??,??,??,?? ?3 ?2 ?(?4) ?(?5) ?(?6) Build Local Inverted Index ?1 ?3 ?1 ?2 ?1 ?3 ?7 ?(?4) ?(?5) ?(?6) ?3 ?5 ?5 ?1 ?3 ?4 ?5 ?6 ?1 ?2 ?4 ?5 ?1 ?3 ?4 ?5 ?6 ?7 Shorter Inverted Lists ?6 ?3 ?4 ?1 ?2

  13. SAL-Hashing LCJoin Data Partition ???? Benefit Shorter Inverted List for Each Node on This Partition Cost ?(?1) ?(?2) ?(?3) Build Local Inverted Index ?3 ?4 ?5 ?6 ?7 ?1 ?2 ?3 ?5 ?6 ?7 ?1 ?2 ?3 ?7 ?1 ?2 ?3 ?2 ?(?4) ?(?5) ?(?6) ?3 ?5 ?5 ?1 ?3 ?4 ?5 ?6 ?1 ?2 ?4 ?5 ?1 ?3 ?4 ?5 ?6 ?7 Build Local Inverted Index ?6 ?3 ?4 Use Original Built Inverted Index ?1 ?2

  14. Experimental Results Experimental Setup Ubuntu 16.04 Intel Xeon Gold-6148 Processor 64G DRAM Characteristics of Our Datasets Dataset # of Sets Min/Max/Avg Size # of Elements Z-Value FLICKR 3,546,729 1 / 1230 / 5.4 618,971 0.63 AOL 36,389,577 1 / 125 / 2.5 3,849,556 0.68 ORKUT 15,301,901 2 / 9120 / 7 2,322,299 0.13 TWITTER 28,819,434 2 / 4998 / 9 13,096,918 0.3

  15. Experimental Results Evaluation of the tree-based methods FLICKR TWITTER AOL Tree Based Method Can Share Computations in Common Prefixes of Sets in

  16. Experimental Results Evaluation of data partition methods AOL TWITTER AOL Dynamic Partition achieved the best performance

  17. Experimental Results Comparing with State-of-the-Art ? ?? TWITTER AOL LCJoin Outperforms the State-of-the-Art Method by Up to ??

  18. Conclusion A cross-cutting based list intersection method, which can skip many irrelevant entries. A A A prefix tree on to share the computation across sets. Data partitions to further improve the efficiency and scalability. B B LCJoin C C Experimenting to demonstrate the superiority of LCJoin D D

  19. Q&A

Related


More Related Content