Understanding Join Algorithms in Database Systems

Slide Note
Embed
Share

This presentation delves into the intricacies of join algorithms in DBMS, focusing on various techniques such as simple nested loops join, block nested loops join, index nested loops join, sort-merge join, and hash join. The importance of optimizing joins to avoid unnecessary cross-products is emphasized, along with assumptions regarding equality joins. Dive deep into the realm of query optimization and execution strategies in database systems.


Uploaded on Sep 15, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Database Applications (15-415) DBMS Internals- Part VII Lecture 20, April 07, 2020 Mohammad Hammoud

  2. Today Last Session: DBMS Internals- Part VI Algorithms for Relational Operations Today s Session: DBMS Internals- Part VII Algorithms for Relational Operations (Cont d) Announcements: P3 is due on Apr 18.

  3. DBMS Layers Queries Query Optimization and Execution Relational Operators Files and Access Methods Transaction Manager Recovery Manager Buffer Management Lock Manager Disk Space Management DB

  4. Outline Introduction The Selection Operation The Projection Operation The Join Operation

  5. The Join Operation Consider the following query, Q, which implies a join: SELECT * FROM Reserves R, Sailors S WHERE R.sid = S.sid How can we evaluate Q? Compute R S Select (and project) as required But, the result of a cross-product is typically much larger than the result of a join Hence, it is very important to implement joins without materializing the underlying cross-product

  6. The Join Operation We will study five join algorithms, two of which enumerate the cross-product and three which do not Join algorithms which enumerate the cross-product: Simple Nested Loops Join Block Nested Loops Join Join algorithms which do not enumerate the cross-product: Index Nested Loops Join Sort-Merge Join Hash Join

  7. Assumptions We assume equality joins with: R representing Reserves and S representing Sailors M pages in R, pR tuples per page, m tuples total N pages in S, pS tuples per page, n tuples total We ignore the output and computational costs

  8. The Join Operation We will study five join algorithms, two of which enumerate the cross-product and three which do not Join algorithms which enumerate the cross-product: Simple Nested Loops Join Block Nested Loops Join Join algorithms which do not enumerate the cross-product: Index Nested Loops Join Sort-Merge Join Hash Join

  9. Simple Nested Loops Join Algorithm #0: (naive) nested loop (SLOW!) R(A,..) S(A, ......) m n

  10. Simple Nested Loops Join Algorithm #0: (naive) nested loop (SLOW!) for each tuple r of R for each tuple s of S if match: print (r, s) R(A,..) S(A, ......) m n

  11. Simple Nested Loops Join Algorithm #0: (naive) nested loop (SLOW!) Outer Relation Inner Relation for each tuple r of R for each tuple s of S if match: print (r, s) R(A,..) S(A, ......) m n

  12. Simple Nested Loops Join Algorithm #0: (naive) nested loop (SLOW!) How many disk accesses ( M and N are the numbers of pages for R and S )? R(A,..) S(A, ......) m n

  13. Simple Nested Loops Join Algorithm #0: (naive) nested loop (SLOW!) How many disk accesses ( M and N are the numbers of pages for R and S )? I/O Cost = M+m*N R(A,..) S(A, ......) m n

  14. Simple Nested Loops Join Algorithm #0: (naive) nested loop (SLOW!) - Cost = M + (pR * M) * N = 1000 + 100*1000*500 I/Os - At 10ms/IO, total = ~6 days (!) I/O Cost = M+m*N R(A,..) S(A, ......) m n Can we do better?

  15. Nested Loops Join: A Simple Refinement Algorithm: Read in a page of R Read in a page of S Print matching tuples COST= ? R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

  16. Nested Loops Join: A Simple Refinement Algorithm: Read in a page of R Read in a page of S Print matching tuples COST= M+M*N R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

  17. Nested Loops Join Which relation should be the outer? COST= M+M*N R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

  18. Nested Loops Join Which relation should be the outer? A: The smaller (page-wise) COST= M+M*N R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

  19. Nested Loops Join M=1000, N=500 - if larger is the outer: Cost = 1000 + 1000*500 = 501,000 = 5010 sec (~ 1.4h) COST= M+M*N R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

  20. Nested Loops Join M=1000, N=500 - if smaller is the outer: Cost = 500 + 1000*500 = 500,500 = 5005 sec (~ 1.4h) COST= N+M*N R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

  21. Summary: Simple Nested Loops Join What if we do not apply the page-oriented refinement? Cost = M+ (pR * M) * N = 1000 + 100*1000*500 I/Os At 10ms/IO, total = ~6 days (!) What if we apply the page-oriented refinement? Cost = M * N + M = 1000*500+1000 I/Os At 10ms/IO, total = 1.4 hours (!) What if the smaller relation is the outer? Slightly better

  22. The Join Operation We will study five join algorithms, two of which enumerate the cross-product and three which do not Join algorithms which enumerate the cross-product: Simple Nested Loops Join Block Nested Loops Join Join algorithms which do not enumerate the cross-product: Index Nested Loops Join Sort-Merge Join Hash Join

  23. Block Nested Loops What if we have Bbuffer pages available? R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

  24. Block Nested Loops What if we have B buffer pages available? A: Give B 1 buffer pages to outer and 1 page to inner R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

  25. Block Nested Loops Algorithm: Read in B 1 pages of R Read in a page of S Print matching tuples COST= ? R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

  26. Block Nested Loops Algorithm: Read in B 1 pages of R Read in a page of S Print matching tuples COST= M+ M/(B 1) *N R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

  27. Block Nested Loops If the smallest (outer) relation fits in memory? That is, M = B 1 Cost =? R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

  28. Block Nested Loops If the smallest (outer) relation fits in memory? That is, M = B 1 Cost = N+M (minimum!) R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

  29. Nested Loops - Guidelines Pick as outer the smallest table (= fewest pages) Fit as much of it in memory as possible Loop over the inner

  30. The Join Operation We will study five join algorithms, two of which enumerate the cross-product and three which do not Join algorithms which enumerate the cross-product: Simple Nested Loops Join Block Nested Loops Join Join algorithms which do not enumerate the cross-product: Index Nested Loops Join Sort-Merge Join Hash Join

  31. Index Nested Loops Join What if there is an index on one of the relations on the join attribute(s)? A: Leverage the index by making the indexed relation inner R(A,..) S(A, ......) M pages, N pages, m tuples n tuples 31

  32. Index Nested Loops Join Assuming an index on S: for each tuple r of R for each tuple s of S where s.sid = r.sid Add (r, s) to result R(A,..) S(A, ......) M pages, N pages, m tuples n tuples 32

  33. Index Nested Loops Join What will be the cost? Cost: M + m * c (c: look-up cost) c depends on the type of index, the adopted alternative and whether the index is clustered or un-clustered! R(A,..) S(A, ......) M pages, N pages, m tuples n tuples 33

  34. The Join Operation We will study five join algorithms, two of which enumerate the cross-product and three which do not Join algorithms which enumerate the cross-product: Simple Nested Loops Join Block Nested Loops Join Join algorithms which do not enumerate the cross-product: Index Nested Loops Join Sort-Merge Join Hash Join

  35. Sort-Merge Join Sort both relations on join attribute(s) Scan each relation and merge This works only for equality join conditions! R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

  36. Sort-Merge Join: An Example sid bid 28 28 31 31 31 58 day rname guppy yuppy dustin lubber lubber dustin ? 103 12/4/96 103 11/3/96 101 10/10/96 102 10/12/96 101 10/11/96 103 11/12/96 sid sname rating age 22 dustin 28 yuppy 31 lubber 44 guppy 58 rusty 7 9 8 5 10 45.0 35.0 55.5 35.0 35.0

  37. Sort-Merge Join: An Example sid bid 28 28 31 31 31 58 day rname guppy yuppy dustin lubber lubber dustin NO 103 12/4/96 103 11/3/96 101 10/10/96 102 10/12/96 101 10/11/96 103 11/12/96 sid sname rating age 22 dustin 28 yuppy 31 lubber 44 guppy 58 rusty 7 9 8 5 10 45.0 35.0 55.5 35.0 35.0

  38. Sort-Merge Join: An Example sid bid 28 28 31 31 31 58 day rname guppy yuppy dustin lubber lubber dustin ? 103 12/4/96 103 11/3/96 101 10/10/96 102 10/12/96 101 10/11/96 103 11/12/96 sid sname rating age 22 dustin 28 yuppy 31 lubber 44 guppy 58 rusty 7 9 8 5 10 45.0 35.0 55.5 35.0 35.0

  39. Sort-Merge Join: An Example sid bid 28 28 31 31 31 58 day rname guppy yuppy dustin lubber lubber dustin YES 103 12/4/96 103 11/3/96 101 10/10/96 102 10/12/96 101 10/11/96 103 11/12/96 sid sname rating age 22 dustin 28 yuppy 31 lubber 44 guppy 58 rusty 7 9 8 5 10 45.0 35.0 55.5 35.0 35.0 Output the two tuples

  40. Sort-Merge Join: An Example sid bid 28 28 31 31 31 58 day rname guppy yuppy dustin lubber lubber dustin ? 103 12/4/96 103 11/3/96 101 10/10/96 102 10/12/96 101 10/11/96 103 11/12/96 sid sname rating age 22 dustin 28 yuppy 31 lubber 44 guppy 58 rusty 7 9 8 5 10 45.0 35.0 55.5 35.0 35.0

  41. Sort-Merge Join: An Example sid bid 28 28 31 31 31 58 day rname guppy yuppy dustin lubber lubber dustin YES 103 12/4/96 103 11/3/96 101 10/10/96 102 10/12/96 101 10/11/96 103 11/12/96 sid sname rating age 22 dustin 28 yuppy 31 lubber 44 guppy 58 rusty 7 9 8 5 10 45.0 35.0 55.5 35.0 35.0

  42. Sort-Merge Join: An Example sid bid 28 28 31 31 31 58 day rname guppy yuppy dustin lubber lubber dustin YES 103 12/4/96 103 11/3/96 101 10/10/96 102 10/12/96 101 10/11/96 103 11/12/96 sid sname rating age 22 dustin 28 yuppy 31 lubber 44 guppy 58 rusty 7 9 8 5 10 45.0 35.0 55.5 35.0 35.0 Output the two tuples

  43. Sort-Merge Join: An Example sid bid 28 28 31 31 31 58 day rname guppy yuppy dustin lubber lubber dustin ? 103 12/4/96 103 11/3/96 101 10/10/96 102 10/12/96 101 10/11/96 103 11/12/96 sid sname rating age 22 dustin 28 yuppy 31 lubber 44 guppy 58 rusty 7 9 8 5 10 45.0 35.0 55.5 35.0 35.0

  44. Sort-Merge Join: An Example sid bid 28 28 31 31 31 58 day rname guppy yuppy dustin lubber lubber dustin NO 103 12/4/96 103 11/3/96 101 10/10/96 102 10/12/96 101 10/11/96 103 11/12/96 sid sname rating age 22 dustin 28 yuppy 31 lubber 44 guppy 58 rusty 7 9 8 5 10 45.0 35.0 55.5 35.0 35.0

  45. Sort-Merge Join: An Example sid bid 28 28 31 31 31 58 day rname guppy yuppy dustin lubber lubber dustin ? 103 12/4/96 103 11/3/96 101 10/10/96 102 10/12/96 101 10/11/96 103 11/12/96 sid sname rating age 22 dustin 28 yuppy 31 lubber 44 guppy 58 rusty 7 9 8 5 10 45.0 35.0 55.5 35.0 35.0

  46. Sort-Merge Join: An Example sid bid 28 28 31 31 31 58 day rname guppy yuppy dustin lubber lubber dustin YES 103 12/4/96 103 11/3/96 101 10/10/96 102 10/12/96 101 10/11/96 103 11/12/96 sid sname rating age 22 dustin 28 yuppy 31 lubber 44 guppy 58 rusty 7 9 8 5 10 45.0 35.0 55.5 35.0 35.0 Continue the same way! Output the two tuples

  47. Sort-Merge Join COST = 2M* 1+logB 1 M/B + 2N* 1+logB 1 N/B + M + N R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

  48. Sort-Merge Join COST = 2M* 1+logB 1 M/B + 2N* 1+logB 1 N/B + M + N For B = 100: = 2*1000* 1+log99 1000/100 + 2*500* 1+log99 500/100 + 1000 + 500 = 2*1000*2 + 2*500*2 + 1000 + 500 = 7,500 I/Os R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

  49. Sort-Merge vs. Block Nested Loop Assuming 100 buffer pages, Reserves and Sailors can be sorted in 2 passes Sort-Merge cost = 7,500 I/Os Block Nested Loop cost = ? I/Os R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

  50. Block Nested Loops Algorithm: Read in B 1 pages of R Read in a page of S Print matching tuples COST= M+ M/(B 1) *N R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

Related


More Related Content