Join Algorithms in Database Systems

Slide Note

This presentation delves into the intricacies of join algorithms in DBMS, focusing on various techniques such as simple nested loops join, block nested loops join, index nested loops join, sort-merge join, and hash join. The importance of optimizing joins to avoid unnecessary cross-products is emphasized, along with assumptions regarding equality joins. Dive deep into the realm of query optimization and execution strategies in database systems.

thei_813 Follow

Uploaded on Sep 15, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Database Applications (15-415) DBMS Internals- Part VII Lecture 20, April 07, 2020 Mohammad Hammoud

Today Last Session: DBMS Internals- Part VI Algorithms for Relational Operations Today s Session: DBMS Internals- Part VII Algorithms for Relational Operations (Cont d) Announcements: P3 is due on Apr 18.

DBMS Layers Queries Query Optimization and Execution Relational Operators Files and Access Methods Transaction Manager Recovery Manager Buffer Management Lock Manager Disk Space Management DB

Outline Introduction The Selection Operation The Projection Operation The Join Operation

The Join Operation Consider the following query, Q, which implies a join: SELECT * FROM Reserves R, Sailors S WHERE R.sid = S.sid How can we evaluate Q? Compute R S Select (and project) as required But, the result of a cross-product is typically much larger than the result of a join Hence, it is very important to implement joins without materializing the underlying cross-product

The Join Operation We will study five join algorithms, two of which enumerate the cross-product and three which do not Join algorithms which enumerate the cross-product: Simple Nested Loops Join Block Nested Loops Join Join algorithms which do not enumerate the cross-product: Index Nested Loops Join Sort-Merge Join Hash Join

Assumptions We assume equality joins with: R representing Reserves and S representing Sailors M pages in R, pR tuples per page, m tuples total N pages in S, pS tuples per page, n tuples total We ignore the output and computational costs

The Join Operation We will study five join algorithms, two of which enumerate the cross-product and three which do not Join algorithms which enumerate the cross-product: Simple Nested Loops Join Block Nested Loops Join Join algorithms which do not enumerate the cross-product: Index Nested Loops Join Sort-Merge Join Hash Join

Simple Nested Loops Join Algorithm #0: (naive) nested loop (SLOW!) R(A,..) S(A, ......) m n

Simple Nested Loops Join Algorithm #0: (naive) nested loop (SLOW!) for each tuple r of R for each tuple s of S if match: print (r, s) R(A,..) S(A, ......) m n

Simple Nested Loops Join Algorithm #0: (naive) nested loop (SLOW!) Outer Relation Inner Relation for each tuple r of R for each tuple s of S if match: print (r, s) R(A,..) S(A, ......) m n

Simple Nested Loops Join Algorithm #0: (naive) nested loop (SLOW!) How many disk accesses ( M and N are the numbers of pages for R and S )? R(A,..) S(A, ......) m n

Simple Nested Loops Join Algorithm #0: (naive) nested loop (SLOW!) How many disk accesses ( M and N are the numbers of pages for R and S )? I/O Cost = M+m*N R(A,..) S(A, ......) m n

Simple Nested Loops Join Algorithm #0: (naive) nested loop (SLOW!) - Cost = M + (pR * M) * N = 1000 + 100*1000*500 I/Os - At 10ms/IO, total = ~6 days (!) I/O Cost = M+m*N R(A,..) S(A, ......) m n Can we do better?

Nested Loops Join: A Simple Refinement Algorithm: Read in a page of R Read in a page of S Print matching tuples COST= ? R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

Nested Loops Join: A Simple Refinement Algorithm: Read in a page of R Read in a page of S Print matching tuples COST= M+M*N R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

Nested Loops Join Which relation should be the outer? COST= M+M*N R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

Nested Loops Join Which relation should be the outer? A: The smaller (page-wise) COST= M+M*N R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

Nested Loops Join M=1000, N=500 - if larger is the outer: Cost = 1000 + 1000*500 = 501,000 = 5010 sec (~ 1.4h) COST= M+M*N R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

Nested Loops Join M=1000, N=500 - if smaller is the outer: Cost = 500 + 1000*500 = 500,500 = 5005 sec (~ 1.4h) COST= N+M*N R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

Summary: Simple Nested Loops Join What if we do not apply the page-oriented refinement? Cost = M+ (pR * M) * N = 1000 + 100*1000*500 I/Os At 10ms/IO, total = ~6 days (!) What if we apply the page-oriented refinement? Cost = M * N + M = 1000*500+1000 I/Os At 10ms/IO, total = 1.4 hours (!) What if the smaller relation is the outer? Slightly better

The Join Operation We will study five join algorithms, two of which enumerate the cross-product and three which do not Join algorithms which enumerate the cross-product: Simple Nested Loops Join Block Nested Loops Join Join algorithms which do not enumerate the cross-product: Index Nested Loops Join Sort-Merge Join Hash Join

Block Nested Loops What if we have Bbuffer pages available? R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

Block Nested Loops What if we have B buffer pages available? A: Give B 1 buffer pages to outer and 1 page to inner R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

Block Nested Loops Algorithm: Read in B 1 pages of R Read in a page of S Print matching tuples COST= ? R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

Block Nested Loops Algorithm: Read in B 1 pages of R Read in a page of S Print matching tuples COST= M+ M/(B 1) *N R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

Block Nested Loops If the smallest (outer) relation fits in memory? That is, M = B 1 Cost =? R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

Block Nested Loops If the smallest (outer) relation fits in memory? That is, M = B 1 Cost = N+M (minimum!) R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

Nested Loops - Guidelines Pick as outer the smallest table (= fewest pages) Fit as much of it in memory as possible Loop over the inner

The Join Operation We will study five join algorithms, two of which enumerate the cross-product and three which do not Join algorithms which enumerate the cross-product: Simple Nested Loops Join Block Nested Loops Join Join algorithms which do not enumerate the cross-product: Index Nested Loops Join Sort-Merge Join Hash Join

Index Nested Loops Join What if there is an index on one of the relations on the join attribute(s)? A: Leverage the index by making the indexed relation inner R(A,..) S(A, ......) M pages, N pages, m tuples n tuples 31

Index Nested Loops Join Assuming an index on S: for each tuple r of R for each tuple s of S where s.sid = r.sid Add (r, s) to result R(A,..) S(A, ......) M pages, N pages, m tuples n tuples 32

Index Nested Loops Join What will be the cost? Cost: M + m * c (c: look-up cost) c depends on the type of index, the adopted alternative and whether the index is clustered or un-clustered! R(A,..) S(A, ......) M pages, N pages, m tuples n tuples 33

The Join Operation We will study five join algorithms, two of which enumerate the cross-product and three which do not Join algorithms which enumerate the cross-product: Simple Nested Loops Join Block Nested Loops Join Join algorithms which do not enumerate the cross-product: Index Nested Loops Join Sort-Merge Join Hash Join

Sort-Merge Join Sort both relations on join attribute(s) Scan each relation and merge This works only for equality join conditions! R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

Sort-Merge Join: An Example sid bid 28 28 31 31 31 58 day rname guppy yuppy dustin lubber lubber dustin ? 103 12/4/96 103 11/3/96 101 10/10/96 102 10/12/96 101 10/11/96 103 11/12/96 sid sname rating age 22 dustin 28 yuppy 31 lubber 44 guppy 58 rusty 7 9 8 5 10 45.0 35.0 55.5 35.0 35.0

Sort-Merge Join: An Example sid bid 28 28 31 31 31 58 day rname guppy yuppy dustin lubber lubber dustin NO 103 12/4/96 103 11/3/96 101 10/10/96 102 10/12/96 101 10/11/96 103 11/12/96 sid sname rating age 22 dustin 28 yuppy 31 lubber 44 guppy 58 rusty 7 9 8 5 10 45.0 35.0 55.5 35.0 35.0

Sort-Merge Join: An Example sid bid 28 28 31 31 31 58 day rname guppy yuppy dustin lubber lubber dustin ? 103 12/4/96 103 11/3/96 101 10/10/96 102 10/12/96 101 10/11/96 103 11/12/96 sid sname rating age 22 dustin 28 yuppy 31 lubber 44 guppy 58 rusty 7 9 8 5 10 45.0 35.0 55.5 35.0 35.0

Sort-Merge Join: An Example sid bid 28 28 31 31 31 58 day rname guppy yuppy dustin lubber lubber dustin YES 103 12/4/96 103 11/3/96 101 10/10/96 102 10/12/96 101 10/11/96 103 11/12/96 sid sname rating age 22 dustin 28 yuppy 31 lubber 44 guppy 58 rusty 7 9 8 5 10 45.0 35.0 55.5 35.0 35.0 Output the two tuples

Sort-Merge Join: An Example sid bid 28 28 31 31 31 58 day rname guppy yuppy dustin lubber lubber dustin ? 103 12/4/96 103 11/3/96 101 10/10/96 102 10/12/96 101 10/11/96 103 11/12/96 sid sname rating age 22 dustin 28 yuppy 31 lubber 44 guppy 58 rusty 7 9 8 5 10 45.0 35.0 55.5 35.0 35.0

Sort-Merge Join: An Example sid bid 28 28 31 31 31 58 day rname guppy yuppy dustin lubber lubber dustin YES 103 12/4/96 103 11/3/96 101 10/10/96 102 10/12/96 101 10/11/96 103 11/12/96 sid sname rating age 22 dustin 28 yuppy 31 lubber 44 guppy 58 rusty 7 9 8 5 10 45.0 35.0 55.5 35.0 35.0

Sort-Merge Join: An Example sid bid 28 28 31 31 31 58 day rname guppy yuppy dustin lubber lubber dustin YES 103 12/4/96 103 11/3/96 101 10/10/96 102 10/12/96 101 10/11/96 103 11/12/96 sid sname rating age 22 dustin 28 yuppy 31 lubber 44 guppy 58 rusty 7 9 8 5 10 45.0 35.0 55.5 35.0 35.0 Output the two tuples

Sort-Merge Join: An Example sid bid 28 28 31 31 31 58 day rname guppy yuppy dustin lubber lubber dustin ? 103 12/4/96 103 11/3/96 101 10/10/96 102 10/12/96 101 10/11/96 103 11/12/96 sid sname rating age 22 dustin 28 yuppy 31 lubber 44 guppy 58 rusty 7 9 8 5 10 45.0 35.0 55.5 35.0 35.0

Sort-Merge Join: An Example sid bid 28 28 31 31 31 58 day rname guppy yuppy dustin lubber lubber dustin NO 103 12/4/96 103 11/3/96 101 10/10/96 102 10/12/96 101 10/11/96 103 11/12/96 sid sname rating age 22 dustin 28 yuppy 31 lubber 44 guppy 58 rusty 7 9 8 5 10 45.0 35.0 55.5 35.0 35.0

Sort-Merge Join: An Example sid bid 28 28 31 31 31 58 day rname guppy yuppy dustin lubber lubber dustin ? 103 12/4/96 103 11/3/96 101 10/10/96 102 10/12/96 101 10/11/96 103 11/12/96 sid sname rating age 22 dustin 28 yuppy 31 lubber 44 guppy 58 rusty 7 9 8 5 10 45.0 35.0 55.5 35.0 35.0

Sort-Merge Join: An Example sid bid 28 28 31 31 31 58 day rname guppy yuppy dustin lubber lubber dustin YES 103 12/4/96 103 11/3/96 101 10/10/96 102 10/12/96 101 10/11/96 103 11/12/96 sid sname rating age 22 dustin 28 yuppy 31 lubber 44 guppy 58 rusty 7 9 8 5 10 45.0 35.0 55.5 35.0 35.0 Continue the same way! Output the two tuples

Sort-Merge Join COST = 2M* 1+logB 1 M/B + 2N* 1+logB 1 N/B + M + N R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

Sort-Merge Join COST = 2M* 1+logB 1 M/B + 2N* 1+logB 1 N/B + M + N For B = 100: = 2*1000* 1+log99 1000/100 + 2*500* 1+log99 500/100 + 1000 + 500 = 2*1000*2 + 2*500*2 + 1000 + 500 = 7,500 I/Os R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

Sort-Merge vs. Block Nested Loop Assuming 100 buffer pages, Reserves and Sailors can be sorted in 2 passes Sort-Merge cost = 7,500 I/Os Block Nested Loop cost = ? I/Os R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

Block Nested Loops Algorithm: Read in B 1 pages of R Read in a page of S Print matching tuples COST= M+ M/(B 1) *N R(A,..) S(A, ......) M pages, N pages, m tuples n tuples

Join Algorithms in Database Systems

Download Presentation

Presentation Transcript

Related

More Related Content