Parallel Databases and Their Impact on Performance

undefined
 
Parallel Databases
 
COMP3211 Advanced Databases
 
Dr Nicholas Gibbins - nmg@ecs.soton.ac.uk
2014-2015
 
Overview
 
2
 
The I/O bottleneck
Parallel architectures
Parallel query processing
Inter-operator parallelism
Intra-operator parallelism
Bushy parallelism
Concurrency control
Reliability
 
The I/O
Bottleneck
The Memory Hierarchy, Revisited
 
 
 
 
 
 
 
T
y
p
e
C
a
p
a
c
i
t
y
L
a
t
e
n
c
y
R
e
g
i
s
t
e
r
s
 
1
0
1
 
b
y
t
e
s
1
 
c
y
c
l
e
L
1
 
1
0
4
 
b
y
t
e
s
<
5
 
c
y
c
l
e
s
L
2
 
1
0
5
 
b
y
t
e
s
5
-
1
0
 
c
y
c
l
e
s
R
A
M
1
0
9
-
1
0
1
0
 
b
y
t
e
s
2
0
-
3
0
 
c
y
c
l
e
s
 
(
1
0
-
8
 
s
)
H
a
r
d
 
D
i
s
k
1
0
1
1
-
1
0
1
2
 
b
y
t
e
s
1
0
6
 
c
y
c
l
e
s
 
(
1
0
-
3
 
s
)
4
The I/O Bottleneck
5
 
Access time to secondary storage (hard disks) dominates
performance of DBMSes
 
Two approaches to addressing this:
Main memory databases (expensive!)
Parallel databases (cheaper!)
 
Increase I/O bandwidth by spreading data across a number of
disks
 
Definitions
 
6
 
Parallelism
An arrangement or state that permits several operations or tasks to be
performed simultaneously rather than consecutively
 
Parallel Databases
have the ability to split
processing of data
access to data
across multiple processors, multiple disks
 
Why Parallel Databases
 
7
 
Hardware trends
Reduced elapsed time for queries
Increased transaction throughput
Increased scalability
Better price/performance
Improved application availability
Access to more data
In short, for better performance
 
Parallel
Architectures
 
Tightly coupled
Symmetric Multiprocessor (SMP)
 
P = processor
M = memory
 
Shared Memory Architecture
 
9
 
Less complex database software
Limited scalability
Single buffer
Single database storage
 
Software 
 Shared Memory
 
10
 
Loosely coupled
Distributed Memory
 
Shared Disc Architecture
 
11
 
Avoids memory bottleneck
Same page may be in more than
one buffer at once 
 can lead to
incoherence
Needs global locking mechanism
Single logical database storage
Each processor has its own
database buffer
 
Software 
 Shared Disc
 
12
 
Massively Parallel
Loosely Coupled
High Speed Interconnect
(between processors)
 
Shared Nothing Architecture
 
13
 
Each processor owns part of the
data
Each processor has its own
database buffer
One page is only in one local
buffer 
 no buffer incoherence
Needs distributed deadlock
detection
Needs multiphase commit
protocol
Needs to break SQL requests into
multiple sub-requests
 
Software - Shared Nothing
 
14
 
Hardware vs. Software Architecture
 
15
 
It is possible to use one software strategy on a different
hardware arrangement
Also possible to simulate one hardware configuration on
another
Virtual Shared Disk (VSD) makes an IBM SP shared nothing system
look like a shared disc setup (for Oracle)
From this point on, we deal only with shared nothi
ng
 
Shared Nothing Challenges
 
16
 
Partitioning the data
Keeping the partitioned data balanced
Splitting up queries to get the work done
Avoiding distributed deadlock
Concurrency control
Dealing with node failure
 
Parallel
Query
Processing
 
Dividing up the Work
 
18
Application
Coordinator
Process
Worker
Process
Worker
Process
Worker
Process
Database Software on each node
19
App1
DBMS
W1
W2
C1
DBMS
W1
W2
App2
DBMS
W1
W2
C2
 
Inter-Query Parallelism
 
20
 
Improves throughput
 
Different queries/transactions execute on different processors
(largely equivalent to material in lectures on concurrency)
 
Intra-Query Parallelism
 
21
 
Improves response times (lower latency)
Intra-operator (horizontal) parallelism
Operators decomposed into independent operator instances, which
perform the same operation on different subsets of data
Inter-operator (vertical) parallelism
Operations are overlapped
Pipeline data from one stage to the next without materialisation
Bushy (independent) parallelism
Subtrees in query plan executed concurrently
 
 
Intra-Operator
Parallelism
 
Intra-Operator Parallelism
 
23
SQL Query
Subset
Queries
Subset
Queries
Subset
Queries
Subset
Queries
Processor
Processor
Processor
Processor
 
Partitioning
 
24
 
Decomposition of operators relies on data being partitioned
across the servers that comprise the parallel database
Access data in parallel to mitigate the I/O bottleneck
Partitions should aim to spread I/O load evenly across servers
Choice of partitions affords different parallel query processing
approaches:
Range partitioning
Hash partitioning
Schema partitioning
 
Range Partitioning
 
25
A-H
 
 
I-P
 
 
Q-Z
 
Hash Partitioning
 
26
Table
 
Schema Partitioning
 
27
Table 1
Table 2
Rebalancing Data
28
 
Intra-Operator Parallelism
 
29
 
Example query:
SELECT c1,c2 FROM t WHERE c1>5.5
Assumptions:
100,000 rows
Predicates eliminate 90% of the rows
Considerations for query plans:
Data shipping
Query shipping
 
Data Shipping
 
30
 
π
c1,c2
 
σ
c1>5.5
 
 
t
1
 
t
2
 
t
3
 
t
4
Data Shipping
31
Coordinator
and
 Worker
Network
Worker
Worker
Worker
Worker
 
25,000 tuples
 
25,000 tuples
 
25,000 tuples
 
25,000 tuples
 
10,000
tuples
 (c1,c2)
 
Query Shipping
 
32
 
π
c1,c2
 
σ
c1>5.5
 
t
1
 
t
2
 
t
3
 
t
4
 
 
π
c1,c2
 
σ
c1>5.5
 
π
c1,c2
 
σ
c1>5.5
 
π
c1,c2
 
σ
c1>5.5
Query Shipping
33
Coordinator
Network
Worker
Worker
Worker
Worker
 
2,500 tuples
 
2,500 tuples
 
2,500 tuples
 
2,500 tuples
 
10,000
tuples
 (c1,c2)
 
Query Shipping Benefits
 
34
 
Database operations are performed where the data are, as far
as possible
Network traffic is minimised
For basic database operators, code developed for serial
implementations can be reused
In practice, mixture of query shipping and data shipping has
to be employed
 
Inter-Operator
Parallelism
 
Inter-Operator Parallelism
 
36
 
Allows operators with a producer-consumer dependency to be
executed concurrently
Results produced by producer are pipelined directly to consumer
Consumer can start before producer has produced all results
No need to materialise intermediate relations on disk (although
available buffer memory is a constraint)
Best suited to single-pass operators
 
Inter-Operator Parallelism
 
37
 
time
Scan
Join
Sort
Scan
Join
Sort
 
Intra- + Inter-Operator Parallelism
 
38
 
time
Scan
Join
Sort
Scan
Join
Sort
Scan
Scan
Join
Join
Sort
Sort
 
The Volcano Architecture
 
39
 
Basic operators as usual:
scan, join, sort, aggregate (sum, count, average, etc)
The Exchange operator
Inserted between the steps of a query to:
Pipeline results
Direct streams of data to the next step(s), redistributing as
necessary
Provides mechanism to support both vertical and horizontal
parallelism
 
Exchange Operators
 
40
 
Example query:
SELECT county, SUM(order_item)
FROM customer, order
WHERE order.customer_id=customer_id
GROUP BY county
ORDER BY SUM(order_item)
 
Exchange Operators
 
41
SORT
GROUP
HASH
JOIN
SCAN
SCAN
 
Customer
 
Order
 
Exchange Operators
 
42
EXCHANGE
SCAN
SCAN
 
Customer
HASH
JOIN
HASH
JOIN
HASH
JOIN
 
Exchange Operators
 
43
EXCHANGE
SCAN
SCAN
 
Customer
HASH
JOIN
HASH
JOIN
HASH
JOIN
EXCHANGE
SCAN
SCAN
SCAN
 
Order
EXCHANGE
SCAN
SCAN
 
Customer
HASH
JOIN
HASH
JOIN
HASH
JOIN
EXCHANGE
SCAN
SCAN
SCAN
 
Order
EXCHANGE
EXCHANGE
GROUP
GROUP
SORT
 
44
 
Bushy
Parallelism
46
Execute subtrees concurrently
Bushy Parallelism
σ
π
R
S
T
U
π
 
Parallel Query
Processing
 
Some Parallel Queries
 
48
 
Enquiry
Collocated Join
Directed Join
Broadcast Join
Repartitioned Join
 
Combine aspects of intra-operator and bushy parallelism
 
Orders Database
 
49
CUSTKEY
C_NAME
C_NATION
ORDERKEY
DATE
CUSTKEY
SUPPKEY
S
_NAME
S
_NATION
SUPPKEY
 
CUSTOMER
 
ORDER
 
SUPPLIER
50
“How many customers live in the UK?”
Enquiry/Query
 
Multiple partitions
of customer table
Collocated Join
51
“Which customers placed orders in July?”
UNION
 
ORDER
 
Tables both partitioned on
CUSTKEY (the same key) and
therefore corresponding entries
are on the same node
 
Requires a JOIN of CUSTOMER
and ORDER
 
CUSTOMER
 
Return to application
Directed Join
52
“Which customers placed orders in July?”
(tables have different keys)
UNION
 
CUSTOMER
 
ORDER
 
Slave Task 1
 
Slave Task 2
 
Coordinator
 
Return to application
 
ORDER partitioned on ORDERKEY, CUSTOMER partitioned on CUSTKEY
Retrieve rows from ORDER, then use ORDER.CUSTKEY to direct
appropriate rows to nodes with CUSTOMER
53
“Which customers and suppliers are in the same country?”
Broadcast Join
UNION
 
CUSTOMER
 
SUPPLIER
 
Slave Task 1
 
Slave Task 2
 
Coordinator
 
Return to application
 
SUPPLIER partitioned on SUPPKEY, CUSTOMER on CUSTKEY.
Join required on *_NATION
Send all SUPPLIER to each CUSTOMER node
 
BROADCAST
54
“Which customers and suppliers are in the same country?”
Repartitioned Join
UNION
 
CUSTOMER
 
SUPPLIER
 
Slave Task 1
 
Coordinator
 
SUPPLIER partitioned on SUPPKEY, CUSTOMER on CUSTKEY.
Join required on *_NATION. Repartition both tables on *_NATION to
localise and minimise the join effort
 
Slave Task 2
 
Slave Task 3
 
Return to application
 
Concurrency
Control
 
Concurrency and Parallelism
 
56
 
A single transaction may update data in several different
places
Multiple transactions may be using the same (distributed)
tables simultaneously
One or several nodes cou
ld
 fail
Requires concurrency control and recovery across multiple
nodes for:
Locking and deadlock detection
Two-phase commit to ensure ‘all or nothing’
 
Locking and Deadlocks
 
57
 
With Shared Nothing architecture, each node is responsible
for locking its own data
No global locking mechanism
However:
T1 locks item A on Node 1 and wants item B on Node 2
T2 locks item B on Node 2 and wants item A on Node 1
Distributed Deadlock
 
Resolving Deadlocks
 
58
 
One approach 
 Timeouts
Timeout T2, after wait exceeds a certain interval
Interval may need random element to avoid ‘chatter’
i.e. both transactions give up at the same time and then try again
Rollback T2 to let T1 to proceed
Restart T2, which can now complete
 
Resolving Deadlocks
 
59
 
More sophisticated approach (DB2)
Each node maintains a local ‘wait-for’ graph
Distributed deadlock detector (DDD) runs at the catalogue
node for each database
Periodically, all nodes send their graphs to the DDD
DDD records all locks found in wait state
Transaction becomes a candidate for termination if found in
same lock wait state on two successive iterations
 
Reliability
 
Reliability
 
61
 
We wish to preserve the ACID properties for parallelised
transactions
Isolation is taken care of by 2PL protocol
Isolation implies Consistency
Durability can be taken care of node-by-node, with proper logging
and recovery routines
Atomicity is the hard part. We need to commit all parts of a
transaction, or abort all parts
Two-phase commit protocol (2PC) is used to ensure that
Atomicity is preserved
 
Two-Phase Commit (2PC)
 
Distinguish between:
The global transaction
The local transactions into which the global transaction is
decomposed
 
Global transaction is managed by a single site, known as the
coordinator
Local transactions may be executed on separate sites, known
as the 
participants
 
62
 
Phase 1: Voting
 
Coordinator sends “prepare T” message to all participants
Participants respond with either “vote-commit T” or
“vote-abort T”
Coordinator waits for participants to respond within a
timeout period
 
63
 
Phase 2: Decision
 
If all participants return “vote-commit T” (to commit), send
“commit T” to all participants. Wait for acknowledgements
within timeout period.
If any participant returns “vote-abort T”, send “abort T” to all
participants. Wait for acknowledgements within timeout
period.
When all acknowledgements received, transaction is
completed.
If a site does not acknowledge, resend global decision until it
is acknowledged.
 
64
Normal Operation
65
C
P
 
prepare T
 
vote-commit T
 
commit T
 
ack
 
vote-commit T
received from all
participants
 
Logging
 
66
C
P
 
prepare T
 
vote-commit T
 
commit T
 
ack
 
<commit T>
 
<begin-commit T>
 
<end T>
 
<ready T>
 
<commit T>
 
vote-commit T
received from all
participants
 
Aborted Transaction
 
67
C
P
 
prepare T
 
vote-commit T
 
abort T
 
ack
 
<abort T>
 
<begin-commit T>
 
<end T>
 
<ready T>
 
<abort T>
 
vote-abort T received
from at least one
participant
 
Aborted Transaction
 
68
C
P
 
prepare T
 
vote-abort T
 
abort T
 
ack
 
<abort T>
 
<begin-commit T>
 
<end T>
 
<abort T>
P
 
vote-abort T received
from at least one
participant
 
State Transitions
 
69
C
P
 
prepare T
 
vote-commit T
 
commit T
 
ack
 
vote-commit T
received from all
participants
INITIAL
WAIT
COMMIT
INITIAL
READY
COMMIT
 
State Transitions
 
70
C
P
 
prepare T
 
vote-commit T
 
abort T
 
ack
 
vote-abort T received
from at least one
participant
INITIAL
WAIT
ABORT
INITIAL
READY
ABORT
 
State Transitions
 
71
C
P
 
prepare T
 
vote-abort T
 
abort T
 
ack
P
INITIAL
WAIT
ABORT
INITIAL
ABORT
 
Coordinator State Diagram
 
72
 
sent:
 
prepare
 T
 
recv: vote-abort T
sent: abort T
INITIAL
WAIT
ABORT
COMMIT
 
recv: vote-commit T
sent: commit T
 
Participant State Diagram
 
recv: prepare
 T
sent: vote-commit T
 
recv: commit T
send: ack
INITIAL
READY
COMMIT
ABORT
 
recv: prepare T
sent: vote-abort
 T
 
recv: 
abort 
T
send: ack
 
73
 
Dealing with failures
 
74
 
If the coordinator or a participant fails during the commit, two
things happen:
The other sites will time out while waiting for the next message from
the failed site and invoke a 
termination protocol
When the failed site restarts, it tries to work out the state of the
commit by invoking a 
recovery protocol
 
The behaviour of the sites under these protocols depends on
the state they were in when the site failed
 
Termination Protocol: Coordinator
 
Timeout in WAIT
Coordinator is waiting for participants to vote on whether they're
going to commit or abort
A missing vote means that the coordinator cannot commit the global
transaction
Coordinator may abort the global transaction
Timeout in COMMIT/ABORT
Coordinator is waiting for participants to acknowledge successful
commit or abort
Coordinator resends global decision to participants who have not
acknowledged
 
75
 
Termination Protocol: Participant
 
Timeout in INITIAL
Participant is waiting for a “prepare T”
May unilaterally abort the transaction after a timeout
If “prepare T” arrives after unilateral abort, either:
resend the “vote-abort T” message or
ignore (coordinator then times out in WAIT)
Timeout in READY
Participant is waiting for the instruction to commit or abort – blocked
without further information
Participant can contact other participants to find one that knows the
decision – cooperative termination protocol
 
76
 
Recovery Protocol: Coordinator
 
Failure in INITIAL
Commit not yet begun, restart commit procedure
Failure in WAIT
Coordinator has sent “prepare T”, but has not yet received all
vote-commit/vote-abort messages from participants
Recovery restarts commit procedure by resending “prepare T”
Failure in COMMIT/ABORT
If coordinator has received all “ack” messages, complete successfully
Otherwise, terminate
 
77
 
Recovery Protocol: Participant
 
Failure in INITIAL
Participant has not yet voted
Coordinator cannot have reached a decision
Participant should unilaterally abort by sending “vote-abort T”
Failure in READY
Participant has voted, but doesn't know what the global decision was
Cooperative termination protocol
Failure in COMMIT/ABORT
Resend “ack” message
 
78
 
Parallel
Utilities
 
Parallel Utilities
 
80
 
Ancillary operations can also exploit the parallel hardware
Parallel Data Loading/Import/Export
Parallel Index Creation
Parallel Rebalancing
Parallel Backup
Parallel Recovery
Slide Note
Embed
Share

Explore the concept of parallel databases, how they address the I/O bottleneck, and their benefits such as increased scalability and improved application availability. Learn about parallel architectures and shared memory systems in advanced database design. Discover the importance of concurrency control and the impact of the memory hierarchy on data processing efficiency.

  • Parallel Databases
  • Database Performance
  • Scalability
  • Concurrency Control
  • Memory Hierarchy

Uploaded on Sep 12, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Parallel Databases COMP3211 Advanced Databases Dr Nicholas Gibbins - nmg@ecs.soton.ac.uk 2014-2015

  2. Overview The I/O bottleneck Parallel architectures Parallel query processing Inter-operator parallelism Intra-operator parallelism Bushy parallelism Concurrency control Reliability 2

  3. The I/O Bottleneck

  4. The Memory Hierarchy, Revisited Type Capacity Latency Registers 101 bytes 1 cycle L1 104 bytes <5 cycles L2 105 bytes 5-10 cycles RAM 109-1010 bytes 20-30 cycles (10-8 s) Hard Disk 1011-1012 bytes 106 cycles (10-3 s) 4

  5. The I/O Bottleneck Access time to secondary storage (hard disks) dominates performance of DBMSes Two approaches to addressing this: Main memory databases (expensive!) Parallel databases (cheaper!) Increase I/O bandwidth by spreading data across a number of disks 5

  6. Definitions Parallelism An arrangement or state that permits several operations or tasks to be performed simultaneously rather than consecutively Parallel Databases have the ability to split processing of data access to data across multiple processors, multiple disks 6

  7. Why Parallel Databases Hardware trends Reduced elapsed time for queries Increased transaction throughput Increased scalability Better price/performance Improved application availability Access to more data In short, for better performance 7

  8. Parallel Architectures

  9. Shared Memory Architecture Tightly coupled P P P Symmetric Multiprocessor (SMP) P = processor Global Memory M = memory 9

  10. Software Shared Memory Less complex database software P P P Limited scalability Single buffer Single database storage Global Memory 10

  11. Shared Disc Architecture Loosely coupled P P P Distributed Memory M M M S 11

  12. Software Shared Disc Avoids memory bottleneck P P P Same page may be in more than one buffer at once can lead to incoherence M M M Needs global locking mechanism S Single logical database storage Each processor has its own database buffer 12

  13. Shared Nothing Architecture Massively Parallel P P P Loosely Coupled High Speed Interconnect (between processors) M M M 13

  14. Software - Shared Nothing Each processor owns part of the data P P P Each processor has its own database buffer M M M One page is only in one local buffer no buffer incoherence Needs distributed deadlock detection Needs multiphase commit protocol Needs to break SQL requests into multiple sub-requests 14

  15. Hardware vs. Software Architecture It is possible to use one software strategy on a different hardware arrangement Also possible to simulate one hardware configuration on another Virtual Shared Disk (VSD) makes an IBM SP shared nothing system look like a shared disc setup (for Oracle) From this point on, we deal only with shared nothing 15

  16. Shared Nothing Challenges Partitioning the data Keeping the partitioned data balanced Splitting up queries to get the work done Avoiding distributed deadlock Concurrency control Dealing with node failure 16

  17. Parallel Query Processing

  18. Dividing up the Work Application Coordinator Process Worker Process Worker Process Worker Process 18

  19. Database Software on each node App1 App2 DBMS DBMS DBMS C1 C2 W1 W2 W1 W2 W1 W2 19

  20. Inter-Query Parallelism Improves throughput Different queries/transactions execute on different processors (largely equivalent to material in lectures on concurrency) 20

  21. Intra-Query Parallelism Improves response times (lower latency) Intra-operator (horizontal) parallelism Operators decomposed into independent operator instances, which perform the same operation on different subsets of data Inter-operator (vertical) parallelism Operations are overlapped Pipeline data from one stage to the next without materialisation Bushy (independent) parallelism Subtrees in query plan executed concurrently 21

  22. Intra-Operator Parallelism

  23. Intra-Operator Parallelism SQL Query Subset Queries Subset Queries Subset Queries Subset Queries Processor Processor Processor Processor 23

  24. Partitioning Decomposition of operators relies on data being partitioned across the servers that comprise the parallel database Access data in parallel to mitigate the I/O bottleneck Partitions should aim to spread I/O load evenly across servers Choice of partitions affords different parallel query processing approaches: Range partitioning Hash partitioning Schema partitioning 24

  25. Range Partitioning A-H I-P Q-Z 25

  26. Hash Partitioning Table 26

  27. Schema Partitioning Table 1 Table 2 27

  28. Rebalancing Data Data in proper balance Data grows, performance drops Add new nodes and disc Redistribute data to new nodes 28

  29. Intra-Operator Parallelism Example query: SELECT c1,c2 FROM t WHERE c1>5.5 Assumptions: 100,000 rows Predicates eliminate 90% of the rows Considerations for query plans: Data shipping Query shipping 29

  30. Data Shipping c1,c2 c1>5.5 t1 t2 t3 t4 30

  31. Data Shipping Coordinator and Worker 10,000 tuples (c1,c2) Network 25,000 tuples 25,000 tuples 25,000 tuples 25,000 tuples Worker Worker Worker Worker 31

  32. Query Shipping c1,c2 c1,c2 c1,c2 c1,c2 c1>5.5 c1>5.5 c1>5.5 c1>5.5 t1 t2 t3 t4 32

  33. Query Shipping 10,000 tuples (c1,c2) Coordinator Network 2,500 tuples 2,500 tuples 2,500 tuples 2,500 tuples Worker Worker Worker Worker 33

  34. Query Shipping Benefits Database operations are performed where the data are, as far as possible Network traffic is minimised For basic database operators, code developed for serial implementations can be reused In practice, mixture of query shipping and data shipping has to be employed 34

  35. Inter-Operator Parallelism

  36. Inter-Operator Parallelism Allows operators with a producer-consumer dependency to be executed concurrently Results produced by producer are pipelined directly to consumer Consumer can start before producer has produced all results No need to materialise intermediate relations on disk (although available buffer memory is a constraint) Best suited to single-pass operators 36

  37. Inter-Operator Parallelism Scan Join Sort Scan Join Sort time 37

  38. Intra- + Inter-Operator Parallelism Scan Join Sort Scan Join Sort Scan Scan Join Join Sort Sort time 38

  39. The Volcano Architecture Basic operators as usual: scan, join, sort, aggregate (sum, count, average, etc) The Exchange operator Inserted between the steps of a query to: Pipeline results Direct streams of data to the next step(s), redistributing as necessary Provides mechanism to support both vertical and horizontal parallelism 39

  40. Exchange Operators Example query: SELECT county, SUM(order_item) FROM customer, order WHERE order.customer_id=customer_id GROUP BY county ORDER BY SUM(order_item) 40

  41. Exchange Operators SORT GROUP HASH JOIN SCAN SCAN Customer Order 41

  42. Exchange Operators HASH JOIN HASH JOIN HASH JOIN EXCHANGE SCAN SCAN Customer 42

  43. Exchange Operators HASH JOIN HASH JOIN HASH JOIN EXCHANGE EXCHANGE SCAN SCAN SCAN SCAN SCAN 43 Customer Order

  44. SORT EXCHANGE GROUP GROUP EXCHANGE HASH JOIN HASH JOIN HASH JOIN EXCHANGE EXCHANGE SCAN SCAN SCAN SCAN SCAN 44 Customer Order

  45. Bushy Parallelism

  46. Bushy Parallelism Execute subtrees concurrently R S T U 46

  47. Parallel Query Processing

  48. Some Parallel Queries Enquiry Collocated Join Directed Join Broadcast Join Repartitioned Join Combine aspects of intra-operator and bushy parallelism 48

  49. Orders Database CUSTOMER CUSTKEY C_NAME C_NATION ORDER ORDERKEY DATE CUSTKEY SUPPKEY SUPPLIER SUPPKEY S_NAME S_NATION 49

  50. Enquiry/Query How many customers live in the UK? Return to application Coordinator SUM Return subcounts to coordinator SCAN Slave Task COUNT Multiple partitions of customer table 50

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#