Parallel Databases and Their Impact on Performance

undefined

Parallel Databases

COMP3211 Advanced Databases

Dr Nicholas Gibbins - nmg@ecs.soton.ac.uk

2014-2015

Overview

•

The I/O bottleneck

•

Parallel architectures

•

Parallel query processing

–

Inter-operator parallelism

–

Intra-operator parallelism

–

Bushy parallelism

•

Concurrency control

•

Reliability

The I/O

Bottleneck

The Memory Hierarchy, Revisited

The I/O Bottleneck

Access time to secondary storage (hard disks) dominates

performance of DBMSes

Two approaches to addressing this:

–

Main memory databases (expensive!)

–

Parallel databases (cheaper!)

Increase I/O bandwidth by spreading data across a number of

disks

Definitions

Parallelism

–

An arrangement or state that permits several operations or tasks to be

performed simultaneously rather than consecutively

Parallel Databases

–

have the ability to split

–

processing of data

–

access to data

–

across multiple processors, multiple disks

Why Parallel Databases

•

Hardware trends

•

Reduced elapsed time for queries

•

Increased transaction throughput

•

Increased scalability

•

Better price/performance

•

Improved application availability

•

Access to more data

•

In short, for better performance

Parallel

Architectures

•

Tightly coupled

•

Symmetric Multiprocessor (SMP)

P = processor

M = memory

Shared Memory Architecture

•

Less complex database software

•

Limited scalability

•

Single buffer

•

Single database storage

Software

–

 Shared Memory

•

Loosely coupled

•

Distributed Memory

Shared Disc Architecture

•

Avoids memory bottleneck

•

Same page may be in more than

one buffer at once

–

 can lead to

incoherence

•

Needs global locking mechanism

•

Single logical database storage

•

Each processor has its own

database buffer

Software

–

 Shared Disc

•

Massively Parallel

•

Loosely Coupled

•

High Speed Interconnect

(between processors)

Shared Nothing Architecture

•

Each processor owns part of the

data

•

Each processor has its own

database buffer

•

One page is only in one local

buffer

–

 no buffer incoherence

•

Needs distributed deadlock

detection

•

Needs multiphase commit

protocol

•

Needs to break SQL requests into

multiple sub-requests

Software - Shared Nothing

Hardware vs. Software Architecture

•

It is possible to use one software strategy on a different

hardware arrangement

•

Also possible to simulate one hardware configuration on

another

–

Virtual Shared Disk (VSD) makes an IBM SP shared nothing system

look like a shared disc setup (for Oracle)

•

From this point on, we deal only with shared nothi

ng

Shared Nothing Challenges

•

Partitioning the data

•

Keeping the partitioned data balanced

•

Splitting up queries to get the work done

•

Avoiding distributed deadlock

•

Concurrency control

•

Dealing with node failure

Parallel

Query

Processing

Dividing up the Work

Application

Coordinator

Process

Worker

Process

Worker

Process

Worker

Process

Database Software on each node

App1

DBMS

W1

W2

C1

DBMS

W1

W2

App2

DBMS

W1

W2

C2

Inter-Query Parallelism

Improves throughput

Different queries/transactions execute on different processors

–

(largely equivalent to material in lectures on concurrency)

Intra-Query Parallelism

Improves response times (lower latency)

Intra-operator (horizontal) parallelism

–

Operators decomposed into independent operator instances, which

perform the same operation on different subsets of data

Inter-operator (vertical) parallelism

–

Operations are overlapped

–

Pipeline data from one stage to the next without materialisation

Bushy (independent) parallelism

–

Subtrees in query plan executed concurrently

Intra-Operator

Parallelism

Intra-Operator Parallelism

SQL Query

Subset

Queries

Subset

Queries

Subset

Queries

Subset

Queries

Processor

Processor

Processor

Processor

Partitioning

Decomposition of operators relies on data being partitioned

across the servers that comprise the parallel database

–

Access data in parallel to mitigate the I/O bottleneck

Partitions should aim to spread I/O load evenly across servers

Choice of partitions affords different parallel query processing

approaches:

–

Range partitioning

–

Hash partitioning

–

Schema partitioning

Range Partitioning

A-H

I-P

Q-Z

Hash Partitioning

Table

Schema Partitioning

Table 1

Table 2

Rebalancing Data

Intra-Operator Parallelism

Example query:

–

SELECT c1,c2 FROM t WHERE c1>5.5

Assumptions:

–

100,000 rows

–

Predicates eliminate 90% of the rows

Considerations for query plans:

–

Data shipping

–

Query shipping

Data Shipping

π

c1,c2

σ

c1>5.5

∪

Data Shipping

Coordinator

and

 Worker

Network

Worker

Worker

Worker

Worker

25,000 tuples

25,000 tuples

25,000 tuples

25,000 tuples

10,000

tuples

 (c1,c2)

Query Shipping

π

c1,c2

σ

c1>5.5

∪

π

c1,c2

σ

c1>5.5

π

c1,c2

σ

c1>5.5

π

c1,c2

σ

c1>5.5

Query Shipping

Coordinator

Network

Worker

Worker

Worker

Worker

2,500 tuples

2,500 tuples

2,500 tuples

2,500 tuples

10,000

tuples

 (c1,c2)

Query Shipping Benefits

•

Database operations are performed where the data are, as far

as possible

•

Network traffic is minimised

•

For basic database operators, code developed for serial

implementations can be reused

•

In practice, mixture of query shipping and data shipping has

to be employed

Inter-Operator

Parallelism

Inter-Operator Parallelism

Allows operators with a producer-consumer dependency to be

executed concurrently

–

Results produced by producer are pipelined directly to consumer

–

Consumer can start before producer has produced all results

–

No need to materialise intermediate relations on disk (although

available buffer memory is a constraint)

–

Best suited to single-pass operators

Inter-Operator Parallelism

time

Scan

Join

Sort

Scan

Join

Sort

Intra- + Inter-Operator Parallelism

time

Scan

Join

Sort

Scan

Join

Sort

Scan

Scan

Join

Join

Sort

Sort

The Volcano Architecture

Basic operators as usual:

–

scan, join, sort, aggregate (sum, count, average, etc)

The Exchange operator

–

Inserted between the steps of a query to:

–

Pipeline results

–

Direct streams of data to the next step(s), redistributing as

necessary

Provides mechanism to support both vertical and horizontal

parallelism

Exchange Operators

Example query:

–

SELECT county, SUM(order_item)

FROM customer, order

WHERE order.customer_id=customer_id

GROUP BY county

ORDER BY SUM(order_item)

Exchange Operators

SORT

GROUP

HASH

JOIN

SCAN

SCAN

Customer

Order

Exchange Operators

EXCHANGE

SCAN

SCAN

Customer

HASH

JOIN

HASH

JOIN

HASH

JOIN

Exchange Operators

EXCHANGE

SCAN

SCAN

Customer

HASH

JOIN

HASH

JOIN

HASH

JOIN

EXCHANGE

SCAN

SCAN

SCAN

Order

EXCHANGE

SCAN

SCAN

Customer

HASH

JOIN

HASH

JOIN

HASH

JOIN

EXCHANGE

SCAN

SCAN

SCAN

Order

EXCHANGE

EXCHANGE

GROUP

GROUP

SORT

Bushy

Parallelism

Execute subtrees concurrently

Bushy Parallelism

σ

π

⨝

⨝

⨝

π

Parallel Query

Processing

Some Parallel Queries

•

Enquiry

•

Collocated Join

•

Directed Join

•

Broadcast Join

•

Repartitioned Join

Combine aspects of intra-operator and bushy parallelism

Orders Database

CUSTKEY

C_NAME

…

C_NATION

…

ORDERKEY

DATE

…

CUSTKEY

…

SUPPKEY

_NAME

…

_NATION

…

SUPPKEY

…

CUSTOMER

ORDER

SUPPLIER

“How many customers live in the UK?”

Enquiry/Query

Multiple partitions

of customer table

Collocated Join

“Which customers placed orders in July?”

UNION

ORDER

Tables both partitioned on

CUSTKEY (the same key) and

therefore corresponding entries

are on the same node

Requires a JOIN of CUSTOMER

and ORDER

CUSTOMER

Return to application

Directed Join

“Which customers placed orders in July?”

(tables have different keys)

UNION

CUSTOMER

ORDER

Slave Task 1

Slave Task 2

Coordinator

Return to application

ORDER partitioned on ORDERKEY, CUSTOMER partitioned on CUSTKEY

Retrieve rows from ORDER, then use ORDER.CUSTKEY to direct

appropriate rows to nodes with CUSTOMER

“Which customers and suppliers are in the same country?”

Broadcast Join

UNION

CUSTOMER

SUPPLIER

Slave Task 1

Slave Task 2

Coordinator

Return to application

SUPPLIER partitioned on SUPPKEY, CUSTOMER on CUSTKEY.

Join required on *_NATION

Send all SUPPLIER to each CUSTOMER node

BROADCAST

“Which customers and suppliers are in the same country?”

Repartitioned Join

UNION

CUSTOMER

SUPPLIER

Slave Task 1

Coordinator

SUPPLIER partitioned on SUPPKEY, CUSTOMER on CUSTKEY.

Join required on *_NATION. Repartition both tables on *_NATION to

localise and minimise the join effort

Slave Task 2

Slave Task 3

Return to application

Concurrency

Control

Concurrency and Parallelism

•

A single transaction may update data in several different

places

•

Multiple transactions may be using the same (distributed)

tables simultaneously

•

One or several nodes cou

ld

 fail

•

Requires concurrency control and recovery across multiple

nodes for:

–

Locking and deadlock detection

–

Two-phase commit to ensure ‘all or nothing’

Locking and Deadlocks

•

With Shared Nothing architecture, each node is responsible

for locking its own data

•

No global locking mechanism

•

However:

–

T1 locks item A on Node 1 and wants item B on Node 2

–

T2 locks item B on Node 2 and wants item A on Node 1

–

Distributed Deadlock

Resolving Deadlocks

•

One approach

–

 Timeouts

•

Timeout T2, after wait exceeds a certain interval

–

Interval may need random element to avoid ‘chatter’

i.e. both transactions give up at the same time and then try again

•

Rollback T2 to let T1 to proceed

•

Restart T2, which can now complete

Resolving Deadlocks

•

More sophisticated approach (DB2)

•

Each node maintains a local ‘wait-for’ graph

•

Distributed deadlock detector (DDD) runs at the catalogue

node for each database

•

Periodically, all nodes send their graphs to the DDD

•

DDD records all locks found in wait state

•

Transaction becomes a candidate for termination if found in

same lock wait state on two successive iterations

Reliability

Reliability

We wish to preserve the ACID properties for parallelised

transactions

–

Isolation is taken care of by 2PL protocol

–

Isolation implies Consistency

–

Durability can be taken care of node-by-node, with proper logging

and recovery routines

–

Atomicity is the hard part. We need to commit all parts of a

transaction, or abort all parts

Two-phase commit protocol (2PC) is used to ensure that

Atomicity is preserved

Two-Phase Commit (2PC)

Distinguish between:

–

The global transaction

–

The local transactions into which the global transaction is

decomposed

Global transaction is managed by a single site, known as the

coordinator

Local transactions may be executed on separate sites, known

as the

participants

Phase 1: Voting

•

Coordinator sends “prepare T” message to all participants

•

Participants respond with either “vote-commit T” or

“vote-abort T”

•

Coordinator waits for participants to respond within a

timeout period

Phase 2: Decision

•

If all participants return “vote-commit T” (to commit), send

“commit T” to all participants. Wait for acknowledgements

within timeout period.

•

If any participant returns “vote-abort T”, send “abort T” to all

participants. Wait for acknowledgements within timeout

period.

•

When all acknowledgements received, transaction is

completed.

•

If a site does not acknowledge, resend global decision until it

is acknowledged.

Normal Operation

prepare T

vote-commit T

commit T

ack

vote-commit T

received from all

participants

Logging

prepare T

vote-commit T

commit T

ack

<commit T>

<begin-commit T>

<end T>

<ready T>

<commit T>

vote-commit T

received from all

participants

Aborted Transaction

prepare T

vote-commit T

abort T

ack

<abort T>

<begin-commit T>

<end T>

<ready T>

<abort T>

vote-abort T received

from at least one

participant

Aborted Transaction

prepare T

vote-abort T

abort T

ack

<abort T>

<begin-commit T>

<end T>

<abort T>

vote-abort T received

from at least one

participant

State Transitions

prepare T

vote-commit T

commit T

ack

vote-commit T

received from all

participants

INITIAL

WAIT

COMMIT

INITIAL

READY

COMMIT

State Transitions

prepare T

vote-commit T

abort T

ack

vote-abort T received

from at least one

participant

INITIAL

WAIT

ABORT

INITIAL

READY

ABORT

State Transitions

prepare T

vote-abort T

abort T

ack

INITIAL

WAIT

ABORT

INITIAL

ABORT

Coordinator State Diagram

sent:

prepare

recv: vote-abort T

sent: abort T

INITIAL

WAIT

ABORT

COMMIT

recv: vote-commit T

sent: commit T

Participant State Diagram

recv: prepare

sent: vote-commit T

recv: commit T

send: ack

INITIAL

READY

COMMIT

ABORT

recv: prepare T

sent: vote-abort

recv:

abort

send: ack

Dealing with failures

If the coordinator or a participant fails during the commit, two

things happen:

–

The other sites will time out while waiting for the next message from

the failed site and invoke a

termination protocol

–

When the failed site restarts, it tries to work out the state of the

commit by invoking a

recovery protocol

The behaviour of the sites under these protocols depends on

the state they were in when the site failed

Termination Protocol: Coordinator

Timeout in WAIT

–

Coordinator is waiting for participants to vote on whether they're

going to commit or abort

–

A missing vote means that the coordinator cannot commit the global

transaction

–

Coordinator may abort the global transaction

Timeout in COMMIT/ABORT

–

Coordinator is waiting for participants to acknowledge successful

commit or abort

–

Coordinator resends global decision to participants who have not

acknowledged

Termination Protocol: Participant

Timeout in INITIAL

–

Participant is waiting for a “prepare T”

–

May unilaterally abort the transaction after a timeout

–

If “prepare T” arrives after unilateral abort, either:

–

resend the “vote-abort T” message or

–

ignore (coordinator then times out in WAIT)

Timeout in READY

–

Participant is waiting for the instruction to commit or abort – blocked

without further information

–

Participant can contact other participants to find one that knows the

decision – cooperative termination protocol

Recovery Protocol: Coordinator

Failure in INITIAL

–

Commit not yet begun, restart commit procedure

Failure in WAIT

–

Coordinator has sent “prepare T”, but has not yet received all

vote-commit/vote-abort messages from participants

–

Recovery restarts commit procedure by resending “prepare T”

Failure in COMMIT/ABORT

–

If coordinator has received all “ack” messages, complete successfully

–

Otherwise, terminate

Recovery Protocol: Participant

Failure in INITIAL

–

Participant has not yet voted

–

Coordinator cannot have reached a decision

–

Participant should unilaterally abort by sending “vote-abort T”

Failure in READY

–

Participant has voted, but doesn't know what the global decision was

–

Cooperative termination protocol

Failure in COMMIT/ABORT

–

Resend “ack” message

Parallel

Utilities

Parallel Utilities

Ancillary operations can also exploit the parallel hardware

–

Parallel Data Loading/Import/Export

–

Parallel Index Creation

–

Parallel Rebalancing

–

Parallel Backup

–

Parallel Recovery

Slide Note

Embed Share

Download

Explore the concept of parallel databases, how they address the I/O bottleneck, and their benefits such as increased scalability and improved application availability. Learn about parallel architectures and shared memory systems in advanced database design. Discover the importance of concurrency control and the impact of the memory hierarchy on data processing efficiency.

mwidm Follow

Uploaded on Sep 12, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Parallel Databases COMP3211 Advanced Databases Dr Nicholas Gibbins - nmg@ecs.soton.ac.uk 2014-2015

Overview The I/O bottleneck Parallel architectures Parallel query processing Inter-operator parallelism Intra-operator parallelism Bushy parallelism Concurrency control Reliability 2

The I/O Bottleneck

The Memory Hierarchy, Revisited Type Capacity Latency Registers 101 bytes 1 cycle L1 104 bytes <5 cycles L2 105 bytes 5-10 cycles RAM 109-1010 bytes 20-30 cycles (10-8 s) Hard Disk 1011-1012 bytes 106 cycles (10-3 s) 4

The I/O Bottleneck Access time to secondary storage (hard disks) dominates performance of DBMSes Two approaches to addressing this: Main memory databases (expensive!) Parallel databases (cheaper!) Increase I/O bandwidth by spreading data across a number of disks 5

Definitions Parallelism An arrangement or state that permits several operations or tasks to be performed simultaneously rather than consecutively Parallel Databases have the ability to split processing of data access to data across multiple processors, multiple disks 6

Why Parallel Databases Hardware trends Reduced elapsed time for queries Increased transaction throughput Increased scalability Better price/performance Improved application availability Access to more data In short, for better performance 7

Parallel Architectures

Shared Memory Architecture Tightly coupled P P P Symmetric Multiprocessor (SMP) P = processor Global Memory M = memory 9

Software Shared Memory Less complex database software P P P Limited scalability Single buffer Single database storage Global Memory 10

Shared Disc Architecture Loosely coupled P P P Distributed Memory M M M S 11

Software Shared Disc Avoids memory bottleneck P P P Same page may be in more than one buffer at once can lead to incoherence M M M Needs global locking mechanism S Single logical database storage Each processor has its own database buffer 12

Shared Nothing Architecture Massively Parallel P P P Loosely Coupled High Speed Interconnect (between processors) M M M 13

Software - Shared Nothing Each processor owns part of the data P P P Each processor has its own database buffer M M M One page is only in one local buffer no buffer incoherence Needs distributed deadlock detection Needs multiphase commit protocol Needs to break SQL requests into multiple sub-requests 14

Hardware vs. Software Architecture It is possible to use one software strategy on a different hardware arrangement Also possible to simulate one hardware configuration on another Virtual Shared Disk (VSD) makes an IBM SP shared nothing system look like a shared disc setup (for Oracle) From this point on, we deal only with shared nothing 15

Shared Nothing Challenges Partitioning the data Keeping the partitioned data balanced Splitting up queries to get the work done Avoiding distributed deadlock Concurrency control Dealing with node failure 16

Parallel Query Processing

Dividing up the Work Application Coordinator Process Worker Process Worker Process Worker Process 18

Database Software on each node App1 App2 DBMS DBMS DBMS C1 C2 W1 W2 W1 W2 W1 W2 19

Inter-Query Parallelism Improves throughput Different queries/transactions execute on different processors (largely equivalent to material in lectures on concurrency) 20

Intra-Query Parallelism Improves response times (lower latency) Intra-operator (horizontal) parallelism Operators decomposed into independent operator instances, which perform the same operation on different subsets of data Inter-operator (vertical) parallelism Operations are overlapped Pipeline data from one stage to the next without materialisation Bushy (independent) parallelism Subtrees in query plan executed concurrently 21

Intra-Operator Parallelism

Intra-Operator Parallelism SQL Query Subset Queries Subset Queries Subset Queries Subset Queries Processor Processor Processor Processor 23

Partitioning Decomposition of operators relies on data being partitioned across the servers that comprise the parallel database Access data in parallel to mitigate the I/O bottleneck Partitions should aim to spread I/O load evenly across servers Choice of partitions affords different parallel query processing approaches: Range partitioning Hash partitioning Schema partitioning 24

Range Partitioning A-H I-P Q-Z 25

Hash Partitioning Table 26

Schema Partitioning Table 1 Table 2 27

Rebalancing Data Data in proper balance Data grows, performance drops Add new nodes and disc Redistribute data to new nodes 28

Intra-Operator Parallelism Example query: SELECT c1,c2 FROM t WHERE c1>5.5 Assumptions: 100,000 rows Predicates eliminate 90% of the rows Considerations for query plans: Data shipping Query shipping 29

Data Shipping c1,c2 c1>5.5 t1 t2 t3 t4 30

Data Shipping Coordinator and Worker 10,000 tuples (c1,c2) Network 25,000 tuples 25,000 tuples 25,000 tuples 25,000 tuples Worker Worker Worker Worker 31

Query Shipping c1,c2 c1,c2 c1,c2 c1,c2 c1>5.5 c1>5.5 c1>5.5 c1>5.5 t1 t2 t3 t4 32

Query Shipping 10,000 tuples (c1,c2) Coordinator Network 2,500 tuples 2,500 tuples 2,500 tuples 2,500 tuples Worker Worker Worker Worker 33

Query Shipping Benefits Database operations are performed where the data are, as far as possible Network traffic is minimised For basic database operators, code developed for serial implementations can be reused In practice, mixture of query shipping and data shipping has to be employed 34

Inter-Operator Parallelism

Inter-Operator Parallelism Allows operators with a producer-consumer dependency to be executed concurrently Results produced by producer are pipelined directly to consumer Consumer can start before producer has produced all results No need to materialise intermediate relations on disk (although available buffer memory is a constraint) Best suited to single-pass operators 36

Inter-Operator Parallelism Scan Join Sort Scan Join Sort time 37

Intra- + Inter-Operator Parallelism Scan Join Sort Scan Join Sort Scan Scan Join Join Sort Sort time 38

The Volcano Architecture Basic operators as usual: scan, join, sort, aggregate (sum, count, average, etc) The Exchange operator Inserted between the steps of a query to: Pipeline results Direct streams of data to the next step(s), redistributing as necessary Provides mechanism to support both vertical and horizontal parallelism 39

Exchange Operators Example query: SELECT county, SUM(order_item) FROM customer, order WHERE order.customer_id=customer_id GROUP BY county ORDER BY SUM(order_item) 40

Exchange Operators SORT GROUP HASH JOIN SCAN SCAN Customer Order 41

Exchange Operators HASH JOIN HASH JOIN HASH JOIN EXCHANGE SCAN SCAN Customer 42

Exchange Operators HASH JOIN HASH JOIN HASH JOIN EXCHANGE EXCHANGE SCAN SCAN SCAN SCAN SCAN 43 Customer Order

SORT EXCHANGE GROUP GROUP EXCHANGE HASH JOIN HASH JOIN HASH JOIN EXCHANGE EXCHANGE SCAN SCAN SCAN SCAN SCAN 44 Customer Order

Bushy Parallelism

Bushy Parallelism Execute subtrees concurrently R S T U 46

Parallel Query Processing

Some Parallel Queries Enquiry Collocated Join Directed Join Broadcast Join Repartitioned Join Combine aspects of intra-operator and bushy parallelism 48

Orders Database CUSTOMER CUSTKEY C_NAME C_NATION ORDER ORDERKEY DATE CUSTKEY SUPPKEY SUPPLIER SUPPKEY S_NAME S_NATION 49

Enquiry/Query How many customers live in the UK? Return to application Coordinator SUM Return subcounts to coordinator SCAN Slave Task COUNT Multiple partitions of customer table 50

Parallel Databases and Their Impact on Performance

Download Presentation

Presentation Transcript

Related

More Related Content