H-Store: A High-Performance, Distributed Main Memory Transaction Processing System

undefined
 
H-store: A high-performance,
distributed main memory
transaction processing system
 
Robert Kallman, Hideaki Kimura, Jonathan Natkins,
Andrew Pavlo, Alex Rasin, Stanley B. Zdonik,
Evan P. C. Jones, Samuel Madden, Michael Stonebraker,
Yang Zhang, John Hugg, Daniel J. Abadi
 
Paper highlights
 
An experimental main-memory, parallel DBMS
Optimized for on-line transaction processing (OLTP)
applications
Highly distributed, row-store-based relational database
Runs on a cluster on shared-nothing, main memory executor
nodes.
 
Background
 
Relational DBMS
 
Data are stored in tables
Each row corresponds to a record
No pointers or other links
Matching keys
 
Example
 
P
S
I
D
 
i
s
 
a
 
k
e
y
Links rows among tables
Must be unique
 
Atomic Transactions: You buy a car
 
What are atomic transactions?
 
A mechanism used in databases and other financial systems.
System guarantees that an atomic transaction will either be
executed properly or abort without leaving any trace
All or nothing semantics
Atomic transactions verify the four ACID properties
 
The ACID properties
 
A
t
o
m
i
c
i
t
y
:
 
A
l
l
 
o
r
 
n
o
t
h
i
n
g
 
p
r
o
p
e
r
t
y
C
o
n
s
i
s
t
e
n
c
y
:
 
A
 
t
r
a
n
s
a
c
t
i
o
n
 
e
i
t
h
e
r
 
b
r
i
n
g
s
 
t
h
e
 
d
a
t
a
 
i
n
 
a
 
n
e
w
c
o
n
s
i
s
t
e
n
t
 
s
t
a
t
e
 
o
f
 
d
a
t
a
 
o
r
 
r
e
t
u
r
n
s
 
t
h
e
m
 
t
o
 
t
h
e
i
r
 
p
r
e
v
i
o
u
s
 
s
t
a
t
e
I
s
o
l
a
t
i
o
n
:
 
A
 
t
r
a
n
s
a
c
t
i
o
n
 
i
n
 
p
r
o
c
e
s
s
 
a
n
d
 
n
o
t
 
y
e
t
 
c
o
m
m
i
t
t
e
d
 
h
a
s
 
n
o
e
f
f
e
c
t
 
o
n
 
a
n
y
 
o
t
h
e
r
 
t
r
a
n
s
a
c
t
i
o
n
.
D
u
r
a
b
i
l
i
t
y
:
 
C
o
m
m
i
t
t
e
d
 
d
a
t
a
 
a
r
e
 
s
t
o
r
e
d
 
b
y
 
t
h
e
 
s
y
s
t
e
m
 
i
n
 
s
o
m
e
k
i
n
d
 
o
f
 
c
r
a
s
h
-
p
r
o
o
f
 
s
t
o
r
a
g
e
 
Importance of atomic transactions
 
Atomicity and consistency properties guarantee that either the
transaction is correct or will leave no traces
No partial updates
No incorrect updates
I
s
o
l
a
t
i
o
n
 
p
r
o
p
e
r
t
y
 
a
l
l
o
w
s
 
c
o
n
c
u
r
r
e
n
t
 
e
x
e
c
u
t
i
o
n
 
o
f
 
t
r
a
n
s
a
c
t
i
o
n
s
Much faster than serial execution
Durability property ensures transactions will not be lost
 
Back to the paper
 
Motivation
 
Legacy OLTP databases
Too many of their architectural components are old
Inherited from original System R
Mid-seventies!
Take advantage of recent trends
Multi-core architectures
Cheap abundant main memory
Dominant use of stored procedures
 
The focus
 
Reject “one size fits all”  approach
 
On-line transaction processing (OLTP) systems have specific
proprieties
Repetitive short-lived transactions
Stored  procedures
Sole focus of this work
 
 
Main issue
 
Poor I/O performance of RDBMS
Their solution
Scale system “horizontally”
P
a
r
t
i
t
i
o
n
 
r
e
s
p
o
n
s
i
b
i
l
i
t
i
e
s
 
a
m
o
n
g
 
m
u
l
t
i
p
l
e
 
s
h
a
r
e
d
 
n
o
t
h
i
n
g
m
a
c
h
i
n
e
s
Store entire DB in the memory of a large cluster of server
machines
R
e
l
y
 
o
n
 
r
e
p
l
i
c
a
t
i
o
n
 
t
o
 
m
i
n
i
m
i
z
e
t
h
e
 
r
i
s
k
 
o
f
 
d
a
t
a
 
l
o
s
s
 
H-Store
 
Next generation OLTP system
Operates on a distributed cluster of shared nothing machines
Coordinates the work of multiple single-threaded engines
All data are always kept in main memory
 
System Overview
 
H
-
S
t
o
r
e
C
l
u
s
t
e
r
 
c
o
n
t
a
i
n
i
n
g
 
t
w
o
 
o
r
 
m
o
r
e
 
c
o
m
p
u
t
a
t
i
o
n
a
l
 
n
o
d
e
s
N
o
d
e
s
S
i
n
g
l
e
 
p
h
y
s
i
c
a
l
 
c
o
m
p
o
n
e
n
t
 
t
h
a
t
 
h
o
l
d
s
 
m
u
l
t
i
p
l
e
 
s
i
t
e
s
S
i
t
e
s
Normally run on a dedicated core
S
i
n
g
l
e
-
t
h
r
e
a
d
e
d
D
o
 
n
o
t
 
s
h
a
r
e
 
a
n
y
 
d
a
t
a
 
s
t
r
u
c
t
u
r
e
 
o
r
 
m
e
m
o
r
y
 
w
i
t
h
 
a
n
y
 
c
o
l
l
o
c
a
t
e
d
s
i
t
e
 
H-Store system architecture
 
System deployment
 
Cluster deployment framework takes at inputs
A set of stored procedures
A database schema
A sample workload (used to optimize data layout)
A set of available sites in the cluster
Two-phase optimization
First optimize stored procedures as if the database was not
distributed
Then come with distributed query plans
 
Run-time model
 
All sites in the cluster are trusted
Any site is able to execute any OLTP application request
Execution plan is
Annotated with the locations of the target sites
Passed to a transaction manager
No shared data structures
Everything is single-threaded
 
Database properties
 
Physical layout of DB specifically optimized to execute
precompiled transactions
Not ad hoc queries
Can still be executed but could be very slow
 
Transaction classes
 
Two important special cases
S
i
n
g
l
e
-
S
i
t
e
 
T
r
a
n
s
a
c
t
i
o
n
s
Can be entirely executed on a single site
Easy to send the transaction to one of the target sites
O
n
e
-
S
h
o
t
 
T
r
a
n
s
a
c
t
i
o
n
s
Each of is individual queries executes on a single site
Output of these queries is not reused as inputs for other
queries
Easy to execute in parallel
 
Physical layout
 
Replicate frequently-accessed or read-only tables on each site
Horizontal partition of tables
Partitions can be accessed in parallel
Collocate them with related data
Protect data against node failures
Important for in-memory DBs
k
-
s
a
f
e
t
y
N
u
m
b
e
r
 
k
 
o
f
 
n
o
d
e
 
f
a
i
l
u
r
e
s
 
D
B
 
m
u
s
t
 
t
o
l
e
r
a
t
e
 
DB layout loader
 
T
a
b
l
e
 
R
e
p
l
i
c
a
t
i
o
n
Replicate all read-only tables on all sites
D
a
t
a
 
P
a
r
t
i
t
i
o
n
i
n
g
Divide horizontally each table into four disjoint partitions
Each partition is stored on two different sites
Accent is on parallelism
K
-
S
a
f
e
t
y
k
 
=
 
2
Slide Note
Embed
Share

H-Store is an experimental main-memory DBMS optimized for online transaction processing (OLTP) applications. It is highly distributed and runs on a cluster of shared-nothing, main memory executor nodes. The system stores data in tables with each row corresponding to a record, ensuring atomic transactions and ACID properties for reliability and efficiency.

  • H-Store
  • Distributed System
  • Main Memory
  • Transaction Processing
  • ACID Properties

Uploaded on Aug 03, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. H-store: A high-performance, distributed main memory transaction processing system Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alex Rasin, Stanley B. Zdonik, Evan P. C. Jones, Samuel Madden, Michael Stonebraker, Yang Zhang, John Hugg, Daniel J. Abadi

  2. Paper highlights An experimental main-memory, parallel DBMS Optimized for on-line transaction processing (OLTP) applications Highly distributed, row-store-based relational database Runs on a cluster on shared-nothing, main memory executor nodes.

  3. Background

  4. Relational DBMS Data are stored in tables Each row corresponds to a record No pointers or other links Matching keys

  5. Example Student Alan PSID 08887 Major CS CE Barbara 13339 PSID is a key Links rows among tables Must be unique PSID Mid Final 08887 89 13339 89 87 90

  6. Atomic Transactions: You buy a car

  7. What are atomic transactions? A mechanism used in databases and other financial systems. System guarantees that an atomic transaction will either be executed properly or abort without leaving any trace All or nothing semantics Atomic transactions verify the four ACID properties

  8. The ACID properties Atomicity: All or nothing property Consistency: A transaction either brings the data in a new consistent state of data or returns them to their previous state Isolation: A transaction in process and not yet committed has no effect on any other transaction. Durability: Committed data are stored by the system in some kind of crash-proof storage

  9. Importance of atomic transactions Atomicity and consistency properties guarantee that either the transaction is correct or will leave no traces No partial updates No incorrect updates Isolation property allows concurrent execution of transactions Much faster than serial execution Durability property ensures transactions will not be lost

  10. Back to the paper

  11. Motivation Legacy OLTP databases Too many of their architectural components are old Inherited from original System R Mid-seventies! Take advantage of recent trends Multi-core architectures Cheap abundant main memory Dominant use of stored procedures

  12. The focus Reject one size fits all approach On-line transaction processing (OLTP) systems have specific proprieties Repetitive short-lived transactions Stored procedures Sole focus of this work

  13. Main issue Poor I/O performance of RDBMS Their solution Scale system horizontally Partition responsibilities among multiple shared nothing machines Store entire DB in the memory of a large cluster of server machines Rely on replication to minimize the risk of data loss

  14. H-Store Next generation OLTP system Operates on a distributed cluster of shared nothing machines Coordinates the work of multiple single-threaded engines All data are always kept in main memory

  15. System Overview H-Store Cluster containing two or more computational nodes Nodes Single physical component that holds multiple sites Sites Normally run on a dedicated core Single-threaded Do not share any data structure or memory with any collocated site

  16. H-Store system architecture

  17. System deployment Cluster deployment framework takes at inputs A set of stored procedures A database schema A sample workload (used to optimize data layout) A set of available sites in the cluster Two-phase optimization First optimize stored procedures as if the database was not distributed Then come with distributed query plans

  18. Run-time model All sites in the cluster are trusted Any site is able to execute any OLTP application request Execution plan is Annotated with the locations of the target sites Passed to a transaction manager No shared data structures Everything is single-threaded

  19. Database properties Physical layout of DB specifically optimized to execute precompiled transactions Not ad hoc queries Can still be executed but could be very slow

  20. Transaction classes Two important special cases Single-Site Transactions Can be entirely executed on a single site Easy to send the transaction to one of the target sites One-Shot Transactions Each of is individual queries executes on a single site Output of these queries is not reused as inputs for other queries Easy to execute in parallel

  21. Physical layout Replicate frequently-accessed or read-only tables on each site Horizontal partition of tables Partitions can be accessed in parallel Collocate them with related data Protect data against node failures Important for in-memory DBs k-safety Number k of node failures DB must tolerate

  22. DB layout loader Table Replication Replicate all read-only tables on all sites Data Partitioning Divide horizontally each table into four disjoint partitions Each partition is stored on two different sites Accent is on parallelism K-Safety k = 2

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#