NoSQL Database Scalability Using Indexing Techniques

Replex: A Multi-Index, Highly-

Available Data Store

Amy Tai*

§

, Michael Wei*, Michael J. Freedman

§

Ittai Abraham*, Dahlia Malkhi*

*VMware Research,

§

Princeton University

SQL vs NoSQL

SQL

NoSQL

Rich querying model

Transaction support

Simple, key-based

querying

No or limited

transaction support

Poor performance

Scalable

Key-value stores

Ex: HyperDex, Cassandra

NewSQL

Rich querying model

Transaction support

Scalable, high

performance

Ex: CockroachDB,

Yesquel, H-Store

SQL vs NoSQL

SQL vs NoSQL

SQL

NoSQL

Rich querying model

Transaction support

Simple, key-based

querying

No or limited

transaction support

Poor performance

Scalable

Key-value stores

Ex: HyperDex, Cassandra

NewSQL

Rich querying model

Transaction support

Scalable, high

performance

Ex: CockroachDB,

Yesquel, H-Store

SQL vs NoSQL

NoSQL Scales with Database Partitioning

 r[0]         r[1]         r[2]              . . .               r[c]

rows

default partitioning key

NoSQL Scales with Database Partitioning

Partitions

(with respect to the

partitioning index)

How to enable richer queries

without sacrificing shared-

nothing scale-out?

Key Observations

1.

Indexing enables richer queries

Local Indexing on the Partitioning Index

Index updates are local

select * where col0=“foo”

Index lookups are local

Local Indexing

Index stored locally at each partition

Index updates are local

 insert

(r[0] r[1] r[2]  . . . r[c])

Local Indexing

Each partition builds a local index

Index updates are local

Index lookups must be

broadcast to all partitions

select * where col2=“foo”

Potential synchronization across

partitions

Index lookups executed on

a single data structure

Global Indexing

Distributed data structure spans all partitions

Index updates

are multi-hop

 insert

(r[0] r[1] r[2]  . . . r[c])

Key Observations

1.

Indexing enables richer queries

2. Indexes more efficient if data is

    partitioned according to that index

Partitioning by Index



 Storage Overheads

default

partitioning key

Partitioning by index



 must store data again

Partitioning by Index



 Storage Overheads

default

partitioning key

Partitioning by index



 must store data again

Partitioning by Index



 Storage Overheads

default

partitioning key

Partitioning by index



 must store data again

Partitioning by Index



 Storage Overheads

default

partitioning key

HyperDex

Key Observations

1.

Indexing enables richer queries

2. Indexes more efficient if data is

    partitioned according to that index

3. Rethink replication paradigm

Replex

New replication unit:

replex

 -- data replica partitioned and sorted with

respect to an associated index

replexes

Serves data

repl

ication and ind

ex

ing

Replex

New replication unit:

replex

 -- data replica partitioned and sorted with

respect to an associated index

replexes

Serves data

repl

ication and ind

ex

ing

Replace data replicas with replexes

System Architecture

Inserts in Replex

 insert

(r[0] r[1] r[2]  . . . r[c])

Partition Determined by a replex’s Index

 insert

r[0]

r[1] r[2]  . . . r[c])

 insert

r[0]

r[1] r[2]  . . . r[c])

Partition Determined by a replex’s Index

Partition Determined by a replex’s Index

 insert

(r[0] r[1]

r[2]

. . . r[c])

Partition Determined by a replex’s Index

 insert

(r[0] r[1]

r[2]

  . . . r[c])

Data Chains in Replex

Partitions in different replexes make

up a data chain if they share rows

(x[0] x[1] x[2]  . . . x[c])

(y[0] y[1] y[2]  . . . y[c])

(z[0] z[1] z[2]  . . . z[c])

Data Chains in Replex

All pairs of partitions are

potential data chains

Main Challenge in Replex

Increased failure complexity

Partition Failures

Index is unavailable,

data is available

1.

 Partition Recovery

2.

 Client Requests

Partition Failures

Index is unavailable,

data is available

Where is the data?

Finding Data After Failure

Finding Data After Failure

Partition Recovery

Client Requests

Parallel Recovery

Requests must be multicast

Lookups reduce to linear

scan at each partitions

Request Amplification

Tradeoff?

Recovery Time

Hybrid Replex: Key Contributions

1.

Recovery time vs request amplification tradeoff

2.

Improve failure availability of multiple replexes

3.

Storage vs recovery performance tradeoff

4.

Graceful failure degradation

Hybrid Replex: Key Contributions

1.

Recovery time vs request amplification tradeoff

2.

Improve failure availability of multiple replexes

3.

Storage vs recovery performance tradeoff

4.

Graceful failure degradation

Hybrid Replex: What is it?

•

Shared across

 replexes

•

Not associated with an index

•

Partitioning function dependent on these

 replexes

Hybrid Replex

hybrid

replex

Data Chains with Hybrid Replex

 insert

(r[0] r[1] r[2]  . . . r[c])

Data Chains with Hybrid Replex

 insert

r[0]

r[1]

r[2]

. . . r[c])

Data Chains with Hybrid Replex

 insert

r[0]

r[1]

r[2]

. . . r[c])

Data Chains with Hybrid Replex

 insert

r[0]

r[1]

r[2]

. . . r[c])

1. Recovery time vs Request Amplification

Recovery is less parallel… but

lower request amplification

2. Improve failure availability of multiple replexes

2. Improve failure availability of multiple replexes

Improving Failure Availability w/o Hybrid Replexes

2x Storage!

3. Storage vs. recovery performance

1.5x Storage! …  but

worse client requests

Generalizing Hybrid Replexes

 hybrid replexes may be shared across

 replexes

For the special case when r = 2,

request amplification will be O(n

1/(k+1)

For the special case when k = 1,

request amplification will be O(n

(r-1)/r

Implementation

•

Built on top of HyperDex, ~700 LOC

•

Implemented partition recovery and request

re-routing on failure

•

Implemented variety of hybrid replex

configurations

Evaluation

1.

What is impact of replexes on

steady-state performance?

2.

How do hybrid replexes affect

failure performance?

Evaluation Setup

•

Table with two indexes

•

12 server machines, 4 client machines

•

All machines colocated in the same rack,

connected via 1GB head-of-rack switch

•

8 CPU, 16GB memory per machine

Systems Under Evaluation

Replex-2

HyperDex

Replex-3

Storage overhead: 2x

Network hops per insert: 2

Failures Tolerated: 1

Storage overhead: 3x

Network hops per insert: 3

Failures Tolerated: 2

Storage overhead: 6x

Network hops per insert: 6

Failures Tolerated: 2

Systems Under Evaluation

Replex-2

HyperDex

Replex-3

Storage overhead: 2x

Network hops per insert: 2

Failures Tolerated: 1

Storage overhead: 3x

Network hops per insert: 3

Failures Tolerated: 2

Storage overhead: 6x

Network hops per insert: 6

Failures Tolerated: 2

Steady State Latency

Reads

Inserts

Replex-2 not included because it

has a lower fault tolerance

threshold

Reads against either index have

similar latency, but we report reads

against primary index

3 network

hops

6 network

hops

Single Failure Performance

Experiment

•

Load with 10M, 100 byte rows

•

Split reads 50:50 between each index

Failure

Single Failure: Recovery Time

Recovery Time

1.

HyperDex recovers slowest because 2-3x more data

2.

Replex-2 recovers fastest because least data, parallel recovery

Single Failure: Failure Throughput

Failure Throughput

1.

Replex-2 low throughput because of high request amplification

2.

Replex-3 has throughput comparable to HyperDex

Increased

throughput with

hybrid replex

Two Failure Performance

Experiment

•

Load with 1M, 1 KB rows

•

50:50 reads against each index

Failures

*Replex-2 not included b/c

  only tolerates 1 failure

Two Failure Performance

1.

Recovery Time: Replex-3 recovers nearly 3x faster, due to less data

2.

Failure Throughput: HyperDex has ½ throughput, because nearly

every partition is participating in recovery

Summary

1.

Replacing replicas with replexes decreases

index storage AND steady-state overheads

2.

Hybrid replexes enable a rich tradeoff space

during failures

Questions?

Slide Note

Embed Share

Download

Dive into the world of NoSQL database scalability by understanding how indexing enables richer queries and how local indexing impacts partitioning, updates, and lookups across distributed databases.

mark_rla Follow

Uploaded on Oct 10, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Replex: A Multi-Index, Highly- Available Data Store Amy Tai* , Michael Wei*, Michael J. Freedman , Ittai Abraham*, Dahlia Malkhi* *VMware Research, Princeton University

SQL vs NoSQL Key-value stores Ex: HyperDex, Cassandra Ex: CockroachDB, Yesquel, H-Store NoSQL SQL NewSQL Rich querying model Simple, key-based querying No or limited transaction support Rich querying model Transaction support Transaction support Poor performance Scalable, high performance Scalable

SQL vs NoSQL Key-value stores Ex: HyperDex, Cassandra Ex: CockroachDB, Yesquel, H-Store NoSQL SQL NewSQL Rich querying model Simple, key-based querying No or limited transaction support Rich querying model Transaction support Transaction support Poor performance Scalable, high performance Scalable

NoSQL Scales with Database Partitioning col2 colc col col . . . 1 0 rows r[0] r[1] r[2] . . . r[c] default partitioning key

NoSQL Scales with Database Partitioning Partitions (with respect to the partitioning index)

How to enable richer queries without sacrificing shared- nothing scale-out?

Key Observations 1. Indexing enables richer queries

Local Indexing on the Partitioning Index col2 colc col1 col0 . . . select * where col0= foo Index updates are local Index lookups are local

Local Indexing Index stored locally at each partition col2 colc col1 col0 . . . insert (r[0] r[1] r[2] . . . r[c]) Index updates are local

Local Indexing Each partition builds a local index col2 colc col1 col0 . . . select * where col2= foo Index updates are local Index lookups must be broadcast to all partitions Potential synchronization across partitions

Global Indexing Distributed data structure spans all partitions col2 colc col1 col0 . . . insert (r[0] r[1] r[2] . . . r[c]) Index updates are multi-hop Index lookups executed on a single data structure

Key Observations 1. Indexing enables richer queries 2. Indexes more efficient if data is partitioned according to that index

Partitioning by Index Storage Overheads col2 colc col1 col0 . . . default partitioning key Partitioning by index must store data again

Partitioning by Index Storage Overheads col2 colc col1 col0 . . . default partitioning key Partitioning by index must store data again

Partitioning by Index Storage Overheads col2 colc col1 col0 . . . default partitioning key Partitioning by index must store data again

Partitioning by Index Storage Overheads col2 colc col1 col0 . . . default partitioning key HyperDex

Key Observations 1. Indexing enables richer queries 2. Indexes more efficient if data is partitioned according to that index 3. Rethink replication paradigm

Replex New replication unit: replex -- data replica partitioned and sorted with respect to an associated index replexes Serves data replication and indexing

Replex New replication unit: replex -- data replica partitioned and sorted with respect to an associated index replexes Replace data replicas with replexes Serves data replication and indexing

System Architecture col2 colc col0 col1 . . .

Inserts in Replex insert (r[0] r[1] r[2] . . . r[c]) ?

Partition Determined by a replexs Index insert (r[0] r[1] r[2] . . . r[c]) ?

Partition Determined by a replexs Index insert (r[0] r[1] r[2] . . . r[c]) ?

Partition Determined by a replexs Index insert (r[0] r[1] r[2] . . . r[c]) ?

Partition Determined by a replexs Index insert (r[0] r[1] r[2] . . . r[c])

Data Chains in Replex (x[0] x[1] x[2] . . . x[c]) (y[0] y[1] y[2] . . . y[c]) (z[0] z[1] z[2] . . . z[c]) Partitions in different replexes make up a data chain if they share rows

Data Chains in Replex All pairs of partitions are potential data chains

Main Challenge in Replex Increased failure complexity

Partition Failures 1. Partition Recovery 2. Client Requests Index is unavailable, data is available

Partition Failures Where is the data? Index is unavailable, data is available

Finding Data After Failure

Finding Data After Failure Client Requests Partition Recovery Tradeoff? Requests must be multicast Parallel Recovery Lookups reduce to linear scan at each partitions Recovery Time Request Amplification

Hybrid Replex: Key Contributions 1. Recovery time vs request amplification tradeoff 2. Improve failure availability of multiple replexes 3. Storage vs recovery performance tradeoff 4. Graceful failure degradation

Hybrid Replex: Key Contributions 1. Recovery time vs request amplification tradeoff 2. Improve failure availability of multiple replexes 3. Storage vs recovery performance tradeoff 4. Graceful failure degradation

Hybrid Replex: What is it? Shared across r replexes Not associated with an index Partitioning function dependent on these r replexes

Hybrid Replex hybrid replex

Data Chains with Hybrid Replex insert (r[0] r[1] r[2] . . . r[c])

Data Chains with Hybrid Replex insert (r[0] r[1] r[2] . . . r[c])

Data Chains with Hybrid Replex insert (r[0] r[1] r[2] . . . r[c]) ? ? ? ?

Data Chains with Hybrid Replex insert (r[0] r[1] r[2] . . . r[c])

1. Recovery time vs Request Amplification Recovery is less parallel but lower request amplification

2. Improve failure availability of multiple replexes

2. Improve failure availability of multiple replexes

Improving Failure Availability w/o Hybrid Replexes 2x Storage!

3. Storage vs. recovery performance 1.5x Storage! but worse client requests

Generalizing Hybrid Replexes k hybrid replexes may be shared across r replexes For the special case when r = 2, request amplification will be O(n1/(k+1)) For the special case when k = 1, request amplification will be O(n(r-1)/r)

Implementation Built on top of HyperDex, ~700 LOC Implemented partition recovery and request re-routing on failure Implemented variety of hybrid replex configurations

Evaluation 1. What is impact of replexes on steady-state performance? 2. How do hybrid replexes affect failure performance?

Evaluation Setup Table with two indexes 12 server machines, 4 client machines All machines colocated in the same rack, connected via 1GB head-of-rack switch 8 CPU, 16GB memory per machine

Systems Under Evaluation HyperDex Replex-2 Replex-3 Storage overhead: 2x Network hops per insert: 2 Failures Tolerated: 1 Storage overhead: 3x Network hops per insert: 3 Failures Tolerated: 2 Storage overhead: 6x Network hops per insert: 6 Failures Tolerated: 2

Related

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#