Overview of SwissBox Project at ETH Zurich

SwissBox

G. Alonso, D. Kossmann, T. Roscoe

Systems Group, ETH Zurich

http://systems.ethz.ch

Agenda

•

What we are building?

•

Why we are building it?

What is SwissBox?

[Forrest Gump, Hollywood 1994]

Inside SwissBox (Hardware)

•

N CPU Cores  (N = 100, 1000)

•

X GB of main memory (X = 10xN)

–

NUMA

–

dedicate MM for each core

•

Network

–

heterogeneous (complex)

•

FPGAs

•

Some persistent storage

–

Disks or flash (maybe obsolete in future with PCM)

•

Think of (commodity) rack or a multi-core machine

Overview of Components

Shared

i-

disk Architecture

Client

Storage

HTTP

Web

Server

App Server

DB Server

FCGI, ...

SQL

get/put

block

records

XML, JSON, HTML

XML, JSON, HTML

Shared

i-

disk Architecture

Client

Storage

HTTP

Web

Server

App Server

DB Server

FCGI, ...

SQL

get/put

block

records

XML, JSON, HTML

XML, JSON, HTML

Client

Client

Client

Workload Splitter

Store (e.g., S3)

DB+App

DB+App

XML, JSON, HTML

Predicates, Light Aggr.

Store (e.g., S3)

Distributed Storage

[Brantner et al. 2008]

Queries + Upd.

records

results

{record, {query-ids} }

ClockScan

data partition

[Unterbrunner et al. 2009]

SharedDB: Joins

•

Mass. share Joins

–

same join pred.

–

diff. table pred

–

(reassemble BO)

•

Same idea as ClockScan

–

„shared join scan“

–

additional join predicate

on „query“

[Giannikis et al. 2011]

Overview of Components

SwissBox Building Blocks

•

Barrelfish Multi-kernel Operating System

–

CPU Driver  for each core (Barrelfish)

–

Message Passing (no shared memory!)

–

Designed for heterogeneous HW (e.g., NUMA)

•

ClockScan

–

Storage layer serves simple predicates + aggregates

–

Snapshop isolation within one partion

•

E-Cast Protocol

–

Paxos + consistent hashing

–

elasticity (online repartioning), SI across partions

•

SharedDB Operators

–

massively shared joins, sorts, group-bys...

–

custom processing (if sharing not worth it)

•

FPGAs

–

some special algos for in-network filtering / processing

Summary: Design Ideas

•

SwissBox is an Appliance

–

enables optimization across layers

•

Exploit data / query duality

–

index queries rather than data

–

optimize with knowledge of queries and data

•

Radically simplified data flow architecture

–

No indexes, one query plan for a particular workload

–

Merge DB and application server layers

–

Save cost and improve predictability

•

Shape the workload

–

Force (almost) all operations into simple access patterns (scan)

•

Shared

i-

disk architecture

–

Great for elasticity, fault tolerance (previous work on cloud)

–

Make use of capabilities of „storage layer“

–

Great for „inter-query“ parall. (not good for „intra-query parall.)

Agenda

•

What we are building?

•

Why we are building it?

Why are we doing this?

•

Because we can...

–

... the proof is in the pudding

•

Interesting research artefact

–

re-address OS/DB co-design

–

study „battle of the bottlenecks“

•

Hardware trends

–

Hardware changes faster than systems software

–

NUMA, main-memory, heterogeneity

•

Challenging workloads and requirements

–

Predictable performance, data freshness guarantees

Amadeus Workload

•

Passenger-Booking Database

–

~ 600 GB of raw data (two years of bookings)

–

single table, denormalized

–

~ 50 attributes: flight-no, name, date, ..., many flags

•

Query Workload

–

up to 4000 queries / second

–

latency guarantees: 2 seconds

–

today: only pre-canned queries allowed

•

Update Workload

–

avg. 600 updates per second (1 update per GB per sec)

–

peak of 12000 updates per second

–

data freshness guarantee: 2 seconds

Other Workloads

•

Logging Service (Amadeus, CreditSuisse)

–

Log entries from multiple apps and middleware

–

Maintenance of coarse-grained indexes (sessionId, ...)

–

Distributed debugging, support, auditing

•

Index look-ups + large scans

•

Twitter Times (http://www.twittertim.es)

–

Streams of events / microblog posts (700 / sec)

–

Maintain simple statistics incrementally (word counts)

–

Compile a personalized newspaper of posts

•

TPC-W style  (CreditSuisse, SAP)

–

Complex queries + updates

Related Work

•

Appliances

–

SAP Trex, Netezza, Oracle Exadata, ...

•

New Data Processing Architectures

–

All the previous papers of this session

–

IBM Blink, MonetDB X100, AsterData, ...

–

Eddies, data/query dualism, StageDB, QPipes, ...

•

Nothing what we do is really new

Conclusion

•

Consensus on Starting Point

–

Great workloads, new app requirements

•

(predictability, elasticity, ...)

–

Technology moving faster than ever

•

(MM, multi-core, heterogeneity, cloud, ...)

–

Building blocks that feel right

•

(ClockScan, multi-kernel, ...)

•

No consensus (yet) on putting it together

–

How to compose predictability and elasticity?

–

„The journey is the destination“

Slide Note

Embed Share

Download

SwissBox project at ETH Zurich, led by G. Alonso, D. Kossmann, and T. Roscoe, focuses on building a high-performance system called SwissBox. It encompasses hardware components like CPUs, memory, FPGAs, and storage, aiming to create a versatile computing platform. The project explores shared i-disk architectures, client-server models, workload distribution, query optimization, and distributed storage solutions. SwissBox incorporates innovative technologies such as Barrelfish multi-kernel OS, heterogeneous hardware support, ClockScan storage layer, and protocols like Paxos and consistent hashing for elasticity.

gain_na Follow

Uploaded on Sep 26, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

SwissBox G. Alonso, D. Kossmann, T. Roscoe Systems Group, ETH Zurich http://systems.ethz.ch

Agenda What we are building? Why we are building it?

What is SwissBox? [Forrest Gump, Hollywood 1994]

Inside SwissBox (Hardware) N CPU Cores (N = 100, 1000) X GB of main memory (X = 10xN) NUMA dedicate MM for each core Network heterogeneous (complex) FPGAs Some persistent storage Disks or flash (maybe obsolete in future with PCM) Think of (commodity) rack or a multi-core machine

Overview of Components

Shared i-disk Architecture Client HTTP XML, JSON, HTML Web Server FCGI, ... XML, JSON, HTML App Server SQL records DB Server get/put block Storage

Shared i-disk Architecture Client Client Client Client HTTP XML, JSON, HTML Workload Splitter Web Server XML, JSON, HTML FCGI, ... XML, JSON, HTML DB+App DB+App App Server Predicates, Light Aggr. SQL records Store (e.g., S3) Store (e.g., S3) Distributed Storage DB Server get/put block [Brantner et al. 2008] Storage

{record, {query-ids} } results Predicate Indexes is Unindexed Queries Queries + Upd. qs Active Queries records Record 0 data partition ClockScan Snapshot n+1 Snapshot n Read Cursor [Unterbrunner et al. 2009] Write Cursor

SharedDB: Joins Mass. share Joins same join pred. diff. table pred (reassemble BO) Same idea as ClockScan shared join scan additional join predicate on query [Giannikis et al. 2011]

Overview of Components

SwissBox Building Blocks Barrelfish Multi-kernel Operating System CPU Driver for each core (Barrelfish) Message Passing (no shared memory!) Designed for heterogeneous HW (e.g., NUMA) ClockScan Storage layer serves simple predicates + aggregates Snapshop isolation within one partion E-Cast Protocol Paxos + consistent hashing elasticity (online repartioning), SI across partions SharedDB Operators massively shared joins, sorts, group-bys... custom processing (if sharing not worth it) FPGAs some special algos for in-network filtering / processing

Summary: Design Ideas SwissBox is an Appliance enables optimization across layers Exploit data / query duality index queries rather than data optimize with knowledge of queries and data Radically simplified data flow architecture No indexes, one query plan for a particular workload Merge DB and application server layers Save cost and improve predictability Shape the workload Force (almost) all operations into simple access patterns (scan) Shared i-disk architecture Great for elasticity, fault tolerance (previous work on cloud) Make use of capabilities of storage layer Great for inter-query parall. (not good for intra-query parall.)

Agenda What we are building? Why we are building it?

Why are we doing this? Because we can... ... the proof is in the pudding Interesting research artefact re-address OS/DB co-design study battle of the bottlenecks Hardware trends Hardware changes faster than systems software NUMA, main-memory, heterogeneity Challenging workloads and requirements Predictable performance, data freshness guarantees

Amadeus Workload Passenger-Booking Database ~ 600 GB of raw data (two years of bookings) single table, denormalized ~ 50 attributes: flight-no, name, date, ..., many flags Query Workload up to 4000 queries / second latency guarantees: 2 seconds today: only pre-canned queries allowed Update Workload avg. 600 updates per second (1 update per GB per sec) peak of 12000 updates per second data freshness guarantee: 2 seconds

Other Workloads Logging Service (Amadeus, CreditSuisse) Log entries from multiple apps and middleware Maintenance of coarse-grained indexes (sessionId, ...) Distributed debugging, support, auditing Index look-ups + large scans Twitter Times (http://www.twittertim.es) Streams of events / microblog posts (700 / sec) Maintain simple statistics incrementally (word counts) Compile a personalized newspaper of posts TPC-W style (CreditSuisse, SAP) Complex queries + updates

Related Work Appliances SAP Trex, Netezza, Oracle Exadata, ... New Data Processing Architectures All the previous papers of this session IBM Blink, MonetDB X100, AsterData, ... Eddies, data/query dualism, StageDB, QPipes, ... Nothing what we do is really new

Conclusion Consensus on Starting Point Great workloads, new app requirements (predictability, elasticity, ...) Technology moving faster than ever (MM, multi-core, heterogeneity, cloud, ...) Building blocks that feel right (ClockScan, multi-kernel, ...) No consensus (yet) on putting it together How to compose predictability and elasticity? The journey is the destination

Overview of SwissBox Project at ETH Zurich

Download Presentation

Presentation Transcript

Related

More Related Content