Overview of SwissBox Project at ETH Zurich

SwissBox
G. Alonso, D. Kossmann, T. Roscoe
 
Systems Group, ETH Zurich
http://systems.ethz.ch
Agenda
What we are building?
Why we are building it?
What is SwissBox?
[Forrest Gump, Hollywood 1994]
Inside SwissBox (Hardware)
N CPU Cores  (N = 100, 1000)
X GB of main memory (X = 10xN)
NUMA
dedicate MM for each core
Network
heterogeneous (complex)
FPGAs
Some persistent storage
Disks or flash (maybe obsolete in future with PCM)
Think of (commodity) rack or a multi-core machine
Overview of Components
Shared 
i-
disk Architecture
Client
Storage
HTTP
Web
Server
App Server
DB Server
FCGI, ...
SQL
get/put
block
records
XML, JSON, HTML
XML, JSON, HTML
Shared 
i-
disk Architecture
Client
Storage
HTTP
Web
Server
App Server
DB Server
FCGI, ...
SQL
get/put
block
records
XML, JSON, HTML
XML, JSON, HTML
Client
Client
Client
Workload Splitter
Store (e.g., S3)
DB+App
DB+App
XML, JSON, HTML
Predicates, Light Aggr.
Store (e.g., S3)
Distributed Storage
[Brantner et al. 2008]
Queries + Upd.
records
results
{record, {query-ids} }
ClockScan
data partition
[Unterbrunner et al. 2009]
SharedDB: Joins
Mass. share Joins
same join pred.
diff. table pred
(reassemble BO)
Same idea as ClockScan
„shared join scan“
additional join predicate
on „query“
[Giannikis et al. 2011]
Overview of Components
SwissBox Building Blocks
Barrelfish Multi-kernel Operating System
CPU Driver  for each core (Barrelfish)
Message Passing (no shared memory!)
Designed for heterogeneous HW (e.g., NUMA)
ClockScan
Storage layer serves simple predicates + aggregates
Snapshop isolation within one partion
E-Cast Protocol
Paxos + consistent hashing
elasticity (online repartioning), SI across partions
SharedDB Operators
massively shared joins, sorts, group-bys...
custom processing (if sharing not worth it)
FPGAs
some special algos for in-network filtering / processing
Summary: Design Ideas
SwissBox is an Appliance
enables optimization across layers
Exploit data / query duality
index queries rather than data
optimize with knowledge of queries and data
Radically simplified data flow architecture
No indexes, one query plan for a particular workload
Merge DB and application server layers
Save cost and improve predictability
Shape the workload
Force (almost) all operations into simple access patterns (scan)
Shared 
i-
disk architecture
Great for elasticity, fault tolerance (previous work on cloud)
Make use of capabilities of „storage layer“
Great for „inter-query“ parall. (not good for „intra-query parall.)
Agenda
What we are building?
Why we are building it?
Why are we doing this?
Because we can...
... the proof is in the pudding
Interesting research artefact
re-address OS/DB co-design
study „battle of the bottlenecks“
Hardware trends
Hardware changes faster than systems software
NUMA, main-memory, heterogeneity
Challenging workloads and requirements
Predictable performance, data freshness guarantees
Amadeus Workload
Passenger-Booking Database
~ 600 GB of raw data (two years of bookings)
single table, denormalized
~ 50 attributes: flight-no, name, date, ..., many flags
Query Workload
up to 4000 queries / second
latency guarantees: 2 seconds
today: only pre-canned queries allowed
Update Workload
avg. 600 updates per second (1 update per GB per sec)
peak of 12000 updates per second
data freshness guarantee: 2 seconds
Other Workloads
Logging Service (Amadeus, CreditSuisse)
Log entries from multiple apps and middleware
Maintenance of coarse-grained indexes (sessionId, ...)
Distributed debugging, support, auditing
Index look-ups + large scans
Twitter Times (http://www.twittertim.es)
Streams of events / microblog posts (700 / sec)
Maintain simple statistics incrementally (word counts)
Compile a personalized newspaper of posts
TPC-W style  (CreditSuisse, SAP)
Complex queries + updates
Related Work
Appliances
SAP Trex, Netezza, Oracle Exadata, ...
New Data Processing Architectures
All the previous papers of this session
IBM Blink, MonetDB X100, AsterData, ...
Eddies, data/query dualism, StageDB, QPipes, ...
Nothing what we do is really new
Conclusion
Consensus on Starting Point
Great workloads, new app requirements
(predictability, elasticity, ...)
Technology moving faster than ever
(MM, multi-core, heterogeneity, cloud, ...)
Building blocks that feel right
(ClockScan, multi-kernel, ...)
No consensus (yet) on putting it together
How to compose predictability and elasticity?
„The journey is the destination“
Slide Note
Embed
Share

SwissBox project at ETH Zurich, led by G. Alonso, D. Kossmann, and T. Roscoe, focuses on building a high-performance system called SwissBox. It encompasses hardware components like CPUs, memory, FPGAs, and storage, aiming to create a versatile computing platform. The project explores shared i-disk architectures, client-server models, workload distribution, query optimization, and distributed storage solutions. SwissBox incorporates innovative technologies such as Barrelfish multi-kernel OS, heterogeneous hardware support, ClockScan storage layer, and protocols like Paxos and consistent hashing for elasticity.

  • SwissBox Project
  • ETH Zurich
  • High-Performance Computing
  • Hardware Components
  • Distributed Systems

Uploaded on Sep 26, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. SwissBox G. Alonso, D. Kossmann, T. Roscoe Systems Group, ETH Zurich http://systems.ethz.ch

  2. Agenda What we are building? Why we are building it?

  3. What is SwissBox? [Forrest Gump, Hollywood 1994]

  4. Inside SwissBox (Hardware) N CPU Cores (N = 100, 1000) X GB of main memory (X = 10xN) NUMA dedicate MM for each core Network heterogeneous (complex) FPGAs Some persistent storage Disks or flash (maybe obsolete in future with PCM) Think of (commodity) rack or a multi-core machine

  5. Overview of Components

  6. Shared i-disk Architecture Client HTTP XML, JSON, HTML Web Server FCGI, ... XML, JSON, HTML App Server SQL records DB Server get/put block Storage

  7. Shared i-disk Architecture Client Client Client Client HTTP XML, JSON, HTML Workload Splitter Web Server XML, JSON, HTML FCGI, ... XML, JSON, HTML DB+App DB+App App Server Predicates, Light Aggr. SQL records Store (e.g., S3) Store (e.g., S3) Distributed Storage DB Server get/put block [Brantner et al. 2008] Storage

  8. {record, {query-ids} } results Predicate Indexes is Unindexed Queries Queries + Upd. qs Active Queries records Record 0 data partition ClockScan Snapshot n+1 Snapshot n Read Cursor [Unterbrunner et al. 2009] Write Cursor

  9. SharedDB: Joins Mass. share Joins same join pred. diff. table pred (reassemble BO) Same idea as ClockScan shared join scan additional join predicate on query [Giannikis et al. 2011]

  10. Overview of Components

  11. SwissBox Building Blocks Barrelfish Multi-kernel Operating System CPU Driver for each core (Barrelfish) Message Passing (no shared memory!) Designed for heterogeneous HW (e.g., NUMA) ClockScan Storage layer serves simple predicates + aggregates Snapshop isolation within one partion E-Cast Protocol Paxos + consistent hashing elasticity (online repartioning), SI across partions SharedDB Operators massively shared joins, sorts, group-bys... custom processing (if sharing not worth it) FPGAs some special algos for in-network filtering / processing

  12. Summary: Design Ideas SwissBox is an Appliance enables optimization across layers Exploit data / query duality index queries rather than data optimize with knowledge of queries and data Radically simplified data flow architecture No indexes, one query plan for a particular workload Merge DB and application server layers Save cost and improve predictability Shape the workload Force (almost) all operations into simple access patterns (scan) Shared i-disk architecture Great for elasticity, fault tolerance (previous work on cloud) Make use of capabilities of storage layer Great for inter-query parall. (not good for intra-query parall.)

  13. Agenda What we are building? Why we are building it?

  14. Why are we doing this? Because we can... ... the proof is in the pudding Interesting research artefact re-address OS/DB co-design study battle of the bottlenecks Hardware trends Hardware changes faster than systems software NUMA, main-memory, heterogeneity Challenging workloads and requirements Predictable performance, data freshness guarantees

  15. Amadeus Workload Passenger-Booking Database ~ 600 GB of raw data (two years of bookings) single table, denormalized ~ 50 attributes: flight-no, name, date, ..., many flags Query Workload up to 4000 queries / second latency guarantees: 2 seconds today: only pre-canned queries allowed Update Workload avg. 600 updates per second (1 update per GB per sec) peak of 12000 updates per second data freshness guarantee: 2 seconds

  16. Other Workloads Logging Service (Amadeus, CreditSuisse) Log entries from multiple apps and middleware Maintenance of coarse-grained indexes (sessionId, ...) Distributed debugging, support, auditing Index look-ups + large scans Twitter Times (http://www.twittertim.es) Streams of events / microblog posts (700 / sec) Maintain simple statistics incrementally (word counts) Compile a personalized newspaper of posts TPC-W style (CreditSuisse, SAP) Complex queries + updates

  17. Related Work Appliances SAP Trex, Netezza, Oracle Exadata, ... New Data Processing Architectures All the previous papers of this session IBM Blink, MonetDB X100, AsterData, ... Eddies, data/query dualism, StageDB, QPipes, ... Nothing what we do is really new

  18. Conclusion Consensus on Starting Point Great workloads, new app requirements (predictability, elasticity, ...) Technology moving faster than ever (MM, multi-core, heterogeneity, cloud, ...) Building blocks that feel right (ClockScan, multi-kernel, ...) No consensus (yet) on putting it together How to compose predictability and elasticity? The journey is the destination

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#