Making Sense of Spark Performance at UC Berkeley

Making Sense of Spark

Performance

Kay Ousterhout

UC Berkeley

In collaboration with Ryan Rasti, Sylvia

Ratnasamy, Scott Shenker, and Byung-Gon Chun

eecs.berkeley.edu/~keo/traces

About Me

PhD student in Computer Science at UC

Berkeley

Thesis work centers around performance of

large-scale distributed systems

Spark PMC member

About This Talk

Overview of how Spark works

How we measured performance bottlenecks

In-depth performance analysis for a few

workloads

Demo of performance analysis tool

…

I am Sam

I am Sam

Sam I am

Do you like

Green eggs

and ham?

Cluster of machines

Count the # of words in the document

Thank you,

Sam I am

Spark driver:

6+6+4+5 = 21

Spark (or Hadoop/Dryad/etc.) task

…

Count the # of occurrences of each word

{I: 4,

you: 2,

…}

{am: 4,

Green: 1,

…}

{Sam: 4,

…}

{Thank:

1,

eggs: 1,

…}

{I: 2,

am: 2,

…}

{Sam: 1,

I: 1,

… }

{Green: 1,

eggs: 1,

 … }

{Thank: 1,

you: 1,… }

I am Sam

I am Sam

Sam I am

Do you like

Green eggs

and ham?

Thank you,

Sam I am

MAP

REDUCE

…

Performance considerations

(1)

Caching input data

(2)

Scheduling:

assigning tasks to

machines

(1)

Straggler tasks

(2)

Network performance

(e.g., during shuffle)

I am Sam

I am Sam

Sam I am

Do you like

Green eggs

and ham?

Thank you,

Sam I am

‘

‘

‘

‘

‘

‘

’

’

’

’

‘

’

‘

‘

’

’

’

‘

’

Dryad [Eurosys ‘07], Spark [NSDI

’

12]

‘

’

’

’

‘

’

‘

’

’

‘

‘

’

‘

‘

‘

‘

‘

‘

’

’

’

’

‘

’

‘

‘

’

’

’

‘

’

Dryad [Eurosys ‘07], Spark [NSDI

’

12]

‘

’

’

’

‘

’

‘

’

’

‘

‘

’

This Work

network read

compute

disk write

time

: time to handle one record

Example Spark task:

Fine-grained instrumentation needed to

understand performance

How much faster would a job run if

the network were infinitely fast?

What’s an upper bound on the

improvement from network

optimizations?

network read

compute

disk write

Original task runtime

compute

Task runtime with infinitely fast network

: blocked on disk

: blocked on network

How much faster would a job run if

the network were infinitely fast?

Task 0

Task 1

Task 2

time

2 slots

: Original job completion time

SQL Workloads

TPC-DS (20 machines, 850GB;

60 machines, 2.5TB

www.tpc.org/tpcds

Big Data Benchmark (5 machines, 60GB)

amplab.cs.berkeley.edu/benchmark

Databricks (9 machines, tens of GB)

databricks.com

2 versions of each: in-memory, on-disk

How much faster could jobs get

from optimizing network

performance?

Median improvement at most 2%

How can we sanity check these

numbers?

How much data is transferred per

CPU second?

’

’

–

’

‘

–

How can this be true?

What kind of hardware should I

buy?

10Gbps networking hardware likely not

necessary!

How much faster would jobs

complete if the disk were

infinitely fast?

How much faster could jobs get

from optimizing disk performance?

Disk Configuration

Our instances: 2 disks, 8 cores

Cloudera:

–

At least 1 disk for every 3 cores

–

As many as 2 disks for each core

Our instances are under provisioned



results are upper bound

How much data is transferred per

CPU second?

What does this mean about Spark

versus Hadoop?

Faster

data

This work says nothing about

Spark vs. Hadoop!

up to 10x

spark.apache.org

6x or more

amplab.cs.berkeley.e

du/benchmark/

Faster

data

What causes stragglers?

Takeaway: causes depend on the workload, but

disk and garbage collection common

Fixing straggler causes can speed up other

tasks too

Live demo

eecs.berkeley.edu/~keo/traces

I want your workloads!

spark.eventLog.enabled true

keo@cs.berkeley.edu

19% reduction in completion time from optimizing disk

Project webpage (with links to paper and tool):

eecs.berkeley.edu/~keo/traces

Contact:

keo@cs.berkeley.edu

, @kayousterhout

Backup Slides

How do results change with scale?

How does the utilization compare?

Slide Note

Embed Share

Download

PhD student at UC Berkeley presents an overview of Spark performance, discussing measurement techniques, performance bottlenecks, and in-depth analysis of workloads using a performance analysis tool. Various concepts such as caching, scheduling, stragglers, and network performance are explored in the context of large-scale distributed systems.

fenwick_d Follow

Uploaded on Oct 09, 2024 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Making Sense of Spark Performance eecs.berkeley.edu/~keo/traces Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun

About Me PhD student in Computer Science at UC Berkeley Thesis work centers around performance of large-scale distributed systems Spark PMC member

About This Talk Overview of how Spark works How we measured performance bottlenecks In-depth performance analysis for a few workloads Demo of performance analysis tool

Count the # of words in the document I am Sam I am Sam 6 Cluster of machines 6 Sam I am Do you like Spark driver: 6+6+4+5 = 21 4 Green eggs and ham? Thank you, Sam I am 5 Spark (or Hadoop/Dryad/etc.) task

Count the # of occurrences of each word {I: 4, you: 2, } {I: 2, am: 2, } I am Sam I am Sam {Sam: 1, I: 1, } {am: 4, Green: 1, } Sam I am Do you like MAP REDUCE {Green: 1, eggs: 1, } {Sam: 4, } Green eggs and ham? {Thank: 1, eggs: 1, } {Thank: 1, you: 1, } Thank you, Sam I am

Performance considerations I am Sam I am Sam (1) Caching input data Sam I am Do you like (2) Scheduling: assigning tasks to machines Green eggs and ham? (1) Straggler tasks (2) Network performance (e.g., during shuffle) Thank you, Sam I am

CachingPACMan [NSDI 12], Spark [NSDI 12], Tachyon [SoCC 14] SchedulingSparrow [SOSP 13], Apollo [OSDI 14], Mesos [NSDI 11], DRF [NSDI 11], Tetris [SIGCOMM 14], Omega [Eurosys 13], YARN [SoCC 13], Quincy [SOSP 09], KMN [OSDI 14] StragglersScarlett [EuroSys 11], SkewTune [SIGMOD 12], LATE [OSDI 08], Mantri [OSDI 10], Dolly [NSDI 13], GRASS [NSDI 14], Wrangler [SoCC 14] NetworkVL2 [SIGCOMM 09], Hedera [NSDI 10], Sinbad [SIGCOMM 13], Orchestra [SIGCOMM 11], Baraat [SIGCOMM 14], Varys [SIGCOMM 14], PeriSCOPE [OSDI 12], SUDO [NSDI 12], Camdoop [NSDI 12], Oktopus [SIGCOMM 11]), EyeQ [NSDI 12], FairCloud [SIGCOMM 12] Generalized programming model Dryad [Eurosys 07], Spark [NSDI 12]

CachingPACMan [NSDI 12], Spark [NSDI 12], Tachyon [SoCC 14] SchedulingSparrow [SOSP 13], Apollo [OSDI 14], Mesos [NSDI 11], DRF [NSDI 11], Tetris [SIGCOMM 14], Omega [Eurosys 13], YARN [SoCC 13], Quincy [SOSP 09], KMN [OSDI 14] StragglersScarlett [EuroSys 11], SkewTune [SIGMOD 12], LATE [OSDI 08], Mantri [OSDI 10], Dolly [NSDI 13], GRASS [NSDI 14], Wrangler [SoCC 14] Stragglers are a major issue with unknown causes Network and disk I/O are bottlenecks NetworkVL2 [SIGCOMM 09], Hedera [NSDI 10], Sinbad [SIGCOMM 13], Orchestra [SIGCOMM 11], Baraat [SIGCOMM 14], Varys [SIGCOMM 14], PeriSCOPE [OSDI 12], SUDO [NSDI 12], Camdoop [NSDI 12], Oktopus [SIGCOMM 11]), EyeQ [NSDI 12], FairCloud [SIGCOMM 12] Generalized programming model Dryad [Eurosys 07], Spark [NSDI 12]

This Work (1) Methodology for quantifying performance bottlenecks (2) Bottleneck measurement for 3 SQL workloads (TPC-DS and 2 others)

Network optimizations can reduce job completion time by at most 2% CPU (not I/O) often the bottleneck Most straggler causes can be identified and fixed

Example Spark task: network read compute disk write time : time to handle one record Fine-grained instrumentation needed to understand performance

How much faster would a job run if the network were infinitely fast? What s an upper bound on the improvement from network optimizations?

How much faster could a task run if the network were infinitely fast? network read compute disk write Original task runtime : blocked on network : blocked on disk compute Task runtime with infinitely fast network

How much faster would a job run if the network were infinitely fast? time Task 0 Task 2 2 slots : time blocked on network Task 1 to: Original job completion time Task 0 2 slots Task 1 Task 2 tn: Job completion time with infinitely fast network

SQL Workloads TPC-DS (20 machines, 850GB; 60 machines, 2.5TB) www.tpc.org/tpcds Big Data Benchmark (5 machines, 60GB) amplab.cs.berkeley.edu/benchmark Databricks (9 machines, tens of GB) databricks.com 2 versions of each: in-memory, on-disk

How much faster could jobs get from optimizing network performance? Percentiles 95 75 50 25 5 Median improvement at most 2%

How can we sanity check these numbers?

How much data is transferred per CPU second? Microsoft 09- 10: 1.9 6.35 Mb / task second Google 04- 07: 1.34 1.61 Mb / machine second

How can this be true? Shuffle Data < Input Data

What kind of hardware should I buy? 10Gbps networking hardware likely not necessary!

How much faster would jobs complete if the disk were infinitely fast?

How much faster could jobs get from optimizing disk performance? Median improvement at most 19%

Disk Configuration Our instances: 2 disks, 8 cores Cloudera: At least 1 disk for every 3 cores As many as 2 disks for each core Our instances are under provisioned results are upper bound

How much data is transferred per CPU second? Google: 0.8-1.5 MB / machine second Microsoft: 7-11 MB / task second

What does this mean about Spark versus Hadoop? This work: 19% serialized + compressed in-memory data serialized + compressed on-disk data Faster

This work says nothing about Spark vs. Hadoop! This work: 19% 6x or more amplab.cs.berkeley.e du/benchmark/ up to 10x spark.apache.org serialized + compressed in-memory data serialized + compressed on-disk data deserialized in-memory data (on-disk data) Faster

What causes stragglers? Takeaway: causes depend on the workload, but disk and garbage collection common Fixing straggler causes can speed up other tasks too

Live demo

eecs.berkeley.edu/~keo/traces

I want your workloads! spark.eventLog.enabled true keo@cs.berkeley.edu

Network optimizations can reduce job completion time by at most 2% CPU (not I/O) often the bottleneck 19% reduction in completion time from optimizing disk Many straggler causes can be identified and fixed Project webpage (with links to paper and tool): eecs.berkeley.edu/~keo/traces Contact: keo@cs.berkeley.edu, @kayousterhout