Making Sense of Spark Performance at UC Berkeley

 
Making Sense of Spark
Performance
 
Kay Ousterhout
UC Berkeley
 
In collaboration with Ryan Rasti, Sylvia
Ratnasamy, Scott Shenker, and Byung-Gon Chun
 
eecs.berkeley.edu/~keo/traces
 
About Me
 
PhD student in Computer Science at UC
Berkeley
 
Thesis work centers around performance of
large-scale distributed systems
 
Spark PMC member
 
About This Talk
 
Overview of how Spark works
 
How we measured performance bottlenecks
 
In-depth performance analysis for a few
workloads
 
Demo of performance analysis tool
I am Sam
I am Sam
Sam I am
Do you like
Green eggs
and ham?
Cluster of machines
 
Count the # of words in the document
Thank you,
Sam I am
 
6
 
6
 
4
 
5
 
Spark driver:
6+6+4+5 = 21
 
Spark (or Hadoop/Dryad/etc.) task
Count the # of occurrences of each word
{I: 4,
you: 2,
…}
{am: 4,
Green: 1,
…}
{Sam: 4,
…}
{Thank:
1,
eggs: 1,
…}
{I: 2,
am: 2,
…}
{Sam: 1,
I: 1,
… }
{Green: 1,
eggs: 1,
 … }
{Thank: 1,
you: 1,… }
I am Sam
I am Sam
Sam I am
Do you like
Green eggs
and ham?
Thank you,
Sam I am
MAP
REDUCE
Performance considerations
(1)
Caching input data
 
(2)
Scheduling:
assigning tasks to
machines
 
(1)
Straggler tasks
 
(2)
Network performance
(e.g., during shuffle)
I am Sam
I am Sam
Sam I am
Do you like
Green eggs
and ham?
Thank you,
Sam I am
S
t
r
a
g
g
l
e
r
s
 
S
c
a
r
l
e
t
t
 
[
E
u
r
o
S
y
s
 
1
1
]
,
 
S
k
e
w
T
u
n
e
 
[
S
I
G
M
O
D
 
1
2
]
,
 
L
A
T
E
[
O
S
D
I
 
0
8
]
,
 
M
a
n
t
r
i
 
[
O
S
D
I
 
1
0
]
,
 
D
o
l
l
y
 
[
N
S
D
I
 
1
3
]
,
 
G
R
A
S
S
 
[
N
S
D
I
 
1
4
]
,
W
r
a
n
g
l
e
r
 
[
S
o
C
C
 
1
4
]
C
a
c
h
i
n
g
 
P
A
C
M
a
n
 
[
N
S
D
I
 
1
2
]
,
 
S
p
a
r
k
 
[
N
S
D
I
 
1
2
]
,
 
T
a
c
h
y
o
n
 
[
S
o
C
C
 
1
4
]
S
c
h
e
d
u
l
i
n
g
 
S
p
a
r
r
o
w
 
[
S
O
S
P
 
1
3
]
,
 
A
p
o
l
l
o
 
[
O
S
D
I
 
1
4
]
,
 
M
e
s
o
s
 
[
N
S
D
I
1
1
]
,
 
D
R
F
 
[
N
S
D
I
 
1
1
]
,
 
T
e
t
r
i
s
 
[
S
I
G
C
O
M
M
 
1
4
]
,
 
O
m
e
g
a
 
[
E
u
r
o
s
y
s
 
1
3
]
,
 
Y
A
R
N
[
S
o
C
C
 
1
3
]
,
 
Q
u
i
n
c
y
 
[
S
O
S
P
 
0
9
]
,
 
K
M
N
 
[
O
S
D
I
 
1
4
]
G
e
n
e
r
a
l
i
z
e
d
 
p
r
o
g
r
a
m
m
i
n
g
 
m
o
d
e
l
Dryad [Eurosys ‘07], Spark [NSDI 
12]
N
e
t
w
o
r
k
 
V
L
2
 
[
S
I
G
C
O
M
M
 
0
9
]
,
 
H
e
d
e
r
a
 
[
N
S
D
I
 
1
0
]
,
 
S
i
n
b
a
d
[
S
I
G
C
O
M
M
 
1
3
]
,
 
O
r
c
h
e
s
t
r
a
 
[
S
I
G
C
O
M
M
 
1
1
]
,
 
B
a
r
a
a
t
 
[
S
I
G
C
O
M
M
 
1
4
]
,
 
V
a
r
y
s
[
S
I
G
C
O
M
M
 
1
4
]
,
 
P
e
r
i
S
C
O
P
E
 
[
O
S
D
I
 
1
2
]
,
 
S
U
D
O
 
[
N
S
D
I
 
1
2
]
,
 
C
a
m
d
o
o
p
[
N
S
D
I
 
1
2
]
,
 
O
k
t
o
p
u
s
 
[
S
I
G
C
O
M
M
 
1
1
]
)
,
 
E
y
e
Q
 
[
N
S
D
I
 
1
2
]
,
 
F
a
i
r
C
l
o
u
d
[
S
I
G
C
O
M
M
 
1
2
]
S
t
r
a
g
g
l
e
r
s
 
S
c
a
r
l
e
t
t
 
[
E
u
r
o
S
y
s
 
1
1
]
,
 
S
k
e
w
T
u
n
e
 
[
S
I
G
M
O
D
 
1
2
]
,
 
L
A
T
E
[
O
S
D
I
 
0
8
]
,
 
M
a
n
t
r
i
 
[
O
S
D
I
 
1
0
]
,
 
D
o
l
l
y
 
[
N
S
D
I
 
1
3
]
,
 
G
R
A
S
S
 
[
N
S
D
I
 
1
4
]
,
W
r
a
n
g
l
e
r
 
[
S
o
C
C
 
1
4
]
C
a
c
h
i
n
g
 
P
A
C
M
a
n
 
[
N
S
D
I
 
1
2
]
,
 
S
p
a
r
k
 
[
N
S
D
I
 
1
2
]
,
 
T
a
c
h
y
o
n
 
[
S
o
C
C
 
1
4
]
S
c
h
e
d
u
l
i
n
g
 
S
p
a
r
r
o
w
 
[
S
O
S
P
 
1
3
]
,
 
A
p
o
l
l
o
 
[
O
S
D
I
 
1
4
]
,
 
M
e
s
o
s
 
[
N
S
D
I
1
1
]
,
 
D
R
F
 
[
N
S
D
I
 
1
1
]
,
 
T
e
t
r
i
s
 
[
S
I
G
C
O
M
M
 
1
4
]
,
 
O
m
e
g
a
 
[
E
u
r
o
s
y
s
 
1
3
]
,
 
Y
A
R
N
[
S
o
C
C
 
1
3
]
,
 
Q
u
i
n
c
y
 
[
S
O
S
P
 
0
9
]
,
 
K
M
N
 
[
O
S
D
I
 
1
4
]
G
e
n
e
r
a
l
i
z
e
d
 
p
r
o
g
r
a
m
m
i
n
g
 
m
o
d
e
l
Dryad [Eurosys ‘07], Spark [NSDI 
12]
N
e
t
w
o
r
k
 
V
L
2
 
[
S
I
G
C
O
M
M
 
0
9
]
,
 
H
e
d
e
r
a
 
[
N
S
D
I
 
1
0
]
,
 
S
i
n
b
a
d
[
S
I
G
C
O
M
M
 
1
3
]
,
 
O
r
c
h
e
s
t
r
a
 
[
S
I
G
C
O
M
M
 
1
1
]
,
 
B
a
r
a
a
t
 
[
S
I
G
C
O
M
M
 
1
4
]
,
 
V
a
r
y
s
[
S
I
G
C
O
M
M
 
1
4
]
,
 
P
e
r
i
S
C
O
P
E
 
[
O
S
D
I
 
1
2
]
,
 
S
U
D
O
 
[
N
S
D
I
 
1
2
]
,
 
C
a
m
d
o
o
p
[
N
S
D
I
 
1
2
]
,
 
O
k
t
o
p
u
s
 
[
S
I
G
C
O
M
M
 
1
1
]
)
,
 
E
y
e
Q
 
[
N
S
D
I
 
1
2
]
,
 
F
a
i
r
C
l
o
u
d
[
S
I
G
C
O
M
M
 
1
2
]
N
e
t
w
o
r
k
 
a
n
d
 
d
i
s
k
 
I
/
O
 
a
r
e
 
b
o
t
t
l
e
n
e
c
k
s
 
S
t
r
a
g
g
l
e
r
s
 
a
r
e
 
a
 
m
a
j
o
r
 
i
s
s
u
e
 
w
i
t
h
u
n
k
n
o
w
n
 
c
a
u
s
e
s
 
(
1
)
 
M
e
t
h
o
d
o
l
o
g
y
 
f
o
r
 
q
u
a
n
t
i
f
y
i
n
g
p
e
r
f
o
r
m
a
n
c
e
 
b
o
t
t
l
e
n
e
c
k
s
 
(
2
)
 
B
o
t
t
l
e
n
e
c
k
 
m
e
a
s
u
r
e
m
e
n
t
 
f
o
r
3
 
S
Q
L
 
w
o
r
k
l
o
a
d
s
 
(
T
P
C
-
D
S
 
a
n
d
 
2
o
t
h
e
r
s
)
 
This Work
 
N
e
t
w
o
r
k
 
o
p
t
i
m
i
z
a
t
i
o
n
s
c
a
n
 
r
e
d
u
c
e
 
j
o
b
 
c
o
m
p
l
e
t
i
o
n
 
t
i
m
e
 
b
y
 
a
t
m
o
s
t
 
2
%
 
C
P
U
 
(
n
o
t
 
I
/
O
)
 
o
f
t
e
n
 
t
h
e
 
b
o
t
t
l
e
n
e
c
k
 
M
o
s
t
 
s
t
r
a
g
g
l
e
r
 
c
a
u
s
e
s
 
c
a
n
 
b
e
i
d
e
n
t
i
f
i
e
d
 
a
n
d
 
f
i
x
e
d
 
network read
 
compute
 
disk write
 
time
 
: time to handle one record
 
Example Spark task:
 
Fine-grained instrumentation needed to
understand performance
How much faster would a job run if
the network were infinitely fast?
 
What’s an upper bound on the
improvement from network
optimizations?
network read
compute
disk write
Original task runtime
H
o
w
 
m
u
c
h
 
f
a
s
t
e
r
 
c
o
u
l
d
 
a
 
t
a
s
k
 
r
u
n
 
i
f
t
h
e
 
n
e
t
w
o
r
k
 
w
e
r
e
 
i
n
f
i
n
i
t
e
l
y
 
f
a
s
t
?
compute
Task runtime with infinitely fast network
: blocked on disk
: blocked on network
How much faster would a job run if
the network were infinitely fast?
Task 0
Task 1
Task 2
time
2 slots
t
o
: Original job completion time
SQL Workloads
 
TPC-DS (20 machines, 850GB;
60 machines, 2.5TB
)
www.tpc.org/tpcds
Big Data Benchmark (5 machines, 60GB)
amplab.cs.berkeley.edu/benchmark
Databricks (9 machines, tens of GB)
databricks.com
 
2 versions of each: in-memory, on-disk
How much faster could jobs get
from optimizing network
performance?
 
Median improvement at most 2%
 
How can we sanity check these
numbers?
How much data is transferred per
CPU second?
 
M
i
c
r
o
s
o
f
t
 
0
9
-
1
0
:
 
1
.
9
6
.
3
5
 
M
b
 
/
 
t
a
s
k
 
s
e
c
o
n
d
G
o
o
g
l
e
 
0
4
-
0
7
:
 
1
.
3
4
1
.
6
1
 
M
b
 
/
 
m
a
c
h
i
n
e
 
s
e
c
o
n
d
 
How can this be true?
 
S
h
u
f
f
l
e
 
D
a
t
a
 
<
 
I
n
p
u
t
 
D
a
t
a
What kind of hardware should I
buy?
 
10Gbps networking hardware likely not
necessary!
 
How much faster would jobs
complete if the disk were
infinitely fast?
How much faster could jobs get
from optimizing disk performance?
M
e
d
i
a
n
 
i
m
p
r
o
v
e
m
e
n
t
 
a
t
 
m
o
s
t
 
1
9
%
 
 
Disk Configuration
 
Our instances: 2 disks, 8 cores
 
Cloudera:
At least 1 disk for every 3 cores
As many as 2 disks for each core
 
Our instances are under provisioned 
results are upper bound
 
How much data is transferred per
CPU second?
 
G
o
o
g
l
e
:
 
0
.
8
-
1
.
5
 
M
B
 
/
 
m
a
c
h
i
n
e
 
s
e
c
o
n
d
M
i
c
r
o
s
o
f
t
:
 
7
-
1
1
 
M
B
 
/
 
t
a
s
k
 
s
e
c
o
n
d
What does this mean about Spark
versus Hadoop?
 
Faster
 
s
e
r
i
a
l
i
z
e
d
 
+
c
o
m
p
r
e
s
s
e
d
o
n
-
d
i
s
k
data
 
s
e
r
i
a
l
i
z
e
d
 
+
c
o
m
p
r
e
s
s
e
d
i
n
-
m
e
m
o
r
y
d
a
t
a
 
T
h
i
s
 
w
o
r
k
:
1
9
%
This work says nothing about
Spark vs. Hadoop!
 
d
e
s
e
r
i
a
l
i
z
e
d
i
n
-
m
e
m
o
r
y
d
a
t
a
 
up to 10x
spark.apache.org
 
6x or more
amplab.cs.berkeley.e
du/benchmark/
Faster
s
e
r
i
a
l
i
z
e
d
 
+
c
o
m
p
r
e
s
s
e
d
o
n
-
d
i
s
k
data
s
e
r
i
a
l
i
z
e
d
 
+
c
o
m
p
r
e
s
s
e
d
i
n
-
m
e
m
o
r
y
d
a
t
a
T
h
i
s
 
w
o
r
k
:
1
9
%
What causes stragglers?
 
Takeaway: causes depend on the workload, but
disk and garbage collection common
 
Fixing straggler causes can speed up other
tasks too
 
Live demo
 
eecs.berkeley.edu/~keo/traces
 
I want your workloads!
 
spark.eventLog.enabled true
 
keo@cs.berkeley.edu
 
N
e
t
w
o
r
k
 
o
p
t
i
m
i
z
a
t
i
o
n
s
c
a
n
 
r
e
d
u
c
e
 
j
o
b
 
c
o
m
p
l
e
t
i
o
n
 
t
i
m
e
 
b
y
 
a
t
 
m
o
s
t
 
2
%
 
C
P
U
 
(
n
o
t
 
I
/
O
)
 
o
f
t
e
n
 
t
h
e
 
b
o
t
t
l
e
n
e
c
k
19% reduction in completion time from optimizing disk
 
M
a
n
y
 
s
t
r
a
g
g
l
e
r
 
c
a
u
s
e
s
 
c
a
n
 
b
e
 
i
d
e
n
t
i
f
i
e
d
 
a
n
d
f
i
x
e
d
 
Project webpage (with links to paper and tool):
eecs.berkeley.edu/~keo/traces
 
Contact: 
keo@cs.berkeley.edu
, @kayousterhout
 
Backup Slides
 
How do results change with scale?
 
How does the utilization compare?
Slide Note
Embed
Share

PhD student at UC Berkeley presents an overview of Spark performance, discussing measurement techniques, performance bottlenecks, and in-depth analysis of workloads using a performance analysis tool. Various concepts such as caching, scheduling, stragglers, and network performance are explored in the context of large-scale distributed systems.

  • Spark Performance
  • UC Berkeley
  • Large-scale Systems
  • Performance Analysis
  • Distributed Computing

Uploaded on Oct 09, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Making Sense of Spark Performance eecs.berkeley.edu/~keo/traces Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun

  2. About Me PhD student in Computer Science at UC Berkeley Thesis work centers around performance of large-scale distributed systems Spark PMC member

  3. About This Talk Overview of how Spark works How we measured performance bottlenecks In-depth performance analysis for a few workloads Demo of performance analysis tool

  4. Count the # of words in the document I am Sam I am Sam 6 Cluster of machines 6 Sam I am Do you like Spark driver: 6+6+4+5 = 21 4 Green eggs and ham? Thank you, Sam I am 5 Spark (or Hadoop/Dryad/etc.) task

  5. Count the # of occurrences of each word {I: 4, you: 2, } {I: 2, am: 2, } I am Sam I am Sam {Sam: 1, I: 1, } {am: 4, Green: 1, } Sam I am Do you like MAP REDUCE {Green: 1, eggs: 1, } {Sam: 4, } Green eggs and ham? {Thank: 1, eggs: 1, } {Thank: 1, you: 1, } Thank you, Sam I am

  6. Performance considerations I am Sam I am Sam (1) Caching input data Sam I am Do you like (2) Scheduling: assigning tasks to machines Green eggs and ham? (1) Straggler tasks (2) Network performance (e.g., during shuffle) Thank you, Sam I am

  7. CachingPACMan [NSDI 12], Spark [NSDI 12], Tachyon [SoCC 14] SchedulingSparrow [SOSP 13], Apollo [OSDI 14], Mesos [NSDI 11], DRF [NSDI 11], Tetris [SIGCOMM 14], Omega [Eurosys 13], YARN [SoCC 13], Quincy [SOSP 09], KMN [OSDI 14] StragglersScarlett [EuroSys 11], SkewTune [SIGMOD 12], LATE [OSDI 08], Mantri [OSDI 10], Dolly [NSDI 13], GRASS [NSDI 14], Wrangler [SoCC 14] NetworkVL2 [SIGCOMM 09], Hedera [NSDI 10], Sinbad [SIGCOMM 13], Orchestra [SIGCOMM 11], Baraat [SIGCOMM 14], Varys [SIGCOMM 14], PeriSCOPE [OSDI 12], SUDO [NSDI 12], Camdoop [NSDI 12], Oktopus [SIGCOMM 11]), EyeQ [NSDI 12], FairCloud [SIGCOMM 12] Generalized programming model Dryad [Eurosys 07], Spark [NSDI 12]

  8. CachingPACMan [NSDI 12], Spark [NSDI 12], Tachyon [SoCC 14] SchedulingSparrow [SOSP 13], Apollo [OSDI 14], Mesos [NSDI 11], DRF [NSDI 11], Tetris [SIGCOMM 14], Omega [Eurosys 13], YARN [SoCC 13], Quincy [SOSP 09], KMN [OSDI 14] StragglersScarlett [EuroSys 11], SkewTune [SIGMOD 12], LATE [OSDI 08], Mantri [OSDI 10], Dolly [NSDI 13], GRASS [NSDI 14], Wrangler [SoCC 14] Stragglers are a major issue with unknown causes Network and disk I/O are bottlenecks NetworkVL2 [SIGCOMM 09], Hedera [NSDI 10], Sinbad [SIGCOMM 13], Orchestra [SIGCOMM 11], Baraat [SIGCOMM 14], Varys [SIGCOMM 14], PeriSCOPE [OSDI 12], SUDO [NSDI 12], Camdoop [NSDI 12], Oktopus [SIGCOMM 11]), EyeQ [NSDI 12], FairCloud [SIGCOMM 12] Generalized programming model Dryad [Eurosys 07], Spark [NSDI 12]

  9. This Work (1) Methodology for quantifying performance bottlenecks (2) Bottleneck measurement for 3 SQL workloads (TPC-DS and 2 others)

  10. Network optimizations can reduce job completion time by at most 2% CPU (not I/O) often the bottleneck Most straggler causes can be identified and fixed

  11. Example Spark task: network read compute disk write time : time to handle one record Fine-grained instrumentation needed to understand performance

  12. How much faster would a job run if the network were infinitely fast? What s an upper bound on the improvement from network optimizations?

  13. How much faster could a task run if the network were infinitely fast? network read compute disk write Original task runtime : blocked on network : blocked on disk compute Task runtime with infinitely fast network

  14. How much faster would a job run if the network were infinitely fast? time Task 0 Task 2 2 slots : time blocked on network Task 1 to: Original job completion time Task 0 2 slots Task 1 Task 2 tn: Job completion time with infinitely fast network

  15. SQL Workloads TPC-DS (20 machines, 850GB; 60 machines, 2.5TB) www.tpc.org/tpcds Big Data Benchmark (5 machines, 60GB) amplab.cs.berkeley.edu/benchmark Databricks (9 machines, tens of GB) databricks.com 2 versions of each: in-memory, on-disk

  16. How much faster could jobs get from optimizing network performance? Percentiles 95 75 50 25 5 Median improvement at most 2%

  17. How can we sanity check these numbers?

  18. How much data is transferred per CPU second? Microsoft 09- 10: 1.9 6.35 Mb / task second Google 04- 07: 1.34 1.61 Mb / machine second

  19. How can this be true? Shuffle Data < Input Data

  20. What kind of hardware should I buy? 10Gbps networking hardware likely not necessary!

  21. How much faster would jobs complete if the disk were infinitely fast?

  22. How much faster could jobs get from optimizing disk performance? Median improvement at most 19%

  23. Disk Configuration Our instances: 2 disks, 8 cores Cloudera: At least 1 disk for every 3 cores As many as 2 disks for each core Our instances are under provisioned results are upper bound

  24. How much data is transferred per CPU second? Google: 0.8-1.5 MB / machine second Microsoft: 7-11 MB / task second

  25. What does this mean about Spark versus Hadoop? This work: 19% serialized + compressed in-memory data serialized + compressed on-disk data Faster

  26. This work says nothing about Spark vs. Hadoop! This work: 19% 6x or more amplab.cs.berkeley.e du/benchmark/ up to 10x spark.apache.org serialized + compressed in-memory data serialized + compressed on-disk data deserialized in-memory data (on-disk data) Faster

  27. What causes stragglers? Takeaway: causes depend on the workload, but disk and garbage collection common Fixing straggler causes can speed up other tasks too

  28. Live demo

  29. eecs.berkeley.edu/~keo/traces

  30. I want your workloads! spark.eventLog.enabled true keo@cs.berkeley.edu

  31. Network optimizations can reduce job completion time by at most 2% CPU (not I/O) often the bottleneck 19% reduction in completion time from optimizing disk Many straggler causes can be identified and fixed Project webpage (with links to paper and tool): eecs.berkeley.edu/~keo/traces Contact: keo@cs.berkeley.edu, @kayousterhout

  32. Backup Slides

  33. How do results change with scale?

  34. How does the utilization compare?

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#