Modular Performance Reasoning of Data-Intensive Programs by Yu David Liu

undefined
 
Yu David Liu
 
State University of New York (SUNY)
at Binghamton
 
FOAL 2015
undefined
 
Yu David Liu
 
State University of New York (SUNY)
at Binghamton
 
FOAL 2015
undefined
Modular 
Performance
 Reasoning
of Data-Intensive Programs
 
Yu David Liu
 
State University of New York (SUNY)
at Binghamton
 
FOAL 2015
undefined
 
Yu David Liu
 
State University of New York (SUNY)
at Binghamton
 
FOAL 2015
undefined
 
Yu David Liu
 
State University of New York (SUNY)
at Binghamton
 
FOAL 2015
 
On the rise
High-performance requirement
 
Data Intensive Applications
Data Intensive Applications
 
Can programming language
techniques help understand and
optimize performance of data-
intensive software?
 
Motivating Question I
Motivating Question I
Performance Question 1
Is this data rate achievable and sustainable?
Performance Question 2
Given an Input data rate, what will the output rate be?
Performance Question 3
Given a desired output rate, what input rate is required?
Performance Question 4
Given unlimited Input data, how fast can the output rate be?
Can program structures help reason
about performance of data-intensive
software modularly?
Motivating Question II
Motivating Question II
 
This Talk: Rate Types
 
 
A type system to reason about
data rates
in data intensive applications
 
Bartenstein, Liu, “Rate Types for Stream Programs”
(OOPSLA’14)
 
Can program structures help reason
about performance of data-intensive
software modularly?
 
Motivating Question Revisited
Motivating Question Revisited
 
What programming
 models?
(an open question in the long term)
 
Stream Programming
Stream Programming
 
A well-known paradigm
classic data-flow model (e.g. Lucid) +
high-throughput data processing
Many recent examples from both PL and
database communities
StreamIt, Streamflex, Aurora, Borealis, Pig
Latin, Bamboo, FlumeJava….
The latest Java has stream support
 
Streams:
Sequence of Data
Stream Programming: Filters and Streams
Stream Programming: Filters and Streams
x=Pop()
x=x+3
Push(x)
 
…, 28, 21, 14, 7
 
…, 31, 24, 17, 10
Filters:
Consume data from an input stream
Process Data
Produce data on an output stream
 
Stream Programming: Combinators
Stream Programming: Combinators
 
chain
 
fork-join
 
feedback
 
Stream Programming Benefits
Stream Programming Benefits
 
Friendliness to Big Data applications
High parallel efficiency
Task Parallelism
Data Parallelism
Pipelining
Programming abstractions friendly for
performance reasoning (as we shall see)
 
 
Rate Types
 
 
A type system to reason about
data rates of
 stream programs
Stream Rate Insight
Output Data rate depends on input data rate…
Stream Rate Insight
… until …
Stream Rate Insight
… processing time becomes the bottleneck;
Output rate is proportional to processing time.
Input / Output Rate Graph
What’s the slope?
Where does it plateau?
 
Slope is the “throughput ratio” -
ratio between the output rate and input rate
Plateau is the “natural bound” -
output rate given unlimited input rate
 Rate type is the tuple –
 
Rate Types
Input / Output Rate Graph
θ
 = 1
ν
 = 3
 
Stream Programming: Combinators
Stream Programming: Combinators
 
chain
 
diamond
 
circle
Chain Combinator Type Checking
P
A
P
B
Chain Combinator Type Checking
P
a
P
b
Diamond Combinator Type Checking
2
3
1
2
5
3
Complex because of feedback
Intuitively
inverted diamond
fix point
Circle Combinator Type Checking
P
B
P
A
1
1
2
1
1
2
Base Case: Filter Type Checking
2
3
time
 
Subtyping
Why Types? A Unified Solution
 
Type Checking:
Type Inference:
Principal Typing:
 
Why Types? Modularity
 
Thanks to its type system formulation, a
distinctive trait of Rate Types for
performance reasoning is it promotes
modular reasoning
Partial
 stream graph reasoning: no need to
have all filters implemented to reason about
the data rate of the entire program.
“Plug-in” stream subgraphs as long as they
conform to the throughput ratio and natural
bound specified on the signature
 
(Rate Types) Operational Semantics
 
Data-aware: correct “bean counting” on
input/output data items in the stream
Time-aware: tracking filter/graph
processing times
Parallel-aware: supporting parallel
execution among filters
An abstract machine with minimal
assumptions:
No (requirement on) synchronizing clocks
Working with arbitrary schedules
 
Why Types? Provable Correctness
 
Soundness:
if the type system says rate x can be achieved,
a reduction sequence exists
Completeness:
if the type system says rate 
x 
cannot be achieved,
no reduction sequence with rate 
x 
exists
 
Type inference is 
sound
 relative to type
checking
Type inference is 
complete
 relative to type
checking
Principal typing always exists
 
Why Types? Provable Correctness
 
Implementation
 
StreamIt base
Calculate principal rate types
Control Input rate
Measure Output rate
Run benchmarks
4 Micro-benchmarks
6 StreamIt benchmarks
200 input data rates
Graph Expected vs. Measured
 
Rate Types Generality
 
Potential generalizations to systems such as Aurora, Bamboo,
StreamFlex, and may further have applications in FRP-like
languages where signal sampling rates matter
Limitations include…
Assumes declarative push and pop counts for filters
Assumes stable filter level profiling
Current/Future work includes h
ybrid typing
handle dynamic fluctuations
improve adaptiveness
A Java-based framework
Can program structures help
understand performance of data-
intensive software?
 
Rate Types
 provides
 
a
 unified and 
modular
reasoning
 framework for
 performance of
data-intensive
 software
Rate Types in Retrospective
 
surprising how much we can statically
answer questions that appear to be
inherently dynamic
Non-functional properties such as
performance and energy are critical
and well-studied in experimental
research (OSDI/SOSP, ASPLOS, ISCA,
SIGMOD, SC, etc)
Bigger Picture
 
reasoning about non-functional properties
 
Thank You
Slide Note
Embed
Share

"Explore how programming language techniques can optimize the performance of data-intensive software. Key questions include achievable data rates, input-output relationships, and program structures for reasoning."

  • Programming
  • Data-intensive
  • Performance optimization
  • Yu David Liu

Uploaded on Sep 15, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Modular Performance Reasoning of Data-Intensive Programs Yu David Liu State University of New York (SUNY) at Binghamton FOAL 2015

  2. Modular Performance Reasoning of Data-Intensive Programs Yu David Liu State University of New York (SUNY) at Binghamton FOAL 2015

  3. Modular Performance Reasoning of Data-Intensive Programs Yu David Liu State University of New York (SUNY) at Binghamton FOAL 2015

  4. Modular Performance Reasoning of Data-Intensive Programs Yu David Liu State University of New York (SUNY) at Binghamton FOAL 2015

  5. Modular Performance Reasoning of Data-Intensive Programs Yu David Liu State University of New York (SUNY) at Binghamton FOAL 2015

  6. Data Intensive Applications On the rise High-performance requirement 6

  7. Motivating Question I Can programming language techniques help understand and optimize performance of data- intensive software? 7

  8. Performance Question 1 Is this data rate achievable and sustainable? ? 8

  9. Performance Question 2 Given an Input data rate, what will the output rate be? ? 9

  10. Performance Question 3 Given a desired output rate, what input rate is required? ? 10

  11. Performance Question 4 Given unlimited Input data, how fast can the output rate be? ? 11

  12. Motivating Question II Can program structures help reason about performance of data-intensive software modularly? 12

  13. This Talk: Rate Types A type system to reason about data rates in data intensive applications Bartenstein, Liu, Rate Types for Stream Programs (OOPSLA 14) 13

  14. Motivating Question Revisited Can program structures help reason about performance of data-intensive software modularly? What programming models? (an open question in the long term) 14

  15. Stream Programming A well-known paradigm classic data-flow model (e.g. Lucid) + high-throughput data processing Many recent examples from both PL and database communities StreamIt, Streamflex, Aurora, Borealis, Pig Latin, Bamboo, FlumeJava . The latest Java has stream support 15

  16. Stream Programming: Filters and Streams Streams: Sequence of Data x=Pop() , 31, 24, 17, 10 , 28, 21, 14, 7 x=x+3 Push(x) Filters: Consume data from an input stream Process Data Produce data on an output stream 16

  17. Stream Programming: Combinators chain fork-join feedback 17

  18. Stream Programming Benefits Friendliness to Big Data applications High parallel efficiency Task Parallelism Data Parallelism Pipelining Programming abstractions friendly for performance reasoning (as we shall see) 18

  19. Rate Types A type system to reason about data rates of stream programs 19

  20. Stream Rate Insight Output Data rate depends on input data rate 20

  21. Stream Rate Insight until 21

  22. Stream Rate Insight processing time becomes the bottleneck; Output rate is proportional to processing time. 22

  23. Input / Output Rate Graph Input vs. Output Rate 3.5 3 OUTPUT DATA ITEMS PER SECOND 2.5 2 Where does it plateau? 1.5 1 What s the slope? 0.5 0 0 1 2 3 4 5 6 7 INPUT DATA ITEMS PER SECOND 23

  24. 4 3 Rate Types 2 1 0 0 2 4 6 Slope is the throughput ratio - ratio between the output rate and input rate Plateau is the natural bound - output rate given unlimited input rate Rate type is the tuple 24

  25. Input / Output Rate Graph Input vs. Output Rate 3.5 3 OUTPUT DATA ITEMS PER SECOND 2.5 = 3 2 1.5 1 = 1 0.5 0 0 1 2 3 4 5 6 7 INPUT DATA ITEMS PER SECOND 25

  26. Stream Programming: Combinators chain diamond circle 26

  27. Chain Combinator Type Checking PA PB 27

  28. Chain Combinator Type Checking Pa Pb 28

  29. Diamond Combinator Type Checking 5 3 2 PA PB 1 2 3 29

  30. Circle Combinator Type Checking Complex because of feedback Intuitively 1 1 inverted diamond fix point 2 PA PB 2 1 1 30

  31. Base Case: Filter Type Checking 2 3 filter time 31

  32. Subtyping Input vs. Output Rate 3.5 =1, =3 3 OUTPUT DATA ITEMS PER SECOND 2.5 2 1.5 1 =0.5, =2.125 0.5 0 0 1 2 3 4 5 6 7 INPUT DATA ITEMS PER SECOND 32

  33. Why Types? A Unified Solution Type Checking: ? Type Inference: ? ? Principal Typing: ? 33

  34. Why Types? Modularity Thanks to its type system formulation, a distinctive trait of Rate Types for performance reasoning is it promotes modular reasoning Partial stream graph reasoning: no need to have all filters implemented to reason about the data rate of the entire program. Plug-in stream subgraphs as long as they conform to the throughput ratio and natural bound specified on the signature 34

  35. (Rate Types) Operational Semantics Data-aware: correct bean counting on input/output data items in the stream Time-aware: tracking filter/graph processing times Parallel-aware: supporting parallel execution among filters An abstract machine with minimal assumptions: No (requirement on) synchronizing clocks Working with arbitrary schedules 35

  36. Why Types? Provable Correctness Soundness: if the type system says rate x can be achieved, a reduction sequence exists Completeness: if the type system says rate x cannot be achieved, no reduction sequence with rate x exists 36

  37. Why Types? Provable Correctness Type inference is sound relative to type checking Type inference is complete relative to type checking Principal typing always exists 37

  38. Implementation StreamIt base Calculate principal rate types Control Input rate Measure Output rate Run benchmarks 4 Micro-benchmarks 6 StreamIt benchmarks 200 input data rates Graph Expected vs. Measured 38

  39. Rate Types Generality Potential generalizations to systems such as Aurora, Bamboo, StreamFlex, and may further have applications in FRP-like languages where signal sampling rates matter Limitations include Assumes declarative push and pop counts for filters Assumes stable filter level profiling Current/Future work includes hybrid typing handle dynamic fluctuations improve adaptiveness A Java-based framework 39

  40. Rate Types in Retrospective Can program structures help understand performance of data- intensive software? Rate Types provides a unified and modular reasoning framework for performance of data-intensive software surprising how much we can statically answer questions that appear to be inherently dynamic 40

  41. Bigger Picture Non-functional properties such as performance and energy are critical and well-studied in experimental research (OSDI/SOSP, ASPLOS, ISCA, SIGMOD, SC, etc) reasoning about non-functional properties 41

  42. Thank You 42

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#