DRFx: A Simple and Efficient Memory Model for Concurrent Programming Languages

undefined
DRF
x
A Simple and Efficient Memory Model
for Concurrent Programming Languages
Dan 
Marino
Abhay Singh
Todd Millstein
Madan Musuvathi
Satish Narayanasamy
UC Los Angeles
University
 of Michigan
UC Los Angeles
MSR, Redmond
University
 of Michigan
S
TATE
 
OF
 
THE
 A
RT
:
SC 
FOR
 D
ATA
 R
ACE
 F
REE
 M
EMORY
 M
ODELS
 
sequential consistency 
[Lamport 79]
intuitive for programmers
limits compiler and hardware optimizations
DRF0 
[Adve&Hill 90]
 models balance performance
and ease of programming
SC behavior guaranteed for race-free programs
most optimizations allowed
e.g. Java and C++0x memory models
[Manson et al. 2005] [Boehm et al. 2008]
2
 
B: init = true;
 
C: if(init)
D:   x->f++;
P
ROGRAM
 B
EHAVIOR
 
UNDER
 DRF0
// Thread t          // Thread u
A: x = new X();
      
C: if(init)
 
B: init = true;
      
D:   x->f++;
X* x = null;
bool init = false;
 
A: x = new X();
B doesn’t depend on A.
It might be faster to reorder them!
O
p
t
i
m
i
z
i
n
g
 
 
 
 
C
o
m
p
i
l
e
r
 
 
 
 
a
n
d
 
 
 
 
H
a
r
d
w
a
r
e
Null
Pointer!
3
 
atomic
D
EFICIENCIES
 
OF
 DRF0
4
weak or 
no
semantics for
racy programs
unintentional
data races easy
to introduce
problematic for
O
UR
 S
OLUTION
: T
HE
 DRF
x 
M
EMORY
 M
ODEL
5
data
race
Programming
Error
Memory
Model
Exception
Fatal Runtime
Error
 
  
debuggability
     
SC for 
all
 executions
  
safety
     
halt program before non-SC behavior exhibited
  
compiler correctness
     
most sequentially-valid optimization permitted
DRF
x
 A
LLOWS
 R
ELAXED
 D
ATA
 R
ACE
 D
ETECTION
6
MM
Exception
SC Behavior
SOURCE
 
PROGRAM
OBSERVED
 
BEHAVIOR
data
race free
has data
races
precise runtime data race detection is slow in software and
complex in hardware
[Flanagan & Freund 2009] [Prvulovic & Torrelas 2003]
simplify detection
undefined
runtime must detect conflicting accesses
in regions that execute concurrently.
D
ETECTING
 
AN
 SC V
IOLATION
 
A: x = new X();
 
B: init = true;
 
C: if(init)
D:   x->f++;
 
region fence
Insight:  compiler can communicate to runtime the
regions 
in which reordering may have occurred
Races need not be reported between regions
that do not execute concurrently!
region serializable for compiled 
 
SC for source
MM
Exception
data race,
but no SC violation
 
region fence
DRF
x
 
C
OMPILER
 
AND
 R
UNTIME
 R
EQUIREMENTS
DRF
x
 Compiler
communicate regions in which optimizations were
made by using 
fence
 instructions
synchronization in their own region
no speculative memory accesses
DRF
x
 Execution Environment
trap on conflicting accesses in concurrent regions
global order on region fences
memory order consistent with fence order
8
F
ORMALIZATION
compiler requirements
how program is split into regions
permitted optimizations
all non-speculative, sequentially valid optimizations
execution environment requirements
when conflict may/must be reported
memory orderings allowed w.r.t. fences
prove
no MM exception 
 SC behavior for source program
MM exception 
 data race in source program
9
E
FFICIENT
 & S
IMPLE
 C
ONFLICT
 D
ETECTION
 
perform detection in hardware
like transactional memory hardware – but simpler
no rollback
we control region boundaries
compiler bounds number of memory locations
dynamically accessed in a region
limits optimization opportunities
distinguish “bounding” region fence
hardware can merge regions separated by a
bounding fence when resources available
10
C
OMPILER
 I
MPLEMENTATION
built conservative DRF
x
-compliant compiler
LLVM 
[Lattner & Adve 2004]
naïve bounding analysis
bounding fence at all loop back edges
disable speculative optimizations
measured performance
PARSEC benchmark suite
stock x86 hardware – no architectural simulator
11
DRF
x
  
O
VERHEAD
 
ON
 P
ARSEC
 B
ENCHMARKS
12
slowdown over unmodified, fully optimizing LLVM
R
ELATED
 W
ORK
memory models
e.g. [Lamport 1979], [Dubois et al. 1986], [Adve & Hill 1990]
hardware race detection
[Adve et al.1991], [Muzahid et al. 2009], [Prvulovic & Torrelas 2003]
software race detection
e.g. [Yu et al. 2005 ],[Flanagan & Freund 2009],[Elmas et al. 2007]
detecting SC violations 
[Gharachorloo&Gibbons, SPAA 1991]
conflict exception 
[Lucia et al., ISCA 2010]
stronger guarantee : serializability of sync-free regions
requires unbounded detection scheme
focused on hardware
13
DRF
x
 C
ONCLUSION
14
lightweight
form of data
race detection
MM
Exception
regions
C
ONTRIBUTIONS
 
defined DRF
x
 in terms of end-to-end programmer
guarantees
established sufficient requirements on compiler
and execution environment
proved that requirements establish guarantees
implemented a DRF
x
 compliant compiler and
measured its performance
designed hardware that meets DRF
x
 execution
environment requirements
15
Soundness
Safety
DRF
DRF
x
 
G
UARANTEES
P is data
 
race free    
all executions of P are SC and MM-exception-free
SC is violated during an execution of P    
an MM exception will eventually be thrown
16
 
an execution of P makes a system call    
the system call is reachable in an SC execution
DRF
x 
ALLOWS
 
A
 
RANGE
 
OF
 
IMPLEMENTATIONS
17
DRF
SC
DRF = data race free
SC = sequentially consistent
All Executions
must
raise MM
exception
may or
may not
raise MM
exception
must not
raise
MM
exception
precise runtime data race detection is slow in software
and complex in hardware
[Flanagan & Freund 2009] [Prvulovic & Torrelas 2003]
undefined
D
ETECTING
 
AN
 SC V
IOLATION
 
A: x = new X();
 
B: init = true;
 
C: if(init)
D:   x->f++;
 
A issued
 
B issued
 
A retired
 
B retired
if hardware performs optimization,
we can notice SC violation by detecting 
races on in-flight instructions
Gharachorloo and Gibbons [SPAA 1991]
if compiler performs optimization,
this strategy is insufficient
Insight:  compiler can communicate to hardware
regions 
in which reordering may have occurred
Races need not be reported between regions
that do not execute concurrently!
Concurrent
Region
Conflict
data race,
but no SC violation
R
EGIONS
 F
ACILITATE
 DRF
x 
I
MPLEMENTATION
19
DRF
SC
DRF = data race free
SC = sequentially consistent
All Executions
 
RCF
 
RS
RCF = region conflict free
RS = region serializable
Soundness
Safety
DRF
E
STABLISHING
 
THE
 DRF
x
 
G
UARANTEES
P race free + syncs in own region 
 conflicting regions cannot execute concurrently
 MM exception is not throw on any execution
MM exception not thrown 
 no concurrent region conflict
 region-serializable execution of compiled program
 SC execution of source program
Soundness + timely exceptions + system calls in own region
 system call reachable through SC execution
20
DRF
x
 
H
ARDWARE
 D
ESIGN
21
Processor
add entry upon issuing
memory access
check for conflicts
with active regions
broadcast
 access set
upon region completion
bounded regions allow
fixed size buffer without
overflow
soft fences allow processor
reordering, multiple 
“in-flight” regions
DRF
x
  
O
VERHEAD
 
ON
 P
ARSEC
 B
ENCHMARKS
22
Slide Note
Embed
Share

State-of-the-art memory model DRFx provides a solution for relaxed data race detection, addressing deficiencies of previous models like DRF0. It ensures safety, debuggability, and compiler correctness while permitting optimizations and halting programs before non-sequential consistency behavior.

  • Memory Model
  • Concurrent Programming
  • DRFx
  • Data Race Detection
  • Safety

Uploaded on Sep 09, 2024 | 3 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. DRFx A Simple and Efficient Memory Model for Concurrent Programming Languages Dan Marino Abhay Singh Todd Millstein Madan Musuvathi Satish Narayanasamy UC Los Angeles University of Michigan UC Los Angeles MSR, Redmond University of Michigan

  2. STATEOFTHE ART: SC FOR DATA RACE FREE MEMORY MODELS sequential consistency [Lamport 79] intuitive for programmers limits compiler and hardware optimizations DRF0 [Adve&Hill 90] models balance performance and ease of programming SC behavior guaranteed for race-free programs most optimizations allowed e.g. Java and C++0x memory models [Manson et al. 2005] [Boehm et al. 2008] 2

  3. PROGRAM BEHAVIORUNDER DRF0 X* x = null; bool init = false; atomic // Thread t // Thread u A: x = new X(); C: if(init) A: x = new X(); C: if(init) D: x->f++; B: init = true; B: init = true; D: x->f++; O p t i m i z i n g C o m p i l e r a n d H a r d w a r e Null Pointer! B doesn t depend on A. It might be faster to reorder them! 3

  4. DEFICIENCIESOF DRF0 weak or no semantics for racy programs unintentional data races easy to introduce problematic for DEBUGGABILITY programmer must assume non- SC behavior for all programs COMPILERCORRECTNESS Java must maintain safety at the cost of complexity [ ev k&Aspinall, ECOOP 2008] SAFETY [Boehm et al., PLDI 2008] optimization + data race = jump to arbitrary code! 4

  5. OUR SOLUTION: THE DRFx MEMORY MODEL Memory Model Exception data race Programming Error Fatal Runtime Error DEBUGGABILITY SC for all executions SAFETY halt program before non-SC behavior exhibited COMPILERCORRECTNESS most sequentially-valid optimization permitted 5

  6. DRFx ALLOWS RELAXED DATA RACE DETECTION SOURCEPROGRAM OBSERVEDBEHAVIOR data race free SC Behavior MM has data races Exception precise runtime data race detection is slow in software and complex in hardware [Flanagan & Freund 2009] [Prvulovic & Torrelas 2003] 6

  7. DETECTINGAN SC VIOLATION Races need not be reported between regions that do not execute concurrently! region serializable for compiled SC for source MM Exception region fence B: init = true; X* x = null; bool init = false; region fence C: if(init) D: x->f++; region fence // Thread t // Thread u A: x = new X(); C: if(init) B: init = true; D: x->f++; A: x = new X(); region fence data race, but no SC violation Insight: compiler can communicate to runtime the runtime must detect conflicting accesses in regions that execute concurrently. regions in which reordering may have occurred

  8. DRFxCOMPILERAND RUNTIME REQUIREMENTS DRFx Compiler communicate regions in which optimizations were made by using fence instructions synchronization in their own region no speculative memory accesses DRFx Execution Environment trap on conflicting accesses in concurrent regions global order on region fences memory order consistent with fence order 8

  9. FORMALIZATION compiler requirements how program is split into regions permitted optimizations all non-speculative, sequentially valid optimizations execution environment requirements when conflict may/must be reported memory orderings allowed w.r.t. fences prove no MM exception SC behavior for source program MM exception data race in source program 9

  10. EFFICIENT & SIMPLE CONFLICT DETECTION perform detection in hardware like transactional memory hardware but simpler no rollback we control region boundaries compiler bounds number of memory locations dynamically accessed in a region limits optimization opportunities distinguish bounding region fence hardware can merge regions separated by a bounding fence when resources available 10

  11. COMPILER IMPLEMENTATION built conservative DRFx-compliant compiler LLVM [Lattner & Adve 2004] na ve bounding analysis bounding fence at all loop back edges disable speculative optimizations measured performance PARSEC benchmark suite stock x86 hardware no architectural simulator 11

  12. DRFx OVERHEADON PARSEC BENCHMARKS Compiler (sync) Hardware (sync) Compiler (bounding) AVERAGE streamcluster x264 fld.animate canneal bodytrack swaptions facesim ferret blackscholes 0% slowdown over unmodified, fully optimizing LLVM 2% 4% 6% 8% 12

  13. RELATED WORK memory models e.g. [Lamport 1979], [Dubois et al. 1986], [Adve & Hill 1990] hardware race detection [Adve et al.1991], [Muzahid et al. 2009], [Prvulovic & Torrelas 2003] software race detection e.g. [Yu et al. 2005 ],[Flanagan & Freund 2009],[Elmas et al. 2007] detecting SC violations [Gharachorloo&Gibbons, SPAA 1991] conflict exception [Lucia et al., ISCA 2010] stronger guarantee : serializability of sync-free regions requires unbounded detection scheme focused on hardware 13

  14. DRFx CONCLUSION lightweight form of data race detection MM regions Exception EASY-TO-UNDERSTAND EFFICIENT programmer gets understandable behavior for all programs straightforward hardware support compiler may perform most sequentially valid optimizations within regions compiler restrictions only 0% - 7% slowdown 14

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#