DRFx: A Simple and Efficient Memory Model for Concurrent Programming Languages
State-of-the-art memory model DRFx provides a solution for relaxed data race detection, addressing deficiencies of previous models like DRF0. It ensures safety, debuggability, and compiler correctness while permitting optimizations and halting programs before non-sequential consistency behavior.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
DRFx A Simple and Efficient Memory Model for Concurrent Programming Languages Dan Marino Abhay Singh Todd Millstein Madan Musuvathi Satish Narayanasamy UC Los Angeles University of Michigan UC Los Angeles MSR, Redmond University of Michigan
STATEOFTHE ART: SC FOR DATA RACE FREE MEMORY MODELS sequential consistency [Lamport 79] intuitive for programmers limits compiler and hardware optimizations DRF0 [Adve&Hill 90] models balance performance and ease of programming SC behavior guaranteed for race-free programs most optimizations allowed e.g. Java and C++0x memory models [Manson et al. 2005] [Boehm et al. 2008] 2
PROGRAM BEHAVIORUNDER DRF0 X* x = null; bool init = false; atomic // Thread t // Thread u A: x = new X(); C: if(init) A: x = new X(); C: if(init) D: x->f++; B: init = true; B: init = true; D: x->f++; O p t i m i z i n g C o m p i l e r a n d H a r d w a r e Null Pointer! B doesn t depend on A. It might be faster to reorder them! 3
DEFICIENCIESOF DRF0 weak or no semantics for racy programs unintentional data races easy to introduce problematic for DEBUGGABILITY programmer must assume non- SC behavior for all programs COMPILERCORRECTNESS Java must maintain safety at the cost of complexity [ ev k&Aspinall, ECOOP 2008] SAFETY [Boehm et al., PLDI 2008] optimization + data race = jump to arbitrary code! 4
OUR SOLUTION: THE DRFx MEMORY MODEL Memory Model Exception data race Programming Error Fatal Runtime Error DEBUGGABILITY SC for all executions SAFETY halt program before non-SC behavior exhibited COMPILERCORRECTNESS most sequentially-valid optimization permitted 5
DRFx ALLOWS RELAXED DATA RACE DETECTION SOURCEPROGRAM OBSERVEDBEHAVIOR data race free SC Behavior MM has data races Exception precise runtime data race detection is slow in software and complex in hardware [Flanagan & Freund 2009] [Prvulovic & Torrelas 2003] 6
DETECTINGAN SC VIOLATION Races need not be reported between regions that do not execute concurrently! region serializable for compiled SC for source MM Exception region fence B: init = true; X* x = null; bool init = false; region fence C: if(init) D: x->f++; region fence // Thread t // Thread u A: x = new X(); C: if(init) B: init = true; D: x->f++; A: x = new X(); region fence data race, but no SC violation Insight: compiler can communicate to runtime the runtime must detect conflicting accesses in regions that execute concurrently. regions in which reordering may have occurred
DRFxCOMPILERAND RUNTIME REQUIREMENTS DRFx Compiler communicate regions in which optimizations were made by using fence instructions synchronization in their own region no speculative memory accesses DRFx Execution Environment trap on conflicting accesses in concurrent regions global order on region fences memory order consistent with fence order 8
FORMALIZATION compiler requirements how program is split into regions permitted optimizations all non-speculative, sequentially valid optimizations execution environment requirements when conflict may/must be reported memory orderings allowed w.r.t. fences prove no MM exception SC behavior for source program MM exception data race in source program 9
EFFICIENT & SIMPLE CONFLICT DETECTION perform detection in hardware like transactional memory hardware but simpler no rollback we control region boundaries compiler bounds number of memory locations dynamically accessed in a region limits optimization opportunities distinguish bounding region fence hardware can merge regions separated by a bounding fence when resources available 10
COMPILER IMPLEMENTATION built conservative DRFx-compliant compiler LLVM [Lattner & Adve 2004] na ve bounding analysis bounding fence at all loop back edges disable speculative optimizations measured performance PARSEC benchmark suite stock x86 hardware no architectural simulator 11
DRFx OVERHEADON PARSEC BENCHMARKS Compiler (sync) Hardware (sync) Compiler (bounding) AVERAGE streamcluster x264 fld.animate canneal bodytrack swaptions facesim ferret blackscholes 0% slowdown over unmodified, fully optimizing LLVM 2% 4% 6% 8% 12
RELATED WORK memory models e.g. [Lamport 1979], [Dubois et al. 1986], [Adve & Hill 1990] hardware race detection [Adve et al.1991], [Muzahid et al. 2009], [Prvulovic & Torrelas 2003] software race detection e.g. [Yu et al. 2005 ],[Flanagan & Freund 2009],[Elmas et al. 2007] detecting SC violations [Gharachorloo&Gibbons, SPAA 1991] conflict exception [Lucia et al., ISCA 2010] stronger guarantee : serializability of sync-free regions requires unbounded detection scheme focused on hardware 13
DRFx CONCLUSION lightweight form of data race detection MM regions Exception EASY-TO-UNDERSTAND EFFICIENT programmer gets understandable behavior for all programs straightforward hardware support compiler may perform most sequentially valid optimizations within regions compiler restrictions only 0% - 7% slowdown 14