Tackling Store Buffers in TSO Analysis
The discussion revolves around the management of store buffers in Total Store Ordering (TSO) analysis, contrasting Sequential Consistency (SC) memory model. Different protocols and approaches like Dekker's mutual exclusion and verification techniques for TSO are explored. An innovative method of reducing TSO analysis to SC with approximations is addressed, focusing on symbolic representation of store buffers and finite state program reachability. Moreover, a simulation technique is proposed, translating concurrent programs in TSO to SC for effective analysis leveraging off-the-shelf tools.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Getting Rid of Store-Buffers in TSO Analysis Mohamed Faouzi Atig Uppsala University, Sweden Ahmed Bouajjani LIAFA, University of Paris 7, France Gennaro Parlato University of Southampton, UK
Sequential consistency memory model (SC) T1 Shared Memory Tn Write(var,val): sh_mem[var] val; (immidialy visible to all threads Read(var): returns sh_mem[val]; SC= actions of different threads interleaved in any order action of the same thread maintain the execution order WMM= For performance reason modern multi-processors reorder memory operations of the same thread
Total Store Ordering (TSO) M1 T1 (x 4) (z 7) (y 3) Shared Memory Mn Tn (z 4) (y 4) Each thread has its store-buffer (FIFO) Write(var,val): the pair (var val) is sent to the buffer Memory update = execution of a Write taken from some buffer Read(var) returns val - If (var val) the last value written into var still in the store-buffer - the buffer does not contain any Write to var, and sh_mem(var) = val fence requires that the store-buffer is empty
Correct under SC -- Wrong under TSO Dekker s mutual exclusion protocol Thread 1 a: y:=1 b: r1:=x c: if (r1==0) then d: critical section y 1 x y 0 0 Thread 2 1: x:=1 2: r2:=y 4: if (r2==0) then 4: critical section x 1 Bad Schedule for TSO: a b c d 1 2 3 4 both threads in the critical section!!!
Verification for TSO? For finite state programs reachability is non-primitive recursive [Atig, Bouajjani, Burckhardt, Masuvathi POPL 10] What shall we do? Symbolic representation of the store buffers? [Linden, Wolper SPIN 10]: Regular model-checking Our approach reduce the analysis from TSO to SC can be done only with approximations
What is this talk about If we restrict to only executions where each thread is executed at most k times with no interruption (for a fixed k) we can translate any concurrent program PTSO (recursion, thread creation, heap, ) into another program PSC s.t. PSC (under SC) simulates all possible executions of PTSO (under TSO) where each thread is executed at most k times PSC has no buffer at all! Simulation of the store-buffers using 2k copies of the shared variables as locals PSC has linear size in the size of PTSO Advantage: use off-the-shelf SC tools for the analysis of TSO programs
Code-to-code translation from TSO to SC
k-round (for each thread) reachability M1 T1 P1 Shared Memory Mi Ti Pi Run = (Ti1++Mi1)+ (Ti2++Mi2)+ ... round Pi1 round Pi2 A k-round run : i # round Pi k
Compositional reasoning [(Ti +Mi)*]k round0 (Mask0 Buff0) round1 (Mask1 Buff1) round2 (Mask2 Buff2)
Getting rid of store-buffers Maski x y z is a copy of the shared vars as Boolean (as locals) (Mask0 Buff0) (Mask1 Buff1) Buffi x - y 6 z - (Mask2 Buff2) is a copy of the shared vars (as locals)
Invariant: x y z x 3 0 0 y 5 - 1 z - - 4 Buff0 Buff1 Buff2 Mask0 Mask1 Mask2 store-buffer round 0 round 1 round 2 4) (y 7) (x 0) (x 4) (x 7) (x 3) (x 7) (y 5) (x 0) (y 1) (z at each time in the simulation Maski [var]=1 iff there is a store in the store-buffer for var that update the Shared memory at round i Buffi[var] containts the last value sent for var
Simulation Before simulation: Masks set to False r_SC 0; r_TSO 0; (Mask0 Buff0) Simulation: All statements not involving shared vars are executed (Mask1 Buff1) Write(var,val) Maskr_TSO[var] T; Queuer_TSO[var] val; (Mask2 Buff2) Read(var) Let i be the greatest index s.t. i>=r_SC & Maski(var) =1 End of round : (Update shared vars): if i>=0 return Queuei[var] else return var ; For all var if Maskr_SC (var) ==1 var Buffr_SC [var]; round round Buffi 1 round 0 2
Skeleton of the translation before(){ // start round if (!sim){ lock; sim=1; r_SC++; if (r_TSO< r_SC) r_TSO=r_SC; } while(*) r_TSO++; } Shared sh_vars; Thread_i() Begin r_TSO, r_SC, sim, Mask0 , Buff0, ,Maskk , Buffk; locals l_vars; Init(); // initialize Masks to False, r_SC=0, r_TSO, sim=0; stmt_1; stmt_j before(); stmt_j; after(); stmt_2; after() { if(*) //end round Update_shared(r_SC, Mask, Queue) sim=0; unlock; } stmt_n; end
Characteristics of the translation For fixed k, PSC is linear in the size of PTSO 2k copies of the shared variable as locals (no store-buffer) PSC and PTSO are in the same class no restriction on the programs is imposed The reachable shared states are the same in PSC and PTSO A state S is reachable in PTSO with at most k rounds per thread iff S is reachable in PSC
Bounding Store Ages Observation: When r_SC =1 (Mask0, Buff0) are not used any longer Reuse the Mask and Queue variables: (Mask0 Buff0) Translation: (Maskj , Buffj) are used circularly (modulo k+1). (Mask1 Buff1) k store-ages: (Mask2 Buff2) Unbounded rounds! Constraint: each write pair remains in the store-buffer for at most k rounds (Mask0 Buff0)
How can we use this code-to-code translation?
Corollaries Decidability results for TSO reachability Our code-to-code translation is a linear reduction TSO -> SC. Inherit decidability from SC schedules (k fixed) Concurrent Boolean Prog. no recursion Complexity References k-store-ages Pspace [Qadeer, Rehof TACAS 05] k context- switches k round-robin Recursion Exptime [Lal, Reps CAV 08] [La Torre, P., Madhusudan CAV 09] [La Torre, P., Madhusudan CAV 10] Recursion Finite # threads |parameterized recursion thread-creation Exptime [Atig, Bouajjani, Qadeer TACAS 09] k-rounds per thread 2-Expspace [Emmi, Qadeer, Rakamaric POPL 11] k-delay bound recursion thread- creation recursion thread-creation Exptime [Bouajjani, Emmi, P. SAS 11] k-compositional Exptime
Tools for SC Tools for TSO (our code-to-code translation as a plug-in) A convenient way to get new tools for TSO Concurrent Program
Experiments Mutual exclusion Protocols POIROT (by MSR) Loop unrolling: 2 D stands for Delay bound No fences (buggy for TSO) D=1 7 s 26 s 5 s 8 s With fences (correct for TSO) D=1 D=2 6 s 110 s 6 s 6 s Dekker Lamport Peterson Szymanski 72 s 1608 s 47 s 978 s POIROT: SMT-based bounded model-checkers for SC programs Errors due to TSO discovered in few seconds! POIROT can also be a model-checker for TSO!
Conclusions We have proposed a code-to-code translation from TSO to SC allows to use existing and future tools designed for SC to analyze programs running under TSO under-approximation (error finding) restrictions imposed on the analyzed runs is useful to find errors in programs Beyond TSO ? Generic approach ? Thanks!