getting rid of store-buffers in tso analysis

Getting Rid of Store-Buffers in TSO Analysis

Mohamed Faouzi Atig Uppsala University, Sweden

Ahmed Bouajjani LIAFA, University of Paris 7, France

Gennaro Parlato ✓

University of Southampton, UK

Sequential consistency memory model (SC)

Write(var,val): sh_mem[var] val; (immidialy visible to all threads Read(var): returns sh_mem[val];

SC= • actions of different threads interleaved in any order• action of the same thread maintain the execution order

WMM= For performance reason modern multi-processors reorder memory operations of the same thread

T1

SharedMemory

Tn

…

Total Store Ordering (TSO)(x4) (z7)

(y3)T1M1

SharedMemory(z4) (y4)Tn Mn

… …

• Each thread has its store-buffer (FIFO)

• Write(var,val): the pair (varval) is sent to the buffer

• Memory update = execution of a Write taken from some buffer

• Read(var) returns val- If (var val) the last value written into var still in the store-buffer - the buffer does not contain any Write to var, and sh_mem(var) = val

• fence requires that the store-buffer is empty

…

Correct under SC -- Wrong under TSODekker’s mutual exclusion protocol

Thread 1a: y:=1b: r1:=xc: if (r1==0) thend: critical section

Shared memory

x y0 0Thread 2

1: x:=12: r2:=y4: if (r2==0) then4: critical sectionBad Schedule for TSO: a b c d 1 2 3 4 both threads in the critical section!!!

Verification for TSO?• For finite state programs

reachability is non-primitive recursive[Atig, Bouajjani, Burckhardt, Masuvathi – POPL’10]

• What shall we do?• Symbolic representation of the store buffers?

[Linden, Wolper—SPIN’10]: Regular model-checking

• Our approach reduce the analysis from TSO to SC• can be done only with approximations …

What is this talk aboutIf we restrict to only executions where each thread is executed at most k times with no interruption (for a fixed k)

we can translate any concurrent program PTSO (recursion, thread creation, heap, …) into another program PSC s.t.

• PSC (under SC) simulates all possible executions of PTSO (under TSO) where each thread is executed at most k times

• PSC has no buffer at all! Simulation of the store-buffers using 2k copies of the shared variables as locals

• PSC has linear size in the size of PTSO

• Advantage: use off-the-shelf SC tools for the analysis of TSO programs

Code-to-code translation from TSO to SC

k-round (for each thread) reachability

Run = (Ti1++Mi1)+ (Ti2++Mi2)+ ... round Pi1 round Pi2

A k-round run : Ɐi # round Pi ≤ k

T1 M1

SharedMemoryTi Mi

… …

… …

Pi

P1

Compositional reasoning

[(Ti +Mi)*]k

round0

round1

round2

(Mask0 Buff0)

(Mask1 Buff1)

(Mask2 Buff2)

Getting rid of store-buffers

(Mask0 Buff0)

(Mask1 Buff1)

(Mask2 Buff2)

is a copy of the shared vars (as locals)

is a copy of the shared vars as Boolean (as locals)

x y z

Maski

x y z- 6 -

Buffi

Invariant: x y z

Mask0

x y z3 5 -0 - -0 1 4

Buff0Buff1Buff2

Mask1

Mask2

(x0) (y1) (z4) (y7) (x0) (x4) (x7) (x3) (x7) (y5)

round 0round 1round 2

store-buffer

at each time in the simulation Maski [var]=1 iff

• there is a store in the store-buffer for var that update the Shared memory at round i

• Buffi[var] containts the last value sent for var

Simulation

1,21,3

0,0 0,1 0,2

Before simulation:• Masks set to False• r_SC0; r_TSO0;

Simulation:• All statements not involving

shared vars are executed

Write(var,val)• Maskr_TSO[var] T;• Queuer_TSO[var] val;

Read(var)Let i be the greatest index s.t.i>=r_SC & Maski(var) =1

if i>=0 return Queuei[var] else return var ;

Buffiround

0

round

1

round

2

End of round : (Update shared vars):

For all var if Maskr_SC (var) ==1 varBuffr_SC [var];

(Mask0 Buff0)

(Mask1 Buff1)

(Mask2 Buff2)

Skeleton of the translationShared sh_vars;

Thread_i()

Begin

locals l_vars;

stmt_1;

stmt_2;

…

stmt_n;

end

r_TSO, r_SC, sim, Mask0 , Buff0, …,Maskk , Buffk;

Init(); // initialize Masks to False, r_SC=0, r_TSO, sim=0;

stmt_j before(); stmt_j; after();

before(){ // start round if (!sim){ lock; sim=1; r_SC++; if (r_TSO< r_SC) r_TSO=r_SC; } while(*) r_TSO++;}

after(){ if(*) //end round Update_shared(r_SC, Mask, Queue) sim=0; unlock;}

Characteristics of the translation

• For fixed k, PSC is linear in the size of PTSO

• 2k copies of the shared variable as locals (no store-buffer)

• PSC and PTSO are in the same class• no restriction on the programs is imposed

• The reachable shared states are the same in PSC and PTSOA state S is reachable in PTSO with at most k rounds

per thread iff

S is reachable in PSC

Bounding Store AgesObservation:

When r_SC =1 (Mask0, Buff0) are not used any longer

Reuse the Mask and Queue variables:

Translation: (Maskj , Buffj) are used circularly (modulo k+1).

k store-ages:• Unbounded rounds! • Constraint: each write pair

remains in the store-buffer for at most k rounds

(Mask0 Buff0)

(Mask1 Buff1)

(Mask2 Buff2)

(Mask0 Buff0) … …

How can we use this code-to-codetranslation?

Corollaries

schedules(k fixed)

ConcurrentBoolean Prog.

Complexity

References

k-store-ages no recursion Pspace

k context-switches

Recursion Exptime [Qadeer, Rehof – TACAS’05]

k round-robin RecursionFinite # threads |parameterized

Exptime [Lal, Reps–CAV’08][La Torre, P., Madhusudan—CAV’09] [La Torre, P., Madhusudan—CAV’10]

k-rounds per thread

recursionthread-creation

2-Expspace [Atig, Bouajjani, Qadeer – TACAS’09]

k-delay bound recursionthread- creation

Exptime [Emmi, Qadeer, Rakamaric—POPL’11]

k-compositional

recursion thread-creation

Exptime [Bouajjani, Emmi, P.—SAS’11]

Decidability results for TSO reachabilityOur code-to-code translation is a linear reduction TSO -> SC. Inherit decidability from SC

Tools for SC Tools for TSO(our code-to-code translation as a plug-in)

A convenient way to get new tools for TSO …

SC tool

TSOSCtranlsation

Instrumentation

for the SC tool

Concurrent Program

ExperimentsMutual

exclusion Protocols

POIROT (by MSR)Loop unrolling: 2 D stands for Delay bound

No fences(buggy for TSO)

D=1

With fences(correct for TSO)

D=1 D=2Dekker 7 s 6 s 72 sLamport 26 s 110 s 1608 sPeterson 5 s 6 s 47 sSzymanski 8 s 6 s 978 s

POIROT: SMT-based bounded model-checkers for SC programs

Errors due to TSO discovered in few seconds!POIROT can also be a model-checker for TSO!

Conclusions

ConclusionsWe have proposed a code-to-code translation from TSO to SC

• allows to use existing and future tools designed for SC to analyze programs running under TSO

• under-approximation (error finding)• restrictions imposed on the analyzed runs is

useful to find errors in programs

Beyond TSO ? Generic approach ?

Thanks!

getting rid of store-buffers in tso analysis

Documents