a novel test coverage metric for concurrently-accessed software components

19.Nisan.2023 PLDI 2005, June 12-15, Chicago, U.S. 1

Serdar TasiranSerdar Tasiran, , Tayfun Elmas, Tayfun Elmas,

Guven Bolukbasi, M. Erkan KeremogluGuven Bolukbasi, M. Erkan Keremoglu

Koç University, Istanbul, TurkeyKoç University, Istanbul, Turkey

A Novel Test Coverage Metric A Novel Test Coverage Metric for Concurrently-Accessed for Concurrently-Accessed

Software ComponentsSoftware Components


Our FocusOur Focus

• Widely-used software systems are built on concurrently-accessed

software components– File systems, databases, internet services– Standard Java and C# class libraries

• Intricate synchronization mechanisms to improve performance– Prone to concurrency errors

• Concurrency errors– Data loss/corruption– Difficult to detect, reproduce through testing


The Location Pairs MetricThe Location Pairs MetricGoal of metric: To help answer the question

– “If I am worried about concurrency errors only, what unexamined scenario should I try to trigger?”

• Coverage metrics: Link between validation tools– Communicate partial results, testing goals between tools– Direct tools toward unexplored, distinct new executions

• The “location pairs” (LP) metric– Directed at concurrency errors ONLY

• Focus: “High-level” data races– Atomicity violations– Refinement violations– All variables may be lock-protected,

but operations not implemented atomically


OutlineOutline

• Runtime Refinement Checking• Examples of Refinement/Atomicity Violations• The “Location Pairs” Metric• Discussion, Ongoing Work


Refinement as Correctness Refinement as Correctness CriterionCriterion

Call I

nsert

(3)

Un

lock A

[0]

A[0

].elt

=3

Call L

ookU

p(3

)

Retu

rn“su

cces

s”

Un

lock A

[1]

A[1

].elt

=4

Retu

rn“su

cces

s”

read

A[0

]

Retu

rn

“tr

ue”

A[0

].elt

=n

ull

Un

lock A

[0]

Retu

rn“su

cces

s”

Call I

nsert

(4)

Call D

ele

te(3

)

LookUp(3)

Thread 1

.....

Insert(3)Thread 2

.....

Insert(4)Thread 3

.....

Delete(3)

Thread 4

.....

ComponentImplementation

• Client threads invokeoperations concurrently

• Data structure operations should appear to be executed– atomically– in a linear order

to client threads.


Runtime Refinement Runtime Refinement CheckingChecking• Refinement

– For each execution of Impl• there exists an “equivalent”, atomic execution of data

structure Spec

• Spec: “Atomized” version of Impl– Client methods run one at a time– Obtained from Impl itself

• Use refinement as correctness criterion– More thorough than assertions– More observability than pure testing

• Runtime verification: Check refinement using execution traces– Can handle industrial-scale programs – Intermediate between testing & exhaustive verification


The VYRD ToolThe VYRD Tool

ReplayMechanism

Impl

Implreplay Spec

traceImpl

Call I

nsert

(3)

Un

lock

A[0

]

A[0

].elt

=3

Call L

ookU

p(3

)

Retu

rn“su

ccess

”

Un

lock

A[1

]

A[1

].elt

=4

Retu

rn“su

ccess

”

read

A[0

]

Retu

rn “tr

ue”

A[0

].elt

=n

ull

Un

lock

A[0

]

Retu

rn“su

ccess

”

Call I

nsert

(4)

Call D

ele

te(3

)

Multi-threaded test

RefinementChecker

......

Write to log

Read from log

traceSpec

Execute logged actions

Run methods atomically

• At certain points for each method, take “state snapshots”• Check consistency of data structure contents


The Vyrd ExperienceThe Vyrd Experience• Scalable method: Caught previously undetected,

serious but subtle bugs in industrial-scale designs– Boxwood (30K LOC)– Scan Filesystem (Windows NT)– Java Libraries with known bugs

• Reasonable runtime overhead

• Key novelty: Checking refinement improves observability– Catches bugs that are triggered but not observed by testing– Significant improvement


ExperienceExperience The Boxwood ProjectThe Boxwood Project

BLINKTREE MODULE Root Pointer Node

Internal Pointer Node Internal Pointer Node Internal Pointer NodeLevel n+1................

Internal Pointer Node Internal Pointer Node Internal Pointer NodeLevel n..................

Root Level

Leaf Pointer Node Leaf Pointer Node Leaf Pointer Node... Level 0 ...

...... ..... ..... ..... ..... ............... .... .... ......... ........ ......

...... ..... ..... ..... ..... ............... .... .... ......... ........ ......

Data Node Data Node Data Node Data Node Data Node Data Node... Data Nodes ...

GlobalDiskAllocator

CHUNK MANAGER MODULE

ReplicatedDisk Manager

Write

Read

Read Write

CACHE MODULE

Dirty Cache Entries

...

Clean Cache Entries

...

Cache


Refinement vs. Testing: Improved Refinement vs. Testing: Improved ObservabilityObservability• Using Vyrd, caught previously undetected bug in

– Boxwood Cache– Scan File System (Windows NT)

• Bug manifestation: – Cache entry is correct, marked “clean”– Permanent storage has corrupted data

• Hard to catch through testing

– As long as “Read”s hit in Cache, return value correct

– Caught through testing only if1.Cache fills, clean entry in Cache is evicted

• Not written again to permanent storage since entry is marked “clean”

2.Entry read from permanent storage after eviction– With no “Write”s to entry in the meantime


OutlineOutline



Idea behind the LP metricIdea behind the LP metric• Observation: Bug occurs whenever

1. Method1 executes up to line X, context switch occurs2. Method2 starts execution from line Y3. Provided there is a data dependency between

• Method1’s code “right before” line X: BlockX• Method2’s code “right after” line Y: BlockY

• Description of bug in the log follows pattern above

• Only requirement on program state, other threads, etc.:– Make the interleaving above possible– May require many other threads, complicated program

state, ...

• A “one-bit” data abstraction captures error scenario– Depdt: Is there a data dependency between BlockX and

BlockY


int len = sb.length();

ensureCapacity(newCount);

int newCount = count + len;

if (newCount > value.length)

count = newCount;

return this;

sb.getChars(0, len, value, count);

public synchronized StringBuffer append(StringBuffer sb) {

}

...

} else {

if (count < newLength)

...

}

return this;

count = newLength;

public synchronized void setLength(int newLength) {

}

...


handle

handle

T Z

Chunk Manager

X Y

Cache

handle

handle

X Z

Chunk Manager

A Y

Cache

handle

handle

A Y

Chunk Manager

A B

Cache

handle

handle

A Y

Chunk Manager

A B

Cache

Write(handle,AB) starts

Flush()starts

Flush() ends

Write(handle, AB)

ends

ExperienceExperience Concurrency Bug in Concurrency Bug in CacheCache

handle

handle

A Y

Chunk Manager

A Y

Cache

Different byte-arrays for the same handle

Corrupted data in persistent storage


}

te.data[i] = buf[i];

for (int i=0; i<buf.length; i++) {

te.lsn = lsn

private static void CpToCache( byte[] buf, CacheEntry te, int lsn, Handle h sb) {

}

...

lock (clean) {

BoxMain.alloc.Write(h, te.data, te.data.length, 0, 0, WRITE_TYPE_RAW);

}

...

public static void Flush(int lsn) {

}


OutlineOutline


-----------------------------------acquire(this)-----------------------------------invoke sb.length()

--------------------------– L1 ----int len = sb.length()

--------------------------- L2 ----int newCount = count + len

-----------------------------------invoke sb.getChar()-----------------------------------sb.getChars(0, len, value, count)--------------------------–--------count = newCount-----------------------------------return this

-----------------------------------if (newCount > value.length)

-----------------------------------expandCapacity(newCount);

public synchronized StringBuffer append(StringBufer sb) {

1 int len = sb.length();2 int newCount = count + len;3 if (newCount > value.length) {4 ensureCapacity(newCount);5 sb.getChars(0, len, value, count);6 count = newCount;7 return this;8 }


Coverage FSM StateCoverage FSM State

Method 1

Method 2

(LX, pend1, LY, pend2, depdt)

Location inthe CFG ofMethod 1

Is an “interesting” action in Method 1 is expected next?

Location inthe CFG ofMethod 2Is an

“interesting” action in Method 2 expected next?

Do actions following LX and LY have a data dependency?


Coverage FSMCoverage FSM

(L1, !pend1, L3, !pend2, depdt)

(L2, !pend1, L3, pend2, !depdt)

(L1, pend1, L3, !pend2, !depdt)

(L1, !pend1, L3, !pend2, !depdt)

t1: L1

L2

t1: L1

L2

t2: L3 L4

t2: L3 L4


Coverage GoalCoverage Goal• The “pend1” bit gets set when

– The depdt bit is TRUE– Method2 takes an action– Intuition: Method1’s dependent action must follow

• Must cover all (reachable) transitions of the form

– p = (LXp, TRUE, LY, pend2p, depdtp)

q = (LXq, pend1q, LY, pend2q, depdtq)

– p = (LX, pend1p, LYp, TRUE, depdtp)

q = (LX, pend1q, LYq, pend2q, depdtq)

• Separate coverage FSM for each method pair: FSM(Method1, Method2)– Cover required transitions in each FSM


Important DetailsImportant Details• Action: Atomically executed code fragment

– Defined by the language

• Method calls:– Call action: Method call, all lock acquisitions– Return action: Total net effect of method, atomically executed +

lock releases

• Separate coverage FSM for each method pair: FSM(Method1, Method2)– Cover required transitions in each FSM

• But what if there is interesting concurrency inside called method?– Considered separately when that method is considered as one in

the method pair– If Method1 calls Method3:

• Considered when FSM(Method3, Method2) is covered


OutlineOutline



Empirical evidenceEmpirical evidence• Does this metric correspond well with high-level concurrency

errors? • Errors captured by metric

– 100% metric Bug guaranteed to be triggered• Triggered vs. detected:

– May need view refinement checking to improve observability

• Preliminary study– Bugs in Java class libraries– Bug found in Boxwood cache– Bug found in Scan file system– Bugs categories reported in

E. Farchi, Y. Nir, S. Ur Concurrent Bug Patterns and How to Test Them 17th Intl. Parallel and Distributed Processing Symposium (IDPDS ’03)

• How many are covered by random testing? How does coverage change over time?– Don’t know yet. Implementing coverage measurement tool.


Reducing the Coverage FSMReducing the Coverage FSM

• Method-local actions: – Basic block consisting of method-local actions

considered a single atomic action

• Pure blocks [Flanagan & Qadeer, ISSTA ’04]– A “pure” execution of pure block does not affect global state

• Example: Acquire lock, read global variable, decide resource not free, release lock

– Considered a “no-op”– Modeled by “bypass transition” in coverage FSM.

• Does not need to be covered


• The metric is NOT for deciding when to stop testing/verification• Intended use:

– Testing, runtime verification is applied to program– List of non-covered coverage targets provided to programmer

• Intuition: Given an unexercised scenario, the programmer must have a simple reason to believe that– the scenario is not possible, or– the scenario is safe

• Given uncovered coverage target, programmer – either provides hints to coverage tool to rule target out– or, assumes that coverage target is a possibility,

• writes test to trigger it• or, makes sure that no concurrency error would result

if coverage target were to be exercised

DiscussionDiscussion


Future Work: Approximating Reachable Future Work: Approximating Reachable LP SetLP Set• # of locations per method in Boxwood: ~10,

after factoring out atomic and pure blocks

• LP reachability undecidable– Metric only intended as aid to programmer

• What have I tested?• What should I try to test?• Make sure LP does not lead to error if it looks like it can be

exercised.

• Future work: Better approximate reachable LP set– Do conservative reachability analysis of coverage FSM

using predicate abstraction.– Programmer can add predicates for better FSM reduction


MultisetMultiset• Multiset data structure

M = { 2, 3, 3, 3, 9, 8, 8, 5 }

• Has highly concurrent implementations of– Insert– Delete– InsertPair– LookUp

Implementation: Implementation: LookUpLookUp

LookUp (x)for i = 1 to n acquire(A[i]) if (A[i].content==x && A[i].valid)

release(A[i]) return true else release(A[i]) return false

A9

8

6

8

5

3

null

3

3

2

content

valid


Call Insert(3)

Call LookUp(3)

Return“success”

Call Insert(4)

Return“success”

Return “true”

Call Delete(3)

Return“success”

Unlock A[0]

A[0].elt=3

Unlock A[1]

A[1].elt=4

read A[0]

A[0].elt=null

Unlock A[0]

MultisetMultiset TestingTesting

• Don’t know which happened first– Insert(3) or Delete(3) ?

• Should 3 be in the multiset at the end?– Must accept both possibilities

as correct

• Common practice:– Run long multi-threaded test– Perform sanity checks on final

state


Call Insert(3)

Call LookUp(3)

Return“success”

Call Insert(4)

Return“success”

Return “true”

Call Delete(3)

Return“success”

Unlock A[0]

A[0].elt=3

Unlock A[1]

A[1].elt=4

read A[0]

A[0].elt=null

Unlock A[0]

MultisetMultiset I/O RefinementI/O Refinement

M=Ø

{3}

{3}

{3, 4}

{4}

Spec trace

Call Insert(3)

Return “success”

Call LookUp(3)


Call Insert(4)

Call Delete(3)


M = M U {3}

Check 3 M

Return “true”

M = M U {4}

M = M \ {3}

CommitInsert(3)

CommitLookUp(3

)

CommitInsert(4)

CommitDelete(3

)

Witness ordering

Unlock A[0]

A[0].elt=3

Unlock A[1]

A[1].elt=4

read A[0]

A[0].elt=null

Unlock A[0]

M = M U {3}

Check 3 M

M = M U {4}

M = M \ {3}


View-refinementView-refinement• State correspondence

– Hypothetical “view” variables must match at commit points

• “view” variable: – Value of variable is abstract data structure state– Updated atomically once by each method

• For A[1..n]– Extract content

if valid=true

View VariablesView Variables

3

5

5

content

validA

3

9

8

6

8

5

viewImpl={3, 3, 5, 5, 8, 8, 9}


View-refinementView-refinement

Call Insert(3)

viewImpl = {3}

A[0].elt=3

Call LookUp(3)

Return“success”

viewImpl = {3,4}

A[1].elt=4

Call Insert(4)

Return“success”

Return “true”

Call Delete(3)

A[0].elt=null

viewImpl = {4}

Return“success”

CommitInsert(3)

CommitLookUp(3

)

CommitInsert(4)

CommitDelete(3

)

Witness ordering M=Ø

{3}

{3}

{3, 4}

{4}

Spec trace

Call Insert(3)


Call LookUp(3)

Return “true”


Call Insert(4)

Call Delete(3)


M = M U {3}

Check 3 M

M = M U {4}

M = M \ {3}

viewSpec = {3}

viewSpec = {3}

viewSpec = {3,4}

viewSpec = {4}

viewImpl = {3}

a novel test coverage metric for concurrently-accessed software components

Documents