canary: practical static detection of inter-thread value

15
Canary: Practical Static Detection of Inter-thread Value-Flow Bugs Yuandao Cai The Hong Kong University of Science and Technology Hong Kong, China [email protected] Peisen Yao The Hong Kong University of Science and Technology Hong Kong, China [email protected] Charles Zhang The Hong Kong University of Science and Technology Hong Kong, China [email protected] Abstract Concurrent programs are still prone to bugs arising from the subtle interleavings of threads. Traditional static analysis for concurrent programs, such as data-ow analysis and sym- bolic execution, has to explicitly explore redundant control states, leading to prohibitive computational complexity. This paper presents a value ow analysis framework for concurrent programs called C that is practical to stati- cally nd diversied inter-thread value-ow bugs. Our work is the rst to convert the concurrency bug detection to a source-sink reachability problem, eectively reducing redun- dant thread interleavings. Specically, we propose a scal- able thread-modular algorithm to capture data and interfer- ence dependence in a value-ow graph. The relevant edges of value ows are annotated with execution constraints as guards to describe the conditions of value ows. C then traverses the graph to detect concurrency defects via tracking the source-sink properties and solving the aggre- gated guards of value ows with an SMT solver to decide the realizability of interleaving executions. Experiments show that C is precise, scalable and practical, detecting over eighteen previously unknown concurrency bugs in large, widely-used software systems with low false positives. CCS Concepts: Software and its engineering ! Soft- ware verication and validation. Keywords: Concurrency, static analysis, interference depen- dence, concurrency bugs detection ACM Reference Format: Yuandao Cai, Peisen Yao, and Charles Zhang. 2021. Canary: Practical Static Detection of Inter-thread Value-Flow Bugs. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specic permission and/or a fee. Request permissions from [email protected]. PLDI ’21, June 20–25, 2021, Virtual, Canada © 2021 Association for Computing Machinery. ACM ISBN 978-1-4503-8391-2/21/06. . . $15.00 hps://doi.org/10.1145/3453483.3454099 Language Design and Implementation (PLDI ’21), June 20–25, 2021, Virtual, Canada. ACM, New York, NY, USA, 15 pages. hps://doi. org/10.1145/3453483.3454099 1 Introduction Hunting concurrency bugs is notoriously dicult with the explosive growth of code size and complexity in modern software. Dynamically detecting such errors is hard because it depends on intricate sequences of low-probability concur- rent events [7, 19, 21, 28, 56]. The number of thread inter- leavings and feasible program paths grows exponentially with the number of threads and the length of program exe- cution, making dynamic analysis dicult to exercise even a tiny fraction of all possible execution, leaving large systems with an abundance of errors [64]. Comparatively, static anal- ysis achieves good coverage, discovering errors in obscure code paths dicult for the runtime execution to reach. Un- fortunately, static analysis suers considerably from being imprecise as a result of abstracting and approximating the runtime behavior of the program. The precise static analysis of concurrent programs is par- ticularly challenging, as it needs to account for the thread interleavings together with the complicated sequential rea- soning. Techniques of either the conventional data-ow an- laysis [12, 17, 18, 35, 54] or symbolic execution [4, 23, 57] propagate the data-ow facts along the control-ow paths, which, unfortunately, are often redundant and irrelevant to the data-ow facts of interest when interleaving comes into play. With this inherent limitation of the state space explosion, these techniques often severely suer from per- formance issues when aiming for high precision. In practice, practitioners have little tolerance to this limitation, because compromising either precision or scalability forms signi- cant hurdles of adoption. To reduce the exponential space, one promising direc- tion is to identify the interference dependence [27, 46, 47], which is the additional data dependence in concurrent pro- grams that takes place at shared memory locations between threads. Capturing the def-use relations between statements across dierent threads enables the eective pruning of the redundant interleaving. However, identifying the interfer- ence dependence itself is extremely challenging as witnessed

Upload: others

Post on 08-Jun-2022

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Canary: Practical Static Detection of Inter-thread Value

Canary: Practical Static Detection ofInter-thread Value-Flow Bugs

Yuandao CaiThe Hong Kong University of Science

and TechnologyHong Kong, [email protected]

Peisen YaoThe Hong Kong University of Science

and TechnologyHong Kong, [email protected]

Charles ZhangThe Hong Kong University of Science

and TechnologyHong Kong, [email protected]

AbstractConcurrent programs are still prone to bugs arising from thesubtle interleavings of threads. Traditional static analysis forconcurrent programs, such as data-�ow analysis and sym-bolic execution, has to explicitly explore redundant controlstates, leading to prohibitive computational complexity.This paper presents a value �ow analysis framework for

concurrent programs called C����� that is practical to stati-cally �nd diversi�ed inter-thread value-�ow bugs. Our workis the �rst to convert the concurrency bug detection to asource-sink reachability problem, e�ectively reducing redun-dant thread interleavings. Speci�cally, we propose a scal-able thread-modular algorithm to capture data and interfer-ence dependence in a value-�ow graph. The relevant edgesof value �ows are annotated with execution constraints asguards to describe the conditions of value �ows. C�����then traverses the graph to detect concurrency defects viatracking the source-sink properties and solving the aggre-gated guards of value �ows with an SMT solver to decide therealizability of interleaving executions. Experiments showthat C����� is precise, scalable and practical, detecting overeighteen previously unknown concurrency bugs in large,widely-used software systems with low false positives.

CCS Concepts: • Software and its engineering! Soft-ware veri�cation and validation.

Keywords: Concurrency, static analysis, interference depen-dence, concurrency bugs detection

ACM Reference Format:Yuandao Cai, Peisen Yao, and Charles Zhang. 2021. Canary: PracticalStatic Detection of Inter-thread Value-Flow Bugs. In Proceedings ofthe 42nd ACM SIGPLAN International Conference on Programming

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies are notmade or distributed for pro�t or commercial advantage and that copies bearthis notice and the full citation on the �rst page. Copyrights for componentsof this work owned by others than ACMmust be honored. Abstracting withcredit is permitted. To copy otherwise, or republish, to post on servers or toredistribute to lists, requires prior speci�c permission and/or a fee. Requestpermissions from [email protected] ’21, June 20–25, 2021, Virtual, Canada© 2021 Association for Computing Machinery.ACM ISBN 978-1-4503-8391-2/21/06. . . $15.00h�ps://doi.org/10.1145/3453483.3454099

Language Design and Implementation (PLDI ’21), June 20–25, 2021,Virtual, Canada. ACM, New York, NY, USA, 15 pages. h�ps://doi.org/10.1145/3453483.3454099

1 IntroductionHunting concurrency bugs is notoriously di�cult with theexplosive growth of code size and complexity in modernsoftware. Dynamically detecting such errors is hard becauseit depends on intricate sequences of low-probability concur-rent events [7, 19, 21, 28, 56]. The number of thread inter-leavings and feasible program paths grows exponentiallywith the number of threads and the length of program exe-cution, making dynamic analysis di�cult to exercise even atiny fraction of all possible execution, leaving large systemswith an abundance of errors [64]. Comparatively, static anal-ysis achieves good coverage, discovering errors in obscurecode paths di�cult for the runtime execution to reach. Un-fortunately, static analysis su�ers considerably from beingimprecise as a result of abstracting and approximating theruntime behavior of the program.

The precise static analysis of concurrent programs is par-ticularly challenging, as it needs to account for the threadinterleavings together with the complicated sequential rea-soning. Techniques of either the conventional data-�ow an-laysis [12, 17, 18, 35, 54] or symbolic execution [4, 23, 57]propagate the data-�ow facts along the control-�ow paths,which, unfortunately, are often redundant and irrelevantto the data-�ow facts of interest when interleaving comesinto play. With this inherent limitation of the state spaceexplosion, these techniques often severely su�er from per-formance issues when aiming for high precision. In practice,practitioners have little tolerance to this limitation, becausecompromising either precision or scalability forms signi�-cant hurdles of adoption.To reduce the exponential space, one promising direc-

tion is to identify the interference dependence [27, 46, 47],which is the additional data dependence in concurrent pro-grams that takes place at shared memory locations betweenthreads. Capturing the def-use relations between statementsacross di�erent threads enables the e�ective pruning of theredundant interleaving. However, identifying the interfer-ence dependence itself is extremely challenging as witnessed

Page 2: Canary: Practical Static Detection of Inter-thread Value

PLDI ’21, June 20–25, 2021, Virtual, Canada Yuandao Cai, Peisen Yao, and Charles Zhang

by a long stream of e�ort [46, 47, 51, 52, 54, 60]. The key di�-culty is the precise pointer aliasing in the presence of threadinterference, because exhaustively and precisely computingpointer information has not yet been proved scalable [54, 59–61]. On the other hand, it is also unnecessary to explorethe interleaving space of a given program irrelevant to thespeci�c properties being checked.To address these problems, we present C�����, a novel

concurrency analysis technique that tracks how values arestored and loaded via both the data and interference depen-dence relationships, thus able to check a diversi�ed cate-gory of multi-threaded software safety properties. Our keyinsight to mitigate the state space explosion is to reduceconcurrency bug detection to the on-demand tracking ofthe source-sink value �ows. We extend the conventionalnotion of source-sink properties to include values that �owacross threads through loads and stores and refer to the bugsdetected this way as the inter-thread value-�ow bugs, suchas the inter-thread use-after-free [2, 7, 28], NULL pointerdereference [19], double-free [7] and information leak [21].Our checking of value �ows only follows the def-use rela-tionships between the relevant statements across threadsand, hence, avoids explicitly enumerating all the possiblethread interleavings.As illustrated in Fig. 1, C����� has three key phases.

It �rst performs a thread-modular algorithm to separatelycapture both the data dependence and the interference de-pendence in a value-�ow graph (VFG). Unlike the previousinterference dependence analysis [60, 61], the proposed al-gorithm captures the interference dependence without per-forming the exhaustive pointer analysis as a prior. Instead,it piggybacks the pointer analysis with the resolution ofthe interference dependence. To represent the condition un-der which the value �ows qualify for a feasible execution,the analyses encode and annotate all necessary executionconstraints [29, 31] as guards on relevant edges of the VFG,following the de�nition of sequential consistency [36]. Theedges of the interference dependence act as “tunnels” to al-low the values of interests to �ow in and out of the threadscope during the on-demand value �ow searching. At thebug checking stage, C����� checks the source-sink prop-erties using the interference-aware VFG by extracting thesource-sink paths and proceeds to verify their realizability(i.e., whether violate a feasible execution) by feeding the col-lected constraints on the aggregated guards to a dedicatedSMT solver.

The noteworthy features of C����� are as follows. First,it focuses on both the graph exploration and the SMT solvinge�ort around the source-sink value �ows that only matter tothe property to be checked. Second, it integrates and solvesthe guard information that is just su�cient to characterize aninterleaving execution, judiciously delaying the disjunctivereasoning of the realizability of the vulnerable paths until

Data Dependence Analysis

VFG

Concurrent Programs

Bug Reports

An SMT Solver

Interference Dep-endence Analysis

Source-sink Checking

VFG

Figure 1. The work�ow of C�����.

the SMT solving stage. Third, the use of value �ows gener-ates concise bug reports with a limited number of relevantstatements and conditions that cause the concurrency errors,greatly helping to diagnose the root causes of the bugs.We have used C����� to check critical concurrency

memory-safety properties, such as the inter-thread use-after-free and NULL pointer dereference, on a large set of popularopen-source multi-threaded C/C++ software systems. Inter-estingly, although these software systems have been wellexamined by a crowd of both free and commercial tools, weare still able to discover over eighteen previously-unknownconcurrency bugs con�rmed by developers with fourteen ofthem �xed. Our evaluation shows thatC����� has good scal-ability as it can build a highly precise value-�ow graph up to>70⇥ and >500⇥ faster with less memory space, comparedto the state of the arts, namely S���� [61] and F��� [60].Moreover, it is able to complete the path-sensitive checkingof M�SQL (around 3 MLoC code) within approximately 2.5hours, the best in terms of scalability and precision, to thebest of our knowledge.

To sum up, the contributions of this work are as follows.• An interference-aware value-�ow analysis techniquefor modelling the checking diversi�ed concurrencybugs as source-sink reachability problems.

• A thread-modular algorithm to identifying data de-pendence and interference dependence for concurrentprograms.

• An implementation and an experiment that demon-strate C�����’s good scalability as well as precisionrelative to previous techniques.

2 C����� in a NutshellWe use a bug-free program in Fig. 2 to illustrate the conceptof the thread interference on the shared memory locations,to clarify the shortcomings of the previous work, and thento highlight the essence of our approach.The code snippet in Fig. 2(a) is bug-free, but an inter-

thread use-after-free bug can be erroneously reported byprevious techniques. In this program, the threadmain creates

Page 3: Canary: Practical Static Detection of Inter-thread Value

Canary: Practical Static Detection of Inter-thread Value-Flow Bugs PLDI ’21, June 20–25, 2021, Virtual, Canada

1. void main(int *a){2. int **x=malloc(); //!!3. *x=a;4. fork(t, thread1, x);5. if("!){6. int * c=*x;7. print(*c); 8. }9. }

10. void thread1(int **y){11. int *b=malloc(); //!"12. if(¬"!){13. *y=b;14. free(b);15. }16. }

(c) A source-sink path(a) Code snippet (b) A value-flow graph

o1

true

true

x@ℓ2

y@ℓ10

alloc_a

"!

true

"!

a@ℓ3

c@ℓ6

print(*c)

o2

¬"!

b@ℓ13

free(b)

¬"!

Partial Order Constraints

(O7>O6 ∧ O14> O13 ∧ O13> O3 ∧ O14> O3)

free(b) b c print(*c)

(O6>O13>O3) ∧ ("! ∧ ¬"!)Guard:

Figure 2. Figure (a) is a bug-free code snippet. Figure (b) is its value-�ow graph with some nodes and conditions omitted.(0;;>2_0 denoted the memory object pointed to by 0) Figure (c) is an extracted source-sink path.

a thread C that runs the function thread1. If ignoring thepath conditions, the “bug” is triggered due to the threadinterference: when the “freed” pointer 1 is stored to thememory object >1 via a pointer ~ at Line 13 in the functionthread1, propagated to the load statement at Line 6, anddereferenced eventually at Line 7. However, since the twopath conditions at Line 13 and Line 6 contradict each other,this “bug” never surfaces to bite. Previous techniques forconcurrent programs that do not understand path conditionscan report this as a false positive.

Unfortunately, the exponential number of states inducedby thread interleavings pose unique challenges to remain-ing path-sensitivity. In the value-�ow analysis frameworkof C�����, the state space of all possible interleavings isreduced via capturing the interference dependence, formally,de�ned below.

De�nition 1. A statement B1 is interference dependent on astatement B2 if a de�nition of G at B1 is used at B2, and B1 andB2 concurrently execute.

To avoid the false positive reported in Fig. 2(a), C������rst performs a thread-modular algorithm to construct thevalue-�ow graph for the program shown in Fig. 2(b), inwhich, the algorithm begins with an intra-thread analysisto resolve the data dependence relations, followed by aninter-thread interference dependence analysis. The mainidea to identify the interference dependence is to discoverthe pointers that point to the shared memory locations. Let ✓1represent the statement at Line 1. For instance, after resolv-ing the intra-thread data dependence relations (solid lines),the VFG encodes that variables G and ~ may point to theshared memory object >1, since both are reachable from >1.This enables the discovery of the interference dependenceedge (dashed line), because 1@✓13 may be stored to >1 and2@✓6 may be loaded from >1 afterwards. Next, we encodethe execution constraints on the guard of the edge that rep-resent the sequential consistency axioms for validating therealizability of value �ows (explained in § 3). The details ofthe algorithm are explained in § 4.

After constructing the VFG, to �nd this “bug”, the analysesonly need to traverse the graph starting from the node free(1)as a source to �nd the use site print(⇤2), a sink, along thedef-use relations. Fig. 2(c) shows the extracted source-sinkpath and its simpli�ed constraints.Take the edge from 1@✓13 to 2@✓6 as an example. The

guard of this edge is the conjunction of (1) the conditionsunder which the value �ows through the same memory ob-ject >1, and (2) the conditions enforcing that the value loadedfrom a load statement (✓6) is the value stored by the mostrecent store statement (✓13). The �rst conditions \1 ^ ¬\1include conjuncting the conditions for G and ~ pointing toan object >1 (CAD4), and the branch conditions of statements✓6 and ✓13 (\1 ^ ¬\1). Let $2 > $1 be the strict partial orderbetween statements ✓1 and ✓2 that indicates “✓2 is executedafter ✓1”. The second conditions$13 < $6 indicate ✓13 shouldbe executed before the ✓6. We also include load-store order$3 < $13 to ensure that another store statement at ✓3 can nothappen between lines ✓13 and ✓6, making it possible to �owfrom 1@✓13 to 2@✓6 without being overwritten by 0@✓3.Because the interference dependence relations are not

transitive [46, 47], this value-�ow path may violate the pro-gram order during searching. Our analysis lazily encodes thepartial order constraints over the path at the bug checkingstage, taking the correct program order into account. Forexample, we have intra-thread order $7 > $6 ^$14 > $13,following the order of control �ows. We also have inter-thread order $13 > $3 ^ $14 > $3, following fork/joinsynchronization semantics, since the thread C is forked at✓3 and all statements in the thread C should happen after✓3. Finally, all constraints are handed to an SMT solver, andC����� does not report the bug when the aggregated exe-cution constraints are unsatis�able. Note that a source-sinkpath with relevant statements is very useful for debugginga concurrency error. Also, the analyses skip the constraintsirrelevant to the source-sink properties, saving a signi�cantcomputational overhead.

The example above illustrates howC�����works in a nut-shell. Seemingly redundant, the order constraints, as a part

Page 4: Canary: Practical Static Detection of Inter-thread Value

PLDI ’21, June 20–25, 2021, Virtual, Canada Yuandao Cai, Peisen Yao, and Charles Zhang

of execution constraints, are important to achieve precisionby validating whether a value-�ow path is realizable at theruntime. In § 3.2, we give details of all necessary executionconstraints, following the sequential consistency axioms.

3 PreliminariesIn this section, we present the basic terminologies and no-tations, and formalize the key problem when consideringinterference dependence.

3.1 Bounded Concurrent ProgramsWe �rst make a standard assumption about the analyzedconcurrent programs. We assume that the shared memorysystem is sequentially consistent [36], where memory op-erations are executed in the order in which they appear inthe program, and every memory operation is atomic. Se-quential consistency is one of the strongest memory modelsdiscussed in the literature. This assumption matches the nat-ural expectation of programmers that a program behaves asan interleaving of the memory accesses from its constituentthreads.

Language. We formalize our approach using a simplecall-by-value language in Fig. 3, similar to the previouswork [38, 39, 55]. Note that it is known to be undecidable toanalyze arbitrary concurrent programs over in�nite threadsand data types. Our analyses gain decidability by structurallybounding the concurrent programs by unrolling both loopsand recursive functions to a �nite depth, which not onlyindirectly �xes the number of threads but also �xes the sizeof the heap that the bounded program may access. None ofthese aspects is handled by most previous static concurrentanalysis techniques [3, 12, 57, 60, 64].

Abstract Domains. The symbols and abstract domainsare listed in Fig. 4. A thread id C 2 T represents a thread thatcorresponds to a context-sensitive fork site and comprisesa �nite number of functions �+. A label ✓ 2 L indicates theprogram position of a statement in the control �ow graph(CFG). We follow the LLVM convention of separating vari-ables into two disjoint sets of top-level E 2 V and address-taken variables > 2 O as in the previous work [24, 37–39, 60].A top-level variable E can point to di�erent memory objects> on di�erent guard conditions i . The address-taken vari-ables O are indirectly accessed via load and store statements,which take the top-level pointer variables, V , as arguments.

Note that, the four-types of pointer operations in Fig. 3 arein the partial SSA form, and only the address-taken variables,O, can be shared among threads and accessed via load orstore statements. The nested pointer dereferences are elimi-nated by introducing the auxiliary variables [37], ensuringeach load and store statement is counted as at most oneshared access.

Value Flows. We say the value of a variable, 0, �ows toa variable, 1, if 0 is assigned to b directly (via assignments,

Program % := �+

Function � := func(E1, . . . , E=){(⇤; }Statement ( :=

| E1 = E2 | E1 = &E2| E1 = ⇤E2 | ⇤ E1 = E2

| E1 = E2 binop E3 | E1 = unop E2| if (E) then (1 else (2 | return (G0, . . . , G=)

| (G0, . . . , G=) = call 5 (E1, . . . , E=) | (1; (2| fork(C, 5 ) | join(C)

binop := + | � | ^ | _ | > | = | < | · · ·

unop := � | ¬ | · · ·

Figure 3. The syntax of the language.

Threads C 2 T Labels ✓ 2 L

Objects > 2 O Variables E 2 V

Figure 4. The abstract domains.

(a) (b)

Thread t1 Thread t2

3. b=*p

1. a=*p

2. *p=a

4. *p=b

Thread t1 Thread t2

3. *q=b

2. *p=a

4. c=*q

1. q = p

Figure 5. Two examples of concrete interleaving executions.

such as 1 = 0) or indirectly (via pointer dereferences, suchas ⇤~ = 0;G = ~;1 = ⇤G). The relations between 0 and1 are either data dependence or interference dependence,depending on if the value �ow happens across threads. Avalue-�ow graph can be de�ned as a directed graph. Eachnode = is indicated by E@✓ , meaning that the variable Eis de�ned or used at the program location ✓ . An edge thatindicates the value-�ow relation can be represented as atuple, (E1@✓1, E2@✓2). The guard �6D0A3 annotated on theedge indicates the condition under which the value �owhappens. A path c = hE1@✓1, . . . , E=@✓=i on the value-�owgraph is called the value-�ow path.Direct �ows can be easy to identify based on the partial

SSA form of statements. Therefore, we focus on indirect�ows from the store statements to load statements via eitherthe data dependence or the interference dependence, whichrequires the reasoning about the pointer information.

Page 5: Canary: Practical Static Detection of Inter-thread Value

Canary: Practical Static Detection of Inter-thread Value-Flow Bugs PLDI ’21, June 20–25, 2021, Virtual, Canada

3.2 The Realizable Value-�ow Path ProblemWhen considering the interference dependence, to obtaina precise value-�ow path, it is necessary to ensure that thepath can correspond to a feasible interleaving execution. Wecall this the realizable value-�ow path problem. Ideally, onewould like to compute a value-�ow path that is the mostprecise. But this is, generally, not computable [50]. We givea practical and precise solution to characterize this problem,which guides our subsequent approaches.

Formally, we use the de�nition of sequential consistency asthe axiomatic foundations to characterize a value-�ow path.Let <% be the program order.

De�nition 2. A value-�ow path c = hE1@✓1, . . . , E=@✓=i issaid to be sequentially consistent if there exists a total order <c

over the program order <% subject to the following conditions:

1. Each E8�1@✓8�1 �ows to E8@✓8 through the samememorylocation without being overwritten, where 8 2 [2,=].

2. For any ✓8 and ✓9 , if ✓8 <% ✓9 , then ✓8 <c ✓9 , i.e., <%)<c .

First of all, the imprecise dependence relations can leadto an irrealizable value-�ow path c , i.e., the stored variableE8�1@✓8�1 may not �ow to the loaded variable E8@✓8 indi-rectly as outlined in Defn. 2(1). Compared to the conven-tional value-�ow techniques for sequential programs, con-sidering the interference dependence introduces additionalchallenges. The analysis must cut through the tangle of theadditional aliasing information to reason about interferencedependence associated with the non-deterministic concur-rent environment. For example, in Fig. 5(a), the solid (dashed)lines represent possible data (interference) dependence, andthe order of statements represents a concrete execution or-der. A variable 0@✓2 can �ow to 2@✓4 only with the threadschedule, in which the statement ✓4 happens after ✓2 andthe statement ✓3 happens before ✓2. As a result, the analysisshould take the order relations between the loads and thestores into account in reasoning about alias.

Second, even if a value-�ow path c is identi�ed with pre-cise dependence relations, it may not be realizable if violating<%)<c according to Defn. 2(2). The <c represents the totalorder between each pair of successive nodes in c , while <%

represents the program order, including both the control �owand the synchronization semantics. In Fig. 5(b), a value-�owpath h0@✓2,1@✓3,1@✓4,0@✓1i is not realizable, because thevalue-�ow path leads to an invalid control-�ow path from✓2 to ✓1 where <%;<c . The innate reason is the intransitiveproperty of interference dependence [46, 47]: if a node =8 isinterference dependent on a node = 9 , and = 9 is interferencedependent on node =: , it does not directly imply an inter-ference dependence from =8 to =: . Therefore, the arbitrarycompositions of interference dependence relationships re-sult in irrealizable value-�ow paths. As a result, the analysisshould take the program order into account while searchingfor value-�ow paths.

In § 4, we present an algorithm to construct the value-�owgraph with the resolved indirect �ows from the store state-ments to the load statements. For the value �ow from@@✓1 to?@✓2 between statements ✓1 : ⇤G = @ and ✓2 : ? = ⇤~, the keyis to decide if G is an alias of~. The main idea is to capture theescaped objects �rst and then discover all pointer variablesthat can point to these objects. The analysis encodes thenecessary conditions of the sequentially consistency axiomsas guards �guard, including (1) (Memory Alias) the conditionunder which the variables G and ~ may point to the samememory object, and (2) (Load-store Order) the condition un-der which @@✓1 can �ow to ?@✓2 through the same memoryobject without being overwritten by other concurrent storestatements, following Defn. 2(1). At the bug checking stagedescribed in § 5, the partial order constraints �?> over thepath are generated lazily, taking the program order <% intoaccount that follows Defn. 2(2). In the end, the aggregatedconstraints enforcing the sequential consistency axioms, arehanded to an SMT solver to validate if the value-�ow pathenforces a feasible execution.

4 Thread-modular Dependence AnalysisThis section covers the details of the thread-modular algo-rithm, which resolves the data dependence and the inter-ference dependence relations in the VFG by the alias analy-sis. Our algorithm consists of both the intra-thread and theinter-thread data dependence analysis phases, computingthe conditions of dependence on edges as the guards.

4.1 Data Dependence AnalysisAt the stage of the intra-thread reasoning, the goal is to re-solve the intra-thread data dependence edges and computetheir guards �guard, following Defn. 2(1). Fig. 6 shows the ba-sic rules for resolving the data dependence. The core problemis to calculate indirect �ows from @@ ⇤ G = @ to ?@? = ⇤~,which needs the points-to facts of the address-taken vari-ables O. The �guard of the data dependence at this stage isthe conjunction of the conditions under which G and ~ pointto the same memory object > , together with the branch con-ditions of the statements ✓1 and ✓2. The VFG generated atthis stage serves as an input to bootstrap the interferencedependence analysis.

More speci�cally, to decompose the expensive cost of theexhaustive points-to analysis, we follow the similar ideas asour previous work [55] by performing the intra-proceduralpath-sensitive points-to analysis to resolve the local data de-pendence relationships. The points-to analysis is performedalong with a thread call graph in a bottom-up fashion, duringwhich the analyses calculate the pointer information of theaddress-taken variables, O, and resolve the local data depen-dences in the VFG. A thread call graph is an extension ofthe call graph for sequential programs by collecting the callgraphs of a group of threads at their fork sites. We next give

Page 6: Canary: Practical Static Detection of Inter-thread Value

PLDI ’21, June 20–25, 2021, Virtual, Canada Yuandao Cai, Peisen Yao, and Charles Zhang

Instruction Code Value Flow Guard �60DA3

✓,i : ? = 0;;>2_> 0;;>2_> ! ? i✓,i : ? = @ @ ! ? i✓1,i1 : ⇤G = @;✓2,i2 : ? = ⇤~

9> | (>,U) 2 %CB (G) ^ (>, V)2 %CB (~) : @ ! ?

i1 ^ i2 ^ U ^ V

Figure 6. Basic rules for data dependence.

the details of the intra-procedural points-to computation andthe method to handle side-e�ects incurred by function calls.

Intra-procedural Analysis. The intra-procedural anal-ysis is a fairly standard data-�ow analysis [24] that exploresthe intra-procedural control �ow graph in the reverse post-order while computing the points-to facts, Pts, at each state-ment and propagating these facts to the successors. We �rstperform a standard transformation (Line 3) to the function �by introducing auxiliary variables for the objects passed intothe function by references, which is applied to explicitly ex-pose the side-e�ects on the function’s parameters [38, 39, 55].Because the variables V are in SSA form, the analyses di-rectly use a global points-to graph %⌧top to hold the pointerinformation for the top-level variables,V , and use the sets�#✓ and$*)✓ to hold the propagated pointer information forthe address-taken variables, O, at the statement ✓ [24]. Dur-ing the calculation of the four-typed pointer operations (Line11-20), the points-to conditions are consecutively recordedwithout being solved aggressively. After disambiguating thepoints-to information of the address-taken variables, O, theindirect �ows can be discovered based on the rules in Fig. 6.

Procedural Transfer Function. The procedural transferfunction for a function, Trans(� ), summarizes its points-toside-e�ects (i.e., changes) between its formal-in parametersand formal-out parameters (Lines 21-22), which expresses thefunction’s behavior in terms of its input/output interfaces,abstracting away from its internal details. When analyzing acall site, the analysis reasons about the callee’s side-e�ectson actual parameters by applying its procedural transferfunction, which bears some similarities with the work [16,68]. However, there is no need for an equivalent reasoningfor the parameters passed to the fork sites (Lines 23-24), sincethe interference dependence incurred by the parameters isgoing to be identi�ed afterwards.

Example 4.1. In Fig. 2(b), the solid lines represent the datadependence identi�ed at this stage. The VFG at this stagecontains the important pointer information for the subse-quent interference dependence analysis. For instance, wecan identify the escaped objects, i.e., the shared memorylocations, by �nding the memory objects in the VFG thatare reachable from nodes in di�erent threads, and collect thepointer variables pointing to them. In the example, the anal-yses can �nd an escaped object >1 and two pointer variablesG and ~ that point to >1 by traversing the VFG.

Algorithm 1: Data Dependence AnalysisInput: A concurrent program ⇠%Output: +�⌧ with resolved intra-thread data dependence

1 ⇠⌧ ú construct the thread call graph;2 foreach function � in reverse topological order of CG do3 Transform � : func(E1, . . . , v<) ) (E1, . . . , v=)(⇤;4 ⇠�⌧ ú Build intra-procedural control �ow graph of � ;5 foreach instruction � in reverse post-order of CFG do6 HandleEachInst(� );

7 Record side-e�ects on E8 and construct Trans for � ;8 Resolve local data dependence relations in VFG;

9 return +�⌧ ;10 Procedure HandleEachInst(�):11 if ✓,i : ? = 0;;>2_> : then12 %⌧top ú {? ⇢ (i,0;;>2_>)} ; // ⇢: point to

13 if ✓,i : ? = @ : then14 PGtop ú {? ⇢ (W ^ i,>)},8(W,>) 2 %CB (@);

15 if ✓,i : ⇤G = @ then16 if Pts(G) is a singleton then17 IN✓ ú (IN✓\Pts(G));

18 $*)✓ ú IN✓ [ {Pts(G)⇢ (W ^ i,>)},8(W,>) 2 Pts(@);Propagate OUT✓ to the successor(s);

19 if ✓,i : ? = ⇤~ then20 Let Padr (>) be the points-to set of address-taken

variables > in IN✓ , where 8> 2 Pts(~);%⌧top ú {? ⇢ (W ^ i,> 0)},8(W,> 0) 2 Padr (>);

21 if ✓,i : (G0, . . . , G=) = call 5 (E1, . . . , E=) then22 PGtop ú {G8 ⇢ (i, Trans(� , Pts(E8 )))},88 2 [1,=];

23 if ✓,i : fork(C, 5 ) then24 Continue;

4.2 Interference Dependence AnalysisAt the stage of inter-thread reasoning, the goal is to iden-tify all possible interference dependence as edges in theVFG, and compute the guards qualifying for the edges. Sim-ilar to the rules in Fig. 6, the core problem is to determinethe pointer aliasing relationship of variables G and ~ for anedge of the interference dependence, (@@✓1, ?@✓2) between⇤G = @ and ? = ⇤~ across di�erent threads. Di�erent fromdata dependence, interference dependence is associated withthe concurrent environment. Because the store at ✓1 shouldhappen before the load at ✓2, we also need to enforce theorder relation between the store and the load. To sum up,the guard �guard for (@@✓1, ?@✓2) is de�ned as below:

�guard (@@✓1, ?@✓2) = �0;80B ^ �;B , (1)where �0;80B indicates the conditions of the interference

dependence through the same memory object, and �;B en-sures the order relations between stores and loads in a con-current environment (as in Defn. 2(1)). We give more detailsin § 4.2.2.

Page 7: Canary: Practical Static Detection of Inter-thread Value

Canary: Practical Static Detection of Inter-thread Value-Flow Bugs PLDI ’21, June 20–25, 2021, Virtual, Canada

4.2.1 Dependence Edges Computation. The high-levelidea underlying our approach is to identify the variablesthat point to the shared memory objects, through which thethread interference happens. Below is an extended propertyof indirect value �ows via interference dependence.

Property 1. A variable @ at one thread can �ow to a variable? at another thread indirectly via interference dependence, onlyif @ is stored to an escaped memory > �rst and ? is loaded fromthe same memory object > afterwards.

This property indicates that, a store statement ⇤G = @ and aload statement ? = ⇤~ can share an interference dependencerelation if and only if these statements reside in di�erentthreads, and G and~ point to the same escaped object > . Thus,to identify the interference dependence, it su�ces to checkload statements and store statements in di�erent threadsthat can access the same escaped objects. We develop ane�cient algorithm that discovers the set of thread escapedobjects and calculates the set of pointer variables pointingto the objects. Let EspObj be the set of escaped objects inthe program, which are the objects in the VFG that can bereachable from the nodes in multiple threads. The pointervariables that point to an object > are called the pointed-to-by set of the object, denoted by Pted(>). Given an escapedobject > 2 EspObj, we can compute Pted(>) by collectingnodes that are reachable from > in the VFG generated byAlg. 1.

Example 4.2. In Fig. 2(a), we �rst capture the escaped ob-ject >1 via the escape analysis. We can then compute thepointed-to-by sets %C43 (>1) and %C43 (>2) by traversing theVFG, which are {G,~} and {1} respectively. This informationenables the adding of an edge of interference dependence,(1@✓13, 2@✓6), in the VFG by reasoning G is an alias of ~.

A non-trivial question that immediately emerges is how tosoundly compute these two sets, EspObj and Pted, to identifyall the interference dependence edges. Note that, since theVFG is updated dynamically during the interference depen-dence analysis (i.e., new inter-thread value �ows betweenstores and loads are introduced), the set of EspObj and theirPted information are also dynamically enlarged accordingly.In the example above, after the edge (1@✓13, 2@✓6) is updatedin the VFG, the pointed-to-by set %C43 (>2) can be updatedfrom {1} to {1, 2}, considering the set of nodes that are reach-able from >2 in the VFG. Thus, %C43 (>2) increases after theedge is added. Additionally, the memory object >2 is escaped,because the >2 becomes reachable from nodes in di�erentthreads and, thus, the size of ⇢B?$1 9 also increases. To sumup, the newly introduced value-�ow edges of the interferencedependence may enlarge the sets of both escaped objectsand their pointed-to-by variables, which may, in turn, intro-duce new edges, forming a cyclic dependence problem, as alsorevealed in the previous work [52, 54].

Algorithm 2: Interference Dependence AnalysisInput: A +�⌧ generated by Alg. 1Output: An interference-aware VFG

1 EscapeAnalysis(VFG);2 do3 foreach store node ✓1,i1 : ⇤G = @ in C 2 T do4 foreach load node ✓2,i2 : ? = ⇤~ in C 0\C 2 T do5 if (G,U), (~, V) 2 Pted(>),> 2 EspObj then6 Compute the dependence guard �guard;

7 ✓1 : ⇤G = @�guard�! ✓2 : ? = ⇤~ in VFG;

8 Update EspObj and their Pted accordingly;9 Update data dependence relations;

10 while No more interference edges are introduced;11 return Interference-aware VFG;12 Procedure EscapeAnalysis(VFG):13 Initialize EspObj ; // objects passed to the fork calls

14 foreach node = in VFG do15 if = is a store node ⇤G = @ then16 Let > and > 0 be the objects G and @ point to;17 if > 2 EspObj then18 EspObj ú EspObj [ {> 0};

19 foreach escaped object > in EspObj do20 Find the set of nodes N that reachable from > ;21 Let f the aggregated guards of edges during traversal;

foreach node = in N do22 Pted(>) ú Pted(>) [ (=,f);

23 return EspObj and their Pted;

Thus, we develop an algorithm that updates the VFG byiteratively computing the escaped objects and their pointed-to-by set based on the updated edges, which terminates ata �xed point where no more value �ows can be introduced.The set of escaped objects, EspObj, and their correspondingpointed-to-by set, Pted, thus are not �nally updated. Com-pared to the previous escape analysis [11, 63], our analysescan capture the escaped objects due to the thread interactionsbetween threads. Also, unlike in the previous work [60], ouralgorithm identi�es the combined e�ects of thread interfer-ence by iteration, without using an exhaustive and expensivepointer analysis as a prior.Alg. 2 describes the key steps. Lines 12-23 present the

escape analysis. It �rst initializes the escaped objects, EspObj,by �nding the objects passed to the fork calls, and thenidenti�es the objects that are stored to an already escapedobject via store statements (Lines 14-18). Next, the algorithmcomputes the pointed-to-by sets %C43 of ⇢B?$1 9 by �ndingnodes reachable from ⇢B?$1 9 and records the aggregatedguards during the graph traversal (Lines 19-23). The guardsrepresent the conditions of the pointed-to-by relations.

Page 8: Canary: Practical Static Detection of Inter-thread Value

PLDI ’21, June 20–25, 2021, Virtual, Canada Yuandao Cai, Peisen Yao, and Charles Zhang

After the escape analysis, the algorithm identi�es the in-terference dependence based on ⇢B?$1 9 and %C43 (Lines 3-9).Speci�cally, an edge (@@✓1, ?@✓2) is identi�ed in the VFG, ifG and~ are both in the pointed-to-by set of an escaped objectand these statements ✓1 and ✓2 are in di�erent threads C and C 0(Line 7). Next, we update EspObj and their Pted based on theupdated edges (Line 8). The algorithm iteratively identi�esnew edges based on the dynamically updated EspObj andtheir Pted, which terminates until there are no more edgesintroduced. The analyses also update the intra-thread datadependence due to the side-e�ects of the identi�ed inter-ference dependence, in the measures similar to Alg. 1 (Line9), so this stage is not a strict inter-thread analysis. How-ever, the approach is thread-modular since it separates theintra-thread and inter-thread reasoning.

4.2.2 Dependence Guard Computation. We next givethe details of how to compute the guard �guard qualifying foran interference dependent edge in Alg. 2, followingDefn. 2(1).The guard �guard is the conjunction of the alias constraint�alias and the load-store constraint �;B .

Alias Constraint �0;80B . Consider a value-�ow edge(@@✓1, ?@✓2) between ✓1,i1 : ⇤G = @ and ✓2,i2 : ? = ⇤~. Thealias constraint �alias comprises two parts: the conditionsunder which G and ~ point to the same memory object > , andthe branch conditions of the statements ✓1 and ✓2. Equally, theconditions of G and ~ pointing to the same memory object> are the conditions under which the object > is pointed toby the variables G and ~ simultaneously. In Alg. 2, we knowthat the pointed-to-by conditions are U and V , which are theaggregated guards from the memory object > to the nodes Gand ~, computed while identifying the variables pointing tothe escaped objects. To sum up, the alias constraint �0;80B isi1 ^ i2 ^ U ^ V .

Example 4.3. In Fig. 2(a), the alias constraint �alias forthe edge (1@✓13, 2@✓6) �rst includes the conjunction of thebranch conditions of these two statements ✓6 : 2 = ⇤G and✓13 : ⇤~ = 1, which is \1 ^ ¬\1. It should also include theconjuncted conditions of the pointer variables G and ~ inPted(>1). By examining the aggregated guards in the VFG,we know that the condition is CAD4 . As a result, we concludethat �alias = \1 ^ ¬\1.

Load-Store Constraint �;B . According to the sequentialconsistency axioms in Defn. 2(1), if a stored variable @@✓1�ows to a loaded variable ?@✓2, we need to enforce that (1)the execution order of ✓1 should be smaller than that of theload statement ✓2, and (2) there are no other concurrent storestatements happening between them. Let B and ; representthe store at ✓1 and the load at ✓2, and let ( (;) be the set of allstores that ; is data dependent on. Formally, we use $; > $B

to denote the strict partial order between statements ; and B ,i.e., ; is executed after B . Then, for an indirect value-�ow edgefrom the store B to the load ; , the �;B is encoded basically as

€B,B0 2( (;)

$B < $;

€8B0<B

$B0 < $B _$; < $B0

!, (2)

where the constraint $B < $; ensures that the store B hap-pens before the load ; , and the constraint$B0 < $B_$; < $B0

ensures that other concurrent stores B 0 can not happen be-tween B and ; . Note that, in practice, it is unnecessary toencode some order constraints between statements in thesame thread, because we can quickly determine their orderby traversing the control �ow graph.The load-store constraints �;B between the succes-

sive nodes ✓8�1 and ✓8 over a value-�ow path c =<E1@✓1, . . . , E=@✓= > result in the total order <c . Note thatsince <c may violate the program order <% , we will intro-duce additional constraints to enforce the program order <%

(detailed in § 5.1).

Example 4.4. In Fig. 5(a), the �;B for (0@✓2, 2@✓4) is $✓2 <$✓4 ^$✓3 < $✓2 . We do not need to encode $✓3 > $✓4 , sincethe statement ✓3 can only happen before ✓4 by the control�ow information. Similarly, the �;B for (1@✓3,3@✓4) is$✓2 <$✓3 _$✓4 < $✓2 .

Summary. The thread-modular dependence analysis ise�cient because it avoids explicitly enumerating exponentialthread interleavings and generates the execution constraintswithout solving them eagerly. Instead, they are determinedaltogether at the bug checking stage. In addition, the algo-rithm iteratively identi�es interference dependence by onlyreasoning the shared memory locations, avoids using an ex-haustive and expensive pointer analysis as a prior, whileremaining path sensitive, thereby signi�cantly leading to amore e�ective algorithm than the previous [51, 54, 60].

5 Guarded Reachability DetectionOnce the interference-aware value-�ow graph is built, anumber of concurrency bugs can be reduced to solvingsource-sink reachability problems over the guarded value-�ow graph. For instance, to detect an inter-thread use-after-free bug, the source, BA2 , is a 5 A44 statement, the sink, B8=: ,is a DB4 statement, and the checking follows the value �owsthat connect BA2 to B8=: across two or more threads. In thissection, we address two problems to achieve precision ande�ciency. The �rst is how to address the program orderviolation of a value-�ow path, <%;<c , due to the intran-sitive interference dependence in Defn. 2(2). The second ishow to e�ciently solve the constraints that qualify for thevalue-�ow paths.

5.1 Partially-ordered Value Flows SearchingA value-�ow path connecting a source and a sink can be ex-tracted by searching and transitively conjuncting the edgesof both data dependence and interference dependence inthe VFG. The searching of value �ows is in similar ways

Page 9: Canary: Practical Static Detection of Inter-thread Value

Canary: Practical Static Detection of Inter-thread Value-Flow Bugs PLDI ’21, June 20–25, 2021, Virtual, Canada

to the previous work [10, 55, 61]. The intra-thread context-sensitivity is maintained using the clone-based function sum-mary [39, 55]. Formally, a value-�ow path c can be computedas follows:

c = hE1@✓1, E2@✓2, . . . , E:@✓: i,

src = E1@✓1, B8=: = E:@✓: ,

(E8@✓8 , E8+1@✓8+1) via either 33 or 83, 1 8 < : ,

(3)

where the : represents the number of nodes in the value-�ow path. We use33 and 83 to represent the data dependenceand the interference dependence relationships, respectively.Due to the intransitive property of interference depen-

dence, the orders between the successive nodes over c mayviolate the program order, i.e., <%;<c . To solve this prob-lem, the analysis has to take the correct program order intoaccount, such as the control �ows and the synchronizationconstructs (e.g., fork/join) semantics. Our main idea is tointroduce the additional partial order constraints to enforcethe program order <% . The partial order constraints, denotedas �?> , together with the aggregated guards of value �ows,are handed to an SMT solver to validate the realizability ofthe value-�ow path c .

Partial Order Constraints �?> . The partial order con-sists of both the intra-thread partial order and the inter-thread partial order. The encoding schema is similar to theprior work [30, 31]. The intra-thread partial order followsthe control �ows of each thread. For instance, consider twosuccessive nodes E1@✓1 and E2@✓2 in the same thread C . Theirpartial order constraint is written as $✓1 < $✓2 , if there is avalid control-�ow path from ✓1 to ✓2. The inter-thread par-tial order follows the synchronization semantics betweenthreads, which are related to the thread fork and join calls. A5 >A: call is the corresponding start operation of the newlyforked thread, and a 9>8= call is the exit operation of thejoined thread. Therefore, the order of the nodes before a forkoperation in the parent thread is smaller than all of the nodesin the corresponding child thread. Similarly, the order of thenodes after a join operation in the parent thread is largerthan all of the nodes in the corresponding joined thread.We encode these constraints �?> for each pair of succes-

sive nodes in a value-�ow path c . For a pair of successivenodes, the analyses generate the corresponding partial orderconstraints if they share the program order relations above.Formally, we introduce a function %$ (E8@✓8 , E8+1@✓8+1) thatreturns the program order relations between statements ✓8and ✓8+1 based on the rules above.

�po (c) =€

82 [1,:�1],

©≠´

€9 2 [8,:�1]

%$ (E8@✓8 , E 9@✓9 )™Æ¨. (4)

Note that the partial order constraints �?> do not attempt toidentify all the program orders enforced by other synchro-nization semantics like lock/unlock and wait/notify. How-ever, the framework is generic enough to allow new synchro-nization semantics to be plugged in easily.

Example 5.1. In Fig. 5(b), �?> consists of program or-der $✓1 < $✓2 ^ $✓3 < $✓4 . For a value-�ow pathh0@✓2,1@✓3,1@✓4,0@✓1i, the �;B is $✓2 < $✓3 ^ $✓3 <$✓4 ^$✓4 < $✓1 . Since $✓1 < $✓2 and $✓2 < $✓1 con�ict witheach other, this irrealizable value-�ow path can be pruned.

5.2 Constraints SolvingFinally, we encode all the constraints of a value-�ow pathinto a formula �0;; , which is the conjunction of the con-straints �6D0A3B qualifying for the def-use edges on the path,and the partial order constraints �?> enforcing Defn. 2(2).

�0;; (c) = �6D0A3B (c) ^ �?> (c) , where

�guards (c) =€

82 [1,:�1],�guard (E8@✓8 , E8+1@✓8+1) . (5)

Next, the analysis feeds the collected constraints �0;; toan SMT solver. Due to the complicated thread interference,we observe that the number of generated constraints canbe extremely large, and some of those constraints can becomplex to solve if without any optimization.

Our analysis optimizes the performance of constraint solv-ing by three means. First, when resolving the data depen-dence and interference dependence (§ 4), we follow the pre-vious work [8, 55] that uses lightweight semi-decision pro-cedures to �lter out conditions having any apparent contra-dictions. The optimization signi�cantly reduces the compu-tational and memory overhead for the subsequent analysis.Second, the constraints on di�erent source-sink paths areindependent of each other, which gives us the ability to lever-age parallelization. This capability is important given theprevalence of multi-core processors. Third, for a complexquery, our analysis can solve the constraints by the cube-and-conquer parallel SMT solving strategy [26].

Summary. There are two salient advantages to aggregateguards qualifying for value-�ow paths and solve these con-straints from di�erent analysis domains together. First, acombined domain allows a more precise solution than oneobtained by solving each domain separately [14]. Second,the better scalability is achieved by judiciously delaying thedisjunctive reasoning of the realizability of the vulnerablepaths until the phase of source-sink checking. Speci�cally,we avoid explicit case-splitting over the feasible thread in-terference with the exhaustive points-to information. Ourapproach is more scalable, since the SMT solver typicallybypasses the analysis of all cases to prove a constraint satis-�able or unsatis�able.

6 ImplementationWe have implemented C����� on top of the LLVM com-piler infrastructure and the Z3 SMT solver [15]. While thelanguage in Fig. 3 has restricted language constructs, ourimplementations support the most features of C/C++, such

Page 10: Canary: Practical Static Detection of Inter-thread Value

PLDI ’21, June 20–25, 2021, Virtual, Canada Yuandao Cai, Peisen Yao, and Charles Zhang

as classes, dynamic memory allocation, references, and vir-tual functions. Arrays are considered monolithic. Currently,C����� supports POSIX thread in C and std::thread in C++.

Performance. When analyzing the interference depen-dence in Alg. 2, we have implemented a may-happen-in-parallel (MHP) analysis to prune the non-interference loadand store statements for e�ciency. Speci�cally, if a load state-ment and a store statement in two threads do not share MHPrelations, by Defn. 1, it is impossible for them to share aninterference dependence relation, thereby avoiding unneces-sary reasoning. The MHP analysis itself has been studied fordecades, and many highly scalable algorithms can be directlyleveraged, which is orthogonal to our Alg. 2 but importantto achieve high e�ciency.Practical programs often make asynchronous fork/join

calls via function pointers. Thus, it is impossible to constructthread call graphs directly from the syntactic program de-scription without analyzing function pointers. As shown inthe previous work [25, 44], a precise call graph for C-likeprograms can be constructed only using the �ow-insensitiveanalysis. Thus, we use Steensgaard’s analysis for the threadcall graph construction, which can be completed in almostlinear time [59]. We also adopt the class hierarchy analysisto resolve the virtual function calls.

Soundiness. We follow the previous bug-�nding tech-niques [55, 65] to engineer a soundy [41] bug detector, whichmeans C����� is designed to �nd as many bugs as possiblewith low false positives, potentially at the cost of missingsome bugs. C����� handles most language features in asound manner, while it also applies some unsound choices asin the previous work [1, 55, 65]. For instance, we unroll eachloop twice on the control �ow graph, ignore inline assembly,and manually model parts of the standard C/C++ libraries.Besides, we assume the distinct parameters of a function donot alias with each other.

7 EvaluationWe evaluate the e�ciency and the e�ectiveness of C�����in constructing the interference-aware value-�ow graph anddetecting concurrency bugs, by comparing it against twomost related static tools, namely S���� [61] and F��� [60].The subjects we adopt include twenty real-life open-

source C/C++ projects such as F���F��, M����DB, andM�SQL, which are commonly used in the concurrency analy-sis literature. Many of these projects are regularly and exten-sively scanned by the commercial tools as well as academictools and, thus, are expected to have very high code quality.The sizes of these projects range from a few thousand LoCto close to nine million, with 936KLoC on average.C����� is quite promising: it can �nish a path-sensitive

scan for about nine million lines of code in just 4.67 hours,with an average false positive rate around 26.67%. At the time

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Tim

e (lo

g sc

ale)

Subject ID (ordered by program size)

12 hr

(a) Time cost.

0

50

100

150

200

250

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Mem

ory

(G)

Subject ID (ordered by the program size)

Saber Fsam Canary

(b) Memory cost.

Figure 7.Time andmemory cost for building the VFG: S����v.s. F��� v.s. C�����.

of writing, it has found eighteen previously unknown con-currency bugs that have been con�rmed by the developerswith fourteen already �xed. This performance and precisionare aligned with its design objectives and the common in-dustrial requirement of detecting millions-of-LoC code in5-10 hours with less than 30% false positives [5].

All the experiments were performed on a computer withtwo 20 core Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHzand 256GB physical memory running Ubuntu-16.04.

7.1 Comparison in Value-�ow Graph ConstructionFirst, we evaluate C�����’s thread-modular data depen-dence analysis by comparing against two recent pointer-analysis-based techniques, S���� [61] and F��� [60]. S����performs an Andersen-style, �ow-insensitive points-to anal-ysis, which can trivially model the thread interference. F���is an Andersen-style, �ow-sensitive pointer analysis formulti-threaded programs. They both perform an exhaustivepoints-to analysis to build the value-�ow graph. To the bestof our knowledge, these are the most precise and e�cienttools for C/C++ multi-threaded programs we can get ourhands on. We con�gure S���� and F��� with their defaultsettings. The timeout is set to twelve hours and the numberof running threads is set to 1.Fig. 7 shows the comparison of time and memory costs

among C�����, S����, and F��� for constructing the value-�ow graph. We can observe that C����� is much morescalable than S���� and F���. C����� can �nish scanningall projects, but S���� times out on 9 projects and F��� times

Page 11: Canary: Practical Static Detection of Inter-thread Value

Canary: Practical Static Detection of Inter-thread Value-Flow Bugs PLDI ’21, June 20–25, 2021, Virtual, Canada

y = 0.0326x + 25.403R² = 0.8302

y = 0.0193x + 18.293R² = 0.7796

0

100

200

300

400

0 2000 4000 6000 8000 10000

Tim

e (m

in) \

Mem

ory

(G)

Time Memory Linear (Time) Linear (Memory)

Figure 8. Scalability of C����� for bug hunting. The G-axisstands for the size of the project (KLoC) and the~-axis standsfor the time cost (minutes) or the memory cost (GB).

out on 15 projects. To be speci�c, when the code size is largerthan 50KLoC and 100KLoC, respectively, F��� and S����always run out of the time budget. In comparison, C�����can �nish building the value-�ow graph for 16 projects inless than one hour. On average, C����� is >15⇥ and 180⇥faster than S���� and F���, respectively. At most, C�����is up to >70⇥ and >500⇥ faster than S���� and F���.

Besides, C����� requires signi�cantly less memory com-pared to S���� and F���, as shown in Fig. 7b. For the subjectslarger than 100 KLoC, S���� uses 130G additional memorycompared to C�����. For the subjects larger than 50 KLoC,F��� uses almost 200G additional memory, but it is stillunable to �nish building VFG for these subjects. Since ouralgorithm does not need to compute the exhaustive points-toresults, C����� is much more e�cient.

7.2 Comparison in Concurrency bug DetectionWe evaluate the scalability and precision of C����� in thewhole process of VFG construction and concurrency bugdetection. The number of nested levels of calling context isset to six. For scalability, we do not list the runtime and mem-ory consumptions of S���� and F��� in the bug checking,because we have shown that they su�er from the scalabilityproblem in the VFG construction. For precision, we comparethe false positive rates among C�����, F���, and S����.We planned to evaluate other bug-�nding detection tools,such as R���� [12] and DCUAF [2]. However, they are eitherpublicly unavailable or outdated for running in the envi-ronments we are able to set up. Among the inter-threadvalue-�ow bugs, we choose to check inter-thread use-after-free, as it has become the most exploited memory errors,leading to many zero-day serious vulnerabilities [28, 67].

Scalability. To study the observed time and memory com-plexity of C�����, we adopt the curve �tting method. Fig. 8reveals the �tting curves and their coe�cients of determi-nation '2. The '2

2 [0, 1] depicts the statistical measure ofhow close the data are to the �tting curve. The closer '2

is to 1, the better the �tting curve is. The �gure suggeststhat C�����’s time and memory cost grow almost linearly

Table 1. Results of bug hunting.Project Size (KLoC) S���� F��� C�����

FP Rate #Reports FP Rate #Reports #FP #Reports1. lrzip 16 96.82% 63 93.75% 32 0 22. lwan 20 98.87% 89 100% 44 0 13. leveldb 21 100% 0 100% 0 1 14. darknet 29 100% 3,636 100% 144 0 05. coturn 39 100% 1,477 100% 368 0 26. httrack 49 100% 134 NA NA 1 17. �nedb 51 100% 421 NA NA 0 18. tcpdump 85 100% 0 NA NA 0 09. transmission 88 99.33% 299 NA NA 0 210. celix 107 100% 3,782 NA NA 0 011. redis 219 100% 0 NA NA 0 012. git 239 NA NA NA NA 0 013. zfs 367 NA NA NA NA 0 114. HP-Socket 426 NA NA NA NA 0 015. openssl 451 NA NA NA NA 1 116. poco 705 NA NA NA NA 0 017. mariadb 1,751 NA NA NA NA 0 118. �mpeg 2,003 NA NA NA NA 0 019. mysql 3,118 NA NA NA NA 0 020. �refox 8,938 NA NA NA NA 1 2NA means the analysis runs out of the time budget (12 hours).

with '2 around 0.8, and, thus, can scale up quite elegantly.Speci�cally, C����� can �nish checkingM�SQL (3 MLoC)and F���F�� (9 MLoC) in approximately 2.5 hours and 4.67hours, respectively.

Precision. C����� reports �fteen inter-thread use-after-free bugs with a false positive rate of 26.67%. All of thetrue positives are previously unknown and have been con-�rmed by our carefully checking manually. A stark contrastis that,C����� generates very few reports as shown in Tbl. 1,whereas S���� and F��� report nearly 9,896 and 586 warn-ings, respectively. Since we are unable to manually inspectall of them, we randomly sample a hundred warnings forinspection if a project generates more than 100 warnings.Unfortunately, S���� and F��� only found �ve and two truepositives after the manual �ltering, which were all found byC�����. However, because we can miss true bugs duringmanual checking the warnings found by F��� and S���� dueto subjectivity, we plan to make the implementation of inter-thread use-after-free checkers of F��� and S���� availablefor the interested readers to examine. Concisely speaking,C����� is more precise because it builds more precise de-pendence relations and only takes the feasible interleavingexecutions into account.

7.3 Detected Real Concurrency BugsWe have been using C����� to scan open-source projectsextensively and continuously. At the time of writing, therewere already eighteen concurrency bugs from dozens ofopen-source projects con�rmed by developers, includingthe famous software systems such as �������, �������, and������������. Among the eighteen con�rmed bugs, four-teen bugs have been �xed in the recent release versions ofthe software. A few vulnerabilities have been discovered onthe key modules of the software and immediately �xed afterbeing reported.

Page 12: Canary: Practical Static Detection of Inter-thread Value

PLDI ’21, June 20–25, 2021, Virtual, Canada Yuandao Cai, Peisen Yao, and Charles Zhang

C����� found an old and latent inter-thread UAF vulner-ability in ������������, missed by the developers, users, re-gression testing, and prior static analysis techniques. ������������� is a popular BitTorrent client widely used by manyUnix and Linux distributions including Ubuntu and Solaris.We retrospectively searched the commit history and foundthis bug has remained in concealment for almost eight years.C����� can detect bugs of high complexity for which

the original developers had to spend a considerable amountof time to con�rm and even complained of the di�cultyof �xing. For example, C����� detected an inter-threaduse-after-free vulnerability in F���F�� and L����DB, respec-tively. The control-�ow paths of these two bugs span acrossseveral functions and compiling units and the bugs may onlybe triggered in the rare thread schedules. The con�rmationof the bugs was a team e�ort, through rounds of rejectionsbefore the �nal con�rmation.

8 Related WorkAnalysis of Concurrent Programs. The conventional

data-�ow analysis alternates between reasoning over intra-thread and inter-thread semantics, which is inherently waste-ful since the analysis engine has to repetitively conduct rea-soning over the similar intra-thread program regions [18].Although many proposed algorithms account for the datadependence between the interleaved threads, the data-�owanalysis itself still needs to propagate data-�ow facts fol-lowing intra-thread control �ows [12, 17, 35, 60]. An earlypointer analysis algorithm [54] for Clik programs only tar-gets structured parbegin/parend parallelism, which solvesdata-�ow problems and discovers the thread interactionsby repeatedly reanalyzing each thread in each new analysiscontext until reaching a �xed point. However, non-lexically-scoped parallelism such as the fork/join model can signi�-cantly complicate the analysis. R���� [12] resorts to a datarace detection engine to ensure the soundness of the sequen-tial analysis by invalidating the data-�ow facts in�uencedby concurrent writes. F��� [60] follows the pre-computedthread-aware def-use chains to conduct pointer analysis. Incomparison, C����� does not need an exhaustive Andersen-style pointer analysis as a prior and is also more precise thanF���. To sum up, C����� checks bugs only following dataand interference dependence in a value �ow graph sparsely,thereby achieving more e�ciency [55].Model checking for concurrent programs has a long

history. The key task is to identify redundant thread in-terleavings. A classical approach is partial order redunc-tion [13, 20, 22]. Huang [27] proposes MCR (Maximal causal-ity reduction) that captures the value of writes and reads inan execution trace to predict new traces. However, previousapproaches target at the stateless model checking that couldbe considered as a form of systematic testing with a �xedinput, as opposed to static analysis. C����� has a �avor of

bounded model checking in the sense that it checks bugs assource-sink problems by exploring a value-�ow graph underthe �nite unrolling of loops.Symbolic execution, systematically exploring all feasible

execution paths, also su�ers seriously from state space ex-plosion when being applied to multi-threaded programs [4,23, 57]. Our approach allows the reduction of the redundantthread interleaving by checking value �ows (a.k.a data depen-dence) without explicitly enumerating thread interleavings.

Concurrency Bug Detection. Concurrency bugs suchas deadlock and race-like bugs are among the most notoriousdefects to deal with [42, 67]. Yang et al. [67] reveal that manyconcurrency vulnerabilities can be exploited to conduct se-vere attacks, such as privilege escalation and malicious codeexecution. This paper aims at detecting a category of concur-rency bugs that could be expressed as source-sink properties.In the context of static approaches, much work on detectingdata races has been done [6, 33, 40, 45, 62]. However, only10% of true data races are harmful while most of them arebenign [48]. O�� [69] relies on various data race detectorsto disclose all possible racy program spots and further ex-amine concurrency bugs. This work discovers 36 con�rmedconcurrency bugs among all 24,645 reported races in theLinux kernel. DCUAF [2] detects inter-thread UAF speci�-cally for Linux drivers with a new method to �nd concurrentinterface functions. Dynamic approaches typically includesystematic testing [9, 10, 32, 66] or predictive analysis. Forpredictive analysis, they are based on approaches includ-ing the SMT-based methods [19, 27–29], partial-order-basedmethods [34, 43, 49, 53, 58], and distance-based exchange-able events [7] to predict additional executions from a singleone. However, they can only predict other thread schedulesunder a speci�c input.

9 ConclusionWe have described C�����, our novel, principled approachto conducting interference-aware value-�ow analysis forchecking inter-thread value-�ow bugs, which achieves bothgood precision and scalability for millions of lines of code.C����� is promising, having already pinpointed over eigh-teen previously-unknown concurrency bugs on a dozen ofwell-known open-source C/C++ projects.

Future work includes (1) enhancements to supportmore synchronization semantics like lock/unlock and sig-nal/notify; (2) extension to relaxed memory models such asTSO/PSO; and (3) deployment of customized decision proce-dures to improve the e�ciency of constraint solving.

AcknowledgmentsWe thank the anonymous reviewers and the shepherd fortheir insightful comments. This work was partially fundedby RGC16206517, ITS/440/18FP, Microsoft Donation, andHuawei Donation. Peisen Yao is the corresponding author.

Page 13: Canary: Practical Static Detection of Inter-thread Value

Canary: Practical Static Detection of Inter-thread Value-Flow Bugs PLDI ’21, June 20–25, 2021, Virtual, Canada

References[1] Domagoj Babic and Alan J. Hu. 2008. Calysto: Scalable and Precise

Extended Static Checking. In Proceedings of the 30th InternationalConference on Software Engineering (Leipzig, Germany) (ICSE ’08). As-sociation for Computing Machinery, New York, NY, USA, 211–220.h�ps://doi.org/10.1145/1368088.1368118

[2] Jia-Ju Bai, Julia Lawall, Qiu-Liang Chen, and Shi-Min Hu. 2019. Ef-fective Static Analysis of Concurrency Use-After-Free Bugs in LinuxDevice Drivers. In 2019 USENIX Annual Technical Conference, USENIXATC 2019, Renton, WA, USA, July 10-12, 2019, Dahlia Malkhi and DanTsafrir (Eds.). USENIX Association, 255–268. h�ps://www.usenix.org/conference/atc19/presentation/bai

[3] Josh Berdine, Cristiano Calcagno, Byron Cook, Dino Distefano, Pe-ter W. O’Hearn, Thomas Wies, and Hongseok Yang. 2007. ShapeAnalysis for Composite Data Structures. In Computer Aided Veri�ca-tion, 19th International Conference, CAV 2007, Berlin, Germany, July3-7, 2007, Proceedings (Lecture Notes in Computer Science, Vol. 4590),Werner Damm and Holger Hermanns (Eds.). Springer, 178–192. h�ps://doi.org/10.1007/978-3-540-73368-3_22

[4] Tom Bergan, Dan Grossman, and Luis Ceze. 2014. Symbolic exe-cution of multithreaded programs from arbitrary program contexts.In Proceedings of the 2014 ACM International Conference on ObjectOriented Programming Systems Languages & Applications, OOPSLA2014, part of SPLASH 2014, Portland, OR, USA, October 20-24, 2014, An-drew P. Black and Todd D. Millstein (Eds.). ACM, 491–506. h�ps://doi.org/10.1145/2660193.2660200

[5] Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, SethHallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, and DawsonEngler. 2010. A Few Billion Lines of Code Later: Using Static Analysisto Find Bugs in the Real World. Commun. ACM 53, 2 (Feb. 2010), 66–75.h�ps://doi.org/10.1145/1646353.1646374

[6] Sam Blackshear, Nikos Gorogiannis, Peter W. O’Hearn, and IlyaSergey. 2018. RacerD: Compositional Static Race Detection. Proc.ACM Program. Lang. 2, OOPSLA, Article 144 (Oct. 2018), 28 pages.h�ps://doi.org/10.1145/3276514

[7] Yan Cai, Biyun Zhu, Ruijie Meng, Hao Yun, Liang He, Purui Su, andBin Liang. 2019. Detecting Concurrency Memory Corruption Vul-nerabilities. In Proceedings of the 2019 27th ACM Joint Meeting onEuropean Software Engineering Conference and Symposium on the Foun-dations of Software Engineering (Tallinn, Estonia) (ESEC/FSE 2019).Association for Computing Machinery, New York, NY, USA, 706–717.h�ps://doi.org/10.1145/3338906.3338927

[8] Satish Chandra, Stephen J. Fink, and Manu Sridharan. 2009. Snug-glebug: A Powerful Approach to Weakest Preconditions. In Proceed-ings of the 30th ACM SIGPLAN Conference on Programming Lan-guage Design and Implementation (Dublin, Ireland) (PLDI ’09). As-sociation for Computing Machinery, New York, NY, USA, 363–374.h�ps://doi.org/10.1145/1542476.1542517

[9] Hongxu Chen, Shengjian Guo, Yinxing Xue, Yulei Sui, Cen Zhang,Yuekang Li, Haijun Wang, and Yang Liu. 2020. MUZZ: Thread-awareGrey-box Fuzzing for E�ective Bug Hunting in Multithreaded Pro-grams. In 29th USENIX Security Symposium, USENIX Security 2020,August 12-14, 2020, Srdjan Capkun and Franziska Roesner (Eds.).USENIX Association, 2325–2342. h�ps://www.usenix.org/conference/usenixsecurity20/presentation/chen-hongxu

[10] Sigmund Cherem, Lonnie Princehouse, and Radu Rugina. 2007. Prac-tical Memory Leak Detection Using Guarded Value-Flow Analysis.In Proceedings of the 28th ACM SIGPLAN Conference on ProgrammingLanguage Design and Implementation (San Diego, California, USA)(PLDI ’07). Association for Computing Machinery, New York, NY, USA,480–491. h�ps://doi.org/10.1145/1250734.1250789

[11] Jong-Deok Choi, Manish Gupta, Mauricio Serrano, Vugranam C. Sreed-har, and Sam Midki�. 1999. Escape Analysis for Java. In Proceedings ofthe 14th ACM SIGPLAN Conference on Object-Oriented Programming,

Systems, Languages, and Applications (Denver, Colorado, USA) (OOP-SLA ’99). Association for Computing Machinery, New York, NY, USA,1–19. h�ps://doi.org/10.1145/320384.320386

[12] Ravi Chugh, Jan W. Voung, Ranjit Jhala, and Sorin Lerner. 2008.Data�ow Analysis for Concurrent Programs Using Datarace Detection.In Proceedings of the 29th ACM SIGPLAN Conference on ProgrammingLanguage Design and Implementation (Tucson, AZ, USA) (PLDI ’08).Association for Computing Machinery, New York, NY, USA, 316–326.h�ps://doi.org/10.1145/1375581.1375620

[13] Edmund M. Clarke, Orna Grumberg, Marius Minea, and Doron A.Peled. 1999. State Space Reduction Using Partial Order Techniques.Int. J. Softw. Tools Technol. Transf. 2, 3 (1999), 279–287. h�ps://doi.org/10.1007/s100090050035

[14] Patrick Cousot and Radhia Cousot. 1979. Systematic Design of ProgramAnalysis Frameworks. In Proceedings of the 6th ACM SIGACT-SIGPLANSymposium on Principles of Programming Languages (San Antonio,Texas) (POPL ’79). Association for Computing Machinery, New York,NY, USA, 269–282. h�ps://doi.org/10.1145/567752.567778

[15] Leonardo Mendonça de Moura and Nikolaj Bjørner. 2008. Z3: AnE�cient SMT Solver. In Tools and Algorithms for the Construction andAnalysis of Systems, 14th International Conference, TACAS 2008, Heldas Part of the Joint European Conferences on Theory and Practice of Soft-ware, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008. Proceed-ings (Lecture Notes in Computer Science, Vol. 4963), C. R. Ramakrishnanand Jakob Rehof (Eds.). Springer, 337–340. h�ps://doi.org/10.1007/978-3-540-78800-3_24

[16] Isil Dillig, Thomas Dillig, Alex Aiken, and Mooly Sagiv. 2011. Preciseand compact modular procedure summaries for heap manipulatingprograms. In Proceedings of the 32nd ACM SIGPLAN Conference onProgramming Language Design and Implementation, PLDI 2011, SanJose, CA, USA, June 4-8, 2011, Mary W. Hall and David A. Padua (Eds.).ACM, 567–577. h�ps://doi.org/10.1145/1993498.1993565

[17] Azadeh Farzan, Zachary Kincaid, and Andreas Podelski. 2013. In-ductive data �ow graphs. In The 40th Annual ACM SIGPLAN-SIGACTSymposium on Principles of Programming Languages, POPL ’13, Rome,Italy - January 23 - 25, 2013, Roberto Giacobazzi and Radhia Cousot(Eds.). ACM, 129–142. h�ps://doi.org/10.1145/2429069.2429086

[18] Azadeh Farzan and P. Madhusudan. 2007. Causal Data�ow Analysisfor Concurrent Programs. In Tools and Algorithms for the Constructionand Analysis of Systems, 13th International Conference, TACAS 2007,Held as Part of the Joint European Conferences on Theory and Practice ofSoftware, ETAPS 2007 Braga, Portugal, March 24 - April 1, 2007, Proceed-ings (Lecture Notes in Computer Science, Vol. 4424), Orna Grumberg andMichael Huth (Eds.). Springer, 102–116. h�ps://doi.org/10.1007/978-3-540-71209-1_10

[19] Azadeh Farzan, P. Madhusudan, Niloofar Razavi, and Francesco Sor-rentino. 2012. Predicting Null-Pointer Dereferences in ConcurrentPrograms. In Proceedings of the ACM SIGSOFT 20th International Sympo-sium on the Foundations of Software Engineering (Cary, North Carolina)(FSE ’12). Association for Computing Machinery, New York, NY, USA,Article 47, 11 pages. h�ps://doi.org/10.1145/2393596.2393651

[20] Cormac Flanagan and Patrice Godefroid. 2005. Dynamic partial-orderreduction for model checking software. In Proceedings of the 32nd ACMSIGPLAN-SIGACT Symposium on Principles of Programming Languages,POPL 2005, Long Beach, California, USA, January 12-14, 2005, JensPalsberg and Martín Abadi (Eds.). ACM, 110–121. h�ps://doi.org/10.1145/1040305.1040315

[21] Malay Ganai, Dongyoon Lee, and Aarti Gupta. 2012. DTAM: Dy-namic Taint Analysis of Multi-Threaded Programs for Relevancy. InProceedings of the ACM SIGSOFT 20th International Symposium on theFoundations of Software Engineering (Cary, North Carolina) (FSE ’12).Association for Computing Machinery, New York, NY, USA, Article46, 11 pages. h�ps://doi.org/10.1145/2393596.2393650

[22] Patrice Godefroid. 1997. VeriSoft: A Tool for the Automatic Analysisof Concurrent Reactive Software. In Computer Aided Veri�cation, 9th

Page 14: Canary: Practical Static Detection of Inter-thread Value

PLDI ’21, June 20–25, 2021, Virtual, Canada Yuandao Cai, Peisen Yao, and Charles Zhang

International Conference, CAV ’97, Haifa, Israel, June 22-25, 1997, Pro-ceedings (Lecture Notes in Computer Science, Vol. 1254), Orna Grumberg(Ed.). Springer, 476–479. h�ps://doi.org/10.1007/3-540-63166-6_52

[23] Shengjian Guo, Markus Kusano, Chao Wang, Zijiang Yang, and AartiGupta. 2015. Assertion guided symbolic execution of multithreadedprograms. In Proceedings of the 2015 10th Joint Meeting on Foundationsof Software Engineering, ESEC/FSE 2015, Bergamo, Italy, August 30 - Sep-tember 4, 2015, Elisabetta Di Nitto, Mark Harman, and Patrick Heymans(Eds.). ACM, 854–865. h�ps://doi.org/10.1145/2786805.2786841

[24] Ben Hardekopf and Calvin Lin. 2009. Semi-Sparse Flow-SensitivePointer Analysis. In Proceedings of the 36th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (Savan-nah, GA, USA) (POPL ’09). Association for Computing Machinery, NewYork, NY, USA, 226–238. h�ps://doi.org/10.1145/1480881.1480911

[25] Ben Hardekopf and Calvin Lin. 2011. Flow-sensitive pointer analy-sis for millions of lines of code. In Proceedings of the CGO 2011, The9th International Symposium on Code Generation and Optimization,Chamonix, France, April 2-6, 2011. IEEE Computer Society, 289–298.h�ps://doi.org/10.1109/CGO.2011.5764696

[26] Marijn JH Heule, Oliver Kullmann, and Armin Biere. 2018. Cube-and-conquer for satis�ability. In Handbook of Parallel Constraint Reasoning.Springer, 31–59.

[27] Je� Huang. 2015. Stateless model checking concurrent programs withmaximal causality reduction. In Proceedings of the 36th ACM SIGPLANConference on Programming Language Design and Implementation, Port-land, OR, USA, June 15-17, 2015, David Grove and Steve Blackburn(Eds.). ACM, 165–174. h�ps://doi.org/10.1145/2737924.2737975

[28] Je� Huang. 2018. UFO: Predictive Concurrency Use-after-Free Detec-tion. In Proceedings of the 40th International Conference on SoftwareEngineering (Gothenburg, Sweden) (ICSE ’18). Association for Com-puting Machinery, New York, NY, USA, 609–619. h�ps://doi.org/10.1145/3180155.3180225

[29] Je� Huang, Patrick O’Neil Meredith, and Grigore Rosu. 2014. MaximalSound Predictive Race Detection with Control Flow Abstraction. InProceedings of the 35th ACM SIGPLAN Conference on ProgrammingLanguage Design and Implementation (Edinburgh, United Kingdom)(PLDI ’14). Association for Computing Machinery, New York, NY, USA,337–348. h�ps://doi.org/10.1145/2594291.2594315

[30] Je� Huang and Charles Zhang. 2012. LEAN: Simplifying Concur-rency Bug Reproduction via Replay-Supported Execution Reduction.In Proceedings of the ACM International Conference on Object OrientedProgramming Systems Languages and Applications (Tucson, Arizona,USA) (OOPSLA ’12). Association for Computing Machinery, New York,NY, USA, 451–466. h�ps://doi.org/10.1145/2384616.2384649

[31] Je� Huang, Charles Zhang, and Julian Dolby. 2013. CLAP: recordinglocal executions to reproduce concurrency failures. In ACM SIGPLANConference on Programming Language Design and Implementation, PLDI’13, Seattle, WA, USA, June 16-19, 2013, Hans-Juergen Boehm and Cor-mac Flanagan (Eds.). ACM, 141–152. h�ps://doi.org/10.1145/2491956.2462167

[32] D. R. Jeong, K. Kim, B. Shivakumar, B. Lee, and I. Shin. 2019. Razzer:Finding Kernel Race Bugs through Fuzzing. In 2019 IEEE Symposiumon Security and Privacy (SP). 754–768. h�ps://doi.org/10.1109/SP.2019.00017

[33] Vineet Kahlon, Nishant Sinha, Erik Kruus, and Yun Zhang. 2009. StaticData Race Detection for Concurrent Programs with AsynchronousCalls. In Proceedings of the 7th Joint Meeting of the European Soft-ware Engineering Conference and the ACM SIGSOFT Symposium on TheFoundations of Software Engineering (Amsterdam, The Netherlands)(ESEC/FSE ’09). Association for Computing Machinery, New York, NY,USA, 13–22. h�ps://doi.org/10.1145/1595696.1595701

[34] Dileep Kini, UmangMathur, and Mahesh Viswanathan. 2017. Dynamicrace prediction in linear time. In Proceedings of the 38th ACM SIGPLAN

Conference on Programming Language Design and Implementation, PLDI2017, Barcelona, Spain, June 18-23, 2017, Albert Cohen and Martin T.Vechev (Eds.). ACM, 157–170. h�ps://doi.org/10.1145/3062341.3062374

[35] Markus Kusano and Chao Wang. 2016. Flow-Sensitive Composition ofThread-Modular Abstract Interpretation. In Proceedings of the 2016 24thACM SIGSOFT International Symposium on Foundations of SoftwareEngineering (Seattle, WA, USA) (FSE 2016). Association for ComputingMachinery, New York, NY, USA, 799–809. h�ps://doi.org/10.1145/2950290.2950291

[36] Leslie Lamport. 1979. How to Make a Multiprocessor Com-puter That Correctly Executes Multiprocess Programs. IEEETransactions on Computers C-28 9 (September 1979), 690–691.h�ps://www.microso�.com/en-us/research/publication/make-multiprocessor-computer-correctly-executes-multiprocess-programs/

[37] Ondrej Lhoták and Kwok-Chiang Andrew Chung. 2011. Points-toAnalysis with E�cient Strong Updates. In Proceedings of the 38th An-nual ACM SIGPLAN-SIGACT Symposium on Principles of ProgrammingLanguages (Austin, Texas, USA) (POPL ’11). Association for Comput-ing Machinery, New York, NY, USA, 3–16. h�ps://doi.org/10.1145/1926385.1926389

[38] Lian Li, Cristina Cifuentes, and Nathan Keynes. 2011. Boosting thePerformance of Flow-Sensitive Points-to Analysis Using Value Flow. InProceedings of the 19th ACM SIGSOFT Symposium and the 13th EuropeanConference on Foundations of Software Engineering (Szeged, Hungary)(ESEC/FSE ’11). Association for Computing Machinery, New York, NY,USA, 343–353. h�ps://doi.org/10.1145/2025113.2025160

[39] Lian Li, Cristina Cifuentes, and Nathan Keynes. 2013. Precise andScalable Context-Sensitive Pointer Analysis via Value Flow Graph. InProceedings of the 2013 International Symposium on Memory Manage-ment (Seattle, Washington, USA) (ISMM ’13). Association for Comput-ing Machinery, New York, NY, USA, 85–96. h�ps://doi.org/10.1145/2464157.2466483

[40] Bozhen Liu and Je�Huang. 2018. D4: fast concurrency debugging withparallel di�erential analysis. In Proceedings of the 39th ACM SIGPLANConference on Programming Language Design and Implementation, PLDI2018, Philadelphia, PA, USA, June 18-22, 2018, Je�rey S. Foster and DanGrossman (Eds.). ACM, 359–373. h�ps://doi.org/10.1145/3192366.3192390

[41] Benjamin Livshits, Dimitrios Vardoulakis, Manu Sridharan, YannisSmaragdakis, Ondřej Lhoták, José Amaral, Bor-Yuh Chang, SamuelGuyer, Uday Khedker, and Anders Møller. 2015. In Defense of Soundi-ness: A Manifesto. Commun. ACM 58 (01 2015), 44–46. h�ps://doi.org/10.1145/2644805

[42] Shan Lu, Soyeon Park, Eunsoo Seo, and Yuanyuan Zhou. 2008. Learn-ing from Mistakes: A Comprehensive Study on Real World Concur-rency Bug Characteristics. In Proceedings of the 13th InternationalConference on Architectural Support for Programming Languages andOperating Systems (Seattle, WA, USA) (ASPLOS XIII). Associationfor Computing Machinery, New York, NY, USA, 329–339. h�ps://doi.org/10.1145/1346281.1346323

[43] Umang Mathur, Dileep Kini, and Mahesh Viswanathan. 2018. Whathappens-after the �rst race? enhancing the predictive power ofhappens-before based dynamic race detection. Proc. ACM Program.Lang. 2, OOPSLA (2018), 145:1–145:29. h�ps://doi.org/10.1145/3276515

[44] Ana Milanova, Atanas Rountev, and Barbara G. Ryder. 2004. PreciseCall Graphs for C Programs with Function Pointers. Autom. Softw.Eng. 11, 1 (2004), 7–26. h�ps://doi.org/10.1023/B:AUSE.0000008666.56394.a1

[45] Mayur Naik, Alex Aiken, and John Whaley. 2006. E�ective static racedetection for Java. In Proceedings of the ACM SIGPLAN 2006 Conferenceon Programming Language Design and Implementation, Ottawa, Ontario,Canada, June 11-14, 2006, Michael I. Schwartzbach and Thomas Ball(Eds.). ACM, 308–319. h�ps://doi.org/10.1145/1133981.1134018

Page 15: Canary: Practical Static Detection of Inter-thread Value

Canary: Practical Static Detection of Inter-thread Value-Flow Bugs PLDI ’21, June 20–25, 2021, Virtual, Canada

[46] Mangala Gowri Nanda and S. Ramesh. 2000. Slicing Concurrent Pro-grams. In Proceedings of the 2000 ACM SIGSOFT International Sympo-sium on Software Testing and Analysis (Portland, Oregon, USA) (IS-STA ’00). Association for Computing Machinery, New York, NY, USA,180–190. h�ps://doi.org/10.1145/347324.349121

[47] Mangala Gowri Nanda and S. Ramesh. 2006. Interprocedural Slicingof Multithreaded Programs with Applications to Java. ACM Trans.Program. Lang. Syst. 28, 6 (Nov. 2006), 1088–1144. h�ps://doi.org/10.1145/1186632.1186636

[48] Satish Narayanasamy, Zhenghao Wang, Jordan Tigani, Andrew Ed-wards, and Brad Calder. 2007. Automatically Classifying Benignand Harmful Data Races Using Replay Analysis. In Proceedings ofthe 28th ACM SIGPLAN Conference on Programming Language De-sign and Implementation (San Diego, California, USA) (PLDI ’07).Association for Computing Machinery, New York, NY, USA, 22–31.h�ps://doi.org/10.1145/1250734.1250738

[49] Andreas Pavlogiannis. 2020. Fast, sound, and e�ectively completedynamic race prediction. Proc. ACM Program. Lang. 4, POPL (2020),17:1–17:29. h�ps://doi.org/10.1145/3371085

[50] G. Ramalingam. 2000. Context-Sensitive Synchronization-SensitiveAnalysis is Undecidable. ACM Trans. Program. Lang. Syst. 22, 2 (March2000), 416–430. h�ps://doi.org/10.1145/349214.349241

[51] Venkatesh Prasad Ranganath and John Hatcli�. 2004. Pruning Inter-ference and Ready Dependence for Slicing Concurrent Java Programs.In Compiler Construction, 13th International Conference, CC 2004, Heldas Part of the Joint European Conferences on Theory and Practice of Soft-ware, ETAPS 2004, Barcelona, Spain, March 29 - April 2, 2004, Proceedings(Lecture Notes in Computer Science, Vol. 2985), Evelyn Duesterwald (Ed.).Springer, 39–56. h�ps://doi.org/10.1007/978-3-540-24723-4_4

[52] Martin C. Rinard. 2001. Analysis of Multithreaded Programs. In StaticAnalysis, 8th International Symposium, SAS 2001, Paris, France, July16-18, 2001, Proceedings (Lecture Notes in Computer Science, Vol. 2126),Patrick Cousot (Ed.). Springer, 1–19. h�ps://doi.org/10.1007/3-540-47764-0_1

[53] Jake Roemer, Kaan Genç, and Michael D. Bond. 2020. SmartTrack:e�cient predictive race detection. In Proceedings of the 41st ACMSIGPLAN International Conference on Programming Language Designand Implementation, PLDI 2020, London, UK, June 15-20, 2020, Alas-tair F. Donaldson and Emina Torlak (Eds.). ACM, 747–762. h�ps://doi.org/10.1145/3385412.3385993

[54] Radu Rugina and Martin Rinard. 1999. Pointer Analysis for Multi-threaded Programs. In Proceedings of the ACM SIGPLAN 1999 Confer-ence on Programming Language Design and Implementation (Atlanta,Georgia, USA) (PLDI ’99). Association for Computing Machinery, NewYork, NY, USA, 77–90. h�ps://doi.org/10.1145/301618.301645

[55] Qingkai Shi, Xiao Xiao, Rongxin Wu, Jinguo Zhou, Gang Fan, andCharles Zhang. 2018. Pinpoint: Fast and Precise Sparse Value FlowAnalysis for Million Lines of Code. In Proceedings of the 39th ACMSIGPLAN Conference on Programming Language Design and Implemen-tation (Philadelphia, PA, USA) (PLDI 2018). Association for ComputingMachinery, New York, NY, USA, 693–706. h�ps://doi.org/10.1145/3192366.3192418

[56] Yao Shi, Soyeon Park, Zuoning Yin, Shan Lu, Yuanyuan Zhou, Wen-guang Chen, and Weimin Zheng. 2010. Do I Use the Wrong De�ni-tion? DeFuse: De�nition-Use Invariants for Detecting Concurrencyand Sequential Bugs. In Proceedings of the ACM International Con-ference on Object Oriented Programming Systems Languages and Ap-plications (Reno/Tahoe, Nevada, USA) (OOPSLA ’10). Associationfor Computing Machinery, New York, NY, USA, 160–174. h�ps://doi.org/10.1145/1869459.1869474

[57] Nishant Sinha and Chao Wang. 2010. Staged Concurrent ProgramAnalysis. In Proceedings of the Eighteenth ACM SIGSOFT InternationalSymposium on Foundations of Software Engineering (Santa Fe, NewMexico, USA) (FSE ’10). Association for Computing Machinery, NewYork, NY, USA, 47–56. h�ps://doi.org/10.1145/1882291.1882301

[58] Yannis Smaragdakis, Jacob Evans, Caitlin Sadowski, Jaeheon Yi, andCormac Flanagan. 2012. Sound predictive race detection in polynomialtime. In Proceedings of the 39th ACM SIGPLAN-SIGACT Symposium onPrinciples of Programming Languages, POPL 2012, Philadelphia, Penn-sylvania, USA, January 22-28, 2012, John Field and Michael Hicks (Eds.).ACM, 387–400. h�ps://doi.org/10.1145/2103656.2103702

[59] Bjarne Steensgaard. 1996. Points-to Analysis in Almost Linear Time. InProceedings of the 23rd ACM SIGPLAN-SIGACT Symposium on Principlesof Programming Languages (St. Petersburg Beach, Florida, USA) (POPL’96). Association for Computing Machinery, New York, NY, USA, 32–41.h�ps://doi.org/10.1145/237721.237727

[60] Yulei Sui, Peng Di, and Jingling Xue. 2016. Sparse �ow-sensitivepointer analysis for multithreaded programs. In Proceedings of the 2016International Symposium on Code Generation and Optimization, CGO2016, Barcelona, Spain, March 12-18, 2016, Björn Franke, Youfeng Wu,and Fabrice Rastello (Eds.). ACM, 160–170. h�ps://doi.org/10.1145/2854038.2854043

[61] Yulei Sui, Ding Ye, and Jingling Xue. 2012. Static Memory Leak Detec-tion Using Full-Sparse Value-Flow Analysis. In Proceedings of the 2012International Symposium on Software Testing and Analysis (Minneapo-lis, MN, USA) (ISSTA 2012). Association for Computing Machinery,New York, NY, USA, 254–264. h�ps://doi.org/10.1145/2338965.2336784

[62] Jan Wen Voung, Ranjit Jhala, and Sorin Lerner. 2007. RELAY: StaticRace Detection on Millions of Lines of Code. In Proceedings of thethe 6th Joint Meeting of the European Software Engineering Confer-ence and the ACM SIGSOFT Symposium on The Foundations of Soft-ware Engineering (Dubrovnik, Croatia) (ESEC-FSE ’07). Associationfor Computing Machinery, New York, NY, USA, 205–214. h�ps://doi.org/10.1145/1287624.1287654

[63] John Whaley and Martin Rinard. 1999. Compositional Pointer andEscape Analysis for Java Programs. In Proceedings of the 14th ACMSIGPLAN Conference on Object-Oriented Programming, Systems, Lan-guages, and Applications (Denver, Colorado, USA) (OOPSLA ’99). As-sociation for Computing Machinery, New York, NY, USA, 187–206.h�ps://doi.org/10.1145/320384.320400

[64] Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, and Junfeng Yang.2012. Sound and precise analysis of parallel programs through schedulespecialization. In ACM SIGPLAN Conference on Programming LanguageDesign and Implementation, PLDI ’12, Beijing, China - June 11 - 16, 2012,Jan Vitek, Haibo Lin, and Frank Tip (Eds.). ACM, 205–216. h�ps://doi.org/10.1145/2254064.2254090

[65] Yichen Xie and Alex Aiken. 2007. Saturn: A scalable framework forerror detection using boolean satis�ability. ACM Transactions onProgramming Languages and Systems (TOPLAS) 29, 3 (2007), 16–es.

[66] M. Xu, S. Kashyap, H. Zhao, and T. Kim. 2020. Krace: Data Race Fuzzingfor Kernel File Systems. In 2020 IEEE Symposium on Security and Privacy(SP). 1643–1660. h�ps://doi.org/10.1109/SP40000.2020.00078

[67] Junfeng Yang, Ang Cui, Salvatore J. Stolfo, and Simha Sethumadhavan.2012. Concurrency Attacks. In 4th USENIX Workshop on Hot Topics inParallelism, HotPar’12, Berkeley, CA, USA, June 7-8, 2012, Hans-JuergenBoehm and Luis Ceze (Eds.). USENIX Association. h�ps://www.usenix.org/conference/hotpar12/workshop-program/presentation/yang

[68] Hongtao Yu, Jingling Xue, Wei Huo, Xiaobing Feng, and ZhaoqingZhang. 2010. Level by Level: Making Flow- and Context-SensitivePointer Analysis Scalable for Millions of Lines of Code. In Proceed-ings of the 8th Annual IEEE/ACM International Symposium on CodeGeneration and Optimization (Toronto, Ontario, Canada) (CGO ’10).Association for Computing Machinery, New York, NY, USA, 218–229.h�ps://doi.org/10.1145/1772954.1772985

[69] Shixiong Zhao, Rui Gu, Haoran Qiu, Tsz On Li, YuexuanWang, HemingCui, and Junfeng Yang. 2018. OWL: Understanding and DetectingConcurrencyAttacks. In 48th Annual IEEE/IFIP International Conferenceon Dependable Systems and Networks, DSN 2018, Luxembourg City,Luxembourg, June 25-28, 2018. IEEE Computer Society, 219–230.