under-constrained symbolic execution: correctness checking ...gc034dd8484/submit-augmented.pdf ·...

UNDER-CONSTRAINED SYMBOLIC EXECUTION:

CORRECTNESS CHECKING FOR REAL CODE

A DISSERTATION

SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE

AND THE COMMITTEE ON GRADUATE STUDIES

OF STANFORD UNIVERSITY

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

David A. Ramos

June 2015

http://creativecommons.org/licenses/by-nc/3.0/us/

This dissertation is online at: http://purl.stanford.edu/gc034dd8484

© 2015 by David Antonio Ramos. All Rights Reserved.

Re-distributed by Stanford University under license with the author.

This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.

ii



http://purl.stanford.edu/gc034dd8484

I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.

Dawson, Engler, Primary Adviser


Alex Aiken


David Dill

Approved for the Stanford University Committee on Graduate Studies.

Patricia J. Gumport, Vice Provost for Graduate Education

This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file inUniversity Archives.

iii

Abstract

Software defects pose a frequent challenge to developers, and their consequences are far-

reaching. Despite advances in software engineering practices, programming language design,

and debugging tools, bugs remain ubiquitous. Traditional testing techniques, while useful,

have failed to prevent a significant number of bugs from affecting end users.

One promising technique for automatically detecting bugs is dynamic symbolic execu-

tion, which aims to test all possible execution paths through a program and identify inputs

that cause the program to crash. Unfortunately, symbolic execution suffers from the well-

known path explosion problem because the number of distinct execution paths through a

program is, in the best case, exponential in the number of branch statements. Consequently,

symbolic execution tools are typically ineffective for programs consisting of more than a few

thousand lines of code, let alone large codebases with line counts in the millions.

This dissertation presents a new, scalable approach to symbolic execution, under-

constrained symbolic execution, that targets individual functions rather than whole pro-

grams. This technique supports direct symbolic execution of arbitrary C functions and

automatically synthesizes their inputs, even for complex, pointer-rich data structures.

We demonstrate this technique’s feasibility by thoroughly evaluating three use cases,

although many others are possible. First, we use it to check the equivalence of library

routines from different implementations that share a common interface (e.g., the C standard

library). Second, we check whether code patches introduce new bugs by comparing two

versions of the same function: before and after a patch is applied. Third, we use under-

constrained symbolic execution to test a single version of a function, using a combination

of heuristics to separate important errors from those likely to be false positives.

In this dissertation, we describe UC-KLEE, a tool we built that implements under-

constrained symbolic execution and supports the above use cases. We evaluate our tool

on large, mature codebases including BIND, OpenSSL, and the Linux kernel and describe

previously-unknown bugs we discovered in each of these codebases.

iv

Acknowledgments

First and foremost, I would like to thank my parents, Carey and Antonio, for encouraging

and supporting me throughout my academic career, and for always making education a

priority. Thanks to my sister Natalie for paving the way and setting the standard to which

I have aspired. I am eternally grateful to Amanda Opuszynski for keeping me sane during

the ups and downs of the past decade.

Many thanks to my advisor, Dawson Engler, for his mentorship and for always setting

a high bar. I would like to thank my reading committee members Alex Aiken and David

Dill, and my oral exam committee members Dan Boneh and Mykel Kochenderfer, for their

time and constructive feedback.

I am grateful to those who guided me in my first foray into research as an undergraduate

at the University of Michigan. Thanks especially to Todd Austin and Valeria Bertacco,

without whom I may never have embarked on this Ph.D. I am profoundly grateful to Joseph

Greathouse and Ilya Wagner for treating me as a peer and showing me how research is done.

I would like to thank my labmates Suhabe Bugrara, Seungbeom Kim, Philip Guo, and

Anthony Romano for the many helpful conversations we have had over the years, and for

commiserating about the endless complexities of working with KLEE. Thanks to Cristian

Cadar for passing the torch and first introducing me to KLEE during his final months at

Stanford. Thanks to Elliott Slaughter for his role in discovering the NFS block size problem

discussed in Section 8.1.1. I am grateful to those who provided careful comments on various

drafts of my written work, especially Diego Ongaro, Joseph Greathouse, and Philip Guo.

I would like to thank a number of outside colleagues for their assistance. Thanks to Tom

Ball and Madan Musuvathi for their mentorship while I interned at Microsoft Research.

Thanks to Nikolaj Bjørner for his help with Z3. Thanks to Evan Hunt and Sue Graves of

ISC for giving me access to the BIND source repository before it became public. Thanks to

the LibreSSL/OpenBSD developers for their quick responses to my bug reports, including

v

Bob Beck, Ted Unangst, Theo de Raadt, Miod Vallat, Philip Guenther, and Joel Sing.

This dissertation was supported by the United States Air Force Research Laboratory

(AFRL) through Contract FA8650-10-C-7024, by a National Science Foundation Gradu-

ate Research Fellowship under Grant No. DGE-0645962, and by a Rose Hills Foundation

Graduate Engineering Fellowship. The views expressed in this dissertation are my own.

vi

Contents

Abstract iv

Acknowledgments v

List of Tables xi

List of Figures xii

1 Introduction 1

2 Overview 5

2.1 Traditional symbolic execution . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Scalability limitations . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Under-constrained symbolic execution . . . . . . . . . . . . . . . . . . . . . 7

2.2.1 Lazy initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3 klee limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Under-constrained symbolic execution 13

3.1 Referents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1.1 Propagation policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1.2 Address resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 Lazy initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2.1 Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2.2 Practical challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2.3 Object sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.3 Error reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

vii

3.4 Function pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.5 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4 Basic equivalence checking 35

4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.2 Path pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.3 Object Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.5 Annotation filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.6.1 Different implementations: Newlib vs. uClibc . . . . . . . . . . . . . 44

4.6.2 Different versions of the same implementation: uClibc . . . . . . . . 48

4.6.3 Checking the checker: finding bugs in uc-klee and llvm . . . . . . 49

4.6.4 Results summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5 Scalable equivalence checking 51

5.1 Patch checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.1.1 Path pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.2 False positives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.2.1 Manual annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.2.2 Automated heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.3.1 Code modification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.3.2 Patches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.3.3 Portability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6 Generalized checking 68

6.1 Leak checker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.2 Uninitialized data checker . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6.3 User input checker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

6.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

6.4.1 Leak checker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

6.4.2 Uninitialized data checker . . . . . . . . . . . . . . . . . . . . . . . . 77

6.4.3 User input checker . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

viii

7 Optimizing symbolic execution 81

7.1 Symbolic expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

7.1.1 Expression uniquing . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

7.1.2 Expression rewriting . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

7.1.3 Expression attributes . . . . . . . . . . . . . . . . . . . . . . . . . . 83

7.2 Lazy constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

7.3 Path explosion in library functions . . . . . . . . . . . . . . . . . . . . . . . 85

8 Experience 87

8.1 General symbolic execution . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

8.1.1 System modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

8.1.2 Search heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

8.2 Under-constrained symbolic execution . . . . . . . . . . . . . . . . . . . . . 91

8.2.1 False positives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

8.2.2 General bug finding . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

8.2.3 Backtracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

8.3 Alternative approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

8.3.1 State merging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

8.3.2 Path refinement (top-down) . . . . . . . . . . . . . . . . . . . . . . . 99

8.3.3 Path composition (bottom-up) . . . . . . . . . . . . . . . . . . . . . 100

8.3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

9 Related work 102

9.1 Symbolic execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

9.1.1 Patches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

9.2 Equivalence verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

9.3 Runtime checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

9.4 Static analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

9.5 Model checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

10 Conclusions 109

10.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

A SMT Rewrite Rules 111

ix

Bibliography 118

x

List of Tables

3.1 Summary of referent policy . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.1 Summary of experimental results: equivalence of Newlib and uClibc . . . . 50

5.1 Summary of C annotation macros. . . . . . . . . . . . . . . . . . . . . . . . 55

5.2 Summary of experimental results: BIND and OpenSSL patches . . . . . . . 59

5.3 Summary of experimental results: False positive heuristics for BIND and

OpenSSL patches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.4 Summary of experimental results: BIND and OpenSSL portability . . . . . 66

6.1 Summary of experimental results: generalized uc-klee checkers on BIND,

OpenSSL, and the Linux kernel . . . . . . . . . . . . . . . . . . . . . . . . . 75

A.1 Expression types rewritten by uc-klee. . . . . . . . . . . . . . . . . . . . . 111

A.2 Constant/variable naming conventions used in this appendix. . . . . . . . . 111

xi

List of Figures

2.1 Code example: Simple function analyzed by uc-klee . . . . . . . . . . . . 9

2.2 Illustration: Example data structure allocated by uc-klee . . . . . . . . . 10

3.1 Illustration: Pointer referents . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.2 Algorithm: Propagation of pointer referents across binary operations . . . . 19

3.3 Illustration: Lazy initialization . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.4 Code example: Linux kernel negative-offset pointers . . . . . . . . . . . . . 30

3.5 Bug found: Path summary for OpenSSL do ssl3 write function . . . . . . 31

4.1 Code example: Trivial functions cross-checked by uc-klee . . . . . . . . . 39

4.2 Code example: Simple filter routine . . . . . . . . . . . . . . . . . . . . . . 43

4.3 Graph: Instruction coverage for uClibc and Newlib . . . . . . . . . . . . . . 45

4.4 Code example: Two equivalent implementations of ffs . . . . . . . . . . . . 46

4.5 Code example: Two equivalent implementations of strlen . . . . . . . . . . 47

4.6 Bug found: Newlib directory removal in remove . . . . . . . . . . . . . . . . 48

4.7 Bug confirmed: uClibc NULL pointer dereference in unsetenv . . . . . . . 49

5.1 False positive: memory access in BIND function isc region compare . . . 54

5.2 Bug found: BIND locking bug in receive secure db . . . . . . . . . . . . 59

5.3 Bug found: OpenSSL NULL pointer dereference in do ssl3 write . . . . . 61

5.4 Graph: Instruction coverage for BIND and OpenSSL . . . . . . . . . . . . . 62

5.5 Graph: Completed execution paths for BIND and OpenSSL . . . . . . . . . 63

6.1 Code example: OpenSSL “Heartbleed” vulnerability . . . . . . . . . . . . . 74

6.2 Bug found: Linux kernel memory leak in gssp free receive pages . . . . 76

6.3 Bug found: OpenSSL uninitialized pointer in ec wNAF precompute mult . 77

xii

6.4 Bug found: BIND uninitialized branch in dispatch createudp . . . . . . . 78

6.5 Bug found: Linux kernel buffer overread in dg dispatch as host . . . . . 80

6.6 Bug found: Linux kernel remainder-by-zero in validate layout . . . . . 80

7.1 Code example: Lazy constraints for integer division . . . . . . . . . . . . . . 84

7.2 Illustration: Example symbolic ITE expression for strlen . . . . . . . . . . 86

8.1 Code example: uClibc function opendir . . . . . . . . . . . . . . . . . . . . 88

8.2 False positive: memory leak in OpenSSL function dtls1 buffer message . 92

8.3 False positive: memory leak in Linux kernel rt2x00 driver . . . . . . . . . . 93

8.4 False positive: assertion failure in OpenSSL function dtls1 do write . . . 94

8.5 Bug found: OpenSSL NULL pointer dereference in tls1 process sigalgs 95

xiii

Chapter 1

Introduction

Software bugs pervade every level of the modern software stack, degrading both stability

and security. A 2002 NIST study pegged the annual cost of bugs to the U.S. economy

at $59.5 billion [92]. Secunia [108] reports that 13,073 security vulnerabilities were re-

ported in 2013 alone, many due to software bugs. Current practice attempts to address this

challenge through a variety of techniques, including code reviews, higher-level programming

languages, testing, and static analysis. While these practices prevent many bugs from being

released to the public, significant gaps remain.

One technique, testing, is a useful sanity check for code correctness, but it often fails to

consider infrequently-executed code paths, in particular those that handle errors. Unsur-

prisingly, these are frequent sources of bugs and security vulnerabilities.

Another broad technique, static analysis, is effective at discovering many classes of bugs.

However, static analysis generally uses abstraction to improve scalability and cannot reason

precisely about program values and pointer relationships. Consequently, deep bugs that

depend on the exact results of earlier computation often evade static techniques.

One promising technique that addresses the limitations of both testing and static anal-

ysis is symbolic execution [12, 17, 99]. A symbolic execution tool conceptually explores

all possible execution paths through a program in a bit-precise manner and considers all

possible input values. Along each path, the tool determines whether any combination of

inputs could cause the program to crash. If so, it reports an error to the developer, along

with a concrete set of inputs that will trigger the bug.

Unfortunately, symbolic execution suffers from the well-known path explosion problem

since the number of distinct execution paths through a program is often exponential in the

1

CHAPTER 1. INTRODUCTION 2

number of if-statements or, in the worst case, infinite. For example, consider the following

trivial implementation of the C function strlen:

size t len = 0;while (str[len])len++;

return len;

Each iteration of the while loop will cause a new path to diverge: one in which the current

character is NULL, and one in which it is not. As a consequence of this path explosion,

symbolic execution often examines only a small subset of execution paths, missing important

bugs.

This dissertation presents an alternative to whole-program symbolic execution, under-

constrained symbolic execution [45, 101], that executes partial programs (e.g., individual

functions) in isolation. Executing individual functions provides several advantages. First,

it reduces the number and length of execution paths that must be explored. Second, it

allows code deep within a program to be checked directly. Third, it effectively parallelizes

symbolic execution by allowing distinct functions to be explored concurrently. Finally, it

allows library and OS kernel interfaces to be checked easily and thoroughly.

In this dissertation, we describe uc-klee, a scalable framework implementing under-

constrained symbolic execution for C/C++ systems code without requiring a manual spec-

ification or even a single testcase. uc-klee automatically synthesizes symbolic inputs to

each function, even for complex, pointer-rich data structures. We apply this framework to

three import use cases.

First, we use it to check the per-path equivalence of library routines that share a common

interface. In many cases, uc-klee can exhaust all execution paths up to the given input

bound, providing a bounded verification that the functions are equivalent, subject to some

standard caveats described in Sections 3.5 and 4.4.

Second, we use uc-klee to check whether code patches introduce new bugs. Ironically,

patches intended to fix bugs or eliminate security vulnerabilities are a frequent source of

them. A developer for the ubiquitous BIND DNS server recently wrote publicly that “[they]

have introduced security problems into BIND 9 in the past couple years. We need to get

better” [67]. In many cases, uc-klee can verify (bounded with caveats) that a patch does

not introduce new potential crashes to a function, a guarantee not possible with existing

techniques.


Third, we use uc-klee as a general code checking framework upon which specific check-

ers can be implemented. We describe three example checkers we implemented to find mem-

ory leaks, uses of uninitialized data, and unsanitized uses of user input. Additional checkers

may be added to our framework to detect a wide variety of bugs along symbolic, bit-precise

execution paths. If uc-klee exhaustively checks all execution paths through a function,

then it has effectively verified (bounded with caveats) that the function passes the check

(e.g., no leaks).

The primary contributions of this dissertation are:

• The uc-klee tool, a robust implementation of under-constrained symbolic execution

for C/C++ code that uses a technique based on referents (Section 3.1) to implement

lazy initialization (Section 3.2) and support real, type-unsafe, open source systems

code.

• The novel application of under-constrained symbolic execution to verify functional

equivalence (bounded with caveats) and check whether patches introduce crashes.

• The novel use of under-constrained symbolic execution in conjunction with individual

checkers to find bugs and verify (bounded with caveats) correctness properties on

large codebases.

• A discussion of our experience implementing uc-klee and the challenges associated

with scaling our tool to support real code (Chapter 8).

• Our experimental evaluation of uc-klee on small library routines from uClibc [115]

and Newlib [91], and on mature, widely-used, and security-critical code from BIND [9],

OpenSSL [94], and the Linux kernel [81]. We verified the equivalence of 300 functions

from uClibc and Newlib and found 9 new bugs. We validated over 800 patches from

BIND and OpenSSL and found 12 bugs, including an OpenSSL denial-of-service vul-

nerability [31]. uc-klee verified that 115 patches did not introduce new crashes, and

it checked thousands of paths and achieved high coverage even for patches for which

it did not exhaust all execution paths. We applied our three built-in checkers to over

20,000 functions from BIND, OpenSSL, and the Linux kernel and discovered 67 new

bugs, several of which appear to be remotely exploitable. Many of these were latent

bugs that had been missed by years of debugging effort. uc-klee also exhaustively

verified (with caveats) that 771 functions from BIND and OpenSSL that allocate


heap memory do not cause memory leaks, and that 4,088 functions do not access

uninitialized data.

The remainder of this dissertation is structured as follows. Chapter 2 presents an

overview of under-constrained symbolic execution and lazy initialization. Chapter 3 dis-

cusses the details of under-constrained symbolic execution, including limitations and the im-

plementation of pointer referents for lazy initialization. Chapter 4 applies under-constrained

symbolic execution to the problem of verifying functional equivalence and evaluates uc-klee

on hundreds of small library functions. Chapter 5 presents techniques for scaling equiva-

lence verification and applies it to the problem of checking code patches, along with an

experimental evaluation on 800 real patches. Chapter 6 uses under-constrained symbolic

execution for generalized checking in conjunction with individual checkers, along with an

experimental evaluation on over 20,000 real functions. Chapter 7 presents optimizations

we implemented in uc-klee that we believe are broadly applicable to symbolic execution

tools. Chapter 8 discusses our experience in implementing uc-klee and the challenges we

encountered. Finally, Chapter 9 discusses related work, and Chapter 10 concludes.

Chapter 2

Overview

This chapter presents an overview of traditional symbolic execution and the baseline klee [17]

tool, along with its limitations. It then introduces under-constrained symbolic execution

and our uc-klee [101] tool, along with an example illustrating how uc-klee automatically

synthesizes a symbolic data structure as input to a function.

2.1 Traditional symbolic execution

Symbolic execution [12, 17, 99] is a precise technique for checking programs and finding er-

rors. A symbolic execution tool executes all possible paths through a program and considers

symbolic inputs that represent all possible input values, rather than a single set of concrete

inputs. Along each execution path, the tool maintains complete program state expressed in

terms of the symbolic inputs and checks whether any possible input values could cause the

program to crash. If so, the tool reports the error to the user along with an example set of

concrete inputs that would trigger the crash. A developer can use this concrete test case to

diagnose and fix the bug and optionally augment the program’s existing test suite.

When a symbolic execution tool reaches a branch (e.g., if-statement) that depends on

the value of a symbolic input, the tool conceptually forks execution and considers each

resulting execution path independently. Along each path, the tool maintains a path condi-

tion, a set of symbolic constraints that precisely define the inputs that cause the path to

be executed. These path constraints are symbolic expressions (Boolean predicates) using

logical and arithmetic operators expressed in terms of symbolic inputs and concrete val-

ues. When executing each branch, the tool consults a satisfiability modulo theories (SMT)

5

CHAPTER 2. OVERVIEW 6

solver [39, 48] to determine which branch targets are feasible under the current path con-

dition. If the tool detects an error along an execution path, it queries the SMT solver to

obtain a set of concrete inputs satisfying the path condition, which are then supplied to the

user along with a description of the error.

This dissertation builds on klee [17], a symbolic execution tool that checks C/C++

programs compiled as bitcode (intermediate representation) by the llvm compiler [78]. klee

maintains a frontier of execution states, partial paths through a program. Each execution

state represents the complete program state along a partial execution path and consists of

a program counter, static single assignment (SSA) registers [37], a call stack, heap memory,

and global variables. At each instruction step, klee selects a state from its frontier using a

search strategy consisting of several heuristics, including the minimum distance to uncovered

(never executed) instructions, how recently a state covered a new instruction, and a state’s

fork depth—the number of path divergences due to symbolic branch conditions.

klee targets application programs, which interact with the operating system to perform

tasks such as file I/O by issuing system calls. Because klee operates on symbolic data,

arguments to these system calls may be symbolic. To handle this situation, klee provides

a partial model for the POSIX system call interface that supports symbolic data for many

common operations. klee typically forwards system calls involving only concrete arguments

directly to the operating system in which klee itself is running, allowing a much larger

variety of system calls to be supported.

Unfortunately, we have observed these “external” system calls causing non-deterministic

interference between execution states. For example, one path may write to a file, and

another may read from that file. The value read will depend on the order in which states

are chosen by klee’s search strategy, causing non-determinism and limiting path coverage.

The uc-klee tool we present in this dissertation includes a new symbolic POSIX model

that eliminates all external system calls, at the cost of supporting less application code.

We used this new model for our experimental evaluations in Chapters 5 and 6, but we used

the old klee POSIX model in Chapter 4. We briefly discuss our experience with system

modeling in Section 8.1.1, but implementation details of our new POSIX model are beyond

the scope of this dissertation.


2.1.1 Scalability limitations

While symbolic execution is a powerful technique capable of precisely reasoning about about

program states and discovering bugs, it suffers from three fundamental scalability limita-

tions when applied to whole programs. These limitations may prevent a tool from exhaus-

tively exploring all execution paths or even achieving reasonably high statement coverage.

Section 2.2 introduces our approach to addressing each of these limitations.

The first scalability limitation is the well-known path explosion problem. In the best

case, the number of distinct execution paths through a program is exponential in the number

of branch statements. In the worst case, when the code contains unbounded loops, the

number of execution paths may be infinite. Path explosion imposes both a compute bound—

we must execute all paths—and a space bound—available memory limits the number of

partial paths we can track simultaneously.

The second scalability limitation of symbolic execution is the computational cost of

each execution path. In our experience, symbolic execution imposes a best-case slowdown

of several orders of magnitude relative to native execution. For computationally-intensive

programs, this slowdown can make executing even a single path prohibitive, let alone all

paths through each program. In addition, execution paths may be infinite in length, pre-

senting an instance of the halting problem [38]. Consequently, the problem of finding all

bugs in a program using symbolic execution is undecidable in general.

The third scalability limitation is the cost of solving SMT queries to determine branch

feasibility and generate concrete test cases. In general, solving the class of bit-vector SMT

queries produced by symbolic execution is NP-complete [8]. Consequently, many programs

produce symbolic expressions that cannot be solved in a reasonable amount of time. This

is especially true for programs that perform complex arithmetic on symbolic inputs (e.g.,

cryptographic functions). While solving complex queries is the responsibility of an external

solver and not the symbolic execution tool, the tool is nonetheless limited by the computa-

tional cost of these queries.

2.2 Under-constrained symbolic execution

Unlike traditional (whole program) symbolic execution, under-constrained symbolic exe-

cution [45] analyzes partial programs in isolation. In this dissertation, we use function


boundaries to demarcate these partial programs because functions provide a natural, well-

understood interface. uc-klee can use an arbitrary function (named by the user) as its

entry point, rather than starting at a program’s main function. If the code being exe-

cuted performs a function call, however, uc-klee executes the called function, as in whole

program symbolic execution. uc-klee automatically synthesizes symbolic inputs to each

function, even for complex, pointer-rich data structures, with no manual effort by default.

Directly executing individual functions allows uc-klee to avoid having to discover an

execution path from main to each function, a potentially enormous computational win. In

practice, this approach partially mitigates all three scalability limitations enumerated in

Section 2.1.1: path explosion, path cost, and SMT query cost.

Traditional symbolic execution typically considers input values originating from external

input sources (e.g., command-line arguments, files, etc.). Because a program has no control

over these inputs, we expect the program to correctly handle all possible input values, re-

jecting invalid inputs rather than crashing. By contrast, individual functions typically have

preconditions imposed on their inputs. For example, a function may require that pointer

arguments not be NULL. Because uc-klee directly executes functions within a program

without requiring a function’s preconditions to be specified by the user, the inputs it con-

siders may be a superset (over-approximation) of the legal values handled by the function.

Consequently, we denote uc-klee’s symbolic inputs as under-constrained to reflect that

they are missing preconditions (constraints).

While this technique allows previously-unreachable code to be deeply checked, the miss-

ing preconditions may cause false positives (spurious errors) to be reported to the user. For

example, a function may crash when a pointer argument is NULL, but the error is spurious

if the program never calls the function with that argument set to NULL. uc-klee provides

an interface for a user to silence these errors by lazily specifying input preconditions using

simple C code. In our experience, even simple annotations may silence a large number of

spurious errors (see § 5.2.1) and this effort is orders of magnitude less work than eagerly

providing a full specification for each function.

2.2.1 Lazy initialization

uc-klee uses lazy initialization [68, 121] to automatically synthesize symbolic inputs, in-

cluding complex data structures, avoiding the need for users to manually construct inputs.

We illustrate lazy initialization by explaining how uc-klee executes the example function


listSum in Figure 2.1(a), which sums the entries in a linked list. Figure 2.1(b) summarizes

the execution paths we explore, and Figure 2.1(c) shows the symbolic inputs generated for

each path. For clarity, we elide error checks that uc-klee normally performs at memory

dereferences, division/remainder operations, and assertions.

uc-klee prepares by creating an under-constrained symbolic value to represent the sole

argument n. Although n is a pointer, it begins in the unbound state, not yet pointing to any

object. uc-klee then passes this symbolic argument to listSum and executes as follows:

Line 7 The local variable sum is assigned a concrete value; no special action is taken.

Line 8 The code checks whether the symbolic variable n is non-null. At this point, uc-

klee forks execution and considers both cases. We first consider the false path where

n = null, (Path A). We then return to the true path where n = null (Path B). On Path A,

uc-klee adds n = null as a path constraint and skips the loop.

Line 12 The code returns 0, and Path A terminates.

1 : typedef struct {2 : int val;3 : struct node *next;4 : } node;5 :6 : int listSum(node *n) {7 : int sum = 0;8 : while (n) {9 : sum += n−>val;10: n = n−>next;11: }12: return sum;13: }

(a) C code

7 : int sum = 0;

8 : while (n) { 12: return sum;

9 : sum += n->val;

10: n = n->next;

8 : while (n) {

9 : sum += n->val;

10: n = n->next;

8 : while (n) {

...

12: return sum;

12: return sum;

true

true

false

false

false

true

Path constraints:

n = null

n != null

n = &node1

node1.next = null

n != null

n = &node1

node1.next != null

node1.next = &node2

node2.next = null

Path A

Path B

Path C

(b) Paths explored

n

null

n

val

next

node1

null

n

val

next

node1

val

next

node2

null

Path A Path B Path C

(c) Symbolic inputs generated by lazy initialization

Figure 2.1: Example code fragment analyzed by uc-klee.

We now consider Path B.

Line 8 uc-klee adds the constraint n = null and enters the loop.


Line 9 The code dereferences the pointer n for the first time on Path B. Because n is un-

bound, uc-klee allocates a new block of memory, denoted node1, to satisfy the dereference

and adds the constraint n = &node1 to bind the pointer n to this object. At this point, n is

no longer unbound, so subsequent dereferences of that pointer will resolve to node1 rather

than trigger additional allocations. The contents of node1 are marked as unbound, allowing

future dereferences of pointers in this object to trigger allocations. This recursive process

is the key to lazy initialization. Next, sum is incremented by the symbolic value node1.val.

Line 10 n is set to the value node1.next. Path B then returns to the loop header.

Line 8 The code tests whether n (which is bound to node1.next) is non-null. uc-klee

forks execution and considers both cases. We first consider node1.next = null, which we

still refer to as Path B. We will then return to the true path where node1.next = null (Path

C). On Path B, node1.next = null is added as a path constraint and execution exits the

loop.

Line 12 Path B returns node1.val and terminates.

We now consider Path C.

Line 8 uc-klee adds node1.next = null as a path constraint, and Path C enters the loop.

Line 9 Path C dereferences the unbound pointer node1.next, which triggers allocation of a

new object. This step illustrates the unbounded nature of many loops. To prevent uc-klee

from allocating an unbounded number of objects as input, the tool accepts a command-line

option to bound the maximum depth of an input-derived data structure (k-bounding [40]).

When a path attempts to exceed this limit, our tool silently terminates it. We will assume

that this option has been set to a depth of 2. At this point uc-klee terminates Path C

and exits.

This example illustrates a simple but powerful technique that allows our tool to au-

tomatically synthesize data structures from under-constrained symbolic input. Figure 2.2

isc_event_t*event

struct isc_eventuc_isc_event1

struct dns_zoneuc_dns_zone1

struct dns_rbtdbuc_dns_rbt1

char*uc_char_ptr1

char[8]uc_char_arr1

struct dns_dbmethodsuc_dns_dbmethods1

[88] common.

methods

ev_argdb_argv

*

*

Figure 2.2: BIND data structure allocated by uc-klee.


shows an actual data structure our tool generated as input for one of the BIND bugs we

discovered (Figure 5.2). The edges between each object are labeled with the field names

contained in the function’s debug information and included in uc-klee’s error report (see

Section 3.3).

2.3 klee limitations

The scalability limitations described in Section 2.1.1 may cause the baseline klee tool to

miss bugs and/or achieve low code coverage due to incomplete path exploration within

a given time and memory limit. In addition to these limitations, which are inherent in

symbolic execution, klee’s implementation presents a number of known limitations that

may cause the tool to miss bugs. Our uc-klee tool inherits these limitations.

We are of course vulnerable to bugs in klee or its POSIX model. In addition, our

approach checks code at the implementation level, compiled as llvm bitcode. Consequently,

we do not reason about bugs that depend on other build configurations, compilers, compiler

options (including optimization), architectures (including word size and endianness), or

unspecified behavior (e.g., function argument evaluation order).

For simplicity, klee assigns concrete addresses to all memory objects. As a result, klee

may miss paths when code explicitly compares the values of pointers to different objects

(undefined behavior in C [63]). klee also assigns concrete sizes to memory objects. If code

passes a symbolic value as the size argument to malloc, klee arbitrarily chooses a single

concrete size, potentially missing bugs triggered by other sizes. The versions of uc-klee we

used for our evaluations in Chapters 5 and 6 (but not Chapter 4) do support symbolically-

sized memory objects, however (see Section 3.2.3). Finally, klee concretizes arguments to

floating-point instructions because SMT solvers have only recently begun to incorporate

floating-point support [104].

klee can only reason about code that it executes. When code calls a function that is

undefined and not available to klee as an external (native) call, klee kills that execution

path and issues a warning. Similarly, klee cannot perform external calls with symbolic

arguments or execute inline assembly, so it kills these paths as well. For implementation

reasons, klee does not support functions that return C struct types, but we have observed

these to be rare in practice.


klee does not support concurrency. Consequently, it cannot detect data races, dead-

locks, or other concurrency bugs. klee has only partial support for C++ programs and

does not support exception handling. KLOVER [79] is a separate extension to klee that

adds support for C++ language features. Finally, the symbolic POSIX model only par-

tially implements the POSIX specification, so code that exercises unmodeled functionality

is unsupported.

The verification guarantees we claim in Chapters 4–6 of this dissertation are subject to

the limitations above, so we may miss errors on paths that uc-klee does not explore due to

limitations in the underlying klee tool. However, we do detect when our tool terminates an

execution path due to an undefined function call, inline assembly, or similar errors. When

these errors occur, we do not consider a function to be verified (see Section 4.6).

Chapter 3

Under-constrained symbolic

execution

This chapter discusses many of the implementation details of under-constrained symbolic

execution and how we applied it to real C/C++ code. We first present two of the primary

contributions of this dissertation: pointer referents and lazy initialization. We then discuss

error reporting and our tool’s handling of function pointers. Finally, we enumerate the

known limitations of under-constrained symbolic execution.

3.1 Referents

One of the most significant challenges we faced while implementing under-constrained sym-

bolic execution for type-unsafe C/C++ code was reasoning precisely about pointers. uc-

klee must recognize pointers in two cases that are at the heart of under-constrained sym-

bolic execution.

First, to automatically synthesize symbolic inputs using lazy initialization (Section 3.2),

uc-klee must bind unbound pointers to new blocks of memory whenever they are deref-

erenced for the first time. In many cases, code manipulates pointers before dereferencing

them. For example, consider the contrived function sum, which is passed an unbound sym-

bolic pointer e:

int sum(int *e) {return e[1] + e[0];

}

13

CHAPTER 3. UNDER-CONSTRAINED SYMBOLIC EXECUTION 14

The code first reads from e at offset 4 (e[1] is the second 32-bit integer in the array). When

uc-klee allocates a new object to satisfy this dereference, it must bind e, not &e[1], to the

start of the new object. Otherwise, if it binds &e[1] to the first byte of the new object, the

subsequent dereference e[0] will access bytes that precede the new object, an illegal memory

access. We refer to the underlying pointer e within a symbolic expression such as &e[1] as

the base address. In practice, pointers may be subjected to arbitrary manipulations (e.g.,

masking) or may involve multiple symbolic values (e.g., x+y), and uc-klee must support

as many cases as possible. We initially attempted to infer base addresses directly from

arbitrary symbolic pointer expressions, but we abandoned this approach after about a year

of development effort in favor of the referent-based approach we present in this section.

The second case that requires uc-klee to reason precisely about pointers is for equiva-

lence checking (Chapter 4). When uc-klee checks the equivalence of two C/C++ functions,

it must check that their outputs (i.e., values written to memory) are identical. It does so by

“walking” the address space, beginning with the functions’ return values, each of their ar-

guments, and all global variables, and traversing all pointers it encounters. uc-klee must

distinguish pointers from scalars in order to (1) follow pointers to discover all reachable

memory objects, and (2) compare only the scalars for equivalence because differences in

pointer addresses are uninteresting (see Chapter 4) as long as the pointed-to objects con-

tain identical values. In type-unsafe C/C++ code, any value may be cast to a pointer, so

any word-sized value stored in (possibly unaligned) memory is potentially a pointer.

To meet the requirements of both lazy initialization and equivalence checking, we im-

plemented a solution that tracks pointer referents, which we observed to be a more robust

approach than deciphering arbitrary symbolic expressions and, at best, guessing based on

some set of criteria. A referent is the address of the first byte of the object to which

a pointer is intended to refer. Referents have been used previously to perform dynamic

bounds checking [66, 105], and they allow tools to detect memory errors when a pointer

lands within a valid region of memory but refers to the “wrong” memory object. In addi-

tion to detecting these types of errors, uc-klee applies referents to the more challenging

problem of reasoning about symbolic pointers.

In uc-klee, all valid pointers originate from one of three sources:

1. Explicit memory allocation: malloc for the heap, alloca for the stack (llvm uses

the alloca instruction for both fixed- and variable-size stack allocations)


symbolic array

referents

symbolic array

address:

0x1000100

size: 32

values

sym_array_99

ref_array_99

(a) Shadow memory

referent

Read w64 ref_array_99, 0

value

Read w64 sym_array_99, 0

%dest

(b) Shadow register %dest after memory load

Figure 3.1: Shadow structures for tracking pointer referents.

2. References to global and static variables defined in an llvm bitcode (IR) module.

3. Under-constrained symbolic inputs that may trigger lazy initialization.

uc-klee generates referents for each of these three sources. It then propagates the referents

alongside the pointers using shadow memory [56, 90, 107] and shadow registers. When an

unbound symbolic pointer is first dereferenced, the referent aids uc-klee in determining the

base address of the pointer. Similarly, the referent allows uc-klee to distinguish between

pointers and scalars during equivalence checking.

Each byte of memory and each register has a shadow location of the same size storing the

corresponding referent. If the value represents a valid pointer, the shadow location stores

the non-zero concrete starting address of the pointed-to memory object, or a symbolic value

in the case of an unbound pointer. If the value is not a valid pointer (i.e., it is not derived

from any of the three sources listed above), its corresponding referent is zero.

For explicit memory allocation using malloc or alloca, uc-klee sets the destination

shadow register (referent) to the starting address of the newly-allocated object (or zero if

the allocation fails). As shown in Figure 3.1(a), uc-klee also creates a shadow copy of the

object, which tracks the referents for values written to or read from this object.

For references to global and static variables, uc-klee sets the referents to the concrete

starting addresses it assigned to each variable when it loaded the llvm module. uc-klee

also creates a shadow copy of each global and static variable.

For symbolic inputs, the solution is slightly more complex. As in the underlying

klee tool, symbolic memory objects in uc-klee are associated with symbolic arrays (e.g.,

sym array 99 in Figure 3.1), which are used for representing symbolic values in SMT

queries [39, 48]. When uc-klee creates a new symbolic array, it also creates a corre-

sponding symbolic referent array (e.g., ref array 99 ). It associates the shadow memory


object with this referent array, as shown in Figure 3.1(a).

When an instruction loads from a memory object with an associated symbolic array,

uc-klee represents the resulting symbolic value as a read expression (Figure 3.1(b)), whose

operands are (1) the symbolic array associated with the memory object, and (2) the offset

of the load from the start of the object. We omit details for brevity, but this offset may be

symbolic (e.g., in the case of *(p+q)), and uc-klee handles this case robustly using the

mechanism provided by klee and supported by mainstream SMT solvers [39, 48]. uc-klee

sets the shadow register (referent) for the result of a load from a symbolic object to a read

expression using the referent array (described above) and the same offset. Figure 3.1(b)

illustrates the result of a one-byte load from the symbolic memory object in Figure 3.1(a).

Instruction Type Referent propagated

malloc Memory allocation Starting address of objectalloca Memory allocation Starting address of objectfree Memory deallocation N/Aload Memory access Load from shadow memorystore Memory access Store to shadow memory

getelementptr Arithmetic Same as input(binary ops) Arithmetic See Algorithm 3.2(floating point) Arithmetic Zero (drop referent)icmp Comparison Zero (drop referent)call Control flow Same as input (arguments)ret Control flow Same as input (return value)br Control flow N/Aswitch Control flow N/Aphi Data flow Same as inputselect Data flow Conditional (ITE) expressiontrunc Type coercion Truncate referentzext Type coercion Zero-extend referentsext Type coercion Zero-extend referentinttoptr Type coercion Same as inputptrtoint Type coercion Same as inputbitcast Type coercion Same as inputinsertvalue Register aggregates Insert referent into aggregateextractvalue Register aggregates Extract referent from aggregate

Table 3.1: Summary of referent policy for each llvm instruction.


3.1.1 Propagation policy

As code operates on registers and memory, uc-klee must propagate the corresponding ref-

erents. It does so using a collection of rules we refer to as the referent policy, summarized in

Table 3.1. Recall that valid (non-zero) referents must point to the starting address of a mem-

ory object. Consequently, operations that manipulate pointer values should not, in most

cases, manipulate the corresponding referent. For example, given a pointer p with referent

p, the addition operation p+16 should set the resulting referent to p because incrementing

a pointer must not change the object to which the pointer refers. The most straightforward

pointer manipulations in llvm bitcode are performed with the getelementptr instruction,

which adds arbitrary offsets to a pointer. For referents, this instruction is simply an identity

operation.

For more complicated operations, such as those involving two register values that each

have valid referents, the desired behavior is less clear. In general, the referent policy errs

on the side of propagating referents, except for specific cases where the result is clearly not

a valid pointer (e.g., a remainder/modulus operation). It is generally safe (sound) to prop-

agate a referent because (1) true scalars are never dereferenced, and (2) lazy initialization

robustly handles cases involving multiple, mutually-exclusive referents (see Section 3.2). As

noted in our discussion of limitations in Section 4.4, the exception to case (1) is for scalars

that equivalence checking (Chapter 4) erroneously concludes are pointers. In this situa-

tion, uc-klee could potentially fail to detect a difference between two functions’ outputs

because it only compares the values of scalars and ignores differences in pointer addresses.

We believe it is highly unlikely (though certainly feasible if code significantly violates the

C standard [63]) for a scalar with a mistakenly valid referent to point to valid memory.

Therefore, this situation should not arise regularly in practice.

The binaryOpReferent function in Algorithm 3.2 summarizes uc-klee’s referent

policy for binary operations. The function accepts an execution state state, an opcode,

and two register operands left and right. The suffixes .value and .referent denote whether

the register value or referent (shadow register) is being used. Lines 4–23 handle special

cases specific to certain opcodes, while the remainder of the algorithm handles the general

cases that are opcode-independent. Since, as argued above, it is generally safe to propa-

gate referents, the special cases that “drop” referents (return zero) are simply performance

optimizations to reduce the number of pointer resolutions uc-klee must perform during

equivalence checking. Cases where uc-klee erroneously drops referents typically result in


pointer errors being emitted, and we detected and fixed several such cases in earlier versions

of our referent policy. The current version has been stable for about four years and has

worked well in practice for all of our experiments.

Each special case is handled as follows:

Lines 4–5 We assume that any operation that strips the most-significant bit (msb) of a

pointer results in a scalar, so uc-klee drops the referent for left shifts. This rule stems

from the idea that the upper bits of a pointer are important (e.g., for distinguishing between

user-space and kernel-space addresses).

Lines 6–9 For similar reasoning, uc-klee drops the referent for a bitwise And with a

concrete (non-symbolic) mask whose msb is zero. Throughout this algorithm, the operators

≡ and ≡ denote structural comparisons between klee expressions, rather than integers.

Expressions in uc-klee are represented as directed acyclic graphs (DAG’s), and the ≡operator evaluates to true if two expressions have identical DAG’s. The symbol Zero

denotes an expression with a constant (non-symbolic) zero value.

Lines 10–13 Multiplication by zero always returns a value zero, so uc-klee drops the

referent.

Lines 14–15 Signed (SRem) and unsigned (URem) remainder/modulus operations return

an offset, so uc-klee drops the referent.

Lines 16–17 Division operations are not commutative, and the resulting referent is prop-

agated from the left operand.

Lines 18–23 If a subtraction operation involves two pointers that refer to the same memory

object, the result is a scalar offset, so uc-klee drops the referent. In C, subtraction between

pointers to different memory objects is undefined. As discussed in Section 3.2.1, uc-klee

generally assumes that unbound pointers do not alias one another, nor do they alias existing

objects. As a hack to detect cases of intended aliasing, we add a constraint (line 23) that

the referents must match whenever one symbolic pointer is subtracted from another. For

example, code that contains an if-statement such as p - q < sizeof(*p) generally expects

p and q to alias.

The general, opcode-independent cases are handled as follows:

Lines 25–26 If neither operand has a valid referent, there is no referent to propagate.


Algorithm 3.2 Propagation of pointer referents across binary operations

1: function isValid(referent)2: return isSymbolic(referent) ∨ referent ≡ Zero

3: function binaryOpReferent(state, opCode, left, right) ▷ handle special cases:4: if opCode = Shl ▷ left shift5: return Zero6: else if opCode = And7: ∧ ((isConstant(left.value) ∧ msb(left.value) ≡ Zero) ▷ bitwise “and” mask8: ∨ (isConstant(right.value) ∧ msb(right.value) ≡ Zero))9: return Zero10: else if opCode = Mul11: ∧ ((isConstant(left.value) ∧ left.value ≡ Zero) ▷ multiplication by Zero12: ∨ (isConstant(right.value) ∧ right.value ≡ Zero))13: return Zero14: else if opCode = URem ∨ opCode = SRem ▷ remainder/modulus15: return Zero16: else if opCode = UDiv ∨ opCode = SDiv ▷ division17: return left.referent18: else if opCode = Sub ▷ subtraction19: if left.referent ≡ right.referent20: return Zero21: else if isSymbolic(left.referent) ∧ isSymbolic(right.referent)22: ∧ mayBeTrue(state, eqExpr.create(left.referent, right.referent))23: addConstraint(state, eqExpr.create(left.referent, right.referent)) ▷ (continue)

24: ▷ handle general cases:25: if not isValid(left.referent) ∧ not isValid(right.referent) ▷ no valid referent26: return Zero27: else if isValid(left.referent) ∧ not isValid(right.referent) ▷ only left referent is valid28: return left.referent29: else if not isValid(left.referent) ∧ isValid(right.referent) ▷ only right referent is valid30: return right.referent31: else ▷ both are valid referents32: if left.referent ≡ right.referent33: return left.referent34: else if getWidth(left.referent) = WordSize35: ∧ getWidth(right.referent) = WordSize36: return left.referent37: else if getWidth(left.referent) = WordSize38: ∧ getWidth(right.referent) = WordSize39: return right.referent40: else ▷ return conditional expression41: leftCond ← andExpr.create(neqExpr.create(left.referent, Zero),42: eqExpr.create(right.referent, Zero))43: rightCond ← andExpr.create(eqExpr.create(left.referent, Zero),44: neqExpr.create(right.referent, Zero))45: return iteExpr.create(leftCond,46: left.referent,47: iteExpr.create(rightCond,48: right.referent,49: Zero))


Lines 27–28 If only the left operand has a valid referent, propagate that referent.

Lines 29–30 If only the right operand has a valid referent, propagate that referent.

Lines 32–33 If both operands have the same referent (except for the subtraction case

handled above), propagate that referent. In general, binary operations between two pointers

are undefined, but this case may arise when adding an offset whose referent was mistakenly

propagated to a valid pointer (e.g., (p & ~0xfff) + hash(p)).

Lines 34–36 If only the left operand has a referent of the appropriate width, propagate

that referent. This may arise when code reads only a few bytes from a symbolic input

and then sign- or zero-extends the value before adding it to a pointer (operands to binary

operations must have the same width in llvm bitcode). In uc-klee, SExt and ZExt

instructions are identity operations on referents.

Lines 37–39 For the same reasoning as above, we return the right operand’s referent.

Lines 40–49 For all remaining operations involving two differing referents (recall that

lines 32–33 handled the case of identical referents), uc-klee uses a conditional if-then-

else (ITE) expression to represent the resulting referent. If exactly one of the referents

is non-zero, the ITE expression evaluates to the non-zero referent. Otherwise, the ITE

expression evaluates to zero. This conditional expression captures the belief that because

binary operations involving pointers to disparate memory objects are undefined in C, only

one of the referents should be valid. If a pointer whose referent is an ITE expression is

dereferenced, each of the possible outcomes is examined. If any of the referents are unbound

symbolic inputs, lazy initialization is triggered. Section 3.2 discusses this process in more

detail.

When a value is loaded from memory, uc-klee loads the referent from the corresponding

shadow memory. When a value is stored to memory, uc-klee writes the referent from the

shadow register to the corresponding shadow memory location. If the referent being stored

is invalid (zero), this effectively clears the referent from that location in shadow memory,

which marks that memory as holding a scalar.

3.1.2 Address resolution

Tracking pointer referents also allows uc-klee to find a class of memory errors missed

by the baseline klee system. Because referents specify the object to which a pointer is


x + 10

C representation: uc-klee internal representation:

(Add w64 (Read w64 sym_array_99, 0), 10)

(Read w64 ref_array_99, 0)xreferent

referentarray

(a) Pointer (top) and referent (bottom) representations

ref_array_98

ref_array_99

...

0 0 0 0 0 0 0 01 1 1 1 1 1 1 1

...

binding metadata

table

binding metadata

0 = unbound1 = bound

ref_array_98

ref_array_99

depth table

...

2

3

...

(b) Auxiliary data structures

Figure 3.3: Lazy initialization pointer representations and auxiliary data structures

intended to refer, they can be used to detect instances where a pointer dereference accesses

valid memory, but within the “wrong” object. For example, if p points to a memory object

that is 32 bytes in size, dereferencing the pointer p + 1024 should trigger an out-of-bounds

memory error. However, the baseline klee system (and many other tools that do not track

referents), will miss this error if p + 1024 lands within some other allocated memory object.

To detect this class of errors, uc-klee augments each address resolution with a query

that ensures that the referent matches the starting address of the memory object. If this

check fails, uc-klee reports the error to the user.

3.2 Lazy initialization

Section 2.2.1 introduced lazy initialization using a simple example. In this section, we

present the details of how lazy initialization is implemented in uc-klee. Throughout, we

use a working example in which code dereferences an unbound symbolic pointer x + 10

whose symbolic referent we notate as xreferent (recall that incrementing a pointer x by 10

does not alter its referent).

When a pointer derived from under-constrained symbolic input is first dereferenced, uc-

klee allocates a new object to satisfy the memory access. To distinguish between initial

dereferences of symbolic pointers and subsequent dereferences (which must map to the same

object), uc-klee tracks whether each symbolic pointer has been previously dereferenced.

Recall from Section 3.1 that each symbolic input in uc-klee is represented as both a

symbolic array and a referent array (Figure 3.3(a)). To track whether each symbolic pointer

has been dereferenced, uc-klee associates additional binding metadata with each referent

array (Figure 3.3(b)). When an under-constrained symbolic input is created, its associated


binding metadata is initialized to zero for each byte of the input, indicating that the bytes

are initially unbound. Once an unbound pointer has been dereferenced and bound to a lazily-

allocated object, uc-klee sets the metadata corresponding to each byte of the pointer to

one, indicating that each byte is now bound. uc-klee uses metadata for this purpose, rather

than relying solely on a state’s path constraints, in order to handle several challenging cases

that arise in practice, which we discuss in Section 3.2.2.

At each memory dereference (i.e., load or store), uc-klee performs the following steps

to implement lazy initialization, continuing with the example symbolic pointer x + 10:

1. uc-klee examines the referent (i.e., xreferent) from the address operand’s shadow

register to see if it is a symbolic expression. If it is concrete, uc-klee handles the

dereference normally, since unbound symbolic pointers must have symbolic referents

(Section 3.1). For the moment, assume that symbolic referents only consist of a single

word-sized read expression, as shown in Figure 3.3(a). We consider more complex

cases in Section 3.2.2.

2. uc-klee retrieves the binding metadata (described above) associated with the corre-

sponding referent array (i.e., ref array 99 ). It examines this metadata at the same

offset as the symbolic read expression (i.e., zero) and checks whether all bytes (eight

for 64-bit code) have the value zero, indicating that they are unbound. If so, uc-

klee continues to Step 3. If not, it handles the dereference normally, without lazy

initialization.

3. uc-klee considers an execution path in which this dereference reads from invalid

memory and triggers an error. For our working example, it does so by branching

on the condition xreferent == 0 (i.e., x is a scalar). uc-klee terminates the path on

which this conditions holds and optionally reports an error to the user. For equivalence

checking (Chapters 4–5), if the error occurs in the first version, uc-klee jumps to

the second version of the function to detect cases where the first version of a function

crashes if this pointer is invalid while the second version does not (i.e., because it

does not dereference this pointer). If the error occurs in the second version, uc-klee

checks whether the first version crashed with the same error. On the path where this

error condition does not hold, execution continues to Step 4.

4. To limit data structures to the user-specified depth bound (k -bound [40]), uc-klee


determines the depth of each object by consulting the execution state’s depth table,

which is indexed by the referent array (i.e., ref array 99 ). The depth of the new

object is one greater than the depth of the parent symbolic object from which the

referent array was loaded. For the dereference of the unbound symbolic pointer x +

10, the depth of the new object would be four, since ref array 99 ’s depth is three

(Figure 3.3(b)). If the new object’s depth exceeds the user-specified depth limit,

uc-klee silently terminates the path to suppress unbounded path exploration.

5. uc-klee determines an allocation size for the new object. Section 3.2.3 discusses two

approaches we implemented for sizing lazily-initialized memory objects.

6. uc-klee allocates a new memory object of the size chosen in Step 5.

7. uc-klee generates a new symbolic array and referent array and associates the new

memory object and its shadow memory object with the two arrays, respectively. It

associates zero-initialized binding metadata with the new referent array to indicate

that all pointers in the new object are unbound. Doing so allows lazy initialization

to recursively build data structures as unbound pointers are dereferenced. uc-klee

also inserts an entry for the new referent array in the depth table with a depth one

greater than that of the parent object.

8. uc-klee binds the symbolic pointer to the starting address of the new ob-

ject by adding two path constraints: xreferent == starting address , and x ==

starting address . The former binds the symbolic referent to the starting address

of the new object. The latter binds the base address of the symbolic pointer to the

starting address of the new object. Internally, uc-klee calculates the base address by

converting the referent array back into the symbolic array. This approach avoids the

need for uc-klee to reason about arbitrarily complex symbolic pointers by instead

relying on the referent. For our running example (Figure 3.3(a)), uc-klee converts

xreferent to x by converting (Read w64 ref array 99 ) to (Read w64 sym array 99 )

using an internal mapping it maintains between symbolic and referent arrays.

9. uc-klee marks the symbolic pointer as bound by updating the binding metadata

for each byte of the referent (i.e., bytes 0–7 of ref array 99 ) to one. This prevents

subsequent dereferences of any pointer with the same referent from satisfying Step 2

above and triggering lazy initialization.


10. uc-klee proceeds along the normal code path for handling dereferences of valid point-

ers.

During our experiments, we initially found that uc-klee was sensitive to the user-specified

depth bound (Step 4). With small bounds, our tool terminated without reaching many parts

of the code that depended on deeper data structures. On the other hand, when using large

bounds for all functions, some would fail to achieve high coverage due to path explosion.

Our solution is to incorporate the symbolic input depth into our tool’s weighted-random

search heuristic in order to favor paths with shallower inputs. Doing so allowed our tool to

satisfy the minimum depth bound for some functions while mitigating the path explosion

for others.

3.2.1 Aliasing

By allocating a new memory object whenever an unbound symbolic pointer is first deref-

erenced, uc-klee makes an implicit assumption that symbolic pointers do not alias each

other, nor do they alias existing objects. That is, all unbound symbolic pointers refer to

distinct memory objects.

For acyclic data structures such as singly-linked lists, this assumption is both reasonable

and desirable. For example, consider the listSum example from Figure 2.1(a), shown here

for convenience:int listSum(node *n) {int sum = 0;while (n) {sum += n−>val;n = n−>next;

}return sum;

}

In this function, each iteration of the while loop should visit a distinct object. If any of the

pointers alias, the linked list would form a cycle, and the code would loop infinitely. On the

other hand, if this were a doubly-linked list, the code would expect the prev pointers (not

shown) to be consistent with the next pointers. Otherwise, traversing the doubly-linked list

from head to tail would visit a different set of elements from a backwards traversal. Consid-

ering such malformed inputs may lead uc-klee to report spurious errors (false positives) on

execution paths that may not arise in practice. In addition, not considering inputs involv-

ing aliasing may cause uc-klee to miss valid execution paths and possibly miss legitimate


errors (false negatives).

As a partial solution, uc-klee considers aliasing in cases where the code constrains a

symbolic pointer prior to its first dereference. In practice, these constraints typically arise

in one of two ways. The first is an explicit comparison between pointers, such as head

== tail or p >= q && p < q+N. The second is by subtracting two pointers, such as p -

q. As discussed in Section 3.1.1 (lines 18–23 of Algorithm 3.2), subtraction between two

pointers in C is only defined if the two pointers share a referent. uc-klee exploits this fact

by adding a path constraint that two referents must match whenever symbolic pointers are

subtracted. When an unbound symbolic pointer is first dereferenced, uc-klee checks to see

if the path constraints require that pointer to resolve to an existing object. If so, uc-klee

marks the pointer as bound and skips lazy initialization. If uc-klee were to proceed with

lazy initialization, Step 8 would fail because no new object’s concrete address would satisfy

the path constraints. Using this strategy, uc-klee can correctly explore many execution

paths that require pointer aliasing.

One approach for improving uc-klee’s handling of aliasing in future work would be

to allow the user to provide manual annotations specifying the pointers within a data

structure that should alias one another. In many cases, it may be possible to infer intended

aliasing statically from the code or dynamically using binary instrumentation. However,

such approaches are beyond the scope of this dissertation.

3.2.2 Practical challenges

Our approach to lazy initialization was influenced by a number of challenges that arose in

practice. We discuss these challenges and our solutions below.

Symbolic offsets

In the simple working example above involving the symbolic pointer x + 10 (Figure 3.3(a)),

it is clear that xreferent corresponds to bytes 0–7 (for a 64-bit machine) of the referent array

ref array 99. In many cases, symbolic pointers may involve symbolic offsets. For example,

given a symbolic array of pointers ptrs and symbolic integer i, a symbolic 64-bit pointer

ptrs[i] may correspond to bytes 0–7, 8–15, and so on of ptrs, depending on the value

of i. Symbolic offsets make reasoning about binding metadata non-trivial. Fortunately,

the solution to this problem is fairly straightforward: uc-klee stores the binding metadata

internally as standard memory objects, allowing it to leverage the underlying klee tool’s


mechanism for symbolic offsets. With this approach, uc-klee can mark eight bytes as

bound beginning at symbolic offset i within the referent array for ptrs, rather than having

to choose a single concrete value for i.

Reasoning about symbolic offsets introduces another practical challenge: a symbolic

pointer being dereferenced may be bound or unbound, depending on the values of other sym-

bolic inputs. For example, once the pointer ptrs[i] has been dereferenced (i.e., *ptrs[i])

and bound to a memory object, the code may dereference the pointer ptrs[j]. If i and

j have the same value, this dereference should map to the same object. Otherwise, a new

object should be allocated. To handle such cases, uc-klee forks execution in Step 2 above

on the condition mdbinding = 0, where mdbinding is the value read from the binding meta-

data. The value of mdbinding for ptrs[j] (following a dereference *ptrs[i]) will yield

a symbolic expression precisely capturing the earlier writes to the binding metadata at

offsets i through i + 7. By forking execution, uc-klee will consider both cases. When

mdbinding = 0 (implying that i=j), lazy initialization will proceed. When mdbinding = 0

(implying that i=j), uc-klee will handle the dereference normally by mapping it to the

object allocated by the earlier dereference *ptrs[i]. Forking execution allows uc-klee to

handle other conditional cases that arise in practice as well, such as when a referent includes

a symbolic if-then-else (ITE) expression (lines 40–49 of Algorithm 3.2).

Multiple symbolic values

As another challenge, symbolic pointers often involve multiple symbolic values. For example,

when dereferencing p + q, it is unclear whether p or q should be bound to a new object.

In most cases, the referent for such pointers will be an ITE expression because binary

operations between two symbolic values will generate conditional expressions (Section 3.1.1).

For the pointer p + q, the referent would be:

preferent = 0 ∧ qreferent = 0 ? preferent : (preferent = 0 ∧ qreferent = 0 ? qreferent : 0)

uc-klee handles these cases by forking execution and considering each possible referent

separately. For this example, uc-klee would consider three execution paths: one binding

preferent to a new object, one binding qreferent, and an error path in which the referent is

zero. The error path detects two cases: (1) neither p nor q is a valid pointer, and (2) both

p and q are valid pointers. The latter case is an error since the sum of two pointers is not

a valid pointer. For rare cases where a symbolic pointer involves multiple symbolic values

but where the referent is not a clean ITE expression with mutually-exclusive conditions,


uc-klee examines the debug type information (if available) to see if one of the values has

a pointer type and the other does not. If so, it binds the value that has the pointer type.

This approach worked well in practice. Across all of our experiments, we encountered at

most a handful of functions where neither the ITE expressions nor the debug information

resolved the ambiguity.

Once uc-klee has determined that exactly one of the symbolic values p or q repre-

sents a valid pointer, it must ensure that the current dereference (which triggered lazy

initialization) is in-bounds. Even if it binds p to the starting address of the new object,

the dereference *(p + q) can result in an out-of-bounds dereference if q is a large value

or negative integer. uc-klee ensures the in-bounds access by adding two additional path

constraints: p + q > starting address , and p + q - starting address <= size -

N, where starting address refers to the address of the new object, size refers to the

size in bytes of the new object, and N is the number of bytes being accessed by the current

dereference. The placement of the terms on each side of the inequality is important because

adding or subtracting terms from both sides does not produce equivalent expressions in bit

vector (modulo) arithmetic.

Constrained pointers

To implement certain optimizations, library code often checks for pointer alignment. For

example, the implementation of strlen in Figure 4.5(a) (p. 47) uses the UNALIGNED macro.

On paths where this macro evaluates to true for a symbolic pointer p, uc-klee would add

the path constraint (p & 0x7) != 0. When the code later dereferences p for the first time,

uc-klee must find a concrete address for the new memory object that satisfies all the

current path constraints. Otherwise, uc-klee could not bind the pointer to the new object

without contradicting the current path condition. In this example, the path constraints

require uc-klee to find an address that is unaligned.

While alignment could be handled as a special case, we implemented a more general

solution that supports most pointer constraints we encountered in practice. uc-klee tries

to find a satisfying concrete address from two reserved regions in a state’s address space: one

beginning at a low address, and the other at a high address. uc-klee issues SMT queries to

find an available satisfying address within one of these regions. If it fails to find a satisfying

address, the tool terminates the path and reports an error to the user. These errors arose

occasionally in practice, but they often occurred on paths that were not feasible (e.g., the


constraints required two pointers from distinct objects to overlap). In our experiments, we

conservatively assumed that these errors may be due to cases mishandled by our tool, and

we marked the affected functions as unverified.

Debug information

Incorporating the use of debug information into uc-klee proved surprisingly difficult. For

example, we noticed that the llvm linker was failing to include large portions of debug

information in its output module, and we began maintaining a local fork of llvm that fixed

this behavior (and other bugs). In addition, we noticed that the crude type system used by

llvm bitcode instructions often differed significantly from the types reflected in the module

debug information. uc-klee tracks the debug types for each object in a best-effort manner,

but it sometimes fails due to these discrepancies. Fortunately, the backtracking technique

described in Section 3.2.3 relegates debug information to an issue of speed and user-friendly

error reporting (Section 3.3) rather than correctness.

3.2.3 Object sizes

Our initial implementation of lazy initialization required all objects to have concrete (non-

symbolic) sizes, as in the underlying klee tool. Choosing an appropriate concrete size

presents a challenging tradeoff. A size that is too small will cause later memory accesses to

fail and the path to terminate, potentially limiting coverage or missing bugs. A size that is

too large may miss legitimate out-of-bounds errors. To robustly handle this tradeoff, uc-

klee implements a form of backtracking. At each lazy initialization, uc-klee checkpoints

the execution state and chooses an initial allocation size (discussed below). If the path

later reads out-of-bounds from this object, uc-klee (1) emits the error to the user, and (2)

restores the checkpoint and uses an allocation size large enough to satisfy the most recent

memory access. uc-klee records the sequence of branches taken after each checkpoint, and

it forces the path to replay the sequence of branches after increasing the allocation size.

uc-klee accepts a command-line option to limit the number of times a state backtracks

for each memory object to suppress paths with unbounded loops (e.g., until a termination

character is found). Note that this limit is distinct from the user-supplied depth limit

(k -bound) for lazy initialization.

The initial concrete size uc-klee chooses during lazy initialization is the maximum of

the following:


• A user-supplied minimum size (e.g., eight bytes).

• The size of the pointed-to type based on the crude type system used by llvm bitcode.

• The size of the pointed-to type based on available debug information that uc-klee

tracks whenever possible.

• The minimum size that satisfies the dereference that triggered lazy initialization (e.g.,

14 bytes if four bytes are loaded from the pointer x + 10).

• The minimum size that satisfies the dereference that triggered backtracking.

An implicit assumption uc-kleemakes during lazy initialization is that the base address

(e.g., x for the pointer x + 10) points to the start of a memory object. In certain cases, this

pointer should refer to some offset within an object. In practice, this manifests when code

later dereferences the pointer at a negative offset relative to the initial dereference. One

particularly tricky example that arose in our experiments involved linked lists in the Linux

kernel. In Linux, any C struct type may be chained together as a linked list by adding

a member of type list head anywhere within the struct definition. While iterating over

the linked list, a pointer to the current element is obtained using the list entry macro

(Figure 3.4(b)), which is a wrapper for the container of macro (Figure 3.4(a)). This

macro subtracts the offset of the list head member within the C struct from the provided

pointer (e.g., queue->list.next), yielding a negative offset.

Fortunately, the backtracking approach described above solves the negative offset prob-

lem as well. When code reads a lazily-initialized object out-of-bounds at a negative offset,

uc-klee backtracks and adjusts the pointer binding appropriately. For example, if a deref-

erence of the pointer x + 10 is followed by a dereference of the pointer x - 5, uc-klee

backtracks to the first dereference and binds x to an offset of five bytes within the newly

allocated object. Note that xreferent still refers to the first byte of the object.

An alternative approach we have recently incorporated into uc-klee is to use symbolically-

sized objects for lazy initialization. Here, uc-klee uses a symbolic value to represent an

object’s size, rather than selecting a single concrete value. Doing so avoids the need for

backtracking in most cases (except for negative offsets) by simultaneously considering many

possible object sizes. At each memory access, uc-klee considers whether the offset could

exceed the symbolic size. If so, it emits an error to the user, as it does when a memory

access can exceed an object’s concrete size. On paths where the access does not exceed the


/*** container of - cast a member of a structure out to the containing structure* ptr: the pointer to the member.* type: the type of the container struct this is embedded in.* member: the name of the member within the struct.**/#define container of(ptr, type, member) ({ \

const typeof( ((type *)0)−>member ) * mptr = (ptr); \(type *)( (char *) mptr − offsetof(type,member) );})

(a) Linux kernel container of macro (include/linux/kernel.h)

/*** list entry - get the struct for this entry* ptr: the &struct list head pointer.* type: the type of the struct this is embedded in.* member: the name of the list struct within the struct.*/#define list entry(ptr, type, member) \

container of(ptr, type, member)

(b) Linux kernel list entry macro (include/linux/list.h)

Figure 3.4: Linux kernel macros triggering negative-offset symbolic pointer dereferences.

object’s size, uc-klee adds a path constraint that the object is at least large enough for

the current offset. Since every memory object in uc-klee is assigned a concrete address,

symbolically-sized objects reserve a user-supplied maximum size within an execution state’s

address space. uc-klee adds a path constraint to set a lower bound on the symbolic size

of an object to the initial concrete size described above. It sets the upper bound to the

user-supplied maximum.

The evaluation in Chapter 6 uses symbolic sizes for lazy initialization, but the earlier

versions of uc-klee we used for the evaluations in Chapters 4–5 did not support this

feature. For implementation reasons, the version of uc-klee we used for the evaluation

in Chapter 5 supported symbolically-sized objects allocated with malloc or alloca, but not

through lazy initialization.


/home1/openssl/commits/patch−20120417−32e1−4a1c/new−4a1cf501/ssl/s3 pkt.c628: static int do ssl3 write(SSL *s, int type, const unsigned char *buf,629: unsigned int len, int create empty fragment)630: {. . .

* 637: SSL3 BUFFER *wb=&(s−>s3−>wbuf); /* first dereference of ’s’ */

imposed constraint(s):referent(s) != 0s == &uc ssl st1referent(s) == &uc ssl st1

638: SSL SESSION *sess;639:

* 640: if (wb−>buf == NULL) /* true branch; first dereference of ’uc ssl st1.s3’ */

< 2 queries >imposed constraint(s):referent(uc ssl st1).s3 != 0uc ssl3 state st1.wbuf.buf == 0uc ssl st1.s3 == &uc ssl3 state st1referent(uc ssl st1).s3 == &uc ssl3 state st1

* 641: if (!ssl3 setup write buffer(s))642: return −1;643:. . .648:649: /* If we have an alert to send, lets send it */

* 650: if (s−>s3−>alert dispatch) /* true branch */imposed constraint(s):uc ssl3 state st1.alert dispatch != 0

651: {* 652: i=s−>method−>ssl dispatch alert(s); /* first dereference of ’uc ssl st1.method’ */

imposed constraint(s):referent(uc ssl st1).method != 0uc ssl st1.method == &uc ssl method st1referent(uc ssl st1).method == &uc ssl method st1uc ssl method st1.ssl dispatch alert == 268435632 /* user-specified function pointer */

* 653: if (i <= 0)654: return(i);655: /* if it went, fall through and send more stuff */656: }. . .853: /* if s->s3->wbuf.left != 0, we need to call this */854: int ssl3 write pending(SSL *s, int type, const unsigned char *buf,855: unsigned int len)856: {. . .

* 885: if (i == wb−>left)886: {

* 887: wb−>left=0;* 888: wb−>offset+=i;* 889: if (s−>mode & SSL MODE RELEASE BUFFERS && /* true branch */

imposed constraint(s):(((uint8 t) uc ssl st1.mode) & 16) != 0

890: SSL version(s) != DTLS1 VERSION && SSL version(s) != DTLS1 BAD VER)* 891: ssl3 release write buffer(s);* 892: s−>rwstate=SSL NOTHING;* 893: return(s−>s3−>wpend ret);

894: }

Figure 3.5: Excerpt of uc-klee path summary for OpenSSL do ssl3 write bug (Figure 5.3).Comments (right) added for clarity.


3.3 Error reporting

With whole program symbolic execution, the symbolic inputs are typically unstructured

strings or byte arrays from command line arguments or file contents. When the tool finds

an error, it is often sufficient to emit a single set of concrete inputs that trigger the error,

along with a description of the error and its location within the program (i.e., a backtrace).

Using these two outputs, a developer may reproduce and fix the bug.

With under-constrained symbolic execution, however, the inputs are often complex,

pointer-rich data structures since uc-klee is directly executing individual functions within

a program. In these cases, a single set of concrete values is not easily understood by a user,

nor can it be given as input to trivially reproduce the error outside the tool because the

pointers expect memory objects to be located at specific addresses.

To provide more comprehensible error reports, uc-klee emits a path summary for each

error. The path summary provides a complete listing of the source code executed along the

path, along with the path constraints added by each line of source. The path constraints

are expressed in a C-like notation and use the available debug information to determine the

types and names of each field. Figure 3.5 shows an excerpt of the path summary for an

the OpenSSL bug (see Figure 5.3) uc-klee discovered that caused security advisory CVE-

2014-0198 [31] to be issued. The asterisks to the left of the line numbers denote source lines

that were executed along the path.

This example shows path constraints that were added for lazy initialization (lines 637,

640, and 652), branches (lines 640, 650, and 889), and symbolic function pointers (line 652,

discussed in Section 3.4). The constraints added on line 637 bind the symbolic pointer

s to the lazily-initialized object uc ssl st1, which is of type struct ssl st (based on

the llvm debug information). Note that the debug type is only used for reporting errors,

resolving function pointers (Section 3.4), and for the initial concrete size (Section 3.2.3);

otherwise, symbolic inputs are handled internally as untyped byte arrays. The “uc ” prefix

is added to all lazily-initialized objects to indicate that they are under-constrained symbolic

inputs. The numeric suffix is used to assign a unique name to each object. On line 640, the

path takes the true branch and adds the constraint uc ssl3 state st1.wbuf.buf == 0,

in addition to triggering lazy initialization of the pointer uc ssl st1.s3.

Lines 650 and 889 show the branches responsible for triggering the NULL pointer deref-

erence bug in do ssl3 write. When an SSL alert is pending (alert dispatch != 0) and


the SSL MODE RELEASE BUFFERS flag is used, OpenSSL crashes. Unfortunately, uc-klee

cannot automatically determine the subset of branches and constraints that form the root

cause of a bug. Existing techniques for finding root causes may be integrated into uc-klee

in future work, such as dynamic slicing [1, 70].

While we found path summaries to be invaluable in understanding the errors reported

by uc-klee, future work remains. In particular, a graphical output format that shows

the pointer relationships between lazily-initialized objects (e.g., as shown in Figure 2.2)

would be helpful. In addition, it may be possible for uc-klee to generate C unit tests

that reproduce the errors by allocating and initializing pointer-rich inputs and invoking the

function. However, we leave these approaches to future work.

3.4 Function pointers

Systems-style C code, such as that used in the Linux kernel, BIND, and OpenSSL, makes

frequent use of function pointers to implement of a degree of object-oriented polymorphism

in C. Specifically, the code uses many C struct types containing function pointers that

are used as object methods. These pointers are set to appropriate values when objects

are created, depending on the type of the object. For example, an object that represents

an SSL connection [47] may have different function pointers for each version of SSL that

may be negotiated by a client. This type of design poses a challenge to under-constrained

symbolic execution because symbolic inputs contain symbolic function pointers. When uc-

klee attempts an indirect call through one of these pointers, it is unclear which function

should be executed.

We currently require that users specify concrete function pointers to associate with each

type of object (as the need arises). When uc-klee encounters an indirect call through a

symbolic pointer, it looks at the object’s debug type information. If the user has defined

function pointers for that type of object, our tool executes that function. Otherwise, it

reports an error to the user and terminates the path. The user can leverage these errors to

specify function pointers only when necessary.

For BIND, we found that most of these errors could be eliminated by specifying function

pointers for only six types: three for memory allocation, and three for internal database

implementations. For OpenSSL, we initially specified function pointers for only three ob-

jects: two related to support for multiple SSL/TLS versions, and one related to I/O. For our


evaluation in Chapter 6, we augmented uc-klee to support specifying regular expression

strings defining the calling contexts under which each function pointer should be invoked.

For example, functions whose names begin with tls1 should invoke TLS connection meth-

ods rather than older SSL methods. Specifying these regular expressions took additional

work, and we currently specify 14 sets of function pointers for OpenSSL. For the Linux

kernel, we did not specify any function pointers; instead, uc-klee “skips” calls to symbolic

function pointers (Chapter 6).

While our current solution requires manual effort to specify function pointers and reg-

ular expressions for matching calling contexts, we hope that future work will explore more

automated approaches. For example, static function pointer alias analysis [43] or dynamic

instrumentation may allow function pointers to be discovered and assigned automatically.

However, these approaches are beyond the scope of this dissertation.

3.5 Limitations

This section enumerates the known limitations of uc-klee, excluding those inherited from

the underlying klee tool, which are listed in Section 2.3.

By directly executing functions without knowledge of their input preconditions, uc-

klee may explore infeasible execution paths and report false positives (spurious errors)

along these paths. If uc-klee erroneously fails to propagate a pointer referent and that

pointer is later dereferenced, uc-klee will terminate the path with an error. It will fail to

explore the remainder of the path and possibly miss true bugs.

uc-klee assumes that symbolic pointers do not alias each other or existing objects

(Section 3.2.1), so it may miss paths that require aliasing and fail to report true errors. It

may also report false positives in cases where it erroneously assumes that two pointers do

not alias, since writes to one pointer will not be returned by reads from the other. Finally,

uc-klee may choose a concrete size for a memory object that fails to expose a potential

out-of-bounds pointer dereference bug for that object. Adding support for symbolically-

sized objects to the evaluation in Chapter 6 largely mitigates this limitation, but bugs may

still be missed if they depend on objects outside the user-specified minimum and maximum

sizes.

Chapter 4

Basic equivalence checking

In this chapter, we describe how under-constrained symbolic execution and uc-klee can

be used to easily verify whether two C functions are equivalent [101]. This ability is useful

in many situations, such as checking: different implementations of the same (standardized)

interface, different versions of the same implementation, optimized routines against a ref-

erence implementation, and finding compiler bugs by comparing code compiled with and

without optimization. Comparing identical code against itself finds bugs in our own tool.

Previously, cross-checking code that takes inputs with complex invariants or complicated

data structures required tediously constructing these inputs by hand. From experience, the

non-trivial amount of code needed to do so can easily dwarf the size of the checked code

(e.g., as happens when checking small library routines). Manual construction also leads to

missed errors caused by over-specificity. For example, when manually building a linked list

containing symbolic data, should it have one entry? Two? A hash table should have how

many collisions and in which buckets? Creating all possible instances is usually difficult or

even impossible. In general, if input has many constraints, a human tester will miss one.

By contrast, using uc-klee is easy: rather than requiring users to manually construct

inputs or write a specification, they simply provide the tool with two routines (written

in raw, unannotated C) to cross-check. uc-klee automatically synthesizes the routines’

inputs (even for rich, nested data structures) and systematically explores a finite number

of their paths. It verifies that the routines produce identical results when fed identical

inputs on these explored paths by checking that they either (1) write the same values to all

escaping memory locations or (2) terminate with the same errors. If one path is correct,

then verifying equivalence proves the other is as well. If the tool terminates, then with some

35

CHAPTER 4. BASIC EQUIVALENCE CHECKING 36

caveats (discussed in Section 4.4), it has verified equivalence up to a given input size.

Because uc-klee leverages the underlying klee system to automatically explore paths

and reason about all values feasible on each path, it gives guarantees far beyond those of

traditional testing, yet it often requires less work than writing even a single test case. This

chapter shows that the approach works well even on heavily-tested code, by using it to cross-

check hundreds of routines in two mature, widely-used open source libc implementations,

where it:

1. Found numerous interesting errors.

2. Verified the equivalence of 300 routines (150 distinct pairs) by exhausting all their

paths up to a fixed input size (8 bytes).

3. Got high statement coverage — the lowest median coverage for any experiment was

90% and the rest were 100%.

A final contribution is a simple, novel trick for finding bugs in the compiler and checking

tool by turning the technique on itself, which we used to detect a serious llvm optimizer

bug and numerous errors in uc-klee.

4.1 Overview

Cross-checking implementations simplifies finding correctness violations because, rather

than requiring that users write a functional specification, it lets the tool use a second

implementation as a reference — functional differences will show up as mismatches. A nat-

ural concern is what happens on invalid inputs. In our experience, real code often shows

crash equivalence, where an illegal input causes the same behavior in both (e.g., when given

a null pointer, both cross-checked routines crash). uc-klee exploits this fact and treats

equivalent crashes as equivalent behavior, but flags when one implementation crashes and

the other does not. (In general, cross-checking cannot detect when two routines make

equivalent mistakes.) This finesse works well in practice. In the rare cases where inputs are

allowed to produce differing results, it is easy for simple, user-written C code to filter these

inputs (discussed further in Section 4.5).

uc-klee checks two C functions for semantic equivalence on a per-path basis. We define

semantic equivalence as having two components:


1. Error equivalence: both functions exit successfully or crash with the same type of

error.

2. Value equivalence: both functions write identical values to all reachable memory

locations. Pointer addresses may differ as long as the pointed-to objects contain

identical values (see Section 4.3).

If a path through both functions (i.e., for the same inputs) exhibits these two properties,

then uc-klee considers the paths equivalent. If uc-klee exhausts all execution paths

through the two functions, then it has verified that the two functions are equivalent up to

the given input size (with caveats described in Section 4.4).

We show how uc-klee works by walking through the simple but complete example in

Figure 4.1, which gives two trivial routines intended to add a value to a structure field and

the cross-checking harness that uc-klee generates to compare them. The user compiles

the routines using uc-klee’s compiler (llvm) and gives the resultant bitcode modules and

two routine names to uc-klee, which links the code against a checking harness and runs

the result. At a high level, the cross-checking harness executes as follows:

1. It marks all function parameters as containing under-constrained symbolic input (i.e.,

representing any possible values of the appropriate sizes). If any of this symbolic

input is used as a pointer and dereferenced, uc-klee will perform lazy initialization

(Section 3.2).

2. It uses symbolic execution to explore (ideally all) paths in the two implementations,

checking that they produce identical effects when run on the same values.

3. If a path’s constraints permit a value that causes an error (such as a division by zero,

null pointer dereference, or assertion failure), uc-klee verifies that the other routine

shows error equivalence (i.e., it terminates with the same error when run with the

same input values). uc-klee also forks execution and explores a path on which the

error does not occur so that it can cross-check the routine on the remaining values.

4. At the end of each path, uc-klee traverses all reachable memory and uses an SMT

solver to prove value equivalence (i.e., that the functions’ outputs have equivalent

contents at the end of both paths). If this check fails, it generates a concrete input

to demonstrate the difference. If the check succeeds, then with some caveats (see


Section 4.4), uc-klee has verified the two routines as equivalent on that path since

the constraints it tracks are accurate and exact (down to the level of a single bit).

Thus, if one path is correct, uc-klee has verified that the other path is correct as

well.

Note that uc-klee’s equivalence guarantee only holds on the finite set of paths that it

explores. Like traditional testing, it cannot make statements about paths it misses. How-

ever, in many cases, even if there are too many paths, uc-klee can at least show total

equivalence up to a given input size.

At a more detailed level, the code in Figure 4.1 works as follows:

Lines 14–18 Stack allocates two variables to pass as the routine’s parameters (f and v)

and marks them as symbolic.

Line 21 Creates a copy of the current address space, which it will restore later so that

add bad runs on identical values.

Line 22 Uses klee eval to run add. This call returns once for each path explored in add.

If add terminates with an error, the error is stored in e1.

Line 3 At the first dereference f→val, uc-klee checks if f can be an invalid pointer (i.e.,

freferent = 0). See Sections 3.1–3.2 for a complete discussion on pointer referents. Since f

and freferent have no constraints on their values, f may be an invalid pointer, so uc-klee

forks execution and continues as follows:

Error path Adds the constraint that freferent = 0 records in e1 that a null dereference

error occurred, and returns from klee eval.

Non-error path Adds the constraint that freferent = 0 and attempts to resolve the deref-

erence. It determines that f is an unbound symbolic pointer, so it allocates memory, marks

it as symbolic, binds it to f, and continues executing until the path completes. It then

returns from klee eval.

Line 22 (after klee eval) The two paths execute independently through the remaining

code.

Line 23 Records the memory state produced by running add, which it later compares

against the memory state produced by running add bad.


1 : // two routines to cross−check.2 : int add(foo *f, int v) {3 : f−>val = f−>val + v;4 : return f−>val;5 : }6 : int add bad(foo *f, int v) {7 : f−>val = f−>val + 1;8 : return f−>val;9 : }10:11: // harness provided by uc−klee12: void main() {13: klee err e1,e2;14: int retv, v;15: foo *f;16:17: klee make symbolic(&f);18: klee make symbolic(&v);

19:20: // record memory state "add" runs on.21: int s0 = klee snapshot();22: klee eval(retv = add(f,v), &e1);23: int s1 = klee snapshot();24:25: // discard writes, keep path constraints26: klee restore(s0);27: klee eval(retv = add bad(f,v), &e2);28: int s2 = klee snapshot();29:30: // compare results.31: if (!klee compare errors(&e1,&e2)32: | | !klee compare(s1, s2, &f)33: | | !klee compare(s1, s2, &v)34: | | !klee compare(s1, s2, &retv))35: klee error("Mismatch!\n");36: }

Figure 4.1: Trivial but complete cross-checking example.

Line 26 Restores the values of f and v that the current path ran add on so that add bad

runs on identical values. It discards all writes add performed (otherwise add bad would run

with a modified value for f→val), but it preserves all constraints, including any pointers it

lazily bound (i.e., the dereference of f on line 3).

Line 27 Evaluates add bad using klee eval. The error path also returns with an invalid

pointer error (since the path has the constraint freferent = 0 and line 7 dereferences f).

The non-error path executes without error; the dereferences of pointer f (lines 7–8) resolve

to the same object lazily allocated at line 3.

Line 31 Checks that both paths returned with the same error state (they did).

Lines 32–34 Checks that the values transitively reachable from the routines’ outputs in

each memory state are equivalent (Section 4.3 describes this analysis in more detail). On

the non-error path, the check for f (line 32) fails and produces a test case with v equal to

some value other than 1 (the single value for which both routines return identical results).

Notes While the example declares the input variables f and v with their static types for

readability, as far as uc-klee is concerned they could have been untyped byte arrays (which

is how uc-klee treats them in any case) since our implementation correctly handles casting

between pointers and integers.


4.2 Path pruning

When checking the equivalence of two functions, paths that execute identical sequences

of instructions through both functions are trivially equivalent. If the two functions are

from closely-related versions of the code (e.g., before and after a patch is applied), a high

percentage of execution paths through the two functions may be trivially equivalent.

For performance, uc-klee avoids executing paths with identical llvm instruction se-

quences by pruning such paths. The tool includes a static cross-checker that initially walks

over the llvm control flow graph and conservatively marks regions of basic blocks that differ

between the two functions. This algorithm is fairly straightforward, and we elide details

for brevity. As uc-klee symbolically executes paths through the two functions, it soundly

prunes (silently terminates) paths when:

1. All previously executed basic blocks are identical.

2. All reachable basic blocks are identical.

The second condition uses an inter-procedural basic block reachability analysis that we

integrated into uc-klee. Paths meeting both of these criteria are safe to prune because

they do not execute any differing instructions and therefore produce the same results or

trigger the same errors.

4.3 Object Comparison

We define two routines as being equivalent on a path if they write identical values to all

memory objects transitively reachable from their return value and each of their formal ar-

guments. That is, pointer values (addresses) can differ as long as the objects they point to

contain identical values. uc-klee checks this property by doing a mark and sweep of all

reachable memory and using the constraint solver to prove that all non-pointer bytes are

equal. In the concrete case, comparisons reduce to constants, avoiding expensive satisfia-

bility queries. For symbolic bytes that neither routine modifies, the values in each address

space snapshot contain identical symbolic expressions, which are trivially equivalent. If uc-

klee detects a pointer, it adds the referenced objects (from each snapshot) to a queue for

later traversal, rather than comparing the objects’ addresses, which may differ between the

two procedures. In the case of pointers stored into memory at symbolic offsets, it is possible


for a particular value to resolve to multiple objects. In this case, uc-klee examines every

pair of objects (Cartesian product) to which the two pointers could resolve. If a single pair

of objects differs, uc-klee flags the error. uc-klee also verifies that the pointers are at

identical offsets into their respective objects and flags any discrepancies as non-equivalences.

To identify which values correspond to pointers, uc-klee examines the referents stored

in the shadow memory for each heap object (see Section 3.1). If the referents for both

values are zero or unbound symbolic input, uc-klee issues a query to verify that both

values must be identical. If the values have non-zero referents or have been bound to an

object, uc-klee adds the referenced objects to the traversal queue and skips the remaining

bytes in that machine word.

4.4 Limitations

uc-klee’s bounded equivalence verification is subject to a number of caveats and limita-

tions, including those enumerated in Section 3.5. In addition, uc-klee can only verify

the equivalence of functions that have identical signatures, including argument types and

return values. In particular, uc-klee cannot verify the equivalence of functions that have

arguments of differing C struct types because uc-klee currently requires inputs to the

two functions to be byte-for-byte identical. We hope that future work will extend uc-klee

to support such differences by implementing a type map that supplies identical inputs to

each function in a “field aware” manner. Alternatively, the user could supply a simple

translation function written in C, which would only require a minimal modification to our

system. However, our current system does not support either of these approaches, and we

excluded such function pairs from our experimental evaluations.

If uc-klee incorrectly propagates pointer referents (Section 3.1), it may miss-identify

pointers during object comparison (Section 4.3). If uc-klee mistakenly identifies a pointer

as a scalar, it may emit spurious errors based on pointer address mismatches, or it may

hide true errors by not reaching and comparing the pointed-to memory objects. If uc-klee

propagates a non-zero referent for a scalar value, it will only treat it as a pointer if the

address resolves within the object identified by the referent. We believe this additional

check would prevent most missed errors and false negatives caused by over-permissiveness

in our referent propagation policy (Section 3.1.1).

During cross-checking, we only invoke a routine a single time and check it in isolation,


missing behaviors that require multiple invocations or coordination across routines. Such

behaviors typically involve state stored in global or static variables, which we do not treat

as symbolic inputs because there may not be a one-to-one mapping between the global and

static variables used by two different implementations of a function. This limitation does

not apply in Chapter 5, when we consider closely-related version of code and do treat global

and static variables as under-constrained symbolic inputs.

In addition, uc-klee does not identify the root causes of bugs. It may assume error

equivalence if both versions of a function have (different) bugs that cause the functions to

crash for a given input. In this case, it would not report either error to the user. Finally,

uc-klee aims to detect crashing bugs and does not look for performance bugs, differences

in most system call arguments, or concurrency errors.

4.5 Annotation filters

A tight specification that maps each input value to a single output value provides the

simplest use case for uc-klee since any difference between implementations constitutes

a bug. For looser specifications that include “don’t cares,” user effort may be needed to

suppress uninteresting differences that uc-klee would otherwise report. Examples include

permitting code to do anything when fed illegal input values or representing “success”

by any non-zero integer rather than a single, specific value (e.g., one). Note that the

problems caused by permitting flexibility are not specific to uc-klee—any method (such

as randomized or specification-based testing) that checks output values or behavior has to

deal with them.

uc-klee provides a simple yet general mechanism for eliminating uninteresting mis-

matches. Instead of invoking checked code directly (lines 22 and 27 in Figure 4.1), it passes

the checked routine and its arguments to a user-supplied function, which calls the checked

routine after filtering its input (e.g., by using an if-statement to skip illegal values) and then

returns the (possibly canonicalized) return value.

Figure 4.2 shows an example filter for the isdigit function in the C library, specified

to return non-zero if its input represents a digit in ASCII and 0 otherwise. The filter first

rejects input values that fall outside the range specified in the C standard (line 2). It

then invokes the passed-in isdigit function (line 4) and canonicalizes all non-zero return

values to 1. In our experiments, this routine eliminated all mismatches for isdigit and 11


1 : int isdigit f(int (*f)(int), int c) {2 : if (c < EOF | | c > 255)3 : return 0;4 : return ((*f)(c) != 0);5 : }

Figure 4.2: Simple filter routine.

analogous routines.

In practice, even if a specification permits variable behavior, code tends to behave sim-

ilarly. In fact, the most widespread use for uc-klee we envision—checking new versions

of code against old versions—suffers from this problem the least since such decisions are

consistent across revisions. Even where we would expect the most variance in behavior—

independently-developed codebases fed error inputs—implementations tend to behave simi-

larly. For example, in our experiments, checked routines typically crashed on illegal pointer

inputs rather than returning differing values.

Many of the differences uc-klee found illustrated needless ambiguities in the underlying

standard, which permitted divergent behavior without a subsequent gain in speed, power,

or simplicity. We believe uc-klee may be applicable as a tool for automatically finding

such specification imprecisions.

In a sense, uc-klee inverts the typical work factor for checking code: a traditional

specification-based approach requires specifying what behavior the user cares about (i.e.,

the functionality the code should implement), whereas uc-klee infers this information

“for free” by cross-checking implementations. On the other hand, uc-klee (may) require

specifying the “don’t care” behaviors (when code is allowed to differ), which typically takes

orders of magnitude less effort than specifying functionality. Further, users only need to

specify these details on demand, after uc-klee detects an uninteresting mismatch. By

contrast, specification verification requires non-trivial work before doing any checking.

4.6 Evaluation

This section shows that uc-klee works well at verifying equivalence by cross-checking re-

cent versions of two heavily-tested open source C libraries: uClibc [115], an implementation

of the C standard library targeted at embedded devices, and Newlib [91], an embedded

libc implementation by Red Hat used by Cygwin [36] and Google Native Client [122]. We


demonstrate its effectiveness on three common use cases, cross-checking: different imple-

mentations of the same interface, different versions of the same code, and identical code to

find errors in the verification toolchain (in our case: the llvm compiler and uc-klee itself).

We measure the quality of cross-checking in two ways: (1) crudely, by the statement

coverage it achieves, and (2) by whether checking exhausts all paths and terminates, since

that verifies that the routines are equivalent up to a fixed input size when invoked a single

time (modulo the limitations discussed in Section 4.4).

It is a bit tricky to measure statement coverage for library code. We compute coverage of

a cross-checked routine as a percentage of the total number of llvm instructions reachable

from it, with the exception that when routine a calls another exported routine b that we

will also cross-check, we exclude b’s instructions from a’s coverage statistics. Usually, such

calls can only exercise a small fraction of b’s code (e.g., when a calls printf with a format

string that just contains “hello world”). On the other hand, if a calls c and we do not

generate a test harness for c, we do count its instructions since we conservatively assume it

is an internal helper function that a should thoroughly exercise. Note: every instruction is

included in the coverage statistics for at least one procedure.

For all experiments, we ran uc-klee on each function for up to 10 minutes and al-

lowed each function to read from up to 2 symbolic files of 10 bytes each (klee argument

--sym-files 2 10). This was in addition to providing under-constrained symbolic values

for each function argument. While we did not treat global variables as symbolic inputs (be-

cause each implementation used a different set of globals), we did mark the global environ

pointer as an under-constrained symbolic input in order to explore paths that access envi-

ronment variables. For this evaluation, our machine ran Fedora Linux 12 and featured a

quad-core 2.8 GHz Intel i7-930 processor and 12GB of RAM.

4.6.1 Different implementations: Newlib vs. uClibc

Our first experiment cross-checks Newlib’s source repository from July 2010 against uClibc

version 0.9.31. We modified both libraries to use uc-klee’s memory allocator. We also

disabled several uClibc internal startup and shutdown tasks that interfered with uc-klee.

Finally, to keep the experiments manageable, we disabled optional features, such as wide

character and locale support.

We automatically generated a test harness for each routine that both libraries imple-

mented with the exception of variadic routines or those whose prototypes differed. We could


0

25

50

75

100

0 20 40 60 80 100 120 140

InstructionCoverag

e(%

)

Procedures

Newlib vs. uClibc 0.9.31

0

25

50

75

100

0 50 100 150 200Procedures

uClibc 0.9.30.3 vs. uClibc 0.9.31

VerifiedDifferent

Partially verified

Figure 4.3: Instruction coverage reported by our cross-checking experiments. Each vertical barrepresents a single procedure, sorted by coverage. The “partially verified” category includes routineswhose analysis did not complete within 10 minutes or hit a limitation in klee or uc-klee. Themedian statement coverage for the left graph was over 90% (59 routines had 100%) and for the rightwas 100% (105 had 100%).

extend our system to support the former by generating multiple test cases for different num-

bers of arguments. Our experiment tested all other exported procedures, even those that

demonstrate weaknesses in our tool.

Figure 4.3 (left) shows the coverage reported by uc-klee. In the routines where uc-

klee found no differences, it checked 66 to termination (versus 15 where it exceeded the time

limit), thereby verifying equivalence for the given input size, despite many having entirely

different structure and overall appearance to the human eye. The two implementations of

ffs (“find first bit set”) in Figure 4.4 are a simple example: uc-klee exhausted all 33 paths

in the test harness and terminated after 6.8 seconds, reaching 100% statement coverage.

Figure 4.5 illustrates a more complex example: two implementations of strlen for

which uc-klee exhaustively verifies 16 paths (for eight-byte inputs) in under 0.5 seconds.

uClibc (Figure 4.5(b)) provides a standard, unoptimized implementation, while Newlib

(Figure 4.5(a)) provides an optimized version that uses aligned, word-sized NULL checks

(lines 27–28) to reduce the number of loop iterations in many cases. Lines 17–22 iterate

over the unaligned bytes in the string preceding the first word boundary. When uc-klee

executes line 17, it forks execution and adds the constraint (str & 7) = 0 to the true path.

When that path executes line 19 and dereferences the unbound symbolic pointer str for the


1 : int ffs (int word) {2 : int i=0;3 : if (!word)4 : return 0;5 : for (;;)6 : if (((1 << i++)&word) != 0)7 : return i;8 : }

(a) Newlib

1 : int ffs(int i) {2 : char n = 1;3 : if (!(i & 0xffff)) { n += 16; i >>= 16; }4 : if (!(i & 0xff)) { n += 8; i >>= 8; }5 : if (!(i & 0x0f)) { n += 4; i >>= 4; }6 : if (!(i & 0x03)) { n += 2; i >>= 2; }7 : return (i) ? (n+((i+1) & 0x01)) : 0;8 : }

(b) uClibc

Figure 4.4: Two implementations of ffs (“find first set bit”) that uc-klee verifies as equivalent.

first time, uc-klee finds a concrete address for the new object that satisfies this constraint

(i.e., is unaligned). Once this concrete address has been assigned to the object (and str

is bound to that address), subsequent iterations of the loop do not fork additional paths

at line 17. Consequently, uc-klee considers one but not all possible unaligned addresses

(recall from Section 2.3 that concrete addresses are a limitation of our implementation).

The word-sized NULL check at line 27 presented us with an interesting dilemma while

we implemented uc-klee. Technically, these word-sized reads may extend past the end of

the memory object, which, strictly speaking, violates the C specification and constitutes an

error. However, this was intentional behavior (as we confirmed with the Newlib developers),

and no mainstream architecture would throw a page fault as a result of reading additional

bytes within the same machine word. We resolved this issue by adding an option to uc-

klee to “tolerate” such out-of-bounds reads (within the same machine word) rather than

generating an error, which worked well in practice. Valgrind’s memcheck tool [90] provides

a similar option (--partial-loads-ok).

uc-klee found differences in 57 of the 143 functions checked, at least 7 of which were

real bugs—despite the code being heavily tested, actively-used, and designed to do well-

understood tasks. One interesting example was an error in Newlib’s implementation of

remove (Figure 4.6), which the POSIX standard mandates should work for both files and

directories. uc-klee detects that Newlib returns -1 (error) while uClibc returns zero

(success) when the symbolic input filename could refer to a directory. This error would

be difficult to detect statically.

We achieved high statement coverage in most but not all procedures. One common

cause of low coverage is that we only cross-check code using a single invocation. In certain

cases, multiple invocations of a routine are required in order to reach additional code. In

other cases, one routine may write values to global or static variables that are read by


1 : #define LBLOCKSIZE (sizeof (long))2 : #define UNALIGNED(X) ((long)X & (LBLOCKSIZE − 1))3 :4 : /* Nonzero if X (a long int) contains a NULL byte. */5 : #define DETECTNULL(X) (((X) − 0x0101010101010101) & ˜(X) & 0x8080808080808080)6 :7 : size t8 : DEFUN (strlen, (str),9 : CONST char *str)10: {11: CONST char *start = str;12:13: #if !defined(PREFER SIZE OVER SPEED) && !defined( OPTIMIZE SIZE )14: unsigned long *aligned addr;15:16: /* Align the pointer, so we can search a word at a time. */17: while (UNALIGNED (str))18: {19: if (!*str)20: return str − start;21: str++;22: }23:24: /* If the string is word-aligned, we can check for the presence of25: a null in each word-sized block. */26: aligned addr = (unsigned long *)str;27: while (!DETECTNULL (*aligned addr))28: aligned addr++;29:30: /* Once a null is detected, we check each byte in that block for a31: precise position of the null. */32: str = (char *) aligned addr;33:34: #endif /* not PREFER SIZE OVER SPEED */35:36: while (*str)37: str++;38: return str − start;39: }

(a) Newlib

1 : size t strlen(const Wchar *s)2 : {3 : register const Wchar *p;4 :5 : for (p=s ; *p ; p++);6 :7 : return p − s;8 : }

(b) uClibc

Figure 4.5: Two implementations of strlen that uc-klee verifies as equivalent.


1 : int remove r(struct reent *ptr,2 : const char *filename) {3 : if ( unlink r (ptr, path) == −1)4 : return −1;5 : return 0;6 : }7 :8 : int remove(const char *filename) {9 : return remove r( REENT, filename);10: }

(a) Newlib

1 : int remove(const char *filename)2 : {3 : int saved errno = errno;4 : int rv;5 : rv = rmdir(filename);6 : if ((rv < 0) && (errno == ENOTDIR)) {7 : set errno(saved errno);8 : rv = unlink(filename);9 : }10: return rv;11: }

(b) uClibc

Figure 4.6: uc-klee detects that Newlib does not handle directory removal correctly.

another. A good example of both is atexit, which registers routines to be run on program

termination by exit. A simple extension would allow uc-klee to handle such cases.

4.6.2 Different versions of the same implementation: uClibc

To measure uc-klee ’s effectiveness at cross-checking different versions of the same code,

we used it to compare all functions that appeared in both uClibc 0.9.30.3 (March 2010)

and uClibc 0.9.31 (April 2010) that were not byte-code identical. This selection yielded 203

routines (out of 399 possible), each of which uc-klee analyzed for up to 10 minutes.

Figure 4.3 (right) plots the instruction coverage. uc-klee revealed 2 previously un-

known bugs and also detected 5 instances of bugs that were patched in the newer release.

We elide a detailed discussion for brevity and instead provide an example for each.

The newer version of uClibc introduced a bug in ctime (used to convert a time record

to a string). The older version used a persistent internal structure (i.e., static) for storage

that lacked thread safety. The newer version instead used a stack-allocated buffer that it

never initialized. A sufficiently large input value caused the returned string to differ, which

uc-klee detected.

uc-klee confirmed that a number of bugs present in uClibc 0.9.30.3 were corrected

in version 0.9.31. One example is unsetenv (Figure 4.7). The old code (Figure 4.7(a))

terminated with a NULL pointer dereference when environ is NULL (e.g., after a call to

the function clearenv), while the new code (Figure 4.7(b)) exited gracefully.


1 : char **ep = environ;2 : while (*ep != NULL) { . . . }

(a) uClibc 0.9.30.3

1 : char **ep = environ;2 : if (ep) while (*ep != NULL) { . . . }

(b) uClibc 0.9.31

Figure 4.7: Bug fixed in uClibc 0.9.31 confirmed by uc-klee

4.6.3 Checking the checker: finding bugs in uc-klee and llvm

A standard caveat in software verification is that claims are contingent on the correctness

of the verifier and underlying compiler implementations. One of our contributions is the

realization that one can detect errors in both by simply attempting to prove the equivalence

of identical code, thus turning the verification system on itself.

Finding compiler optimizer bugs

We check that an optimizer has correctly transformed a program path by compiling the

same routine both with and without optimization and cross-checking the results. With the

usual caveats, if any possible value exists that would cause the path to give different results,

uc-klee will detect it. If there is no such value, it has verified that the optimizer worked

correctly on the checked path. If it terminates, it has shown that the optimizer transformed

the entire routine correctly. Any discrepancies it finds are due to either compiler bugs or

the routine depending on unspecified behavior (e.g., function argument evaluation order).

Because the library code we checked intends to be portable, even use of unspecified compiler

behavior almost certainly constitutes an error.

We compared all 622 procedures in uClibc 0.9.31, compiled with no optimization (-O0)

versus high optimization (-O3). This check uncovered at least one bug in llvm 2.6’s opti-

mizer but did not expose its root cause. For memmem, uc-klee reported a set of concrete

inputs where the unoptimized code returned an offset within haystack (the correct result),

while the optimized code returned NULL, indicating that needle was not found in haystack.

We confirmed this bug with a small program. Since llvm is a mature, production compiler,

the fact that we immediately found bugs in this simple way is a strong result. We found a

total of 70 differences, but because of time constraints could not determine whether they

were due to this bug or others. Future work will be necessary to test optimization levels be-

tween these two extremes and attempt to automatically find a minimal set of optimization

passes that yield an observable difference.


Newlib/ uClibc llvm uc-kleeuClibc Versions Optimizer Self-check

Procedures Checked 143 203 622 622

Procedures Verified 66 84 335 335Differences Detected 57 20 70 12No Differences (timeout) 15 30 85 91klee Limitations 4 56 94 147uc-klee Limitations 1 13 38 37

100% Coverage 59 105 367 375Mean Coverage 72.2% 80.7% 85.6% 85.6%Median Coverage 90.1% 100.0% 100.0% 100.0%

Table 4.1: Breakdown of procedures checked in each experiment.

Finding uc-klee bugs

In general, tool developers can detect verifier bugs by simply cross-checking a routine against

another identical copy of itself (i.e., compiled at the same optimization level). This check

has been a cornerstone of debugging uc-klee—it often turned up tricky errors after devel-

opment pushes.

The uc-klee bugs we found fell into two main categories: (1) unwanted non-determinism

in uc-klee and its POSIX model, which makes it hard to replay paths after backtracking

(Section 3.2.3) or get consistent results, and (2) bugs in our initial pointer tracking ap-

proach. In fact, as a direct result of the tricky cases cross-checking exposed in this pointer

tracking implementation, we threw it away and instead designed the much simpler and

robust referent-based approach described in Section 3.1.

4.6.4 Results summary

Table 4.1 summarizes the results presented in this section. The “klee Limitations” row

describes procedures that resulted in incomplete testing due to limitations in the underlying

klee tool: inline assembly (141 procedures), external calls with symbolic arguments (206),

and unresolved external calls (17). “uc-klee Limitations” are cases where the tool failed to

lazily allocate objects either because the required size of the object exceeded our specified

maximum of 2KB (20 procedures) or uc-klee was unable to allocate an object whose

address satisfied the path constraints (117 procedures). Note that limitations resulted in

individual paths being terminated. As a result, certain procedures encountered a variety of

limitations on different paths. In particular, a procedure deemed “klee limited” may have

also encountered uc-klee limitations, although the converse is not true.

Chapter 5

Scalable equivalence checking

This chapter focuses on an important use case of uc-klee: checking that patches do not

introduce new crashes. We first describe techniques for pruning paths and mitigating the

effects of false positives (spurious errors), followed by an experimental evaluation of uc-klee

on over 800 patches from BIND and OpenSSL, which found 12 bugs and exhaustively verified

(with caveats) that 115 patches do not introduce crashes. Finally, we demonstrate that uc-

klee can discover portability bugs by cross-checking functions compiled with different build

configurations.

5.1 Patch checking

To check whether a patch introduces new crashing bugs, uc-klee cross-checks two versions

of a function: P , the unpatched version, and P ′, the patched version. If it finds any

execution paths in which P ′ crashes but P does not (when given the same symbolic inputs),

then there may be a bug in the patch. If uc-klee exhausts all execution paths through the

two functions and finds no paths on which P ′ crashes but P does not, then it has verified

(with caveats) that the patch does not introduce any new crashes.

To cross-check the two versions, uc-klee uses an automatically-generated test harness,

as described in Section 4.1. In addition to generating under-constrained symbolic inputs

for the functions’ arguments, however, uc-klee also generates under-constrained symbolic

inputs for global and static variables that the two versions have in common. Since the

two versions are closely-related, marking these variables as symbolic allows uc-klee to

exhaustively explore paths through the two functions, even for functionality that would

51

CHAPTER 5. SCALABLE EQUIVALENCE CHECKING 52

otherwise require multiple calls to the functions or involve interactions with other functions.

In Chapter 4, we used uc-klee to verify the equivalence of small library routines, both

in terms of error equivalence (crashes) and value equivalence (outputs). While detecting

differences in output may point to interesting bugs, these discrepancies are typically mean-

ingful only to the developers of the checked code. Because this chapter evaluates uc-klee

on hundreds of patches from large, complex systems developed by third parties, we limit

our discussion to crashes, which objectively point to bugs. When checking for portability

bugs in Section 5.3.3, however, we verify both error equivalence and value equivalence since

build options should not affect the outputs of portable code.

5.1.1 Path pruning

Recall from Section 4.2 that uc-klee statically identifies differing regions of basic blocks

and prunes paths that execute identical sequences of instructions through both functions.

When checking whether patches introduce bugs, uc-klee leverages additional pruning op-

portunities. For this use case, uc-klee first executes the patched function P ′, followed by

the unpatched function P . As it executes P ′, it prunes paths that either:

1. Return from P ′ without triggering an error, or

2. Trigger an error without reaching differing basic blocks.

In the first case, we are only concerned with errors introduced by a patch, so we can clearly

ignore paths in which the patched version P ′ does not crash. In the second case, P and

P ′ would execute identical instruction sequences up to the point where the error occurs, so

they are guaranteed to both crash with the same error.

In addition to path pruning, uc-klee aggressively performs error uniquing by associat-

ing each path executing P with the location of the error that occurred in P ′—an error must

have occurred in P ′ or condition (1) above would have caused the path to be pruned. Once

uc-klee executes a path that returns from P without crashing (and reports the error in

P ′), it prunes all other pending paths in P associated with the same error in P ′. Further, it

prunes any future paths that trigger the same error in P ′. In practice, this enabled uc-klee

to prune thousands of redundant error paths.


5.2 False positives

Typically, implementors care about errors on legal inputs and ignore errors when code is

misused on illegal values. Since uc-klee analyzes patched functions in isolation rather than

from a program entry point (i.e., main), it can miss important input preconditions and thus

explore paths using invalid inputs, potentially leading to large numbers of false positives. As

described in Section 4.1, real code is often crash equivalent on invalid inputs. For example,

two versions of a function that requires non-NULL pointers might both crash when a NULL

pointer is given as input (rather than return different values). Crash equivalence is especially

common when checking patches, since one function directly descends from the other with

only a modest change. For example, over 60% of the patches we check in Section 5.3.2

produce no differences in error behavior. For those cases where error equivalence does

not hold, we mitigate false positives using manual annotations and automated heuristics,

described below.

5.2.1 Manual annotations

uc-klee supports two types of manual annotations beyond the “filters” described in Sec-

tion 4.5: (1) function call annotations, and (2) data type annotations. Both are written in

C, compiled with llvm, and invoked by uc-klee at runtime. Function call annotations are

used to run specific code immediately prior to calling a function. For example, we wrote a

function call annotation for BIND that runs before each call to isc mutex lock, with the

same arguments:void annot isc mutex lock(isc mutex t *mp) {EXPECT(*mp == 0);

}

The EXPECT macro adds the specified path constraint only if the condition is feasible on

the current path and elides it otherwise. In this example, we avoid considering cases where

the mutex is already locked. However, this annotation has no effect if the condition is

not feasible (i.e., the lock has definitely been acquired along this path). This annotation

allows uc-klee to detect errors in lock usage while suppressing false positives under the

assumption that if a function attempts to acquire a lock supplied as input, then a likely

input precondition is that the lock is not already held. This annotation did not prevent us

from finding the BIND locking bug in receive secure db shown in Figure 5.2.

Data type annotations are invoked at the end of a path, prior to emitting an error


1 : struct isc region {2 : unsigned char * base;3 : unsigned int length;4 : };5 :6 : typedef struct isc region isc region t;7 :8 : int isc region compare(isc region t *r1, isc region t *r2) {9 : unsigned int l;10: int result;11:12: REQUIRE(r1 != NULL);13: REQUIRE(r2 != NULL);14:15: l = (r1−>length < r2−>length) ? r1−>length : r2−>length; /* chooses min. buffer length */

16:17: if ((result = memcmp(r1−>base, r2−>base, l)) != 0) /* memcmp reads out-of-bounds */

18: return ((result < 0) ? −1 : 1);19: else20: return ((r1−>length == r2−>length) ? 0 :21: (r1−>length < r2−>length) ? −1 : 1);22: }

Figure 5.1: Example false positive in BIND. uc-klee does not associate length field with bufferpointed to by base field. Consequently, uc-klee falsely reports that memcmp (line 17) reads out-of-bounds from base.

report. These annotations are associated with named data types and are used to specify

invariants that typically apply to all instances of that data type. Prior to evaluating each

data type annotation, uc-klee discards all writes performed by the execution path, since

the purpose of the annotations is to constrain the set of valid inputs to the function. While

it is desirable to invoke annotations as early as possible in order to suppress false path

exploration, the LLVM debug type information used for identifying each object’s data type

is often not available in the compiled code at the point where lazy initialization is triggered.

Deferring the annotations until the end of each path allows uc-klee to invoke annotations

based on all available debug information.

Figure 5.1 illustrates a motivating example of a false positive from BIND. The

isc region t type consists of a buffer and a length field. The isc region compare

function caused uc-klee to generate hundreds of false positives involving out-of-bounds

memory reads caused by the call to memcmp on line 17. The code selects an appropriate

length on line 15 (the minimum of the two buffer lengths), but uc-klee does not associate

this length field with the size of the buffer pointed to by the base field. Consequently,


Macro DescriptionINVARIANT(condition) Add condition as a path constraint; kill path if infeasible.EXPECT(condition) Add condition as a path constraint if feasible; otherwise, ignore.IMPLIES(a, b) Logical implication: a → b.HOLDS(a) Returns true if condition a must hold; false otherwise.MAY HOLD(a) Returns true if condition a may hold; false otherwise.SINK(e) Forces e to be evaluated; prevents compiler from optimizing it away.VALID POINTER(ptr) Returns true if ptr has a non-zero referent; false otherwise.OBJECT SIZE(ptr) Returns the size of the object pointed to by ptr; kills path if pointer

is invalid.

Table 5.1: Summary of C annotation macros.

uc-klee explores many false paths on which the buffer is smaller than length, resulting

in out-of-bounds reads, despite the code properly checking the length field. We added the

following simple annotation for the isc region t data type to silence these errors:

INVARIANT(r−>length <= OBJECT SIZE(r−>base));

The INVARIANT macro requires that the condition hold. If it is infeasible (cannot be true) on

the current path, uc-klee emits an error report with a flag indicating that the annotations

have been violated. We use this flag to filter out uninteresting error reports. This simple

annotation allowed us to filter 623 errors, which represented about 7.5% of all the errors

uc-klee reported for BIND.

Table 5.1 summarizes the convenience macros we provided for expressing annotations

using C code. While annotations may be written using arbitrary C code, these macros

provide a simple interface to functionality not expressible with C itself (e.g., checking a

pointer referent using VALID POINTER). The HOLDS and MAY HOLD macros allow code to

check the feasibility of a Boolean expression without causing uc-klee to fork execution

and trigger path explosion.

One tricky aspect of compiling the annotations was finding the necessary type defini-

tions in each codebase’s header files. In many cases, the header files cannot be included

directly because they depend on build-time configuration options. To address this issue,

we implemented a tool in our system that extracts type definitions from llvm’s debug in-

formation and automatically generates a C header suitable for compiling the annotations.

This tool was highly effective in practice.

Manual annotations provide an opportunity for a three-way tradeoff between false pos-

itives, manual effort, and false negatives (missed errors). If annotations are overly restric-

tive, they may suppress legitimate errors that would otherwise be detected by our system.


Therefore, care must be taken to determine the proper set of invariants for each codebase.

5.2.2 Automated heuristics

Our system implements three automated heuristics for identifying errors likely to represent

true bugs. uc-klee augments each error report with a field listing the heuristics satisfied

by that error.

Each heuristic targets errors that occur for all input values following a path, which are

likely to be true errors [45] because they occur regardless of the (unknown) input precondi-

tions, as long as the execution path itself does not violate the preconditions. The must-fail

heuristic is satisfied when an error must occur under the path constraints. Assertion failures

are must-fail when the assertion condition may only be false. Memory errors are must-fail

when the pointer must fall out-of-bounds.

A variation on the must-fail heuristic is the belief-fail heuristic, which uses a form

of belief analysis [46]. The intuition behind this heuristic is that if a function contradicts

itself, it likely has a bug. For example, if the code checks that a pointer is NULL and then

dereferences the pointer, it has a bug, regardless of any unspecified input preconditions,

again assuming that the path itself does not violate the preconditions. This heuristic is

distinct from the must-fail heuristic because it only considers a subset of a path’s constraints,

those believed by the current function.

In general, constraints added by the routines a function calls may not indicate true errors.

For example, if a called function checks a pointer against NULL, the caller is not aware

that this comparison has taken place. The caller does not act based on the assumption that

the pointer is NULL, even if the branch was taken and uc-klee added a path constraint

asserting this condition.

Our system implements the belief-fail heuristic by tracking a belief set for each stack

frame. A stack frame’s belief set propagates to called functions, but called functions do not

propagate constraints to their callers’ belief sets. When an error occurs, our system checks

whether the belief set for the current stack frame requires the failure condition to hold. If

so, the belief-fail heuristic is satisfied.

A second variation of the must-fail heuristic is the concrete-fail heuristic, which indi-

cates that an assertion failure or memory error was triggered by a concrete (non-symbolic)

condition or pointer, respectively. In practice this heuristic was the most effective of the

three.


5.3 Evaluation

This section uses uc-klee to check hundreds of patches from BIND and OpenSSL, two

mature, widely-used, and security critical systems. Each codebase contains about 400,000

lines of C code, making them reasonable measures of uc-klee’s scalability and robustness.

Our results show that uc-klee finds many new bugs, produces a manageable number of false

positives, explores thousands of paths through patched code for most patches, and verifies

(with caveats) that dozens of patches do not introduce new bugs. In addition, uc-klee’s

ability to cross-check functions can be used to cross-check code against itself under different

build and compiler options in order to find behavior bugs such as reliance on undefined-

and implementation-defined behavior and fragility under optimization. Section 5.3.3 uses

this ability to find eight new bugs in BIND and OpenSSL.

5.3.1 Code modification

We had to make a number of minor changes to BIND and OpenSSL, which we automated

so that they would apply to future versions. We canonicalized several macros that intro-

duced spurious code differences such as the LINE macro, version-related macros such as

VERSION, SRCID, DATE, and OPENSSL VERSION NUMBER. We also sanitized the CFLAGS string

contained in an OpenSSL header file, which was crucial for the portability experiment in

Section 5.3.3. To support function-call annotations (Section 5.2.1) in BIND, we converted

four preprocessor macros to function calls.

For BIND, we disabled both expensive error-logging code that precedes each assertion

failure and much of its debug malloc functionality, which uc-klee already provided. For

OpenSSL, we added a new build target that disabled reference counting and address align-

ment. The reference counting was responsible for many false positives (double free errors)

due to unknown preconditions on an object’s reference count.

uc-klee automatically marks global and static variables as under-constrained symbolic

inputs in order to fully explore the paths through each function. It does not mark const

globals as symbolic since they are never mutated. In practice, many constant global vari-

ables are not explicitly declared const, leading uc-klee to treat them as symbolic values.

Consequently, uc-klee explores many infeasible execution paths by considering values for

these globals that can never arise in practice, wasting compute resources and emitting false

positives. To reduce these effects, we manually specified a set of 13 globals for BIND and


51 for OpenSSL that uc-klee should not mark as symbolic. In many cases, this task could

be automated by using straightforward static analysis to identify globals that are never

modified. However, we did not implement this analysis.

5.3.2 Patches

We tried to avoid selection bias by using two complete sets of patches from the git reposi-

tories for recent stable branches: BIND 9.9 from 1/2013 to 3/2014 and OpenSSL 1.0.1 from

1/2012 to 4/2014. Many of the patches we encountered modified more than one function;

this section uses patch to refer to changes to a single function, and commit to refer to a

complete changeset.

We excluded all patches that: only changed copyright information, had build errors,

modified build infrastructure only, removed dead functions only, applied only to disabled

features (e.g., win32), patched only BIND contrib features, only touched regression/unit

tests, or used variadic functions. We also eliminated all patches that yielded identical code

after compiler optimizations. Because of tool limitations, we excluded patches that changed

input datatypes (Section 4.4). Finally, to avoid inflating our verification numbers, we ex-

cluded three BIND commits that patched 200-300 functions each by changing a pervasive

linked-list macro and/or replacing all uses of memcpy with memmove. Neither of these changes

introduced any errors and, given their near-trivial modifications, shed little additional light

on our tool’s effectiveness. This yielded 487 patches from BIND and 324 patches from

OpenSSL, both from 177 distinct commits to BIND and OpenSSL (purely by coincidence).

We compiled patched and unpatched versions of the codebase for each revision using an

llvm 2.7 toolchain. We then ran uc-klee over each patch for one hour with a maximum

symbolic input size of 25,000 bytes per object and a depth bound (Section 3.2) of 9 objects.

Each run was allocated a single Intel Xeon E5645 2.4GHz core and 4GB of memory on a

compute cluster running 64-bit Fedora Linux 14.

For these runs, we configured uc-klee to target crashes only in patched routines or

routines they call. While this approach allows uc-klee to focus on the most likely source

of errors, it does not detect bugs caused by the outputs of a function, which may trigger

crashes elsewhere in the system (e.g., if the function unexpectedly returns NULL). uc-klee

can optionally detect such differences, and we use this functionality to find portability bugs

in Section 5.3.3.


Codebase Function Type Cause NewBIND receive secure db assert fail double lock acquisition ✓BIND save nsec3param assert fail uninitialized struct ✓BIND configure zone acl assert fail inconsistent NULL arg. handling ✓BIND isc lex gettoken assert fail input parsing logic ✓OpenSSL PKCS5 PBKDF2 HMAC uninit. pointer dereference uninitialized structOpenSSL dtls1 process record assert fail inconsistent NULL checkOpenSSL tls1 final finish mac NULL pointer dereference unchecked return value ✓OpenSSL do ssl3 write NULL pointer dereference callee side effect after NULL check ✓OpenSSL PKCS7 dataDecode NULL pointer dereference unchecked return value ✓OpenSSL b64 read out-of-bounds array access negative count passed to memcpy ✓OpenSSL dtls1 buffer record use-after-free improper error handling ✓OpenSSL pkey ctrl gost uninit. pointer dereference improper error handling ✓

Table 5.2: Summary of bugs discovered for BIND and OpenSSL patches. New indicates that thebug was previously unknown. The OpenSSL bug in do ssl3 write resulted in security advisoryCVE-2014-0198 [31], and the crash in b64 read (actually a bug in EVP DecodeUpdate) triggeredadvisory CVE-2015-0292 [35].

1 : LOCK ZONE(zone);2 : if (DNS ZONE FLAG(zone, DNS ZONEFLG EXITING)3 : | | !inline secure(zone)) {4 : result = ISC R SHUTTINGDOWN;5 : goto unlock;6 : }7 : . . .8 : if (result != ISC R SUCCESS)9 : goto failure; /* ← bypasses UNLOCK ZONE */

10: . . .11: unlock:12: UNLOCK ZONE(zone);13: failure:14: dns zone idetach(&zone);

Figure 5.2: BIND locking bug found in receive secure db.

Bugs found

From the patches we tested, uc-klee uncovered three previously unknown bugs in BIND

and eight bugs in OpenSSL, six of which were previously unknown. These bugs are sum-

marized in Table 5.2.

Figure 5.2 shows a representative double-lock bug in BIND found by cross-checking.

The patch moved the LOCK ZONE earlier in the function (line 1), causing existing error

handling code that jumped to failure (line 9) to bypass the UNLOCK ZONE (line 12). In

this case, the subsequent call to dns zone idetach (line 14) reacquires the already-held

lock, which triggers an assertion failure. This bug was one of several we found that in-

volved infrequently-executed error handling code. Worse, BIND often hides goto failure


statements inside a CHECK macro, which was responsible for a bug we discovered in the

save nsec3param function (not shown). We reported the bugs to the BIND developers,

who promptly confirmed and fixed them. These examples demonstrate a key benefit of

uc-klee: it explores non-obvious execution paths that would likely be missed by a human

developer, either because the code is obfuscated or an error condition is overlooked.

uc-klee is not limited to finding new bugs introduced by the patches; it can also

find old bugs in patched code. We added a new mode where uc-klee flags must-fail errors

(Section 5.2.2) that occur in both P and P ′. This approach allowed us to find one new bug in

BIND and four in OpenSSL. It also re-confirmed a number of bugs found by cross-checking

above. This mode could be used to find bugs in functions that have not been patched,

but that is beyond the scope of this dissertation. However, we did discover an OpenSSL

denial-of-service vulnerability that triggered CVE-2015-0291 [34] by using uc-klee in this

manner.

Figure 5.3 shows a representative must-fail bug (Section 5.2.2), a previously unknown

NULL pointer dereference (denial-of-service) vulnerability we discovered in OpenSSL’s

do ssl3 write function that led to security advisory CVE-2014-0198 [31] being issued.

In this case, a developer attempted to prevent this bug by explicitly checking whether

wb->buf is NULL (line 1). If the pointer is NULL, ssl3 setup write buffers allocates

a new buffer (line 2). On line 6, the code then handles any pending alerts [47] by call-

ing ssl dispatch alert (line 8). This call has the subtle side effect of freeing the write

buffer when the common SSL MODE RELEASE BUFFERS flag is set. After freeing the buffer,

wb->buf is set to NULL (not shown), triggering a NULL pointer dereference on line 15.

This bug would be hard to find with other approaches. The write buffer is freed by a

chain of function calls that includes a recursive call to do ssl3 write, which one maintainer

described as “sneaky” [116]. In contrast to static techniques that could not reason precisely

about the recursion, uc-klee proved that under the circumstances when both an alert is

pending and the release flag is set, a NULL pointer dereference will occur. This example

also illustrates the weaknesses of regression testing. While a developer may write tests to

make sure this function works correctly when an alert is pending or when the release flag

is set, it is unlikely that a test would exercise these conditions simultaneously. Perhaps as

a direct consequence, this vulnerability was nearly six years old.


1 : if (wb−>buf == NULL) /* ← NULL pointer check */

2 : if (!ssl3 setup write buffer(s))3 : return −1;4 : . . .5 : /* If we have an alert to send, lets send it */6 : if (s−>s3−>alert dispatch) {7 : /* call sets wb→buf to NULL */

8 : i=s−>method−>ssl dispatch alert(s);9 : if (i <= 0)10: return(i);11: /* if it went, fall through and send more stuff */12: }13: . . .14: unsigned char *p = wb−>buf; /* ← p = NULL */

15: *(p++)=type&0xff; /* ← NULL pointer dereference */

Figure 5.3: OpenSSL NULL pointer dereference bug in do ssl3 write (CVE-2014-0198).

Patches verified

In addition to finding new bugs, uc-klee exhaustively verified all execution paths for 67

(13.8%) of the patches in BIND, and 48 (14.8%) of the patches in OpenSSL. Our system

effectively verified that, up to the given input bound and with the usual caveats, these

patches did not introduce any new crashes. This strong result is not possible with imprecise

static analysis or testing.

The median instruction coverage (described below) for the exhaustively verified patches

was 90.6% for BIND and 100% for OpenSSL, suggesting that these patches were thoroughly

tested. Only six of the patches in BIND and one in OpenSSL achieved very low (0–2%)

coverage. We determined that uc-klee achieved low coverage on these patches due to dead

code (2 patches); insufficient symbolic input bound (2 patches); comparisons between input

pointers (we assume no aliasing, 1 patch); symbolic malloc size (1 patch); and a trivial

stub function that was optimized away (1 patch).

Patches partially verified

This section measures how thoroughly we check non-terminating patches using two metrics:

(1) instruction coverage, and (2) the number of execution paths completed.

We conservatively measure instruction coverage by counting the number of instructions

that differ in P ′ from P and then computing the percentage of these instructions that uc-

klee executes at least once. Figure 5.4 plots the instruction coverage. The median coverage


0

20

40

60

80

100

50 100 150 200 250 300 350 400 450

Patchinstr.

coverage(%

)

BIND patches

0

20

40

60

80

100

50 100 150 200 250 300

Patchinstr.

coverage(%

)

OpenSSL patches

Figure 5.4: Coverage of patched instructions. 98 BIND patches (20.1%) and 124 OpenSSL patches(38.3%) achieved 100% coverage. Median was 81.1% for BIND, 86.9% for OpenSSL.

was 81.1% for BIND and 86.9% for OpenSSL, suggesting that uc-klee thoroughly exercised

the patched code, even when it did not exhaust all paths.

Figure 5.5 plots the number of completed execution paths for each patch we did not

exhaustively verify that hit at least one patched instruction. These graphs exclude 31

patches for BIND and 32 patches for OpenSSL for which our system crashed during the one

hour execution window. The crashes were primarily due to bugs in our tool and memory

exhaustion/blowup caused by symbolically executing cryptographic ciphers.

For the remaining patches, uc-klee completed a median of 5,828 distinct paths per

patch for BIND and 1,412 for OpenSSL. At the upper end, 154 patches for BIND (39.6%)

and 79 for OpenSSL (32.4%) completed over 10,000 distinct execution paths. At the bottom

end, 58 patches for BIND (14.9%) and 46 for OpenSSL (18.9%) completed zero execution

paths. In many cases, uc-klee achieved high coverage on these patches but neither detected

errors nor ran the non-error paths to completion. A few reasons we observed for paths not

running to completion included query timeouts, unspecified symbolic function pointers, or


Paths

BIND patches

1

10

100

1000

10000

100000

1000000

50 100 150 200 250 300 350

Paths

OpenSSL patches

1

10

100

1000

10000

100000

1000000

50 100 150 200

Figure 5.5: Completed execution paths (log scale). Median was 5,828 paths per patch for BIND and1,412 for OpenSSL. Top quartile was 17,557 paths for BIND and 21,859 for OpenSSL.

ineffective search heuristics.

These numbers should only be viewed as a crude approximation of thoroughness; they

do not measure the independence between the paths explored (greater is preferable). On

the other hand, they grossly under-count the number of distinct concrete values each sym-

bolic path reasons about simultaneously. One would generally expect that exercising 1,000

or more paths through a patch, where each path simultaneously tests all feasible values,

represents a dramatic step beyond the current standard practice of running the patch on a

few tests.

False positives

This section describes our experience in separating true bugs from false positives, which were

mostly due to missing input preconditions. As mentioned in Section 5.2, our system manages

false positives using two approaches: manual annotations and automated heuristics.

For BIND, we wrote 13 function call annotations and 31 data type annotations (about


P ′ only P and P ′

Reports Patches Reports PatchesHeuristic Total Bugs Total BugsTotal errors 2446 3 141 5829 - 260Manual annotations 1419 3 125 1378 - 153must-fail 44 3 8 1378 - 153concrete-fail 26* 2 6* 878 - 110belief-fail 35* 3 7* 1053 - 127excluding inputs 30* 3 7* 852 - 102

True bugs 3* 3 3* 1 1 1

(a) BIND (487 patches, 4 distinct bugs)

P ′ only P and P ′

Reports Patches Reports PatchesHeuristic Total Bugs Total BugsTotal errors 1423 5 79 579 11 125Manual annotations 1286 5 79 451 11 124must-fail 41 5 22 451 11 124concrete-fail 14* 5 12* 224 11 98belief-fail 25* 5 18* 316 11 117excluding inputs 17* 5 11* 90* 11 47*

True bugs 5* 5 4* 11* 11 10*

(b) OpenSSL (324 patches, 8 distinct bugs)

Table 5.3: Effects of heuristics on false positives. Total indicates the total number of reports, ofwhich Bugs are true errors; Patches indicates the number of patches that reported at least one error.P ′ only refers to errors that occurred only in function P ′; P and P ′ occurred in both versions. Indentindicates successive heuristics; * indicates that we reviewed all the reports manually.

400 lines of C). For OpenSSL, we wrote six data type annotations and no function call

annotations (60 lines). We applied a single set of annotations for each codebase to all the

patches we tested. In our experience, most of these annotations were simple to specify

and often suppressed many errors. As an extreme example, a one-line annotation in BIND

(Section 5.2.1) suppressed 623 reports (7.5% overall). We felt this was reasonable effort

relative to the sizes of the codebases. We added annotations lazily, in response to errors.

Table 5.3 illustrates the effects of the annotations and heuristics on the error reports

for BIND and OpenSSL. The P ′ only column describes errors that only occurred in the

patched function, while P and P ′ describes errors that occurred in both versions. In this

experiment, we are primarily concerned with bugs introduced by a patch, so our discussion

describes P ′ only unless otherwise noted.

The manual annotations suppressed 42% of the reports for BIND but only 9.6% for


OpenSSL. We attribute this difference to the greater effort we expended writing manual

annotations for BIND, for which the automated heuristics were less effective without the

annotations.

We tried numerous heuristics to reduce false reports. The concrete-fail and belief-fail

heuristics were the most useful by far. These heuristics reduced the total number of reports

to a small enough number that we were able to inspect them all manually. While only 8.6%

of the belief-fail errors for BIND and 20% of those for OpenSSL were true bugs, the total

number of these errors (60) was manageable relative to the number of patches we tested

(811). In total, the annotations and belief-fail heuristic were able to eliminate 98.6% of

false positives for BIND and 98.2% for OpenSSL.

A subset of the belief-fail errors were caused by reading past the end of an input buffer,

and none of these were true bugs. Instead, they were due to paths reaching the input

bound we specified. In many cases, our system would emit these errors for any given input

bound because they involved unbounded loops (e.g., strlen). The excluding inputs row in

Figure 5.3 describes the subset of belief-fail errors that are not caused by invalid accesses

to input buffers. This additional filter produced a small enough set of P and P ′ errors for

OpenSSL that we were able to manually inspect them, discovering a number of additional

bugs. We note that the true errors listed in Figure 5.3 are not all distinct bugs. In several

cases, a bug showed up in multiple error reports.

5.3.3 Portability

As demonstrated in Chapter 4, uc-klee can cross-check code to find functional equivalence

bugs instead of just equivalent crash behavior. We used this ability to cross-checking the

same code against itself under different implementation-defined choices (e.g., argument

evaluation order) or optimization levels. One wide-ranging implementation-defined decision

in the C language is whether the char type defaults to signed char or unsigned char [63].

Portable code should exhibit identical behavior and produce equivalent outputs regardless

of this compiler setting; any differences likely indicate bugs. This section uses uc-klee to

check that all functions in BIND and OpenSSL produce identical outputs (defined as writes

to escaping globals, heap memory and return values) irrespective of whether char defaults

to unsigned or signed.

For this experiment, we compiled two copies of each codebase, one with -fsigned-char,

and the other with -funsigned-char. We used a revision of BIND 9.9 from 11/2013 and


Codebase Function Type Cause NewBIND isc lex gettoken assert fail char sign extension ✓BIND dns name fromtext out-of-bounds array access char sign extension (3 occurrences) ✓BIND isc regex validate functionality char sign extension ✓BIND cfg parse special functionality char sign extension ✓OpenSSL ASN1 STRING print functionality char signed greater-than ✓OpenSSL BN dec2bn functionality char sign extension ✓

Table 5.4: Summary of portability bugs discovered for BIND and OpenSSL portability. New indi-cates that the bug was previously unknown.

OpenSSL 1.0.1 from 4/2014. We elected the earlier version of BIND because more recent

revisions included fixes for bugs we reported.

We used our static cross-checker (Section 4.2) to identify functions that differed syn-

tactically when compiled with the different char settings. We found static differences in

45 functions for BIND and 59 functions for OpenSSL. We then ran uc-klee for one hour

on each of these functions. This time, we configured our system to emit reports for all

errors and differences in outputs (i.e., return values and pointer arguments), not just those

introduced by a patch.

uc-klee reported 39 differences/errors from six distinct functions for BIND and 16

differences from ten distinct functions for OpenSSL. All but two of the errors for BIND point

to four distinct bugs, while the other two affect only hash functions. Three of the OpenSSL

errors each represent distinct bugs, while three affect only a hash function. The remaining

ten are artifacts (false positives), mainly due to malloc returning different addresses for

objects allocated by each copy of the function.

Table 5.4 summarizes the four previously-unknown bugs we discovered in BIND and the

two in OpenSSL. The BIND bug we found in isc lex gettoken was particularly inter-

esting and had been present in every version dating back to 2000. The cause of this bug

reduces to (where c is an int):

1 : if (buffer−>current == buffer−>used)2 : c = EOF3 : else4 : c = *((char *)buffer−>base + buffer−>current);5 : if (c != EOF) { . . . }

If the character read from buffer->base on line 4 is 0xff and the code is compiled with a

signed char type, the variable c will receive a value that is sign-extended to -1, causing the

if-condition to evaluate to false. This causes an assertion to fail immediately afterwards.

An interesting non-crashing bug was in the BIND function isc regex validate. Here,


one version accepts the regular expression supplied as input, while the other version rejects

it. uc-klee detected this bug as a difference in the return values. To illustrate the bug, uc-

klee generated the example input string "\x01[\x01-\x80]". In this case, the comparison

between 0x80 and 0x01 (the beginning and end of a character class) differs depending on

whether the former is sign-extended or zero-extended to an int before a signed less-than

comparison occurs.

Chapter 6

Generalized checking

To find generic programming errors in a single version of code, uc-klee provides an inter-

face for implementing rule-based checkers. These checkers are similar to tools built using

dynamic instrumentation systems such as Valgrind [90] or Pin [82]. Unlike these frame-

works, however, uc-klee applies its checkers to all possible paths (with caveats) through

a function, not to a single execution path through a program. As in Chapters 4 and 5,

uc-klee considers all possible input values along each path, allowing it to discover bugs

that might be missed when checking a single set of concrete inputs. Here, uc-klee verifies

(with caveats) that a function adheres to a checker’s rules (e.g., it does not leak memory)

up to the given input size if it exhausts all paths through a function and finds no errors.

Conceptually, uc-klee’s checkers may be used to analyze arbitrary properties along

an execution path. These properties may include system-specific rules such as proper lock

usage or opened files being closed, similar to Woodpecker [27]. In this dissertation, however,

we evaluate uc-klee using three generic checkers we implemented to identify: (a) memory

leaks, (b) uses of uninitialized data, and (c) unsanitized uses of untrusted user input.

uc-klee provides a simple interface for implementing checkers by having each checker

derive from a common C++ base class. This interface provides hooks for a checker to

intercept memory accesses, arithmetic operations, branches, and several types of errors

uc-klee detects. Additional hooks are trivial to add in the future.

A user invoking uc-klee provides a compiled llvm module and the name of a function

to check. We refer to this function as the top-level function. Optionally, uc-klee may

also link this module with other modules containing the functions that might be called by

the top-level function. When uc-klee encounters a function call, it executes the called

68

CHAPTER 6. GENERALIZED CHECKING 69

function. With large codebases, however, this approach does not scale. For example, the

Linux kernel has far too many dependencies for all (tens of thousands of) potentially callable

functions to be linked into a single llvm module, which is a limitation of our underlying

toolchain.

When uc-klee encounters a call to a function missing from the llvm module, it may

optionally skip over the function call rather than terminate the path with an error message.

This approach could also be used to skip calls to expensive functions, but we did not employ

this technique. When uc-klee skips a function call, it creates a new under-constrained value

to represent the function’s return value, but it leaves the function’s arguments unchanged.

Consider the following code fragment:

1 : int bar(char *str);2 :3 : void foo(char *buf) {4 : int x = bar(buf);5 : . . .6 : }

When uc-klee skips the call to bar on line 4, it sets x to a new under-constrained value,

but it does not alter buf. This approach under-approximates (considers only a subset of)

the behaviors that the missing function might perform (e.g., writing to globals or the buffer

pointed to by buf). Consequently, uc-klee may miss bugs and cannot provide verification

guarantees when functions are missing.

We briefly experimented with an alternative approach in which we overwrote skipped

functions’ pointer arguments (e.g., buf) with new under-constrained values, but this over-

approximation caused significant path explosion, mostly involving paths that could not arise

in practice.

In addition to missing functions due to scalability limitations, we also encountered inline

assembly (Linux kernel only) and unresolved symbolic function pointers. We skipped these

two cases in the same manner as missing functions. For all three cases, uc-klee provides

a hook to allow a checker to detect when a call is being skipped and to take appropriate

actions for that checker. For example, the leak checker considers pointers passed to skipped

function calls to have escaped, and it does not report memory leaks involving these addresses

(Section 6.1).

The remainder of this chapter describes each of the three generic checkers we imple-

mented in uc-klee, and Section 6.4 presents experimental results from using these checkers

to analyze over 20,000 functions from BIND, OpenSSL, and the Linux kernel.


6.1 Leak checker

Memory leaks can lead to memory exhaustion and pose a serious problem for long-running

servers. Frequently, they are exploitable as denial-of-service vulnerabilities [29, 32, 33]. To

detect memory leaks (which may or may not be remotely exploitable, depending on their

location within a program), we implemented a leak checker on top of uc-klee. The leak

checker considers a heap object to be leaked if, after returning from the top-level function,

the object is not reachable from a root set of pointers. The root set consists of a function’s

(symbolic) arguments, its return value, and all global variables. This checker is similar to

the leak detection in Purify [56] or Valgrind’s memcheck [90] tool, but applied to all possible

execution paths through a function.

Unlike Purify and Valgrind, which find reachable objects by following all “potential

pointers,” our leak checker traverses pointers soundly by examining the referents stored

in shadow memory (see Section 3.1) and calculating the reachable set of objects. It then

examines the set of currently allocated heap objects and subtracts the reachable objects

visited above. Since C and C++ do not support automatic garbage collection, any remaining

(unreachable) heap objects represent memory leaks, and our checker reports these to the

user.

When uc-klee encounters a missing function, the leak checker finds the set of heap

objects that are reachable from each of the function call’s arguments. It then marks these

objects as possibly escaping, since the missing function could capture pointers to these

objects and prevent them from becoming unreachable. At the end of each execution path,

the leak checker removes any possibly escaping objects from the set of leaked objects. Doing

so allows it to report only true memory leaks, at the cost of possibly omitting leaks when

functions are missing. However, uc-klee may still report false leaks along invalid execution

paths due to missing input preconditions. Consider the following code fragment:

1 : char* leaker() {2 : char *a = (char*) malloc(10); /* not leaked */3 : char *b = (char*) malloc(10); /* maybe leaked */

4 : char *c = (char*) malloc(10); /* leaked! */5 :6 : bar(b); /* skipped call to bar */

7 : return a;8 : }

When uc-klee returns from the function leaker, it inspects the heap and finds three

allocated objects: a, b, and c. It then examines the root set of objects. In this example,


there are no global variables and leaker has no arguments, so the root set consists only of

leaker’s return value. uc-klee examines this return value and finds that the pointer a is

live (and therefore not leaked). However, neither b nor c is reachable. It then looks at its

list of possible escaping pointers due to the skipped call to bar on line 6, which includes b.

uc-klee subtracts b from the set of leaked objects and reports back to the user that c has

been leaked. While this example is trivial, uc-klee discovered 37 non-trivial memory leak

bugs in BIND, OpenSSL, and the Linux kernel (Section 6.4.1).

6.2 Uninitialized data checker

Functions that access uninitialized data from the stack or heap exhibit undefined or non-

deterministic behavior and are particularly difficult to debug. Additionally, the prior con-

tents of the stack or heap may hold sensitive information, so code that operates on these

values may be vulnerable to a loss of confidentiality.

uc-klee includes a checker that detects accesses to uninitialized data. When a function

allocates stack or heap memory, the checker fills it with special garbage values. The checker

then intercepts all loads, binary operations, branches, and pointer dereferences to check

whether any of the operands (or the result of a load) contain garbage values. If so, it

reports an error to the user. Alternatively, we could have used shadow memory to track

whether each memory location had been initialized, but tracking uninitialized values using

shadow registers would have required more invasive changes to uc-klee.

In practice, loads of uninitialized data are often intentional; they frequently arise within

calls to memcpy or when code manipulates bit fields within a C struct. Our evaluation in

Section 6.4 therefore focuses on branches and dereferences of uninitialized pointers.

When a call to a missing function is skipped, the uninitialized data checker sanitizes

the function’s arguments to avoid reporting spurious errors in cases where missing functions

write to their arguments.


Consider the following example C function garbage and its LLVM bitcode representation:

1 : void garbage(int x) {2 : int y;3 : if (x + y)4 : . . .5 : }

define void @garbage(i32 %x) {entry:%y addr = alloca i32%y = load i32* %y addr%1 = add i32 %x, %y%2 = icmp ne i32 %3, 0br i1 %2, label %true br, label %false br. . .

}

On line 2, the function stack allocates four bytes for the variable y (the alloca LLVM

instruction). uc-klee populates these four bytes with garbage values. The load instruction

then reads the garbage values from memory into the register %y. uc-klee optionally reports

this load of uninitialized data as an error and continues executing the path. Line 3 adds the

(symbolic) argument x with the uninitialized value y, yielding the symbolic expression x +

GARBAGE (stored in register %1). Again, uc-klee optionally reports an error since a binary

operation involving an uninitialized value is undefined. Next, the same line (3) compares

this value to zero and branches based on the result. When the code executes the branch

(br) instruction, uc-klee detects that the branch condition x + GARBAGE != 0 contains a

garbage value and emits an error to the user to indicate that the code has branched based

on an uninitialized value. It then continues executing, assigning an arbitrary concrete value

(e.g., 0xab) to represent the garbage value when deciding the result of the symbolic branch.

6.3 User input checker

Code that handles untrusted user input is particularly prone to bugs that lead to security

vulnerabilities since an attacker can supply any possible input value to exploit the code.

Generally, uc-klee treats inputs to a function as under-constrained because they may

have unknown preconditions. For cases where inputs originate from untrusted sources such

as network packets or user-space data passed to the kernel, however, the inputs can be

considered fully-constrained. This term denotes that the inputs may hold any possible

input value. If any such value triggers an error in the code, then the error is likely to be

exploitable by an attacker, assuming that the execution path on which the error occurs does

not violate other preconditions.

uc-klee maintains shadow memory (metadata) associated with each symbolic input

that tracks whether each symbolic byte is under-constrained or fully-constrained. This


shadow memory is separate from the shadow memory used to track whether symbolic

pointers are unbound (see Section 3.2). uc-klee provides an interface for system-specific C

annotations to mark untrusted inputs as fully-constrained by calling the function ucklee -

clear uc byte. This function sets the shadow memory for each byte to the fully-constrained

state.

uc-klee includes a system-configurable user input checker that intercepts all errors and

adds an UNSAFE INPUT flag to errors caused by fully-constrained inputs. For memory access

errors, the checker examines the pointer to see if it contains fully-constrained symbolic

values. For assertion failures, it examines the assertion condition. For division-by-zero

errors, it examines the divisor.

To avoid reporting spurious errors, the checker suppresses the UNSAFE INPUT flag from

errors involving values that may have been properly sanitized by the function being checked.

For each class of error, the checker inspects the fully-constrained inputs responsible for

the error and determines whether any path constraints compare those inputs to under-

constrained data (originating elsewhere in the program). If so, the checker assumes that

the constraints may properly sanitize the input, and it suppresses the flag. Because the

constraints involve under-constrained data, uc-klee does not have enough information

to determine whether the comparisons constitute proper sanitization. Suppressing these

errors avoids false positives at the cost of missing errors when inputs are partially (but

insufficiently) sanitized.

We designed this checker primarily to find security vulnerabilities similar to the severe

OpenSSL “Heartbleed” vulnerability [3, 30] from 2014, shown in Figure 6.1. This bug was

exploited in the wild to trigger severe losses of confidentiality. As described in Section 6.4.3

and attributed to Chou [20], uc-klee intercepts calls to the byte-swapping function n2s

on line 4 and marks the two bytes read from the buffer p into payload as fully-constrained.

When the code passes these bytes to memcpy at line 18 without sanitizing them, uc-klee

detects out-of-bounds reads from the source buffer pl up to the offset supplied by payload

(i.e., up to 64KB). Since this offset is fully-constrained, uc-klee reports the error with the

UNSAFE INPUT flag, denoting a potentially-exploitable security vulnerability. We confirmed

that uc-klee reports this error using an old version of OpenSSL that did not have the

patch for this bug.


1 : int tls1 process heartbeat(SSL *s) {2 : unsigned int payload;3 : . . .4 : n2s(p, payload); /* load and swap 2 network bytes from packet ’p’ into ’payload’ */

5 : pl = p;6 : . . .7 :8 : /* Allocate memory for the response, size is 1 bytes9 : * message type, plus 2 bytes payload length, plus10: * payload, plus padding11: */12: buffer = OPENSSL malloc(1 + 2 + payload + padding);13: bp = buffer;14:15: /* Enter response type, length and copy payload */16: *bp++ = TLS1 HB RESPONSE;17: s2n(payload, bp);18: memcpy(bp, pl, payload); /* attacker-supplied ’payload’ length */

19: bp += payload;20: . . .21: }

Figure 6.1: OpenSSL “Heartbleed” vulnerability (CVE-2014-0160) [3, 30].

6.4 Evaluation

We evaluated uc-klee’s checkers on over 20,000 functions from BIND, OpenSSL, and the

Linux kernel. For BIND and OpenSSL, we used uc-klee to check all functions except those

in the codebases’ test directories. We used the same minor code modifications described in

Section 5.3.1, and we again used a maximum input size of 25,000 bytes and a depth bound

of 9 objects.

For the Linux kernel, we included functions relevant to each checker, as described below.

Unlike our evaluation in Section 5.3.2, we did not use any manual annotations to suppress

false positives. We ran uc-klee for up to five minutes on each function from BIND and

the Linux kernel, and up to ten minutes on each OpenSSL function. We used the same

machines as in Section 5.3.2.

For BIND, we checked version 9.10.1-P1 (December 2014). For OpenSSL, we checked

version 1.0.2 (January 2015). For the Linux kernel, we checked version 3.16.3 (September

2014).

Table 6.1 summarizes the results, including the number of functions that uc-klee ex-

haustively verified (up to the given input bound and with caveats) as having each property.

uc-klee discovered a total of 67 previously-unknown bugs: 12 in BIND, 11 in OpenSSL,


Verified VerifiedFunctions Bugs Reports False No Leaks No malloc

BIND 6239 9 138 2.2% 388 1776OpenSSL 6579 5 272† 90.1% 383 1648Linux kernel 5812 23 127 76.4% - -

(a) Leak checker

Pointer Pointer BranchFunctions Bugs Reports False Reports Verified

BIND 6239 3 0 - 244* 2045OpenSSL 6579 6 197 92.90% 564* 2043Linux kernel 7185 10 72 83.30% 494* -

(b) Uninitialized data checker

Functions Bugs Reports FalseBIND 6239 0 67 100%OpenSSL 6579 0 5 100%Linux kernel 1857 11 145 80.0%

(c) User input checker

Table 6.1: Summary of experimental results from running uc-klee checkers. Bugs shows the numberof distinct true bugs found (67 total). Reports shows the total number of errors reported by uc-kleein each category (multiple errors may point to a single bug). False reports the percentage of errorsreported that did not appear to be true bugs (i.e., false positives). Verified lists the number offunctions exhaustively verified to satisfy the checker on all paths. †excludes reports for obfuscatedASN.1 code. *denotes that we inspected only a handful of errors for that category.

and 44 in the Linux kernel. To check the vast Linux kernel, we linked each function

with other modules from the same directory, as well as the memory management mod-

ule mm/vmalloc.c. However, we configured uc-klee to skip inline assembly and calls to

missing functions, which caused it to under-approximate the set of execution paths through

each function. Consequently, we omit verification guarantees for the Linux kernel.

6.4.1 Leak checker

The leak checker was the most effective. It reported the greatest number of bugs (37 total)

and the lowest false positive rate. Interestingly, only three of the 138 leak reports for BIND

were spurious errors, a false positive rate of only 2.2%. For OpenSSL, we excluded 269

additional reports involving the library’s obfuscated ASN.1 [62] parsing code, which we

could not understand. Of the remaining 272 reports, the checker found five bugs but had a

high false positive rate of 90.1%.


1 : int gssp accept sec context upcall(struct net *net, struct gssp upcall data *data) {2 : . . .3 : ret = gssp alloc receive pages(&arg);4 : . . .5 : gssp free receive pages(&arg);6 : . . .7 : }8 : int gssp alloc receive pages(struct gssx arg accept sec context *arg) {9 : arg−>pages = kzalloc(. . .);10: . . .11: return 0;12: }13: void gssp free receive pages(struct gssx arg accept sec context *arg) {14: for (i = 0; i < arg−>npages && arg−>pages[i]; i++)15: free page(arg−>pages[i]);16: /* missing: kfree(arg–>pages); */

17: }

Figure 6.2: Linux kernel memory leak in RPCSEC GSS protocol implementation used by NFS.

One source of false positives we did notice in OpenSSL was when code attempted to

insert objects in a priority queue that required unique priorities (e.g., based on message

sequence numbers). If the priority was already present in the queue, the object was leaked.

Our tool was missing the precondition that the priority queue could not contain sequence

numbers greater than the previous message’s sequence number, causing it to explore an

infeasible path that leaked memory.

For the Linux kernel, we wrote simple C annotations (about 60 lines) to intercept calls

to kmalloc, (used by kmalloc, kmalloc array, kzalloc, etc.), vmalloc node, (used

by vmalloc, vzalloc, etc.), kfree, and vfree, and to forward these to uc-klee’s built-in

malloc and free functions. Doing so allowed us to track memory management without the

overhead of symbolically executing the kernel’s internal allocators. We then ran uc-klee

on all functions that directly call these allocation functions. Memory leaks may be present

elsewhere in the kernel (i.e., in functions that transitively call one of these functions), but

we did not check other functions due to limited time and compute resources.

Our system discovered 23 memory leaks in the Linux kernel. One particularly interesting

example (Figure 6.2) involved the SunRPC layer’s server-side implementation of AUTH GSS

authentication for NFS. Each connection triggering an upcall causes 512 bytes allocated at

line 9 to be leaked due to a missing kfree that should be present around line 16. Since this

leak may be triggered by remote connections, it poses a potential denial-of-service (memory

exhaustion) vulnerability. The NFS maintainers accepted our patch to fix the bug.


1 : points = OPENSSL malloc(sizeof (EC POINT*)*(num + 1));2 : . . .3 : for (i = 0; i < num; i++) {4 : if ((points[i] = EC POINT new(group)) == NULL)5 : goto err; /* leaves ’points’ only partially initialized */

6 : }7 : . . .8 : err:9 : . . .10: if (points) {11: EC POINT **p;12: for (p = points; *p != NULL; p++)13: EC POINT free(*p); /* dereference/free of uninitialized pointer */

14: OPENSSL free(points);15: }

Figure 6.3: OpenSSL dereference/free of uninitialized pointer in ec wNAF precompute mult func-tion.

uc-klee found that at least 2909 functions in BIND and at least 3700 functions in

OpenSSL (or functions they call) allocate heap memory. As shown in Table 6.1(a), uc-

klee verified (with caveats) that 388 functions in BIND and 383 in OpenSSL allocate heap

memory but do not leak it. Our system also verified that 1776 functions in BIND and

1648 functions in OpenSSL do not allocate heap memory (either directly or through called

functions), making them trivially leak-free.

6.4.2 Uninitialized data checker

The uninitialized data checker reported a total of 19 new bugs. One illustrative exam-

ple, shown in Figure 6.3, involves OpenSSL’s elliptic curve cryptography. If the call to

EC POINT new on line 4 fails, the code jumps to line 8, leaving the points array partially

uninitialized. Line 13 then passes uninitialized pointers from the array to EC POINT free,

which dereferences the pointers and passes them to free, potentially corrupting the heap.

This is one of many bugs that we found involving infrequently executed error-handling code,

a common source of security bugs.

uc-klee discovered an interesting bug (Figure 6.4) in BIND’s UDP port randomization

fix for Kaminsky’s cache poisoning attack [28]. To prevent spoofed DNS replies, BIND

must use unpredictable source port numbers. The dispatch createudp function calls the

get udpsocket function at line 9, which selects a pseudorandom number generator (PRNG)

at line 18 based on whether we are using a UDP or TCP socket. However, the socktype field


1 : #define DISP ARC4CTX(disp) \2 : ((disp)−>socktype == isc sockettype udp) ? (&(disp)−>arc4ctx) \3 : : (&(disp)−>mgr−>arc4ctx)4 : static isc result t dispatch createudp(. . ., unsigned int attributes, . . .) {5 : . . .6 : result = dispatch allocate(mgr, maxrequests, &disp);7 : . . .8 : if ((attributes & DNS DISPATCHATTR EXCLUSIVE) == 0) {9 : result = get udpsocket(mgr, disp, . . .);10: . . .11: }12: disp−>socktype = isc sockettype udp; /* late initialization */13: . . .14: }15: static isc result t get udpsocket(. . ., dns dispatch t *disp, . . .) {16: . . .17: /* PRNG selected based on uninitialized ’socktype’ field */

18: prt = ports[dispatch uniformrandom(DISP ARC4CTX(disp), nports)];19: . . .20: }

Figure 6.4: BIND non-deterministic PRNG selection bug.

isn’t initialized in dispatch createudp until line 12, meaning that the PRNG selection is

based on uninitialized data. While it appears that the resulting port numbers are sufficiently

unpredictable despite this bug, this example illustrates uc-klee’s ability to find errors with

potentially serious security implications.

For the Linux kernel, we checked the union of the functions we used for the leak checker

and the user input checker (discussed below) and found 10 bugs.

Due to time limitations, we exhaustively inspected only the most serious category of

errors: uninitialized pointers. The checker reported too many uninitialized branches for us

to examine completely, but we did inspect a few dozen of these errors in an ad-hoc manner.

All three of the bugs from BIND and one bug from the Linux kernel fell into this category.

The remaining bugs were uninitialized pointer errors. We did not inspect the error reports

for binary operations or load values.

Finally, our system verified (with caveats) that about a third of the functions from BIND

(2045) and OpenSSL (2043) do not access uninitialized data. We believe that providing this

level of guarantee on such a high percentage of functions with almost no manual effort is a

strong result not possible with existing tools.


6.4.3 User input checker

The user input checker required us to identify data originating from untrusted sources.

Chou [20] observed that data swapped from network byte order to host byte order is gen-

erally untrusted. We applied this observation to OpenSSL and used simple annotations

(about 40 lines of C) to intercept calls to n2s, n2l, n2l3, n2l6, c2l, and c2ln, and mark

the results fully-constrained. We also applied a simple patch to OpenSSL to replace byte-

swapping macros with function calls so that uc-klee could use our annotations. We hope

future work will explore automated methods for identifying untrusted data.

For BIND, we annotated (about 50 lines) the byte-swapping functions ntohs and ntohl,

along with isc buffer getuint8 and three other functions that generally read from un-

trusted buffers.

For the Linux kernel, we found that many network protocols store internal state in

network byte order, leading to spurious errors if we consider these to be untrusted. Instead,

we annotated (about 40 lines) the copy from user function and get user macro (which

we converted to a function call). In addition, we used an option in uc-klee to mark all

arguments to the system call handlers sys * as untrusted. Finally, we used uc-klee to

check the 1502 functions that directly invoke copy from user and get user, along with

the 355 system call handlers in our build.

Reassuringly, this checker did not discover any bugs in the latest versions of BIND or

OpenSSL. We attribute this both to the limited amount of data we marked as untrusted and

to our policy of suppressing the UNSAFE INPUT flag for errors involving possibly sanitized

data (see Section 6.3). However, as mentioned earlier, we were able to detect the 2014

“Heartbleed” vulnerability [3, 30] when we ran our system on an old version of OpenSSL.

Interestingly, we did discover 11 bugs in the Linux kernel. Seven of these bugs were

division- or remainder-by-zero operations that would trigger floating-point exceptions and

crash the kernel. The remaining four bugs are out-of-bounds dereferences.

Figure 6.5 shows a buffer overread bug we discovered in the kernel driver for the VMware

Communication Interface (VMCI) that follows a pattern nearly identical to “Heartbleed.”

The userspace datagram dg is read using copy from user. The code then allocates a

destination buffer on line 5 and invokes memcpy on line 9 without sanitizing the dg size

field read from the datagram. An attacker could potentially use this bug to copy up to

69,632 bytes of private kernel heap memory and send it from the host OS to the guest OS.

Fortunately, this vulnerability is only exploitable by code running locally on the host OS.


1 : static int dg dispatch as host(. . ., struct vmci datagram *dg) {2 : /* read length field from userspace datagram */

3 : dg size = VMCI DG SIZE(dg);4 : . . .5 : dg info = kmalloc(sizeof(*dg info) +6 : (size t) dg−>payload size, GFP ATOMIC);7 : . . .8 : /* unchecked memcpy length; read overrun */

9 : memcpy(&dg info−>msg, dg, dg size);10: . . .11: }

Figure 6.5: Linux kernel VMCI driver unchecked memcpy length (buffer overread) bug.

1 : static long validate layout(. . ., struct ceph ioctl layout *l) {2 : . . .3 : /* validate striping parameters */4 : if ((l−>object size & ˜PAGE MASK) | |5 : (l−>stripe unit & ˜PAGE MASK) | |6 : (l−>stripe unit != 0 && /* ← 64-bit check */

7 : /* 32-bit divisor: */8 : ((unsigned)l−>object size % (unsigned)l−>stripe unit)))9 : return −EINVAL;10: . . .11: }

Figure 6.6: Linux kernel CEPH distributed filesystem driver remainder-by-zero bug in ioctl handler.

The maintainers quickly patched this bug.

Figure 6.6 shows an unsanitized remainder-by-zero bug we found in the kernel driver

for the CEPH distributed filesystem. The check at line 6 attempts to prevent this bug

with a 64-bit comparison, but the divisor at line 8 uses only the low 32 bits of the un-

trusted stripe unit field (read from userspace using copy from user). A value such as

0xffffffff00000000 would pass the check but result in a remainder-by-zero error. An un-

privileged local attacker could potentially issue an ioctl system call to crash the machine.

We notified the developers, who promptly fixed the bug.

Because this checker uses manual annotations to identify unsafe inputs and imprecise

heuristics to identify when inputs have been sanitized, we did not use it to exhaustively

verify any properties about the functions we checked.

Chapter 7

Optimizing symbolic execution

This chapter presents techniques and optimizations we implemented in uc-klee that are

broadly applicable to symbolic execution tools. The techniques target three challenges that

are common in symbolic execution: symbolic expression blowup, SMT query timeouts, and

path explosion.

7.1 Symbolic expressions

Symbolic expressions (e.g., x + 5) form the backbone of symbolic execution tools, including

uc-klee. These expressions are built by the tool as code operates on symbolic inputs

through arithmetic instructions, casts, comparisons, memory accesses, and conditional if-

then-else constructs. Often, real-world code results in complex symbolic expressions that

consume memory, are compute-intensive to manipulate, and yield expensive SMT queries.

We implemented several (fairly obvious) techniques in uc-klee that are not part of the

baseline klee system and that greatly improved our tool’s ability to run on real code.

7.1.1 Expression uniquing

In practice, symbolic expressions often have redundant subexpressions. For example, the

expressions 2x+ 1 and 2x+ 2 share the subexpression 2x. In the baseline klee system,

as is common practice among program analysis tools, expressions are represented by di-

rected acyclic graphs (DAGs) that share nodes rather than trees in order to reduce their

memory footprint. However, klee only shares subexpressions if the code builds the parent

expressions from the same memory/register values. If the code independently constructs the

81

CHAPTER 7. OPTIMIZING SYMBOLIC EXECUTION 82

subexpressions (e.g., 2x), klee will create redundant expression objects that waste memory,

even for concrete (constant) values. Worse, klee never reuses identical expressions that

are constructed by execution paths that have diverged (those constructed before a path

diverges are shared using a copy-on-write mechanism).

uc-klee introduces global expression uniquing that guarantees that two symbolic ex-

pressions are structurally identical if and only if they have the same address, similar to

hash-consing [5] and expression uniquing techniques employed internally by SMT solvers

such as STP [48]. In practice, this approach reduced uc-klee’s memory consumption on

many programs by an order of magnitude or more. In addition, it reduced the cost of

comparing two expressions from a deep traversal to a simple pointer comparison. uc-klee

implements expression uniquing in a straightforward fashion using a global hash table that

it checks before creating each new expression.

7.1.2 Expression rewriting

While developing uc-klee, we introduced scores of rewrite rules for symbolic expressions

that (1) reduced SMT query cost, and (2) decreased the memory footprint of symbolic

expressions by reducing their size. SMT solvers [39, 48] typically apply their own rewrite

rules to simplify symbolic expressions while solving queries. Unfortunately, we observed that

in practice, solvers often fail to simplify even basic arithmetic operations. For example,

we observed frequent query timeouts involving integer remainder operations of the form

x % N < N (with an unsigned less-than operator), which is a tautology since a remainder

is always less than the divisor. As we developed uc-klee and applied it to real code,

we examined hundreds of query timeouts and added rewrite rules whenever we observed

simplification opportunities. Appendix A lists the rewrite rules we added to uc-klee that

are not included in the baseline klee system. In all cases, we believe that these rules are

broadly applicable, and not specifically tailored to the tens of thousands of functions we

checked.

While most rewrite rules provided a clear speedup, certain rules became too expensive

to apply in practice. For example, we experimented with recursively applying DeMorgan’s

Laws. While doing so provided additional simplification opportunities, many functions

generated large expressions with hundreds of And/Or expressions chained together. Applying

DeMorgan’s Laws to each subexpression resulted in significant overhead, sometimes causing

uc-klee to stall for seconds or even minutes at a time. Eventually, we disabled most


recursive rewrite rules and only applied simple rules that did not have pathological cases.

7.1.3 Expression attributes

Many operations in uc-klee involve examining symbolic expressions and extracting some

property. For example, uc-klee uses constraint independence [17] (as in klee) to simplify

SMT queries by eliminating constraints that do not affect the expression being queried.

Previously, constraint independence required klee to visit every subexpression to find ref-

erences to the symbolic inputs that determine whether two constraints are dependent on

one another. Through profiling, we discovered that repeatedly performing this operation

on each constraint added significant computational overhead. To eliminate this hotspot,

uc-klee augments each node in an expression with a set that caches the symbolic inputs

referenced by its operands (and their descendants). We store this set as a spare bit-vector,

and each byte of symbolic input referenced by a subexpression consumes a single bit in

most cases. When an expression is created, its symbolic inputs are the union of those

of its operands. Checking whether two constraints are independent reduces to taking the

intersection of their symbolic input sets, without visiting any subexpressions.

In addition, we introduced other expression attributes in order to eliminate expensive

traversals at the cost of minimal per-expression memory overhead. For example, we added

a single bit to each expression to indicate whether the expression included an uninitialized

value, which sped up the uninitialized data checker (Section 6.2). We included six other

attributes in each expression, but we omit details for brevity.

7.2 Lazy constraints

During our experiments, we faced query timeouts and low coverage for several benchmarks

that we traced to symbolic division and remainder operations. The worst cases occurred

when an unsigned remainder operation had a symbolic value in the denominator. To address

this challenge, we implemented a solution we refer to as lazy constraints. Here, we defer

evaluation of expensive queries until we find an error. In the common case where an error

does not occur or two functions exhibit semantic equivalence along a path, our tool avoids

ever issuing potentially expensive queries. When an error is detected, the tool re-checks

that the error path is feasible (otherwise the error is invalid).

Figure 7.1(a) shows a simple example. With eager constraints (the standard approach),


1 : int x = y / z;2 : if (x > 10) /* query: y / z > 10 */3 : . . .

(a) Eager constraints (standard)

1 : int x = lazy x; /* adds lazy constraint: lazy x = y / z */2 : if (x > 10) /* query: lazy x > 10 */3 : . . .

(b) Lazy constraints

Figure 7.1: Lazy constraint used for symbolic integer division operation.

the if-statement at line 2 triggers an SMT query involving the symbolic integer division op-

eration y / z. This query may be expensive, depending on the other paths constraints im-

posed on y and z. To avoid a potential query timeout, uc-klee introduces a lazy constraint

(Figure 7.1(b)). On line 1, it replaces the result of the integer division operation with a

new, unconstrained symbolic value lazy x and adds the lazy constraint lazy x = y / z to

the current path. At line 2, the resulting SMT query is the trivial expression lazy x > 10.

Because lazy x is unconstrained, uc-klee will take both the true and false branches fol-

lowing the if-statement. One of these branches may violate the constraints imposed on y

and z, so uc-klee must check that the lazy constraints are consistent with the full set of

path constraints prior to emitting any errors to the user (i.e., if the path later crashes).

In many cases, the delayed queries are more efficient than their eager counterparts

because additional path constraints added after the division operation have narrowed the

solution space considered by the SMT solver. If our tool determines that the path is

infeasible, it silently terminates the path. Otherwise, it reports the error to the user. One

caveat to note is that an infeasible path could adversely affect our tool’s statement coverage

metric if only infeasible paths have executed an instruction. We first added this feature to

uc-klee for our evaluation in Section 4.6, and we modified uc-klee to check the validity

of a path before marking a new instruction as covered to prevent an artificially inflated

coverage metric (Figure 5.4).

Lazy constraints are a form of abstraction refinement, where uc-klee relaxes the preci-

sion of its execution states in order to achieve a computational win (in many cases). At the

end of each path, the abstraction is verified to restore precision and eliminate inconsistent

execution paths. Unlike many prior abstraction refinement techniques such as CEGAR [21],

lazy constraints use abstraction to reduce query overhead rather than reduce the size of the

state space. In fact, lazy constraints may actually increase the size of the state space since

uc-klee may explore infeasible paths as a result of the over-approximation.

We believe lazy constraints may be more broadly applicable to symbolic execution tools.


Often, SMT query cost is a limiting factor in symbolic execution (Section 2.1.1). In many

cases, expensive queries are performed for execution paths that do not crash (and are there-

fore “uninteresting”). Lazy constraints could be used to concentrate a tool’s computational

efforts on paths that are of greater interest by speculatively executing past expensive sym-

bolic branches. For paths where an error does not occur, their feasibility is irrelevant. If

an error does occur, then an expensive query is worth the computational cost. We ex-

perimented in uc-klee by performing queries with short timeouts (e.g., two seconds) for

symbolic branches, and using lazy constraints when queries exceeded this timeout. If an

error later occurred along that path, we checked the lazy constraints using a higher timeout

(e.g., 30 seconds). This technique showed promising initial results, but we leave a more

formal evaluation to future work.

7.3 Path explosion in library functions

A major cause of path explosion for symbolic execution tools is library code such as memcmp

and strlen. uc-klee forks execution paths as the library code iterates over symbolic

strings or memory buffers, leading to path explosion. For example, strlen results in a new

execution path being forked after each byte in the string is checked for a NULL terminator.

Fortunately, alternative representations make it possible to mitigate this source of path

explosion without losing precision. The Z3 Theorem Prover [39] provides an if-then-else

(ITE) construct for representing conditional expressions. We implemented uc-klee-specific

versions of the functions memcpy, memmove, strchr, strrchr, strcmp, strncmp, and strlen

that recast branching code in terms of ITE chains in many cases. Figure 7.2 illustrates

an example symbolic expression capturing the result of executing strlen on a four-byte

symbolic string. uc-klee’s implementation of strlen also forks an error case to consider

strings without a NULL terminator. On that path, the code triggers an out-of-bounds

memory read, and uc-klee terminates the path (or executes the second version of the code

if performing cross-checking).

While complex symbolic expressions such as these may increase the cost of SMT queries

performed by Z3, we have found this to be a worthwhile tradeoff in practice. More generally,

Bjørner, et. al. [10] propose using a theory of strings to avoid explicit path explosion and

instead model string manipulations as higher-level operations. This approaches are related

to the idea of state merging [74] (Section 8.3.1), which combines similar paths that have


s[0] == 0

s[1] == 0

s[2] == 0

s[3] == 0

0

1

2

3 -1

T

T

T

T

F

F

F

F

Figure 7.2: Example symbolic if-then-else (ITE) expression for strlen.

previously forked execution. While state merging is more broadly applicable, our approach

preemptively avoids specific sources of path explosion.

Chapter 8

Experience

As we implemented uc-klee and used it to check real code over a period of about five

years, we encountered many practical challenges. This chapter details our experience in ad-

dressing a number of those challenges and presents three alternative approaches to symbolic

execution that we explored with unsatisfying results.

8.1 General symbolic execution

This section discusses our experience with two aspects of generalized symbolic execution:

modeling system calls and selecting execution states using search heuristics.

8.1.1 System modeling

As discussed in Section 2.1, symbolic execution requires a method for handling interactions

(i.e., system calls) between application code and the operating system. The underlying

klee tool provides a symbolic model that handles a useful subset of the POSIX system call

interface. For system calls not directly implemented by this model, klee concretizes any

symbolic arguments and executes the call as a native system call, leveraging the operating

system to provide the missing functionality.

In practice, klee’s approach of allowing checked code to interact with the operating

system caused our tool’s results to vary from run to run, especially when the checked

code mutated the system in an observable way (e.g., wrote to a file on disk). In these

cases, the order in which the search heuristics selected execution states determined how the

execution environment was mutated and which execution paths were consequently explored.

87

CHAPTER 8. EXPERIENCE 88

We often found that debugging and evaluating uc-klee was difficult since we could not

reliably reproduce some crashes.

A particularly interesting example arose when we symbolically executed code that

opened a directory using the POSIX library function opendir shown in Figure 8.1. This

function opens a directory specified by name and returns a pointer to a DIR struct that

can be passed to other POSIX functions to read from the directory. If the file descriptor

opened on line 6 corresponds to a concrete file (i.e., name is concrete or the POSIX model

has exhausted the user-specified symbolic files along that path), uc-klee will execute the

call to fstat on line 9 as a native system call.

During our early experiments, we observed that uc-klee achieved high coverage on

some programs when run on a local machine, but poor coverage when run on our (more

powerful) compute cluster. Eventually, we traced the problem to the native fstat system

call on line 9. The st blksize field returned by fstat determines the size of the directory’s

1 : DIR *opendir(const char *name) {2 : int fd;3 : struct stat statbuf;4 : DIR *ptr;5 : . . .6 : if ((fd = open(name, O RDONLY|O NDELAY|O DIRECTORY)) < 0)7 : return NULL;8 : . .9 : if (fstat(fd, &statbuf) < 0) /* UC-KLEE calls native fstat system call */

10: goto close and ret;11: . . .12: if (!(ptr = malloc(sizeof(*ptr))))13: goto nomem close and ret;14: . . .15: ptr−>dd max = statbuf.st blksize; /* block size returned by native system call */

16: if (ptr−>dd max < 512)17: ptr−>dd max = 512;18:19: if (!(ptr−>dd buf = calloc(1, ptr−>dd max))) { /* buffer size from native system call */

20: . . .21: set errno(ENOMEM);22: return NULL;23: }24: pthread mutex init(&(ptr−>dd lock), NULL);25: return ptr;26: }

Figure 8.1: uClibc implementation of opendir that depends on block size returned by the fstat

system call.


dd buf buffer allocated on line 19. When we ran uc-klee locally, fstat read from a local

ext4 file system with a 4KB block size. When we ran it on the cluster, however, fstat read

from an NFS mount with a 1MB block size. The much larger block size for NFS caused

operations on the DIR struct (which iterate over the dd buf buffer) to be significantly more

expensive. Unsurprisingly, uc-klee achieved poor coverage under these circumstances.

This example, combined with the difficulty we experienced in reproducing many bugs,

convinced us to abandon the old POSIX model and implement a new POSIX model that

did not perform any native system calls. The new model allows uc-klee to explore paths

in any order without the paths interfering, and it eliminates any observable effects from

running uc-klee on different machines. The downside is that this model only supports

system calls that it implements and cannot fall back to the operating system to provide

unimplemented functionality.

8.1.2 Search heuristics

For many large programs or functions, a symbolic execution tool cannot exhaust all exe-

cution paths (even with under-constrained symbolic execution) due to path explosion. In

these cases, while the tool will not verify the code, it is crucial for the tool to choose a

useful subset of execution paths to explore during the allotted time in order to find bugs

and maximize coverage.

Like the original klee system, uc-klee selects execution paths using a combination

of search heuristics. These heuristics include those used by klee, such as the distance

from the current program counter to the nearest uncovered instruction, whether a state has

recently covered new instructions, and a depth-based random path heuristic [17]. While

these heuristics are more effective than selecting states uniformly at random, they are far

from a panacea.

In our experience, the search heuristics have become a limiting factor in scaling uc-

klee to large codebases. As we have optimized other areas of uc-klee, we have in some

cases observed decreases in statement coverage that we attribute to the search heuristics

choosing uninteresting paths. For example, we significantly reduced the average memory

overhead of execution states, which increased the number of states that uc-klee can track

with a given amount of memory by approximately 5X. We expected this change to improve

statement coverage by allowing uc-klee to explore more states before hitting its memory

limit. Unfortunately, the search heuristics did not make effective use of this larger pool of


states, reducing coverage.

We tried a number of additional search heuristics in uc-klee with limited success. We

did not formally evaluate each heuristic using a representative set of benchmarks, but we

anecdotally observed some to be more effective than others. We describe each heuristic in

decreasing order of anecdotal effectiveness:

1. We prioritized execution states that have a high miss rate in the SMT query cache.

The intuition behind this heuristic is that states that perform SMT queries that miss

in the cache are likely to be “different” from other states that have been explored pre-

viously. States that are different may cover instructions that have not been previously

reached. Anecdotally, this heuristic was the most effective.

2. We de-prioritized fork bombs, which are instructions that cause uc-klee to fork many

execution states. Without correcting for this behavior, our searcher would dispropor-

tionately choose states produced by fork bombs, whose weights sum to a large fraction

of the total weight across all states. To avoid this, our searcher divides each state’s

weight by the number of sibling states produced by the state’s most recent symbolic

fork. The worst offenders we observed were symbolic switch statements with many

cases. BIND [9] includes several of these, with one case for each type of DNS resource

record.

3. We attempted to maximize coverage during cross-checking by prioritizing states that

did not crash in the first version of the function. We believed this heuristic would

help target bugs involving differences in output values (where neither path crashed).

4. We tweaked klee’s nearest uncovered instruction heuristic to consider all reachable

instructions (including those in called functions), not only those in the function being

checked.

5. We prioritized states that included “critical constraints,” which are path constraints

that satisfy branches leading to uncovered instructions. Unfortunately, this heuristic

only considered constraints that matched exactly, not those logically implied by other

constraints, which we believed to be prohibitively expensive.

While we have expended some effort in improving uc-klee’s search heuristics, we believe

there is ample opportunity for future work in this area.


8.2 Under-constrained symbolic execution

Section 3.2.2 described our experience in implementing lazy initialization, the backbone of

under-constrained symbolic execution. This section describes our experience with two other

challenges: false positives and backtracking.

8.2.1 False positives

The most significant shortcoming of under-constrained symbolic execution is the high rate

of false positives (spurious errors). In nearly all cases, uc-klee’s false positives are due to

missing preconditions on a function’s inputs. That is, uc-klee explores execution paths

that cannot arise in practice because a function’s callers never supply inputs that satisfy

the corresponding path condition (constraints). In rare cases, false positives may be due to

bugs in uc-klee itself or its implementation of pointer referents (Section 3.1).

As we developed uc-klee and used it to check real code from open source libraries

and the Linux kernel, we expended significant manual effort in determining whether errors

represented true bugs or false positives. Generally, this task requires understanding the

code being checked, so it is best suited to developers of each codebase. Over time, we

familiarized ourselves with the implementation details of large portions of the code from

BIND and OpenSSL in order to understand the errors uc-klee reported.

To determine whether each error was a true bug or a false positive, we examined the

functions’ call sites and arguments. Often, we traced the sources of these arguments further

up the call graph to determine whether the error-inducing inputs were feasible. In some

cases, we examined the code that manipulated data structures in order to determine whether

any obvious data structure invariants were being violated by those inputs.

The automated heuristics described in Section 5.2.2 helped to identify errors that were

less likely to be false positives by flagging errors that occurred for all input values following a

path. In practice, we used these heuristics to concentrate our efforts on a smaller population

of errors. Unfortunately, even most of these errors turned out to be false positives, and

we no doubt missed true bugs by overlooking the large population of errors that did not

satisfy these heuristics. We hope that future work will identify more effective techniques

for more easily distinguishing between true bugs and false positives, perhaps using ranking

schemes [71, 72] or inferred data structure invariants (e.g., length fields corresponding to

buffers).


Most of the false positives we encountered can be classified into three general categories

based on the missing preconditions/invariants that caused them:

1. Data structure invariants, which apply to all instances of a data structure.

2. State machine invariants, which determine the sequence of allowed values and the

variable assignments that may be held concurrently.

3. API invariants, which determine the legal inputs to API entry points.

Figure 5.1 (p. 54) illustrates an example false positive due to a data structure invariant

on the isc region t type. As discussed in Section 5.2.1, uc-klee fails to infer that the

length field corresponds to the size of the buffer pointed to by the base field.

Figure 8.2 illustrates a state machine invariant that relates directly to the SSL/TLS

protocol semantics. In this example, uc-klee reported a spurious memory leak in the

OpenSSL function dtls1 buffer message. Line 6 allocates a new object, and line 13

inserts that object into the per-connection priority queue sent messages. In OpenSSL,

priority queues require distinct priorities for each entry, and pqueue insert returns NULL

if the priority already exists in the queue. In this example, the priority corresponds to

the message sequence number, which strictly increases and precludes any collisions in the

queue. uc-klee is unaware of this invariant and erroneously reports the memory leak.

The false positive in Figure 8.3 can best be categorized as a state machine invariant.

uc-klee reported this false positive in a Linux kernel wireless network driver as the result

1 : memset(seq64be,0,sizeof(seq64be));2 : seq64be[6] = (unsigned char)(dtls1 get queue priority(frag−>msg header.seq,3 : frag−>msg header.is ccs)>>8);4 : seq64be[7] = (unsigned char)(dtls1 get queue priority(frag−>msg header.seq,5 : frag−>msg header.is ccs));6 : item = pitem new(seq64be, frag);7 : if ( item == NULL)8 : {9 : dtls1 hm fragment free(frag);10: return 0;11: }12: . . .13: pqueue insert(s−>d1−>sent messages, item); /* returns NULL if seq64be already in queue */

Figure 8.2: False positive (memory leak) in OpenSSL function dtls1 buffer message caused byduplicate sequence numbers, which are infeasible in practice.


1 : channels = kcalloc(spec−>num channels, sizeof(*channels), GFP KERNEL);2 : if (!channels)3 : return −ENOMEM;4 : . . .5 : if (spec−>supported bands & SUPPORT BAND 2GHZ) {6 : . . .7 : rt2x00dev−>bands[IEEE80211 BAND 2GHZ].channels = channels;8 : . . .9 : }10: . . .11: if (spec−>supported bands & SUPPORT BAND 5GHZ) {12: . . .13: rt2x00dev−>bands[IEEE80211 BAND 5GHZ].channels = &channels[14];14: . . .15: }16:17: return 0;

Figure 8.3: False positive (memory leak) in Linux kernel function rt2x00lib probe hw modes

caused by unknown device invariants.

of missing preconditions on the supported bands field. The code allocates memory on line

1, and pointers to this memory may escape on lines 7 or 13. If the hardware capability

specification supports neither the 2 GHz nor 5 GHz bands, then the memory is leaked.

However, the initialization code for each of the supported wireless devices always sets the

supported bands field to include one or both of these bands.

Finally, Figure 8.4 illustrates a false positive tied to the API semantics. When the

SSL OP NO QUERY MTU option is used, the user must specify a maximum transmission unit

(MTU) for the connection by calling SSL set mtu, and the user must supply an MTU

greater than or equal to the value returned by dtls1 min mtu. Otherwise, the assertion

at line 17 fails. In uc-klee, line 17 triggers a must-fail error (Section 5.2.2) because the

if-statement on line 1 adds the path constraint that the MTU is too small. While the

assertion failure could arise in practice, it would indicate a bug in the client application

code, not the OpenSSL library code.

8.2.2 General bug finding

Chapters 4 and 5 use uc-klee for cross-checking two versions of a function, and Chapter 6

applies targeted checkers to a single version of a function. We did not formally evaluate

uc-klee as a tool for finding general bugs (e.g., out-of-bounds memory accesses) in a


1 : if (s−>d1−>mtu < dtls1 min mtu() && !(SSL get options(s) & SSL OP NO QUERY MTU))2 : {3 : s−>d1−>mtu =4 : BIO ctrl(SSL get wbio(s), BIO CTRL DGRAM QUERY MTU, 0, NULL);5 :6 : /* I’ve seen the kernel return bogus numbers when it doesn’t know7 : * (initial write), so just make sure we have a reasonable number */8 : if (s−>d1−>mtu < dtls1 min mtu())9 : {10: s−>d1−>mtu = 0;11: s−>d1−>mtu = dtls1 guess mtu(s−>d1−>mtu);12: BIO ctrl(SSL get wbio(s), BIO CTRL DGRAM SET MTU,13: s−>d1−>mtu, NULL);14: }15: }16: . . .17: OPENSSL assert(s−>d1−>mtu >= dtls1 min mtu()); /* should have something reasonable now */

Figure 8.4: False positive (assertion failure) in OpenSSL function dtls1 do write caused by un-known API invariants.

single version of a function because our informal experiments with this use case reported

an unmanageably high rate of false positives, even when we used our automated heuristics

and manual annotations (Section 5.2.1). We did find a number of interesting bugs while

exploring this use case, including an OpenSSL “high severity” denial-of-service vulnerability

for which security advisory CVE-2015-0291 [34, 55] was issued. Figure 8.5 shows the code

responsible for this bug, which involved the TLS 1.2 signature algorithms extension [41].

Line 7 sets shared sigalgs to NULL if the connection is being re-negotiated, but it leaves

shared sigalgslen unchanged. This triggers a NULL pointer dereference at line 29 if the

return at line 13 is taken. We manually developed a working exploit using uc-klee’s error

report, and we included this in our bug report to the developers.

8.2.3 Backtracking

To choose appropriate allocation sizes for lazy initialization and to handle negative symbolic

pointer offsets, uc-klee supports the backtracking technique described in Section 3.2.3.

With this technique, uc-klee checkpoints execution states just before performing lazy

initialization when an unbound symbolic pointer is dereferenced. It then records all the

branch decisions taken by that execution state following the checkpoint. If the path reads

from the lazily-initialized object out-of-bounds or at a negative offset, uc-klee restores the


1 : /* Set shared signature algorithms for SSL structures */2 : static int tls1 set shared sigalgs(SSL *s)3 : {4 : . . .5 : if (c−>shared sigalgs) {6 : OPENSSL free(c−>shared sigalgs);7 : c−>shared sigalgs = NULL;8 : /* c->shared sigalgslen NOT set to 0 */

9 : }10: . . .11: nmatch = tls12 shared sigalgs(s, NULL, pref, preflen, allow, allowlen);12: if (!nmatch)13: return 1;14: . . .15: nmatch = tls12 shared sigalgs(s, salgs, pref, preflen, allow, allowlen);16: c−>shared sigalgs = salgs;17: c−>shared sigalgslen = nmatch;18: return 1;19: }20: . . .21: int tls1 process sigalgs(SSL *s)22: {23: . . .24: if (!tls1 set shared sigalgs(s))25: return 0;26:27: for (i = 0, sigptr = c−>shared sigalgs; /* sigptr set to NULL */

28: i < c−>shared sigalgslen; i++, sigptr++) {29: idx = tls12 get pkey idx(sigptr−>rsign); /* NULL pointer dereference */

30: . . .31: }32: . . .33: }

Figure 8.5: OpenSSL NULL pointer dereference bug/denial-of-service vulnerability (CVE-2015-0291).

checkpoint, adjusts the allocation size and offset appropriately, and replays the path suffix

between the checkpoint and the out-of-bounds access.

In practice, replaying the path suffix requires paths (and uc-klee) to be fully deter-

ministic. Any non-determinism may cause the replayed path to deviate from the original

path, which is unacceptable. Internally, uc-klee asserts that replayed paths always take

the same sequence of branch decisions. If a branch decision taken previously is infeasible

during replay, uc-klee crashes to ensure that we examine the cause and fix it. As we devel-

oped uc-klee, these crashes were a periodic occurrence, and we discovered and corrected


many sources of non-determinism over the four years since we first introduced backtrack-

ing. We also used the replay mechanism for general debugging of uc-klee. Our tool can

emit the branch decision sequence for an execution path to disk, and we can replay that

path after fixing bugs in uc-klee. The non-determinism we faced for backtracking also

interfered with our ability to replay paths for debugging.

The trickiest source of non-determinism we encountered involved the addresses of mem-

ory objects. Code that branches based on the addresses returned by malloc uses undefined

behavior in C, but uc-klee should nonetheless support such code, which occurs frequently

in practice. As mentioned in Section 2.3, uc-klee only considers one concrete address per

memory object, so it may miss paths through this code. Here, we are only concerned with

exploring a consistent set of branches through this code, not in capturing all possible paths

through it.

When checked code calls malloc in the underlying klee tool, klee invokes malloc

within its own address space. It then creates a memory object within the current execution

state and assigns it the concrete address that was returned by malloc. Clearly, if uc-klee

were to take this approach during replay, objects would likely be assigned differing addresses,

an obvious source of non-determinism. Initially, we eliminated this non-determinism by

recording the addresses returned by malloc and replaying these alongside the branch deci-

sions. As an aside, uc-klee also attempts to assign the same concrete addresses to memory

objects in other execution states, which, in practice, significantly improved the hit rate of

our SMT query cache (for queries that included the addresses of objects).

A second source of non-determinism we encountered was the concretization of floating

point values and memory allocation sizes. When checked code performs floating point oper-

ations on symbolic values, uc-klee queries the SMT solver for a single satisfying concrete

value and constrains the symbolic input to that value, concretizing it. If the checked code

later branches based on this value, the branch decision could become infeasible during replay

if the replayed path uses a different concrete value. Until we added support for symbolically-

sized objects (Section 3.2.3), a similar issue arose when checked code passed a symbolic

value as the size argument to malloc. We fixed both of these sources of non-determinism

by recording the concretized values and explicitly replaying them during backtracking.

A third source of non-determinism arose where uc-klee forked more than two execution

states at a time. When checked code encounters a switch statement with a symbolic value,

uc-klee forks execution states for each satisfied case. If the code dereferences a (bound)


symbolic pointer that can point to multiple memory objects, uc-klee forks execution states

for each feasible memory object. In these cases, the paths must follow the same case

or resolve to the same object, respectively, during replay. We fixed this source of non-

determinism by pre-sorting each of the cases/objects before invoking uc-klee’s internal

fork routine.

A final source of non-determinism arose from klee’s old POSIX modeling code, which

interacted with uc-klee’s execution environment in unpredictable ways (see Section 8.1.1).

The new POSIX model we implemented to avoid any interactions with uc-klee’s execu-

tion environment removed this source of non-determinism. There were additional (minor)

sources of non-determinism we encountered and fixed in uc-klee, but we omit these for

brevity.

8.3 Alternative approaches

While working with uc-klee, we explored three novel approaches to symbolic execution

that we hoped would significantly mitigate the path explosion problem. We dedicated 3-6

months of research effort to each approach, but none of the approaches provided a significant

win. In this section, we discuss each of the approaches and the lessons we learned from

implementing them.

8.3.1 State merging

Once two execution paths have diverged following a symbolic branch (i.e., if-statement),

a symbolic execution tool explores them independently. In many cases, the two paths

subsequently converge by reaching the same point in the program. For example, if one path

takes the true branch of an if-statement and the other takes the false branch, both paths

will execute the basic block following the if and else clauses (i.e., the post-dominator in the

control flow graph [2]). For many programs, the execution states will have nearly identical

memory contents when the two paths reach this point. Exploring them independently wastes

both time—each instruction must be executed twice—and space—the states use separate

storage (although they share most memory state via a copy-on-write mechanism).

An alternative to exploring the two paths independently is to merge their states once

they have converged at a given program point. Their merged memory contents may be

expressed using symbolic if-then-else (ITE) expressions, and their merged path condition


(constraints) may be expressed as a disjunction of the two path conditions.

Conceptually, state merging has the potential for an exponential win over traditional

symbolic execution without any loss in precision. Unfortunately, it presents a number of

difficult tradeoffs in practice. Merged states are often more expensive to execute and reason

about than simple states. This cost stems from their disjunctive path conditions and ITE

expressions in memory, which often yield expensive SMT queries. In many cases, the cost

of these queries can dwarf any potential benefits of merging the states. Worse, it is often

difficult to tell a priori whether merging two states will provide a benefit that outweighs

the costs. We expended significant engineering effort in minimizing the costs of the merged

states by performing algebraic simplifying on the disjunctive path conditions, but this pro-

vided only a marginal benefit (perhaps because the SMT solver already performed its own

simplifications).

A second tradeoff concerns when to merge states, since two states must be at the same

program location in order to be merged. One approach is to stall the first state at the imme-

diate post-dominator and wait for the other state to arrive at that location. Unfortunately,

the other state may perform significant computation before arriving at that location, or it

may never arrive. For example, it may return from the function, crash, or enter an infinite

loop. Stalling the first state prevents it from making progress, and choosing how long to

stall a state before “giving up” is a difficult policy decision. For branches that fork more

than two states (i.e., switch-statements), this tradeoff is even more difficult. Worse still,

if-statements are often nested, and the tool needs to decide which branches to target for

merging (e.g., innermost, outermost, or all branches).

Anecdotally, we found that programs for which uc-klee already exhausted all execution

paths derived the most benefit from state merging. For these programs, many of the policy

decisions and tradeoffs were irrelevant since the tool would complete all paths regardless of

these decisions. For example, stalling states indefinitely when they reached the immediate

post-dominator always succeeded since there could not be any infinite loops or prohibitively

expensive paths through these programs (otherwise they would not have terminated without

state merging).

In concurrent work, Kuznetsov, el. al. [74] reported promising results from using a

query count estimation heuristic to avoid merging states that were likely to trigger many

subsequent SMT queries at statically-reachable branches. While their overall approach

reported no significant improvement in statement coverage (and in many cases caused a


decrease in coverage), their results roughly confirm our own experience: state merging

provides a clear speedup for programs that can be exhaustively explored even without state

merging, but it has not been shown to improve coverage for programs for which symbolic

execution does not terminate.

We also found that state merging was difficult to implement in conjunction with under-

constrained symbolic execution, in particular our backtracking technique (Section 3.2.3).

When an execution state backtracks to increase an allocation size, it must replay the same

series of branch decisions in the path suffix between the allocation and the current pro-

gram point. A merged state represents a composite of multiple (possibly many) divergent

execution paths. To avoid losing precision, each of these paths must be replayed indepen-

dently and then re-merged. In practice, this was extremely difficult to implement effectively,

and the cost of replaying so many states was often prohibitive. Eventually, we decided to

abandon state merging, although it may be useful in some circumstances.

8.3.2 Path refinement (top-down)

Typically, most paths explored by a symbolic execution tool are uninteresting because they

do not crash, or there is no difference between two versions of a function (when cross-

checking). To reduce the amount of computation spent executing uninteresting paths, we

implemented a novel abstraction technique we called path refinement. At a high level, path

refinement works in a top-down manner by first executing abstract paths. Abstract paths are

execution paths where portions of the paths (e.g., function calls) are skipped. The behavior

of the skipped portions is modeled as a conservative over-approximation (with the usual

caveats) using under-constrained symbolic values.

In the vast majority of cases where an abstract path does not crash or is equivalent in

both versions, there is no further need to reason about the path, and significant computation

has been avoided. In the cases where a path does crash or exposes a difference between

the two versions, the tool refines the path by re-executing it, this time visiting some of the

skipped portions of the path. During refinement, the tool may discover that the path is

infeasible, an artifact of the over-approximation. In this case, the tool silently terminates

the path. The tool iteratively refines each path until either (1) the path is found to be

infeasible, or (2) it is fully-refined (no portion of the path is skipped). In the latter case, the

tool has discovered a complete execution path exposing a crash or difference in the code.

Path refinement is closely related to counterexample-guided abstraction refinement


(CEGAR) [21], an approach used in model checking. CEGAR uses an abstract over-

approximation to prove invariants. If an abstract model satisfies an invariant, then the

concrete (original) model must also satisfy the invariant. In path refinement, this corre-

sponds to an abstract path not crashing (or exposing a difference between two versions),

in which case we do not have to explore it further. If the abstract model in CEGAR fails

to satisfy an invariant, the tool uses an arbitrary counterexample to check whether the

concrete model also fails the invariant for that counterexample. If so, the tool has found a

concrete violation of the invariant (i.e., a bug). If not, CEGAR refines the abstraction and

repeats its analysis. With path refinement, we have no way to check whether a counterex-

ample satisfies some fully-refined execution path, so we always perform the refinement step

if an error occurs along an abstract path.

The effectiveness of path refinement hinged upon an early assumption we made: the

computation eliminated by using abstraction would outweigh the cost of executing the in-

feasible paths (from over-approximation). Unfortunately, this assumption proved incorrect

in practice, and our tool spent most of its execution time on paths that it would later find

to be infeasible. Worse, even when the tool came across real bugs in the code, it often failed

to find a fully-refined path to the bug. Most often, this occurred because a single infeasi-

ble branch would cause our tool to terminate the path during the refinement step, even if

that branch was irrelevant to the bug. It may be possible to use dynamic slicing [1, 70] to

consider only the branches that affect a bug, but we did not implement this.

Aside from its ineffectiveness, implementing path refinement turned out to be much

more complex than we had anticipated. During each refinement step, the abstract path

must be reconciled with the refined path to determine whether the path is feasible. When

the refined path returns from a function call that was skipped during the abstract run, it

needs to (1) check whether the return values and outputs are consistent (satisfy the path

condition from the abstract path), and (2) add path constraints so that the remainder of

the path will follow the same set of branch decisions as the abstract path. In practice, there

were many complex cases to handle involving bound/unbound symbolic pointers, and our

code quickly became unmanageably complex. Eventually, we abandoned path refinement.

8.3.3 Path composition (bottom-up)

After path refinement, we explored a technique at the opposite end of the spectrum: path

composition. With path composition, we begin by executing individual functions (as in


general under-constrained symbolic execution) and looking for crashes in those functions.

For paths that crash, we then examine the calling functions to see if any of those functions

may pass arguments that trigger the crash in the original function. Conceptually, this

process may be repeated until main is reached, and the tool will have discovered a complete

execution path to the crash. In a sense, this technique “works backwards” from the site of

a crash and attempts to discover a path from the program entry point. We expected this

targeted technique to be more effective than traditional symbolic execution, which begins

at main and “hopes” to stumble upon a crash.

Unfortunately, this technique turned out to be ineffective for reasons similar to why path

refinement failed. When the caller invokes the original function (in which the bug occurred),

the tool checks whether the arguments will trigger the crash. It does so by examining the

path constraints from the original crash and seeing whether the arguments satisfy those

constraints. Unfortunately, it expects the arguments to satisfy all of the constraints, even

if only a subset are necessary in order to trigger the bug. As with path refinement, dynamic

slicing [1, 70] may prove useful here, but we did not implement this technique in uc-klee.

8.3.4 Conclusions

This section briefly discussed three approaches to improve symbolic execution’s scalabil-

ity without sacrificing the ability to soundly prove the existence of bugs (i.e., without

introducing false positives). The scalability of symbolic execution is primarily limited by

path explosion, which is usually the result of a tool independently exploring paths that

are mostly identical. To avoid this redundancy, path refinement introduces an initial over-

approximation that groups paths into equivalence classes that are distinguished only when

necessary (i.e., at least one of the constituent paths has a bug). Unfortunately, applying

symbolic execution to abstract paths created additional path explosion, largely involving

infeasible paths. This suggests that unguided path exploration may be a more significant

cause of scalability limitations than the state representation. Our experience with state

merging bolsters this conclusion, since neither we nor Kuznetsov, el. al. [74] achieved an

improvement in coverage despite an exponential reduction in the number of execution states

that must be explored independently. Consequently, future research may derive a greater

benefit from guiding path exploration toward interesting goals than from reducing the cost

of unguided exploration. Path composition represents one step in this direction, but our

initial approach did not sufficiently guide exploration when examining a function’s callers.

Chapter 9

Related work

This chapter discusses work related to under-constrained symbolic execution, including

traditional symbolic execution, equivalence verification, runtime checking, static analysis,

and model checking.

9.1 Symbolic execution

This dissertation builds on prior work in symbolic execution. Boyer, et. al. [12] first intro-

duced symbolic execution to check user-specified assertions in the context of Lisp programs.

Their initial system detected only assertion failures and did not flag out-of-bounds memory

errors, divisions by zero, or other types of bugs.

More recently, EXE [18, 19] instrumented C programs to perform mixed concrete-

symbolic execution using a source-to-source translator and runtime library. EXE tracked

execution paths as separate processes and used the native fork system call whenever code

branched based on a symbolic value. Engler and Dunbar [45] implemented an early ver-

sion of under-constrained symbolic execution, the focus of this dissertation, using EXE.

klee [17] improved upon (baseline) EXE by tracking complete execution states within a

single process as a “symbolic virtual machine.” This approach avoided the need for expen-

sive fork system calls and gave the tool greater control over path exploration (e.g., not

relying on the kernel to schedule paths). klee targets whole programs for symbolic execu-

tion, while our uc-klee tool, which extends klee, aims to execute partial programs (e.g.,

individual functions) and automatically synthesizes their symbolic inputs. An alternative

approach, Java PathFinder [4, 69, 99], performs symbolic execution on Java programs and,

102

CHAPTER 9. RELATED WORK 103

unlike klee, can find concurrency bugs.

Concurrent work has improved upon klee or extended it to interesting use cases. Boon-

stoppel, et. al. [11] and Bugrara and Engler [16] significantly reduced path explosion in klee

by pruning execution states that cannot cover new instructions because their live values are

redundant with respect to previously-explored paths. Kuznetsov, et. al. [74] used state

merging (see Section 8.3.1) to address path explosion without pruning. Their experimental

results show an exponential win for programs that klee can exhaustively explore even with-

out merging, but they do not demonstrate an improvement in statement coverage for other

programs. Cloud9 [15] parallelizes klee to concurrently explore many execution paths on

compute clusters. klee-mc [102, 103] extends klee to symbolically execute x86 binaries.

KleeNet [106] uses klee to debug wireless sensor networks by injecting non-deterministic

network failure events. klee has also been used to handle end-user bug reports and repro-

duce errors for debugging [65, 123].

Woodpecker [27] extends klee to check system-specific rules, similar to uc-klee’s check-

ers (Chapter 6). Unlike uc-klee, Woodpecker applies to whole programs, so we expect it

would not scale well to large systems. However, Woodpecker aggressively prunes execution

paths that are redundant with respect to individual checkers, a technique that would be

useful for certain checkers in uc-klee, such as the leak checker. Checkers such as the unini-

tialized data checker and user input checker would likely not benefit significantly from this

pruning because a large percentage of instructions (e.g., all loads and stores) are relevant

to these checkers.

An alternative to full symbolic execution is concolic testing (a portmanteau of concrete

and symbolic). DART [52] initially generates random concrete inputs and executes the pro-

gram with those inputs. It collects symbolic integer linear constraints as the path executes,

and then it uses these constraints to redirect branches by negating their predicates in order

to reach additional code. CUTE [109] is similar to DART but supports a class of pointer

constraints and a symbolic memory model. However, neither DART nor CUTE can reason

precisely about reads and writes at symbolic memory offsets, which both EXE and klee

support. SAGE [53] scales concolic testing to larger applications by using native x86 pro-

gram traces and a generational search strategy for redirecting branches. Related to these

approaches, ZESTI [84] extends klee to use existing concrete regression test suites as a

starting point. It then redirects branches in a manner similar to concolic testing in order

to generate tests that cover new instructions.


9.1.1 Patches

Recent work has utilized symbolic execution in the context of checking program changes.

DiSE [98] aims to make symbolic execution incremental by only executing paths affected by

a patch. DiSE performs whole program symbolic execution but prunes unaffected paths us-

ing intra-procedural static analysis to identify control and data dependencies. By contrast,

uc-klee achieves a huge computational win by directly executing the functions affected by

a patch. However, uc-klee uses a more crude control flow reachability analysis for path

pruning and would likely benefit from DiSE’s pruning technique.

Other recent work has used symbolic execution to generate regression tests exercising

the code changed by a patch [83, 85, 100]. These approaches use existing regression tests

as a starting point and greedily redirect symbolic branch decisions toward a patch, as in

general concolic testing. While these techniques are effective at generating high-coverage

tests in an automated fashion, they provide few correctness guarantees because they only

consider a small subset of the paths from program entry to the patch. By contrast, uc-klee

performs symbolic execution directly on the patched functions (Chapter 5), considering all

possible intermediate program values as inputs (with caveats).

9.2 Equivalence verification

Chapters 4 and 5 relate to prior work in equivalence verification. Differential Symbolic

Execution (DSE) [97] characterizes the differences caused by a patch by abstracting away

unchanged portions of the code using summaries and uninterpreted functions, similar to

regression verification [54]. While this technique is capable of verifying when two versions of

the code must be equivalent up to a given depth bound, it introduces an additional source

of false error reports not present in uc-klee. Our approach soundly executes complete

paths through each patched function, eliminating this source of false positives. Impact

Summaries [6] apply the ideas from DiSE [98] to equivalence verification to soundly prune

paths and ignore constraints unaffected by a patch. This work only considers non-error

paths and cannot detect when a patch introduces a new crash. However, Impact Summaries

are complementary to our approach and may help uc-klee to avoid redundant execution

paths.

Other recent work has used symbolic execution to verify parallel numerical applications.

Given parallel and sequential implementations of an application, one approach [110] uses a


model checker to first explore all paths through the sequential version. For each sequential

path, a model checker then explores all possible event orderings through the parallel version.

The symbolic outputs are compared, and a difference is reported to the user if any event

orderings cause a discrepancy from the sequential output. Another approach [111] uses in-

ferred loop invariants to verify parallel numerical applications running on a fixed number of

processors. klee-fp [24] uses symbolic execution to verify the bounded equivalence of base-

line floating-point applications and their optimized Intel SSE (SIMD) implementations. It

uses a strict definition of floating-point equivalence and pattern matching to avoid reasoning

about floating-point values in the SMT solver, which is unsupported. klee-cl [25] extends

klee-fp to check the equivalence of sequential applications and parallelized OpenCL [113]

versions.

SymDiff [75] provides a scalable solution to check the equivalence of two programs up

to a fixed level of loop unrolling, using uninterpreted functions to model function calls and

the heap. As with DSE [97], this approach achieves scalability at the expense of precision,

reporting many false differences. Differential assertion checking (DAC) [76] applies SymDiff

to the problem of detecting whether properties that hold in one version of a program

also hold in the other, a generalization of error/crash equivalence. However, DAC suffers

from the imprecisions of SymDiff, especially in cases where function calls are reordered

by a patch. While an experimental comparison between our system and DAC would be

illuminating, DAC is not publicly available. Abstract semantic differencing [95] achieves

scalability through clever abstraction, while minimizing the information loss by correlating

variables in the two programs. While this technique may be able to prove the equivalence

of programs for which uc-klee cannot exhaust all paths, it suffers from additional false

positives due to over-approximation. Also related to our approach is Semantic Diff [64],

which describes an early tool for detecting program differences by examining changes to

input-output dependencies, rather than comparing the values of the outputs.

Earlier work in equivalence checking focused on combinational circuits in hardware [22,

73, 87]. While an important milestone, hardware verification is simpler than general purpose

software equivalence checking, which includes loops, complex pointer relationships, and

other difficult constructs.

Smith and Dill [112] verified the correctness of real-world block cipher implementations.

Their work exploits the key properties that block ciphers have fixed input sizes and loop

iterations, enabling full loop unrolling. They developed several constraint optimizations


that may apply to general-purpose equivalence checking.

As an alternative to offline analysis, a number of solutions have been proposed to im-

prove the reliability of patched programs at runtime. Multi-version Execution [60] exploits

parallel hardware resources to run two versions of a program simultaneously. If one of the

versions misbehaves, the output from the other is used. Delta Execution [114] proposes a

similar approach but only performs redundant computation when executing the patched

portions of a program. ClearView [96] automatically patches programs at runtime in re-

sponse to undesirable program behavior. Finally, DORA [117] deterministically records

and replays program executions for debugging, even if a program has been patched. While

these approaches help to mitigate the effects of bugs in the field, they only detect observable

crashes and are far from an effective substitute for discovering and removing bugs before

code is released.

9.3 Runtime checking

In contrast to symbolic execution, which aims to explore all possible execution paths through

a program, runtime checking techniques examine a single execution path supplied with a

single set of concrete inputs by the user. Consequently, they cannot discover bugs on alter-

nate code paths. Purify [56] statically rewrites program binaries to check for memory access

errors and memory leaks at runtime. Valgrind [90], PIN [82], and DynamoRIO [14] provide

flexible, low-level interfaces for dynamic binary instrumentation and rewriting. These inter-

faces can be used to implement checkers similar to those in uc-klee (Chapter 6), but for

concrete executions. Valgrind’s included memcheck tool detects memory access errors, uses

of uninitialized data, and memory leaks. CCured [89] uses a combination of static analysis

and runtime checks to detect pointer errors. SWAT [57] uses static binary instrumentation

and random sampling at runtime to find memory leaks. Our user input checker (Section 6.3)

relates to prior work in dynamic taint analysis, including TaintCheck [93], Dytan [23], and

work by Larson and Austin [77]. Micro execution [51] applies a technique related to lazy

initialization to generate inputs on-the-fly for native x86 functions running inside a virtual

machine.


9.4 Static analysis

Static bug-finding techniques analyze program source code and can discover bugs without

actually executing the code. Doing so allows static tools to analyze very large codebases

for which symbolic execution is impractical, and they can check difficult-to-execute code

such as an operating system kernel. On the other hand, static tools cannot reason fully

about program state (e.g., exact values in memory). Consequently, they typically cannot

detect bugs that rely on deep program properties. Under-constrained symbolic execution,

the focus of this dissertation, partially bridges the gap between static analysis tools and

whole-program symbolic execution by directly executing individual functions, even within

an operating system kernel.

Meta-level compilation (MC) [44] checks system rules using manually-written checkers

that are represented as simple state machines and applied to program source in a flow-

sensitive manner. Scalability is achieved by aggressively pruning redundant paths based

on checker states. Subsequent work extended MC [46] to infer checker rules automatically

from beliefs implied by the code being checked. This was the inspiration for our must-fail

and belief-fail heuristics (Section 5.2.2).

Saturn [121] uses Boolean constraints to more precisely model program state and check

user-specified correctness predicates. It uses lazy initialization, as in uc-klee, to generate

symbolic inputs, with distinct Boolean variables representing each input bit. Saturn achieves

scalability using fixed loop unrolling and function summaries to model the effects of function

calls. Saturn has also been used for memory leak detection [120].

Related to portability (Section 5.3.3), Stack [119] uses static analysis to find bugs caused

by unstable code, which may be optimized in unexpected ways by a compiler. uc-klee can

be used to prove that specific llvm optimizations alter behavior or verify (with caveats)

that they do not. However, Stack is more generally applicable to unstable code affected by

any (current or future) compiler.

9.5 Model checking

uc-klee does not require the user to provide a manual specification of correct program

behavior (although the user may suppress false positives using manual annotations). While

this makes uc-klee trivial to use, our tool can only find general errors such as improper

pointer dereferences but cannot detect when code violates higher-level application semantics


beyond any assertions included in the program source.

An alternative approach is model checking [7, 13, 22, 26, 50, 58, 59, 61, 88, 118], which

typically requires a complete functional specification to be provided by the user. The im-

plementation is then checked for correctness against this specification. Unfortunately, con-

structing specifications is often as time consuming as developing the implementation itself.

In addition, specifications are error-prone, particularly if written by the implementors of

the checked code (who may apply the same mistaken assumptions to both representations).

CBMC [22] verifies hardware Verilog implementations against a specification written in

C up to a given loop bound. Java PathFinder [118] was first used for model checking Java

programs before being extended to support general symbolic execution [4, 69, 99]. Finally,

recent verification work has checked code manipulating complex data structures against

manually constructed specifications [40, 42, 49, 86].

Chapter 10

Conclusions

This dissertation has presented under-constrained symbolic execution, an alternative to

traditional, whole program symbolic execution designed to enhance scalability, find real

bugs, and provide bounded verification guarantees (with caveats). We have described uc-

klee, our implementation of under-constrained symbolic execution targeting real, com-

plex C/C++ systems code. Using this tool, we have evaluated three use cases of under-

constrained symbolic execution. First, we used it to verify the equivalence of small C library

routines. Second, we used it to check hundreds of patches from BIND and OpenSSL, found a

dozen bugs, and verified (with caveats) that 115 patches did not introduce new crashes. Fi-

nally, we used uc-klee in conjunction with three checkers to analyze over 20,000 functions

from BIND, OpenSSL, and the Linux kernel and discovered 67 new bugs.

10.1 Future work

There are several potential avenues for future work in under-constrained symbolic execution.

The first and most significant is managing or mitigating false positives (spurious errors).

One possible direction would be to develop ranking schemes to identify the errors that

are most likely to represent true bugs. The ranking could be based on a combination of

heuristics, perhaps including the number/proportion of paths on which an error occurs [72],

the distance between the most recent relevant branch and the error, or user feedback about

other reported errors [71]. An alternative strategy might be to infer the likelihood of

different path constraints based on dynamic information (i.e., predicates) collected from

instrumented runs, such as that used in cooperative bug isolation [80].

109

CHAPTER 10. CONCLUSIONS 110

uc-klee currently requires manual effort for specifying function pointers (Section 3.4)

and sources of untrusted user input (Section 6.3). Our tool would benefit from future work

to automate these tasks through static analysis or dynamic instrumentation. In addition,

uc-klee would explore a greater variety of execution paths and provide stronger verification

guarantees if it had a method for manually specifying or automatically inferring aliasing

information between symbolic pointers (Section 3.2.1). Future work could also allow uc-

klee to support patches that change data structure definitions (e.g., adding or removing

fields) by mapping the symbolic fields passed to each version of a function rather than

passing byte-for-byte identical inputs. As in general symbolic execution, uc-klee would

benefit from improved search heuristics (Section 8.1.2) to identify the most interesting paths

to explore within the prescribed time limit.

Currently, uc-klee considers all path constraints as equally important. In practice,

bugs typically depend on a subset of the path constraints, and portions of an execution

path (e.g., some if-statements) typically do not control whether a bug is triggered. uc-

klee’s error reports would be easier to understand if the tool used dynamic slicing [1, 70]

to identify the specific constraints and portions of the execution path that led to an error.

This information would also aid users in determining which errors represent false positives

by eliminating irrelevant path constraints. The exact execution path uc-klee follows to

trigger a bug may not be feasible in practice, but the error is a true bug as long as the

relevant path constraints are feasible.

Appendix A

SMT Rewrite Rules

This appendix lists the SMT rewrite rules we introduced into uc-klee that were not part of

the baseline klee system. Table A.1 describes the types of SMT expressions we optimized in

uc-klee. Table A.2 describes the conventions used in this appendix to represent constants

and variables. The remainder of this appendix lists the SMT rewrite rules. For certain rules,

uc-klee only applies the rewrite under certain circumstances, described in the Condition

column.

Add AdditionSub SubtractionMul MultiplicationUDiv Unsigned divisionSDiv Signed division

URem Unsigned remainderSRem Signed remainderNot Logical/bitwise negationAnd Logical/bitwise conjunctionOr Logical/bitwise disjunction

Xor Logical/bitwise exclusive disjunctionShl Bitwise left shift

LShr Logical bitwise right shiftAShr Arithmetic bitwise right shift

Eq EqualityUlt Unsigned less-thanUle Unsigned less-than-or-equal-toSlt Signed less-thanSle Signed less-than-or-equal-to

Concat ConcatenationExtract Bit extraction

ZExt Zero extensionSExt Sign extension

Select Conditional if-then-else

Table A.1: Expression types rewritten by uc-klee.

a, b, c Constantw Width (1–64 bits)

off Constant offsetp Boolean predicate

u, v, x, y, t, f Variable

Table A.2: Constant/variable naming conventions used in this appendix.

111

APPENDIX A. SMT REWRITE RULES 112

UDiv

Input

Resu

ltCondition

UDiv(x

,a)

LShr(x

,Log2(a

))if

PowerOf2(a

)

URem

Input

Resu

ltCondition

URem(0,x)

0

URem(x

,a)

And(x

,Sub(a

,1))

ifPowerOf2(a

)

And

Input

Resu

ltCondition

And(C

oncat(x

,y),

c)

Concat(A

nd(x

,Extract(c

,width(y

),width(x

))),

And(y

,Extract(c

,0,width(y

))))

And(A

nd(x

,a),

b)

And(x

,And(a

,b))

And(x

,Not(x

))fa

lse

And(A

nd(x

,y),

And(u

,v))

And(x

,And(y

,And(u

,v)))

And(O

r(x

,y),

x)

x

And(O

r(x

,y),

y)

y

And(x

,Or(x

,y))

x

And(y

,Or(x

,y))

y

And(E

q(a

,x),

Eq(b

,x))

false

ifNot(E

q(a

,b))

And(E

q(a

,x),

Not(E

q(b

,x)))

Eq(a

,x)

ifNot(E

q(a

,b))

And(N

ot(E

q(a

,x)),Eq(b

,x))

Eq(b

,x)

ifNot(E

q(a

,b))

And(U

le(x

,y),

Ule(y

,x))

Eq(x

,y)

And(U

le(y

,x),

Not(E

q(x

,y)))

Ult

(y,x)

And(U

le(x

,y),

Not(E

q(x

,y)))

Ult

(x,y)

And(S

le(x

,y),

Sle(y

,x))

Eq(x

,y)

And(N

ot(E

q(x

,y)),Ule(x

,y))

Ult

(x,y)

And(N

ot(E

q(x

,y)),Ule(y

,x))

Ult

(y,x)

And(U

lt(x

,y),

Not(E

q(x

,y))

Ult

(x,y)

And(U

lt(x

,y),

Not(E

q(S

ub(y

,1),

x)))

And(U

lt(x

,Sub(y

,1)),Not(E

q(y

,0)))

And(U

lt(x

,y),

Not(E

q(y

,Add(x

,1))))

And(U

lt(A

dd(x

,1),

y),

Not(E

q(x

,-1)))

And(U

lt(x

,y),

Ult

(Sub(y

,2),

x))

And(E

q(x

,Sub(y

,1)),And(N

ot(E

q(y

,0)),Not(E

q(y

,1))))

And(U

lt(x

,y),

Ult

(y,Add(x

,2)))

And(E

q(y

,Add(x

,1)),And(N

ot(E

q(x

,-1)),Not(E

q(x

,-2))))

And(N

ot(E

q(x

,y)),Ult

(x,y))

Ult

(x,y)

And(N

ot(E

q(S

ub(y

,1),

x)),Ult

(x,y))

And(U

lt(x

,Sub(y

,1)),Not(E

q(y

,0)))

And(N

ot(E

q(y

,Add(x

,1))),

Ult

(x,y))

And(U

lt(A

dd(x

,1),

y),

Not(E

q(x

,-1)))

And(U

lt(S

ub(y

,2),

x),

Ult

(x,y))

And(E

q(x

,Sub(y

,1)),And(N

ot(E

q(y

,0)),Not(E

q(y

,1))))

And(U

lt(y

,Add(x

,2)),Ult

(x,y))

And(E

q(y

,Add(x

,1)),And(N

ot(E

q(x

,-1)),Not(E

q(x

,-2))))

And(U

lt(x

,y),

Ult

(y,x))

false


Input

Resu

ltCondition

And(S

lt(x

,y),

Slt

(y,x))

false

And(U

le(x

,y),

Ult

(y,Sub(x

,1)))

false

ifNot(E

q(x

,0))

And(U

lt(y

,Sub(x

,1)),Ule(x

,y))

false

ifNot(E

q(x

,0))

And(U

le(x

,y),

Ule(S

ub(y

,2),

Sub(x

,2)))

And(E

q(x

,y),

And(N

ot(E

q(y

,0)),Not(E

q(y

,1))))

ifAnd(N

ot(E

q(x

,1),

Not(E

q(x

,0))))

And(U

le(S

ub(y

,2),

Sub(x

,2)),Ule(x

,y))

And(E

q(x

,y),

And(N

ot(E

q(y

,0)),Not(E

q(y

,1))))

ifAnd(N

ot(E

q(x

,1),

Not(E

q(x

,0))))

And(U

le(x

,a),

Ule(b

,x))

false

ifUlt

(a,b)

And(U

le(b

,x),

Ule(x

,a))

false

ifUlt

(a,b)

And(U

le(a

,x),

Ule(b

,x))

Ule(M

ax(a

,b),

x)

And(U

le(a

,x),

Ule(x

,b))

Ule(x

,Min(a

,b))

And(E

q(a

,x),

Ule(b

,x))

false

ifUlt

(a,b)

And(E

q(a

,x),

Ule(b

,x))

Eq(a

,x)

ifUle(b

,a)

And(U

le(b

,x),

Eq(a

,x))

false

ifUlt

(a,b)

And(U

le(b

,x),

Eq(a

,x))

Eq(a

,x)

ifUle(b

,a)

And(E

q(a

,x),

Ule(x

,b))

false

ifUlt

(b,a)

And(E

q(a

,x),

Ule(x

,b))

Eq(a

,x)

ifUle(a

,b)

And(U

le(x

,b),

Eq(a

,x))

false

ifUlt

(b,a)

And(U

le(x

,b),

Eq(a

,x))

Eq(a

,x)

ifUle(a

,b)

And(Z

Ext(x

,w),

y)

ZExt(A

nd(x

,Extract(y

,0,width(x

))))

And(x

,ZExt(y

,w))

ZExt(A

nd(y

,Extract(x

,0,width(x

))))

And(C

oncat(x

,y),

Concat(u

,v))

Concat(A

nd(x

,u),

And(y

,v))

And(c

,Select(p,t,f))

Select(p,And(c

,t),

And(c

,f))

And(S

elect(p,t,f),

c)

Select(p,And(c

,t),

And(c

,f))

Not(A

nd(x

,y))

Or(N

ot(x

),Not(y

))

Or Input

Resu

ltCondition

Or(O

r(x

,a),

b)

Or(x

,Or(a

,b))

Or(x

,Not(x

))true

Or(O

r(x

,y),

Or(u

,v))

Or(x

,Or(y

,Or(u

,v)))

Or(A

nd(x

,y),

x)

x

Or(A

nd(x

,y),

y)

y

Or(x

,And(x

,y))

x

Or(y

,And(x

,y))

y

Or(N

ot(E

q(a

,x)),Not(E

q(b

,x)))

true

ifNot(E

q(a

,b))

Or(N

ot(E

q(a

,x)),Eq(b

,x))

Not(E

q(a

,x))

ifNot(E

q(a

,b))

Or(E

q(a

,x),

Not(E

q(b

,x))

Not(E

q(b

,x))

ifNot(E

q(a

,b))

Or(U

lt(x

,y),

Ult

(y,x))

Not(E

q(x

,y))

Or(U

lt(y

,x),

Eq(x

,y))

Ule(y

,x)

Or(U

lt(x

,y),

Eq(x

,y))

Ule(x

,y)

Or(S

lt(x

,y),

Slt

(y,x))

Not(E

q(x

,y))

Or(E

q(x

,y),

Ult

(y,x))

Ule(y

,x)


Input

Resu

ltCondition

Or(E

q(x

,y),

Ult

(x,y))

Ule(x

,y)

Or(U

le(y

,x),

Eq(y

,x))

Ule(y

,x)

Or(U

le(y

,x),

Eq(S

ub(y

,1),

x))

Or(U

le(S

ub(y

,1),

x),

Eq(y

,0))

Or(U

le(y

,x),

Eq(y

,Add(x

,1)))

Or(U

le(y

,Add(x

,1)),Eq(x

,-1))

Or(U

le(y

,x),

Ule(x

,Sub(y

,2)))

Or(N

ot(E

q(x

,Sub(y

,1))),

Or(E

q(y

,0),

Eq(y

,1)))

Or(U

le(y

,x),

Ule(A

dd(x

,2),

y))

Or(N

ot(E

q(y

,Add(x

,1))),

Or(E

q(x

,-1),

Eq(x

,-2)))

Or(E

q(y

,x),

Ule(y

,x))

Ule(y

,x)

Or(E

q(S

ub(y

,1),

x),

Ule(y

,x))

Or(U

le(S

ub(y

,1),

x),

Eq(y

,0))

Or(E

q(y

,Add(x

,1)),Ule(y

,x))

Or(U

le(y

,Add(x

,1)),Eq(x

,-1))

Or(U

le(x

,Sub(y

,2)),Ule(y

,x))

Or(N

ot(E

q(x

,Sub(y

,1))),

Or(E

q(y

,0),

Eq(y

,1)))

Or(U

le(A

dd(x

,2),

y),

Ule(y

,x))

Or(N

ot(E

q(y

,Add(x

,1))),

Or(E

q(x

,-1),

Eq(x

,-2)))

Or(U

le(x

,y),

Ule(y

,x))

true

Or(U

lt(x

,y),

Ule(S

ub(y

,1),

x))

true

ifNot(E

q(y

,0))

Or(U

le(S

ub(y

,1),

x),

Ult

(x,y))

true

ifNot(E

q(y

,0))

Or(U

lt(y

,x),

Ult

(Sub(x

,2),

Sub(y

,2)))

Or(N

ot(E

q(x

,y)),Or(E

q(y

,0),

Eq(y

,1)))

ifAnd(N

ot(E

q(x

,1)),Not(E

q(x

,0)))

Or(U

lt(S

ub(x

,2),

Sub(y

,2)),Ult

(y,x))

Or(N

ot(E

q(x

,y)),Or(E

q(y

,0),

Eq(y

,1)))

ifAnd(N

ot(E

q(x

,1)),Not(E

q(x

,0)))

Or(U

le(a

,x),

Ule(x

,b))

true

ifUle(a

,b)

Or(U

le(x

,b),

Ule(a

,x))

true

ifUle(a

,b)

Or(U

le(a

,x),

Ule(b

,x))

Ule(M

in(a

,b),

x)

Or(U

le(x

,a),

Ule(x

,b))

Ule(x

,Max(a

,b))

Or(N

ot(E

q(a

,x)),Ule(x

,b))

true

ifUle(a

,b)

Or(N

ot(E

q(a

,x)),Ule(x

,b))

Not(E

q(a

,x))

ifUlt

(b,a)

Or(U

le(x

,b),

Not(E

q(a

,x)))

true

ifUle(a

,b)

Or(U

le(x

,b),

Not(E

q(a

,x)))

Not(E

q(a

,x))

ifUlt

(b,a)

Or(N

ot(E

q(a

,x)),Ule(b

,x))

true

ifUle(b

,a)

Or(N

ot(E

q(a

,x)),Ule(b

,x))

Not(E

q(a

,x))

ifUlt

(a,b)

Or(U

le(b

,x),

Not(E

q(a

,x)))

true

ifUle(b

,a)

Or(U

le(b

,x),

Not(E

q(a

,x)))

Not(E

q(a

,x))

ifUlt

(a,b)

Or(C

oncat(x

,y),

z)

Concat(O

r(x

,Extract(z,width(y

),width(x

))),

Or(y

,Extract(z,0,width(y

))))

Or(z,Concat(x

,y))

Concat(O

r(E

xtract(z,width(y

),width(x

)),x),

Or(E

xtract(z,0,width(y

)),y))

Or(c

,Select(p,t,f))

Select(p,Or(c

,t),

Or(c

,f))

Or(S

elect(p,t,f),

c)

Select(p,Or(c

,t),

Or(c

,f))

Xor

Input

Resu

ltCondition

Xor(C

oncat(x

,y),

Concat(u

,v))

Concat(X

or(x

,u),

Xor(y

,v))

ifEq(w

idth(x

),width(u

))


Shl

Input

Resu

ltCondition

Shl(x

,0)

x

Shl(S

hl(x

,a),

b)

Shl(x

,Add(a

,b))

Shl(C

oncat(x

,y),

a)

Concat(E

xtract(x

,0,Sub(w

idth(x

),a)),Concat(y

,0))

Shl(C

oncat(x

,y),

a)

Concat(E

xtract(y

,0,Sub(A

dd(w

idth(x

),width(y

)),a)),0)

LShr

Input

Resu

ltCondition

LShr(x

,0)

x

LShr(L

Shr(x

,a),

b)

LShr(x

,Add(a

,b))

LShr(C

oncat(x

,y),

a)

Concat(0,Concat(x

,Extract(y

,a,Sub(w

idth(y

),a))))

LShr(C

oncat(x

,y),

a)

Concat(0,Extract(x

,Sub(a

,width(y

))),

Sub(A

dd(w

idth(x

),width(y

)),a))

AShr

Input

Resu

ltCondition

AShr(x

,0),

x

AShr(A

Shr(x

,a),

b)

AShr(x

,Add(a

,b))

Eq Input

Resu

ltCondition

Eq(a

,Or(x

,b))

Eq(A

nd(a

,b),

b)

Eq(0,Or(x

,y))

And(E

q(0,x),

Eq(0,y))

Eq(a

,Concat(b

,x))

And(E

q(E

xtract(a

,width(x

),width(b

),b),

Eq(E

xtract(a

,0,width(x

)),x)))

Eq(a

,Mul(b

,x))

false

ifNot(E

q(0,URem(a

,b))

Eq(0,UDiv(x

,y))

Ult

(x,y)

Eq(c

,Select(p,t,f))

Select(p,Eq(c

,t),

Eq(c

,f))

Eq(S

elect(p,t,f),

c)

Select(p,Eq(c

,t),

Eq(c

,f))

Eq(A

dd(a

,x),

Add(b

,x))

false

ifNot(E

q(a

,b))

Eq(A

dd(a

,x),

Add(b

,y))

Eq(x

,Add(S

ub(b

,a),

y))

ifUle(a

,b)

Eq(A

dd(a

,x),

Add(b

,y))

Eq(A

dd(S

ub(a

,b),

y))

ifUlt

(b,a)

Eq(A

dd(a

,x),

x)

false

ifNot(E

q(0,a))

Eq(x

,Add(a

,x))

false

ifNot(E

q(0,a))

Eq(S

hl(x

,a),

y)

false

ifNot(E

q(0,Extract(y

,0,a)))

Eq(x

,Shl(y

,a))

false

ifNot(E

q(0,Extract(x

,0,a)))


Ult Input

Resu

ltCondition

Not(U

lt(x

,y))

Ule(y

,x)

Ult

(x,x)

false

Ult

(-1,x)

false

Ult

(a,x)

Ule(A

dd(a

,1),

x)

ifNot(E

q(-1,a))

Ult

(x,0)

false

Ult

(x,a)

Ule(x

,Sub(a

,1))

ifNot(E

q(0,a))

Ule

Input

Resu

ltCondition

Not(U

le(x

,y))

Ult

(y,x)

Ule(1,x)

Not(E

q(0,x))

Ule(a

,Concat(0,x))

false

ifNot(E

q(0,Extract(a

,width(x

),

Sub(w

idth(a

),width(x

)))))

Ule(a

,Concat(0,x))

Ule(E

xtract(a

,0,width(x

)),x)

ifEq(0,Extract(a

,width(x

),

Sub(w

idth(a

),width(x

))))

Ule(x

,-1)

true

Ule(C

oncat(0,x),

a)

true

ifNot(E

q(0,Extract(a

,width(x

),

Sub(w

idth(a

),width(x

)))))

Ule(C

oncat(0,x),

a)

Ule(x

,Extract(a

,0,width(x

)))

ifEq(0,Extract(a

,width(x

),

Sub(w

idth(a

),width(x

))))

Ule(U

Rem(x

,a),

b)

true

ifUle(a

,Add(b

,1))

Ule(b

,URem(x

,a))

false

ifUle(a

,b)

Slt Input

Resu

ltCondition

Not(S

lt(x

,y))

Sle(y

,x)

Sle

Input

Resu

ltCondition

Not(S

le(x

,y))

Slt

(y,x)


Extract

Input

Resu

ltCondition

Extract(E

xtract(x

,i,u),

j,w)

Extract(x

,Add(i,j),w)

Extract(S

hl(x

,c),

off,w)

Shl(E

xtract(x

,max(0,Sub(o

ff,c)),w),

max(0,Sub(c

,off)))

Extract(L

Shr(x

,c),

off,w)

0if

Ule(S

ub(w

idth(x

),c),

off)

Extract(A

Shr(x

,c),

off,w)

0if

Ule(S

ub(w

idth(x

),c),

off)

Extract(L

Shr(x

,c),

off,w)

Extract(x

,Add(c

,off),

w)

ifUle(A

dd(c

,Add(o

ff,w)),

width(x

))

Extract(A

Shr(x

,c),

off,w)

Extract(x

,Add(c

,off),

w)

ifUle(A

dd(c

,Add(o

ff,w)),

width(x

))

Extract(Z

Ext(x

,u),

0,w)

xif

Ule(w

,width(x

))

Extract(S

Ext(x

,u),

0,w)

xif

Ule(w

,width(x

))

Extract(Z

Ext(x

,u),

0,w)

ZExt(x

,w)

ifUlt

(w,width(x

))

Extract(S

Ext(x

,u),

0,w)

SExt(x

,w)

ifUlt

(w,width(x

))

Extract(A

dd(x

,y),

off,w)

Add(E

xtract(x

,off,w),

Extract(y

,off,w))

Extract(S

ub(x

,y),

off,w)

Sub(E

xtract(x

,off,w),

Extract(y

,off,w))

Extract(A

nd(x

,y),

off,w)

And(E

xtract(x

,off,w),

Extract(y

,off,w))

Extract(O

r(x

,y),

off,w)

Or(E

xtract(x

,off,w),

Extract(y

,off,w))

Extract(X

or(x

,y),

off,w)

Xor(E

xtract(x

,off,w),

Extract(y

,off,w))

Extract(M

ul(x

,y),

off,w)

Mul(E

xtract(x

,off,w),

Extract(y

,off,w))

Select(ITE)

Input

Resu

ltCondition

Select(p1,Select(p2,t1,f),

Select(p2,Select(p1,t1,t2),

f)

Select(p2,t2,f))

Select(p1,Select(p2,t,f1)

Select(p2,t,Select(p1,f1,f2))

Select(p2,t,f2))

Select(p,Select(p,t2,f2),

f1)

Select(p,t2,f1)

Select(p,Select(N

ot(p),

t2,f2),

f1)

Select(p,f2,f1)

Select(p,t1,Select(p,t2,f2))

Select(p,t1,f2)

Select(p,t1,Select(N

ot(p),

t2,f2))

Select(p,t1,t2)

Select(p1,Select(p2,t,f),

f)

Select(A

nd(p1,p2),

t,f)

Select(p1,t,Select(p2,t,f))

Select(O

r(p1,p2),

t,f)

Bibliography

[1] Hiralal Agrawal and Joseph R. Horgan. Dynamic program slicing. In Proceedings of the

ACM SIGPLAN Conference on Programming Language Design and Implementation

(PLDI), 1990.

[2] A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools.

Addison-Wesley, Reading, Massachusetts, 1986.

[3] Alert (TA14-098A): OpenSSL ’Heartbleed’ vulnerability (CVE-2014-0160). https://

www.us-cert.gov/ncas/alerts/TA14-098A, April 2014.

[4] Saswat Anand, Corina S. Pasareanu, and Willem Visser. JPF-SE: A symbolic execu-

tion extension to Java PathFinder. In Proceedings of the International Conference on

Tools and Algorithms for the Construction and Analysis of Systems (TACAS), 2007.

[5] Andrew W. Appel and Marcelo J.R. Goncalves. Hash-consing garbage collection.

Technical Report CS-TR-415-93, Princeton University, Feb 1993.

[6] John Backes, Suzette Person, Neha Rungta, and Oksana Tkachuk. Regression veri-

fication using impact summaries. In Proceedings of the SPIN Symposium on Model

Checking of Software (SPIN), 2013.

[7] Thomas Ball and Sriram K. Rajamani. Automatically validating temporal safety

properties of interfaces. In Proceedings of the Eighth International SPIN Workshop

on Model Checking of Software (SPIN ’01), 2001.

[8] Clark W. Barrett, David L. Dill, and Jeremy R. Levitt. A decision procedure for

bit-vector arithmetic. In Proceedings of the Design Automation Conference (DAC),

1998.

118

https://www.us-cert.gov/ncas/alerts/TA14-098A

https://www.us-cert.gov/ncas/alerts/TA14-098A

BIBLIOGRAPHY 119

[9] BIND. https://www.isc.org/downloads/bind/.

[10] Nikolaj Bjørner, Nikolai Tillmann, and Andrei Voronkov. Path feasibility analysis

for string-manipulating programs. In Proceedings of the International Conference on

Tools and Algorithms for the Construction and Analysis of Systems (TACAS), 2009.

[11] Peter Boonstoppel, Cristian Cadar, and Dawson Engler. RWset: Attacking path

explosion in constraint-based test generation. In Proceedings of the International

Conference on Tools and Algorithms for the Construction and Analysis of Systems

(TACAS), 2008.

[12] Robert S. Boyer, Bernard Elspas, and Karl N. Levitt. Select – a formal system for

testing and debugging programs by symbolic execution. ACM SIGPLAN Notices,

10(6):234–45, June 1975.

[13] G. Brat, K. Havelund, S. Park, and W. Visser. Model checking programs. In IEEE

International Conference on Automated Software Engineering (ASE 2000).

[14] Derek Bruening. Efficient, Transparent, and Comprehensive Runtime Code Manipu-

lation. PhD thesis, M.I.T., 2004.

[15] Stefan Bucur, Vlad Ureche, Cristian Zamfir, and George Candea. Parallel symbolic

execution for automated real-world software testing. In Proceedings of the ACM

SIGOPS/EuroSys European Conference on Computer Systems (EuroSys), 2011.

[16] Suhabe Bugrara and Dawson R. Engler. Redundant state detection for dynamic

symbolic execution. In Proceedings of the USENIX Annual Technical Conference

(ATC), 2013.

[17] Cristian Cadar, Daniel Dunbar, and Dawson Engler. KLEE: Unassisted and automatic

generation of high-coverage tests for complex systems programs. In Proceedings of the

USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2008.

[18] Cristian Cadar and Dawson Engler. Execution generated test cases: How to make

systems code crash itself. In Proceedings of the 12th International SPIN Workshop

on Model Checking of Software (SPIN ’05), 2005.

https://www.isc.org/downloads/bind/

BIBLIOGRAPHY 120

[19] Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, and Dawson R.

Engler. EXE: automatically generating inputs of death. In Proceedings of the 13th

ACM Conference on Computer and Communications Security (CCS ’06), 2006.

[20] Andy Chou. On detecting heartbleed with static analysis. http://

security.coverity.com/blog/2014/Apr/on-detecting-heartbleed-with-

static-analysis.html, 2014.

[21] Edmund Clarke, Orna Grumberg, Somesh Jha, Yuan Lu, and Helmut Veith.

Counterexample-guided abstraction refinement. In Proceedings of the International

Conference on Computer Aided Verification (CAV), 2000.

[22] Edmund Clarke and Daniel Kroening. Hardware verification using ANSI-C programs

as a reference. In Proceedings of the Asia and South Pacific Design Automation

Conference (ASP-DAC 2003).

[23] James Clause, Wanchun Li, and Alessandro Orso. Dytan: a generic dynamic taint

analysis framework. In Proceedings of the International Symposium on Software Test-

ing and Analysis (ISSTA), 2007.

[24] Peter Collingbourne, Cristian Cadar, and Paul H. J. Kelly. Symbolic crosschecking of

floating-point and SIMD code. In Proceedings of the ACM SIGOPS/EuroSys European

Conference on Computer Systems (EuroSys), 2011.

[25] Peter Collingbourne, Cristian Cadar, and Paul H. J. Kelly. Symbolic testing of

OpenCL code. In Proceedings of the Haifa Verification Conference (HVC), 2011.

[26] James C. Corbett, Matthew B. Dwyer, John Hatcliff, Shawn Laubach, Corina S.

Pasareanu, Robby, and Hongjun Zheng. Bandera: Extracting finite-state models from

Java source code. In Proceedings of the 22nd International Conference on Software

Engineering (ICSE ’00), 2000.

[27] Heming Cui, Gang Hu, Jingyue Wu, and Junfeng Yang. Verifying systems rules using

rule-directed symbolic execution. In Proceedings of the International Conference on

Architectural Support for Programming Languages and Operating Systems (ASPLOS),

2013.

http://security.coverity.com/blog/2014/Apr/on-detecting-heartbleed-with-static-analysis.html



BIBLIOGRAPHY 121

[28] CVE-2008-1447: DNS Cache Poisoning Issue (”Kaminsky bug”). https://

kb.isc.org/article/AA-00924.

[29] CVE-2012-3868. https://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-

2012-3868, Jul 2012.


2014-0160, April 2014.


2014-0198, May 2014.


2014-3513, Oct 2014.


2015-0206, Jan 2015.


2015-0291, March 2015.


2015-0292, March 2015.

[36] Cygwin. http://cygwin.com.

[37] Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth

Zadeck. Efficiently computing static single assignment from and the control depen-

dence graph. ACM Transactions on Programming Languages and Systems, 13(4):451–

490, October 1991.

[38] Martin Davis. Computability and Unsolvability. Courier Corporation, 1958.

[39] Leonardo de Maura and Nikolaj Bjørner. Z3: An efficient SMT solver. In Proceedings

of the International Conference on Tools and Algorithms for the Construction and

Analysis of Systems (TACAS), 2008.

[40] Xianghua Deng, Jooyong Lee, and Robby. Bogor/kiasan: A k-bounded symbolic

execution for checking strong heap properties of open systems. In Proceedings of the

21st IEEE International Conference on Automated Software Engineering, 2006.

https://kb.isc.org/article/AA-00924

https://kb.isc.org/article/AA-00924

https://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2012-3868














http://cygwin.com

BIBLIOGRAPHY 122

[41] T. Dierks and E. Rescorla. RFC 5246: The Transport Layer Security (TLS) Protocol

Version 1.2. Internet Engineering Task Force (IETF), Aug 2008.

[42] Bassem Elkarablieh, Darko Marinov, and Sarfraz Khurshid. Efficient solving of struc-

tural constraints. In Proceedings of the International Symposium on Software Testing

and Analysis, 2008.

[43] Maryam Emami, Rakesh Ghiya, and Laurie J. Hendren. Context-sensitive inter-

procedural points-to analysis in the presence of function pointers. In Proceedings of the

ACM SIGPLAN Conference on Programming Language Design and Implementation

(PLDI), 1994.

[44] Dawson Engler, Benjamin Chelf, Andy Chou, and Seth Hallem. Checking system rules

using system-specific, programmer-written compiler extensions. In Proceedings of the

USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2000.

[45] Dawson Engler and Daniel Dunbar. Under-constrained execution: making automatic

code destruction easy and scalable. In Proceedings of the International Symposium

on Software Testing and Analysis (ISSTA), 2007.

[46] Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou, and Benjamin Chelf.

Bugs as deviant behavior: A general approach to inferring errors in systems code.

In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP

’01), 2001.

[47] A. Freier, P. Karlton, and P. Kocher. RFC 6101: The Secure Sockets Layer (SSL)

Protocol Version 3.0. Internet Engineering Task Force (IETF), Aug 2011.

[48] Vijay Ganesh and David L. Dill. A decision procedure for bit-vectors and arrays.

In Proceedings of the 19th International Conference on Computer Aided Verification

(CAV 2007).

[49] Milos Gligoric, Tihomir Gvero, Vilas Jagannath, Sarfraz Khurshid, Viktor Kuncak,

and Darko Marinov. Test generation through programming in UDITA. In Proceedings

of the 32nd International Conference on Software Engineering (ICSE ’10), 2010.

BIBLIOGRAPHY 123

[50] P. Godefroid. Model Checking for Programming Languages using VeriSoft. In Proceed-

ings of the 24th Annual Symposium on Principles of Programming Languages (POPL

’97), 1997.

[51] Patrice Godefroid. Micro execution. In Proceedings of the International Conference

on Software Engineering (ICSE), 2014.

[52] Patrice Godefroid, Nils Klarlund, and Koushik Sen. DART: Directed automated ran-

dom testing. In Proceedings of the ACM SIGPLAN 2005 Conference on Programming

Language Design and Implementation (PLDI), 2005.

[53] Patrice Godefroid, Michael Levin, and David Molnar. Automated whitebox fuzz

testing. In Proceedings of the Network and Distributed System Security Symposium

(NDSS), 2008.

[54] Benny Godlin and Ofer Strichman. Regression verification: proving the equivalence of

similar programs. Software Testing, Verification and Reliability, 23(3):241–258, 2013.

[55] Dan Goodin. OpenSSL warns of two high-severity bugs, but no Heartbleed. Ars

Technica, March 2015.

[56] Reed Hastings and Bob Joyce. Purify: Fast detection of memory leaks and access

errors. In Proceedings of the USENIX Winter Technical Conference (USENIX Winter

’92), 1992.

[57] Matthias Hauswirth and Trishul M. Chilimbi. Low-overhead memory leak detection

using adaptive statistical profiling. In Proceedings of the International Conference on

Architectural Support for Programming Languages and Operating Systems (ASPLOS),

2004.

[58] G. J. Holzmann. From code to models. In Proceedings of the Second International

Conference on Applications of Concurrency to System Design (ACSD ’01), 2001.

[59] Gerard J. Holzmann. The model checker SPIN. Software Engineering, 23(5):279–295,

1997.

[60] Petr Hosek and Cristian Cadar. Safe software updates via multi-version execution. In

Proceedings of the International Conference on Software Engineering (ICSE), 2013.

BIBLIOGRAPHY 124

[61] Alan J. Hu, David L. Dill, Andreas J. Drexler, and C. Han Yang. Higher-level spec-

ification and verification with BDDs. In Workshop on Computer-Aided Verification,

1992.

[62] International Telecommunication Union. ITU-T Recommendation X.680: Abstract

Syntax Notation One (ASN.1): Specification of basic notation, Nov 2008.

[63] ISO/IEC 9899:1999 - Programming languages – C, Dec 1999.

[64] Daniel Jackson and David A. Ladd. Semantic diff: A tool for summarizing the ef-

fects of modifications. In Proceedings of the International Conference on Software

Maintenance (ICSM), 1994.

[65] Wei Jin and Alessandro Orso. BugRedux: Reproducing field failures for in-house

debugging. In Proceedings of the International Conference on Software Engineering

(ICSE), 2012.

[66] Richard Jones and Paul Kelly. Backwards-compatible bounds checking for arrays and

pointers in C programs. In Proceedings of the International Workshop on Automatic

Debugging, 1997.

[67] Shane Kerr. BIND 9’s security record. https://www.isc.org/blogs/bind9-

security-record, 2013.

[68] Sarfraz Khurshid, Corina S. Pasareanu, and Willem Visser. Generalized symbolic exe-

cution for model checking and testing. In Proceedings of the International Conference

on Tools and Algorithms for the Construction and Analysis of Systems, 2003.

[69] Sarfraz Khurshid, Corina S. Pasareanu, and Willem Visser. Generalized symbolic

execution for model checking and testing. In Proceedings of the Ninth International

Conference on Tools and Algorithms for the Construction and Analysis of Systems,

2003.

[70] B. Korel and J. Laski. Dynamic program slicing. Information Processing Letters,

29(3):155–163, 1988.

[71] Ted Kremenek, Ken Ashcraft, Junfeng Yang, and Dawson Engler. Correlation ex-

ploitation in error ranking. In Proceedings of the 12th ACM SIGSOFT International

Symposium on Foundations of Software Engineering (SIGSOFT ’04/FSE-12), 2004.

https://www.isc.org/blogs/bind9-security-record

https://www.isc.org/blogs/bind9-security-record

BIBLIOGRAPHY 125

[72] Ted Kremenek and Dawson Engler. Z-ranking: Using statistical analysis to counter

the impact of static analysis approximations. In 10th Annual International Static

Analysis Symposium, 2003.

[73] Andreas Kuehlmann and Florian Krohm. Equivalence checking using cuts and heaps.

In Proceedings of the 34th Annual Design Automation Conference (DAC), 1997.

[74] Volodymyr Kuznetsov, Johannes Kinder, Stefan Bucur, and George Candea. Efficient

state merging in symbolic execution. In Proceedings of the ACM SIGPLAN Conference

on Programming Language Design and Implementation (PLDI), 2012.

[75] Shuvendu Lahiri, Chris Hawblitzel, Ming Kawaguchi, and Henrique Rebelo. SymDiff:

A language-agnostic semantic diff tool for imperative programs. In Proceedings of the

International Conference on Computer Aided Verification (CAV), 2012.

[76] Shuvendu K. Lahiri, Kenneth L. McMillan, Rahul Sharma, and Chris Hawblitzel.

Differential assertion checking. In Proceedings of the Joint Meeting on Foundations

of Software Engineering (FSE), 2013.

[77] Eric Larson and Todd Austin. High coverage detection of input-related security faults.

In Proceedings of the 12th USENIX Security Symposium (Security 2003), 2003.

[78] Chris Lattner and Vikram Adve. LLVM: A Compilation Framework for Lifelong

Program Analysis & Transformation. In Proceedings of the International Symposium

on Code Generation and Optimization (CGO), 2004.

[79] Guodong Li, Indradeep Ghosh, and Sreeranga P. Rajan. KLOVER: A symbolic exe-

cution and automatic test generation tool for C++ programs. In Proceedings of the

International Conference on Computer Aided Verification (CAV), 2011.

[80] Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan. Scal-

able statistical bug isolation. In Proceedings of the ACM SIGPLAN Conference on

Programming Language Design and Implementation (PLDI). ACM, 2005.

[81] Linux kernel. https://www.kernel.org/linux.html.

[82] Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff

Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. Pin: building

https://www.kernel.org/linux.html

BIBLIOGRAPHY 126

customized program analysis tools with dynamic instrumentation. In Proceedings of

the ACM SIGPLAN Conference on Programming Language Design and Implementa-

tion (PLDI), 2005.

[83] Paul Dan Marinescu and Cristian Cadar. High-coverage symbolic patch testing. In

Proceedings of the International SPIN Symposium on Model Checking Software, 2012.

[84] Paul Dan Marinescu and Cristian Cadar. make test-zesti: A symbolic execution solu-

tion for improving regression testing. In Proceedings of the International Conference

on Software Engineering (ICSE), 2012.

[85] Paul Dan Marinescu and Cristian Cadar. KATCH: High-coverage testing of soft-

ware patches. In Proceedings of the 9th Joint Meeting on Foundations of Software

Engineering (FSE), 2013.

[86] Darko Marinov, Alexandr Andoni, Dumitru Daniliuc, Sarfraz Khurshid, and Martin

Rinard. An evaluation of exhaustive testing for data structures. Technical report,

MIT Computer Science and Artificial Intelligence Laboratory Report MIT-LCS-TR-

921, 2003.

[87] Alan Mishchenko, Satrajit Chatterjee, Robert Brayton, and Niklas Een. Improve-

ments to combinational equivalence checking. In Proceedings of the IEEE/ACM In-

ternational Conference on Computer-Aided Design (ICCAD), 2006.

[88] Madanlal Musuvathi, David Y.W. Park, Andy Chou, Dawson R. Engler, and David L.

Dill. CMC: A pragmatic approach to model checking real code. In Proceedings of the

Fifth USENIX Symposium on Operating Systems Design and Implementation (OSDI),

2002.

[89] George C. Necula, Scott McPeak, and Westley Weimer. Ccured: type-safe retrofitting

of legacy code. In Proceedings of the Symposium on Principles of Programming Lan-

guages (POPL), 2002.

[90] Nicholas Nethercote and Julian Seward. Valgrind: a framework for heavyweight dy-

namic binary instrumentation. In Proceedings of the ACM SIGPLAN 2007 Conference

on Programming Language Design and Implementation (PLDI), 2007.

[91] Newlib. https://sourceware.org/newlib/.

https://sourceware.org/newlib/

BIBLIOGRAPHY 127

[92] Michael Newman. The economic impacts of inadequate infrastructure for software

testing. Technical report, National Institute of Standards and Technology, 2002.

[93] James Newsome and Dawn Song. Dynamic taint analysis for automatic detection,

analysis, and signature generation of exploits on commodity software. In Proceedings

of the Network and Distributed Systems Security Symposium (NDSS), 2005.

[94] OpenSSL. https://www.openssl.org/source.

[95] Nimrod Partush and Eran Yahav. Abstract semantic differencing for numerical pro-

grams. In Proceedings of the International Static Analysis Symposium (SAS), 2013.

[96] Jeff H. Perkins, Sunghun Kim, Sam Larsen, Saman Amarasinghe, Jonathan Bachrach,

Michael Carbin, Carlos Pacheco, Frank Sherwood, Stelios Sidiroglou, Greg Sullivan,

Weng-Fai Wong, Yoav Zibin, Michael D. Ernst, and Martin Rinard. Automatically

patching errors in deployed software. In Proceedings of the ACM SIGOPS Symposium

on Operating Systems Principles (SOSP), 2009.

[97] Suzette Person, Matthew B. Dwyer, Sebastian Elbaum, and Corina S. Pasareanu.

Differential symbolic execution. In Proceedings of the ACM SIGSOFT International

Symposium on Foundations of Software Engineering (FSE), 2008.

[98] Suzette Person, Guowei Yang, Neha Rungta, and Sarfraz Khurshid. Directed incre-

mental symbolic execution. In Proceedings of the ACM SIGPLAN Conference on

Programming Language Design and Implementation (PLDI), 2011.

[99] Corina S. Pasareanu and Neha Rungta. Symbolic PathFinder: Symbolic execution

of Java bytecode. In Proceedings of the IEEE/ACM International Conference on

Automated Software Engineering (ASE), 2010.

[100] Dawei Qi, Abhik Roychoudhury, and Zhenkai Liang. Test generation to expose changes

in evolving programs. In Proceedings of the IEEE/ACM International Conference on

Automated Software Engineering (ASE), 2010.

[101] David A. Ramos and Dawson R. Engler. Practical, low-effort equivalence verification

of real code. In Proceedings of the International Conference on Computer Aided

Verification (CAV), 2011.

https://www.openssl.org/source

BIBLIOGRAPHY 128

[102] Anthony Romano and Dawson Engler. Expression reduction from programs in a

symbolic binary executor. In Proceedings of the International SPIN Symposium on

Model Checking of Software, 2013.

[103] Anthony Romano and Dawson R. Engler. symMMU: symbolically executed runtime

libraries for symbolic memory access. In Proceedings of the International Conference

on Automated Software Engineering (ASE), 2014.

[104] Philipp Rummer and Thomas Wahl. An SMT-LIB theory of binary floating-point

arithmetic. In Proceedings of the International Workshop on Satisfiability Modulo

Theories (SMT), 2010.

[105] O. Ruwase and M. S. Lam. A practical dynamic buffer overflow detector. In Proceed-

ings of the 11th Annual Network and Distributed System Security Symposium (NDSS),

2004.

[106] Raimondas Sasnauskas, Olaf Landsiedel, Muhammad Hamad Alizai, Carsten Weise,

Stefan Kowalewski, and Klaus Wehrle. KleeNet: Discovering insidious interaction

bugs in wireless sensor networks before deployment. In Proceedings of the International

Conference on Information Processing in Sensor Networks (IPSN), 2010.

[107] Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, and Thomas E.

Anderson. Eraser: A dynamic data race detector for multithreaded programming.

ACM Transactions on Computer Systems, 15(4):391–411, Nov 1997.

[108] Secunia vulnerability review 2014. Technical report, Secunia, February 2014.

[109] Koushik Sen, Darko Marinov, and Gul Agha. CUTE: A concolic unit testing en-

gine for C. In Proceedings of the 13th ACM SIGSOFT International Symposium on

Foundations of Software Engineering (ESEC/FSE-13), 2005.

[110] Stephen F. Siegel, Anastasia Mironova, and George S. Avrunin. Combining symbolic

execution with model checking to verify parallel numerical programs. ACM Transac-

tions on Software Engineering and Methodology (TOSEM), 17(2), May 2008.

[111] Stephen F. Siegel and Timothy K. Zirkel. Loop invariant symbolic execution for

parallel programs. In Proceedings of the 13th International Conference on Verification,

Model Checking, and Abstract Interpretation (VMCAI), 2012.

BIBLIOGRAPHY 129

[112] Eric Whitman Smith and David L. Dill. Automatic formal verification of block cipher

implementations. In Proceedings of the 2008 International Conference on Formal

Methods in Computer-Aided Design (FMCAD ’08), 2008.

[113] John E. Stone, David Gohara, and Guochun Shi. OpenCL: A parallel programming

standard for heterogeneous computing systems. Computing in Science and Engineer-

ing, 12(3):66–73, May 2010.

[114] Joseph Tucek, Weiwei Xiong, and Yuanyuan Zhou. Efficient online validation with

delta execution. In Proceedings of the International Conference on Architectural Sup-

port for Programming Languages and Operating Systems (ASPLOS), 2009.

[115] uClibc. http://uclibc.org.

[116] Ted Unangst. Commit e76e308f (tedu): on today’s episode of things you didn’t

want to learn. http://anoncvs.estpak.ee/cgi-bin/cgit/openbsd-src/commit/

lib/libssl?id=e76e308f, Apr 2014.

[117] Nicolas Viennot, Siddharth Nair, and Jason Nieh. Transparent mutable replay for

multicore debugging and patch validation. In Proceedings of the International Con-

ference on Architectural Support for Programming Languages and Operating Systems

(ASPLOS), 2013.

[118] Willem Visser, Klaus Havelund, Guillaume Brat, SeungJoon Park, and Flavio Lerda.

Model checking programs. Automated Software Engineering, 10(2):203–232, 2003.

[119] Xi Wang, Nickolai Zeldovich, M. Frans Kaashoek, and Armando SolarLezama. To-

wards optimization-safe systems: Analyzing the impact of undefined behavior. In

Proceedings of the Symposium on Operating Systems Principles (SOSP), 2013.

[120] Yichen Xie and Alex Aiken. Context- and path-sensitive memory leak detection. In

Proceedings of the International Symposium on Foundations of Software Engineering

(FSE), 2005.

[121] Yichen Xie and Alex Aiken. Scalable error detection using boolean satisfiability.

In Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of

Programming Languages (POPL), 2005.

http://uclibc.org

http://anoncvs.estpak.ee/cgi-bin/cgit/openbsd-src/commit/lib/libssl?id=e76e308f

http://anoncvs.estpak.ee/cgi-bin/cgit/openbsd-src/commit/lib/libssl?id=e76e308f

BIBLIOGRAPHY 130

[122] Bennet Yee, David Sehr, Gregory Dardyk, J Bradley Chen, Robert Muth, Tavis Or-

mandy, Shiki Okasaka, Neha Narula, and Nicholas Fullagar. Native client: A sandbox

for portable, untrusted x86 native code. In Proceedings of the IEEE Symposium on

Security and Privacy, 2009.

[123] Cristian Zamfir and George Candea. Execution synthesis: A technique for automated

software debugging. In Proceedings of the ACM SIGOPS/EuroSys European Confer-

ence on Computer Systems (EuroSys), 2010.

under-constrained symbolic execution: correctness checking ...gc034dd8484/submit-augmented.pdf ·...

Documents