introduction to satisfiability modulo theories (smt) clark barrett, nyu sanjit a. seshia, uc...

Introduction to Satisfiability Modulo Theories

(SMT)

Clark Barrett, NYU

Sanjit A. Seshia, UC Berkeley

ICCAD Tutorial November 2, 2009

C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 2

Boolean Satisfiability (SAT)

Ç

Æ:

Ç

Æ

Ç

.

.

.

p2

p1

pn

Is there an assignment to the p1, p2, …, pn variables such that evaluates to 1?


Satisfiability Modulo Theories

Ç

Æ:

Ç

Æ

Ç

.

.

.

p2

p1

pn

Is there an assignment to the x,y,z,w variables s.t. evaluates to 1?

x + 2 z ¸ 1

x % 26 = v

w & 0xFFFF = x

x = y


Satisfiability Modulo Theories

• Given a formula in first-order logic, with associated background theories, is the formula satisfiable?– Yes: return a satisfying solution– No [generate a proof of unsatisfiability]


Applications of SMT

• Hardware verification at higher levels of abstraction (RTL and above)

• Verification of analog/mixed-signal circuits• Verification of hybrid systems• Software model checking• Software testing• Security: Finding vulnerabilities, verifying

electronic voting machines, …• Program synthesis• …


References

Satisfiability Modulo Theories Clark Barrett, Roberto Sebastiani, Sanjit A. Seshia,

and Cesare Tinelli. Chapter 8 in the Handbook of Satisfiability, Armin

Biere, Hans van Maaren, and Toby Walsh, editors, IOS Press, 2009.

(available from our webpages)

SMTLIB: A repository for SMT formulas (common format) and tools

SMTCOMP: An annual competition of SMT solvers


Roadmap for this Tutorial

• Background and Notation

• Survey of Theories

• Theory Solvers

• Approaches to SMT Solving– Lazy Encoding to SAT– Eager Encoding to SAT

• Conclusion



Background and Notation

• Survey of Theories

• Theory Solvers

• Approaches to SMT Solving– Lazy Encoding to SAT– Eager Encoding to SAT

• Conclusion


First-Order Logic

• A formal notation for mathematics, with expressions involving – Propositional symbols– Predicates– Functions and constant symbols– Quantifiers

• In contrast, propositional (Boolean) logic only involves propositional symbols and operators


First-Order Logic: Syntax

• As with propositional logic, expressions in first-order logic are made up of sequences of symbols.

• Symbols are divided into logical symbols and non-logical symbols or parameters.

• Example:

(x = y) Æ (y = z) Æ (f(z) ¸ f(x)+1)


First-Order Logic: Syntax

• Logical Symbols– Propositional connectives: Ç, Æ, :, !, $– Variables: v1, v2, . . .– Quantifiers: 8, 9

• Non-logical symbols/Parameters– Equality: =– Functions: +, -, %, bit-wise &, f(), concat, …– Predicates: ·, is_substring, …– Constant symbols: 0, 1.0, null, …


Quantifier-free Subset

• We will largely restrict ourselves to formulas without quantifiers (8, 9)

• This is called the quantifier-free subset/fragment of first-order logic with the relevant theory


Logical Theory

• Defines a set of parameters (non-logical symbols) and their meanings

• This definition is called a signature.

• Example of a signature:

Theory of linear arithmetic over integers

Signature is (0,1,+,-,·) interpreted over Z



Background and NotationSurvey of Theories

• Theory Solvers

• Two Approaches to SMT Solving– Lazy Encoding to SAT– Eager Encoding to SAT

• Conclusion


Some Useful Theories

• Equality (with uninterpreted functions)

• Linear arithmetic (over Q or Z)

• Difference logic (over Q or Z)

• Finite-precision bit-vectors – integer or floating-point

• Arrays / memories

• Misc.: Non-linear arithmetic, strings, inductive datatypes (e.g. lists), sets, …


Theory of Equality and Uninterpreted Functions (EUF)

• Also called the “free theory”– Because function symbols can take any

meaning– Only property required is congruence: that

these symbols map identical arguments to identical values i.e., x = y ) f(x) = f(y)

• SMTLIB name: QF_UF


x0x1

x2

xn-1

Data and Function Abstraction with EUF

ALU

x

f

Bit-vectors to Abstract Domain (e.g. Z)

Functional units to Uninterpreted Functions a = x Æ b = y ) f(a,b) = f(x,y)

Common Operations

1

0

x

y

p

ITE(p, x, y)

If-then-else

x

y x = y=

Test for equality

…


Hardware Abstraction with EUF

• For any Block that Transforms or Evaluates Data:– Replace with generic, unspecified function– Also view instruction memory as function

Reg.File

IF/ID

InstrMem

+4

PCID/EX

ALU

EX/WB

=

=

Rd

Ra

Rb

Imm

Op

Adat

Control Control

F 2

F1

F 3


Example QF_UF (EUF) Formula

(x = y) Æ (y = z) Æ (f(x) f(z))

Transitivity:

(x = y) Æ (y = z) ) (x = z)

Congruence:

(x = z) ) (f(x) = f(z))


Equivalence Checking of Program Fragments

int fun1(int y) { int x, z; z = y; y = x; x = z;

return x*x;}

int fun2(int y) { return y*y;} What if we use SAT to check equivalence?

SMT formula Satisfiable iff programs non-equivalent

( z = y Æ y1 = x Æ x1 = z Æ ret1 = x1*x1) Æ( ret2 = y*y ) Æ( ret1 ret2 )




return x*x;}

int fun2(int y) { return y*y;}

SMT formula Satisfiable iff programs non-equivalent

( z = y Æ y1 = x Æ x1 = z Æ ret1 = x1*x1) Æ( ret2 = y*y ) Æ( ret1 ret2 )

Using SAT to check equivalence (w/ Minisat) 32 bits for y: Did not finish in over 5 hours 16 bits for y: 37 sec. 8 bits for y: 0.5 sec.




return x*x;}


SMT formula ’

( z = y Æ y1 = x Æ x1 = z Æ ret1 = sq(x1) ) Æ( ret2 = sq(y) ) Æ( ret1 ret2 )

Using EUF solver: 0.01 sec


Equivalence Checking of Program Fragmentsint fun1(int y) { int x; x = x ^ y; y = x ^ y; x = x ^ y;

return x*x;}


Does EUF still work?

No! Must reason about bit-wise XOR.

Need a solver for bit-vector arithmetic.

Solvable in less than a sec. with a current bit-vector solver.


Finite-Precision Bit-Vector Arithmetic (QF_BV)

– Fixed width data words• Can model int, short, long, etc.

– Arithmetic operations• E.g., add/subtract/multiply/divide & comparisons• Two’s complement and unsigned operations

– Bit-wise logical operations• E.g., and/or/xor, shift/extract and equality

– Boolean connectives


Linear Arithmetic (QF_LRA, QF_LIA)

• Boolean combination of linear constraints of the form

(a1 x1 + a2 x2 + … + an xn » b)

• xi’s could be in Q or Z , » 2 {¸,>,·,<,=}

• Many applications, including:– Verification of analog circuits– Software verification, e.g., of array bounds


Difference Logic (QF_IDL, QF_RDL)

• Boolean combination of linear constraints of the form

xi - xj » cij or xi » ci

» 2 {¸,>,·,<,=}, xi’s in Q or Z• Applications:

– Software verification (most linear constraints are of this form)

– Processor datapath verification– Job shop scheduling / real-time systems– Timing verification for circuits


Arrays/Memories

• SMT solvers can also be very effective in modeling data structures in software and hardware– Arrays in programs– Memories in hardware designs: e.g.

instruction and data memories, CAMs, etc.


Theory of Arrays (QF_AX)Select and Store

• Two interpreted functions: select and store– select(A,i) Read from A at index i– store(A,i,d) Write d to A at index i

• Two main axioms:– select(store(A,i,d), i) = d– select(store(A,i,d), j) = select(A,j) for i j

• One other axiom: – (8 i. select(A,i) = select(B,i)) ) A = B


Equivalence Checking of Program Fragmentsint fun1(int y) { int x[2]; x[0] = y; y = x[1]; x[1] = x[0];

return x[1]*x[1];}


SMT formula ’’

[ x1 = store(x,0,y) Æ y1 = select(x1,1) Æ x2 = store(x1,1,select(x1,0)) Æ ret1 = sq(select(x2,1)) ] Æ( ret2 = sq(y) ) Æ( ret1 ret2 )



Background and NotationSurvey of TheoriesTheory Solvers

• Two Approaches to SMT Solving– Lazy Encoding to SAT– Eager Encoding to SAT

• Conclusion


Over to Clark…



Background and NotationSurvey of TheoriesTheory Solvers

• Approaches to SMT Solving– Lazy Encoding to SATEager Encoding to SAT

• Conclusion


Eager Approach to SMT

Key Ideas:• Small-domain encoding

– Constrain model search

• Rewrite rules• Abstraction-based

methods (eager + lazy)

Example Solvers:UCLID, STP, Spear,

Boolector, Beaver, …

Input Formula

Boolean Formula

satisfiable unsatisfiable

Satisfiability-preserving Boolean

Encoder

SAT Solver

EAGER ENCODING

SAT Solver involved in Theory Reasoning


Theories

• Eager Encoding Methods have been demonstrated for the following Theories:– Equality & Uninterpreted Functions– Integer Linear Arithmetic– Restricted Lambda expressions

• Arrays, memories, etc.

– Finite-precision Bit-Vector Arithmetic– Strings


UCLID Operation

OperationOperation– Series of Series of

transformations transformations leading to Boolean leading to Boolean formulaformula

– Each step is validity Each step is validity (satisfiability) (satisfiability) preservingpreserving

– Each step performs Each step performs optimizationsoptimizations

LambdaExpansionfor Arrays

Encoding Arithmetic

BooleanSatisfiability

InputFormula

-freeFormula

Linear/ Bitvector ArithmeticFormula

BooleanFormula

Function&

PredicateElimination

http://uclid.eecs.berkeley.edu


Rewrites: Eliminating Function Applications

– Two applications of an uninterpreted function f in a formula

– f(x1) and f(x2)

Ackermann’s Ackermann’s EncodingEncoding

f(f(xx11)) vfvf11

f(f(xx22)) vfvf22

xx11== xx2 2 vfvf1 1 = = vfvf22

Bryant, German, Velev’s Bryant, German, Velev’s EncodingEncoding

f(f(xx11)) vfvf11

f(f(xx22))

ITE(ITE(xx11== xx22, vf, vf11, vf, vf22))


Small-Domain Encoding

• Consider an SMT formula (x1, x2, …, xn) where xi 2 Di

• Small-domain encoding/Finite instantiation: Derive finite set Si ½ Di s.t. |Si| ¿ |Di|

– In some cases, Si is finite where Di is infinite

• Encode each xi to take values only in Si

– Could be done by encoding to SAT

• Example: Integer Linear Arithmetic (QF_LIA)


Solving QF_LIA is NP-complete• In NP:

– If a satisfying solution exists, then one exists within a bound d

• log d is polynomial in input size

– Expression for d [Papadimitriou, ‘82]

(n+m) ¢ (bmax+1) ¢ ( m ¢ amax ) 2m+3

– Input size:• m – # constraints • n – # variables

• bmax – largest constant (absolute value)

• amax– largest coefficient (absolute value)


Small-domain encoding / Finite Instantiation: Naïve approach

• Steps– Calculate the solution bound d– Encode each integer variable with d log d e

bits & translate to Boolean formula– Run SAT solver

• Problem: For QF_LIA, d is ( m m ) – ( m log m ) bits per variable

• Solution: Exploit special-cases and domain-specific structure


Special Case 1: Equality Logic

• Linear constraints are equalities xi = xj • Result: d = n

xx11 xx22 ÆÆ xx22 xx33 ÆÆ xx11 xx33

3-valued domain is needed: {1, 2, 3}3-valued domain is needed: {1, 2, 3}

xx11 xx22 ÆÆ xx22 xx33 ÆÆ xx11 xx33

Can find solution with domain {1, 2}Can find solution with domain {1, 2}

[Pnueli et al., Information and Computation, 2002]


Special Case 2: Difference Logic

• Boolean combination of difference-bound constraints

– xi ¸ xj + b, § xi ¸ b

• Result: d = n ¢ (bmax + 1) [Bryant, Lahiri, Seshia, CAV’02]

• Proof sketch: satisfying solution corresponds to shortest path in constraint graph– Longest such path has length · n ¢ (bmax + 1)

• Tighter formula-specific bounds possible


Special Case 3: Generalized 2SAT

• Generalized 2SAT constraints– xi + xj ¸ b, - xi - xj ¸ b, xi - xj ¸ b, xi ¸ b

• d = 2 ¢ n ¢ (bmax + 1) [Seshia, Subramani,

Bryant,’04]


Full Integer Linear Arithmetic

• Can we avoid the mm blow-up?

• In fact, yes. The idea is to derive a new parameterized solution bound d– Formalize parameters that the bound really

depends on– Parameters characterize sparse structure

• Occurs especially in software verification; also in many high-level hardware models

– [Seshia & Bryant, LICS’04, LMCS’05]


Structure of Linear Constraints in Software Verification

• Characteristics of studied benchmarks– Mostly difference constraints

• Only 3% of constraints were NOT difference constraints

– Non-difference constraints are sparse• At most 6 variables per constraint (total number of

variables in 1000s)

• Some similar observations: Pratt’77, ESC/Java-Simplify-TR’03


Parameterized Solution Bound

m #constraints

n #variables

bmax max |constant|

amax max |coefficient|

New parameters: New parameters: – kk non-difference constraints, non-difference constraints, – ww variables per constraint (width) variables per constraint (width)

Our solution bound:Our solution bound:

n n ¢¢ ( (bbmaxmax+1) +1) ¢¢ ( ( ww ¢¢ aamaxmax ) ) kk

Previous:Previous:

((nn++mm) ) ¢¢ ( (bbmaxmax+1) +1) ¢¢ ( ( mm ¢¢ aamaxmax ) ) 22mm+3+3

• Direct dependence on Direct dependence on mm eliminated eliminated (and (and kk ¿¿ mm ) )


Example

x1 - x2 ¸ 1

Ç

Æ

:

Ç

x1 + 2 x2 + x3 > -3

x2 – x4 ¸ 0

m #constraints 3

k #non-difference 1

n #variables 4

w width 3

bmax max |constant| 3

amax max |coefficient| 2

d = d = 9696

PreviousPrevious d d = = 282,175,488282,175,488


Summary of d Values

Logic Solution Bound d

Equality logic n

Difference logic n ¢ ( bmax + 1 )

Generalized 2SAT logic

2 ¢ n ¢ ( bmax + 1 )

Full Integer Linear Arithmetic

n ¢ (bmax + 1) ¢ (amaxk ¢ w

k)


Abstraction-Based Methods

• For some logics, one cannot easily compute a closed-form expression for the small domain

• Example: Bit-Vector Arithmetic

• In such cases, an abstraction-refinement approach can be used to compute formula-specific small domains


Bit-Vector Arithmetic: Some History• B.C. (Before Chaff)

– String operations (concatenate, field extraction)– Linear arithmetic with bounds checking– Modular arithmetic

• SAT-Based “Bit Blasting” – Generate Boolean circuit based on bit-level behavior of

operations• Handles arbitrary operations

– Check with best available SAT solver– Effective in many applications

• CBMC [Clarke, Kroening, Lerda, TACAS ’04]• Microsoft Cogent + SLAM [Cook, Kroening, Sharygina, CAV ’05]


Research Challenge

• Is there a better way than bit blasting?• Requirements

– Provide same functionality as with bit blasting• Must support all bit-vector operators

– Exploit word-level structure– Improve on performance of bit blasting

• Current Approaches based on two core ideas:1. Simplification: Simplify input formula using word-level

rewrite rules and solvers

2. Abstraction: Can use automatic abstraction-refinement to solve simplified formula


Bit-Vector SMT Solvers, circa Spr.’2009

Current Techniques with Sample Tools – Proof-based abstraction-refinement – UCLID [Bryant et al.,

TACAS ’07]

– Solver for linear modular arithmetic to simplify the formula – STP [Ganesh & Dill, CAV’07]

– Automatic parameter tuning for SAT– Spear [Hutter et al., FMCAD ’07]

– Rewrites, underapproximation, efficient SAT engine – Boolector [Brummayer & Biere, TACAS’09]

– Equality/constant propagation, logic optimization, special rules for non-linear ops - Beaver [Jha et al., CAV’09]

– DPLL(T) framework: Layered approach, rewriting – CVC3 [Barrett et al.], MathSAT [Bruttomesso et al], Yices [Dutertre et al.], Z3 [de Moura et al]


Abstraction-Refinement• Deciding Bit-Vector Arithmetic with Abstraction

[Bryant et al., TACAS ’07, STTT ’09]– Use bit blasting as core technique– Apply to simplified versions of formula: under and over

approximations– Generate successive approximations until a solution is

found or formula shown unsatisfiable– Inspired by McMillan & Amla’s proof-based abstraction

for finite-state model checking

• Small Motivating Example:

(x + y y + x) Æ (x * y y * x)– Sufficient to prove the left-hand conjunct unsat


Approximations to Formula

• Example Approximation Techniques– Underapproximating

• Restrict word-level variables to smaller ranges of values

– Overapproximating• Replace subformula with Boolean variable

Original Formula

+Overapproximation + More solutions:

If unsatisfiable, then so is

Underapproximation−

−

Fewer solutions:Satisfying solution also satisfies


Starting Iterations

• Initial Underapproximation– (Greatly) restrict ranges of word-level variables– Intuition: Satisfiable formula often has small-domain solution

1−


First Half of Iteration

• SAT Result for 1−

– Satisfiable• Then have found solution for

– Unsatisfiable

• Use UNSAT proof to generate overapproximation 1+

1−If SAT, then done

1+

UNSAT proof:generate overapproximation


Second Half of Iteration

• SAT Result for 1+

– Unsatisfiable: then have shown unsatisfiable– Satisfiable: solution indicates variable ranges that must

be expanded• Generate refined underapproximation

1−

If UNSAT, then done1+

SAT:Use solution to generate refined underapproximation

2−


Example

:= (x = y+2) Æ (x2 > y2)

1− := (x[1] = y[1]+2) Æ (x[1]2 > y[1]

2)

2− := (x[2] = y[2]+2) Æ (x[2]2 > y[2]

2)

1+ := (x = y+2)

SAT, done.

UNSATLook at proof

SATx = 2, y = 0


Iterative Behavior

• Underapproximations– Successively more precise

abstractions of – Allow wider variable ranges

• Overapproximations– No predictable relation– UNSAT proof not unique

1−

1+

2−

k−

2+

k+


Overall Effect

• Soundness– Only terminate with solution on

underapproximation– Only terminate as UNSAT on

overapproximation

• Completeness– Successive underapproximations

approach – Finite variable ranges guarantee

termination

• In worst case, get k−

SAT

UNSAT

1−

1+

2−

k−

2+

k+



Background and NotationSurvey of TheoriesTheory SolversApproaches to SMT Solving

– Lazy Encoding to SAT– Eager Encoding to SAT

Conclusion


Summary of Ideas: Modeling

• Philosophy: Model systems in first-order logic + suitable theories

• Widely-used theories:– Equality and uninterpreted functions– Linear arithmetic– Bit-vector arithmetic– Arrays

C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 62C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 62

Summary of Ideas: Lazy Methods

• Philosophy: Extend DPLL framework from SAT to SMT

• Literals assigned by SAT are sent to Theory Solver

• Theory Solver determines if literals are satisfiable in the theory

• Key optimizations: small explanations, early conflict detection, theory propagation


Summary of Ideas: Eager Methods

• Philosophy: Constrain solution space with logic-specific methods

• Small-domain encoding– Compute bounds that work for any formula in

the logic

• Abstraction-refinement of domains– Compute formula-specific small domains

• Rewrite rules: high level and bit level– Simplify formula before and after bit-blasting


Challenges and Opportunities

• Solvers for new theories– Strings– Non-linear arithmetic– Can we exploit domain-specific structure?

• Parallel SMT

• Better support for quantifiers

• Better proof/interpolant generation


Join the SMT Community

• We need your new, exciting applications!

• Contribute to SMT-LIB

• Create new solvers, compete in SMTCOMP

Slides and book chapter available on our websites:

Clark: http://cs.nyu.edu/~barrett

Sanjit: http://www.eecs.berkeley.edu/~sseshia

introduction to satisfiability modulo theories (smt) clark barrett, nyu sanjit a. seshia, uc...

Documents