the eager approach to smt - eecs at uc berkeleysseshia/219c/spr11/lectures/... · the eager...
TRANSCRIPT
1
The Eager Approach to SMT
Sanjit A. Seshia
UC Berkeley
Slides based on ICCAD ’09 Tutorial
S. A. Seshia 2
Eager Approach to SMT
Key Ideas:• Small-domain encoding
– Constrain model search
• Rewrite rules• Abstraction-based
methods (eager + lazy)
Example Solvers:UCLID, STP, Spear,
Boolector, Beaver, …
Input Formula
Boolean Formula
satisfiable unsatisfiable
Satisfiability-preserving Boolean Encoder
SAT Solver
EAGER ENCODING
SAT Solver involved in Theory Reasoning
2
S. A. Seshia 3
Theories
• Eager Encoding Methods have been demonstrated for the following Theories:– Equality & Uninterpreted Functions
– Integer Linear Arithmetic
– Restricted Lambda expressions • Arrays, memories, etc.
– Finite-precision Bit-Vector Arithmetic
– Strings
S. A. Seshia 4
UCLID Operation
OperationOperation–– Series of Series of
transformations transformations leading to Boolean leading to Boolean formulaformula
–– Each step is validity Each step is validity (satisfiability) (satisfiability) preservingpreserving
–– Each step performs Each step performs optimizationsoptimizations
LambdaExpansionfor Arrays
Encoding Arithmetic
BooleanSatisfiability
InputFormula
-freeFormula
Linear/ BitvectorArithmeticFormula
BooleanFormula
Function&
PredicateElimination
http://uclid.eecs.berkeley.edu
3
S. A. Seshia 5
Rewrites: Eliminating Function Applications
– Two applications of an uninterpreted function f in a formula
– f(x1) and f(x2)
AckermannAckermann’’s s EncodingEncoding
f(f(xx11)) vfvf11
f(f(xx22)) vfvf22
xx11== xx2 2 vfvf1 1 = = vfvf22
Bryant, German, Bryant, German, VelevVelev’’ssEncodingEncoding
f(f(xx11)) vfvf11
f(f(xx22))
ITE(ITE(xx11== xx22, vf, vf11, vf, vf22))
S. A. Seshia 6
The Small Model Property
• A Theory is said to have the small-model property if, given a formula , is satisfiable over the original domain D iffit is satisfiable over a domain S, where S is a function of , and |S| < |D|.
• Example: Integer Linear Arithmetic (QF_LIA)
4
S. A. Seshia 7
Small-Domain Encoding
• Consider an SMT formula (x1, x2, …, xn)where xi ∈ Di
• Small-domain encoding/Finite instantiation: Derive finite set Si ⊂ Di s.t. |Si| ¿ |Di|– In some cases, Si is finite where Di is infinite
• Encode each xi to take values only in Si
– Could be done by encoding to SAT
S. A. Seshia 8
Solving QF_LIA is NP-complete
• In NP:– If a satisfying solution exists, then one exists
within a bound d• log d is polynomial in input size
– Expression for d [Papadimitriou, ‘82]
(n+m) · (bmax +1) · ( m · amax ) 2m+3
– Input size:• m – # constraints
• n – # variables
• bmax – largest constant (absolute value)
• amax – largest coefficient (absolute value)
5
S. A. Seshia 9
Small-domain encoding / Finite Instantiation: Naïve approach
• Steps– Calculate the solution bound d– Encode each integer variable with d log d e bits
& translate to Boolean formula
– Run SAT solver
• Problem: For QF_LIA, d is ( m m )– ( m log m ) bits per variable
• Solution: Exploit special-cases and domain-specific structure
S. A. Seshia 10
Special Case 1: Equality Logic
• Linear constraints are equalities xi = xj
• Result: d = n
xx11 xx22 ∧∧ xx22 xx33 ∧∧ xx11 xx33
33--valued domain is needed: {1, 2, 3}valued domain is needed: {1, 2, 3}
xx11 xx22 ∧∧ xx22 xx33 ∧∧ xx11 xx33
Can find solution with domain {1, 2}Can find solution with domain {1, 2}
[Pnueli et al., Information and Computation, 2002]
6
S. A. Seshia 11
Special Case 2: Difference Logic
• Boolean combination of difference-bound constraints– xi ≥ xj + b, ± xi ≥ b
• Result: d = n · (bmax + 1)[Bryant, Lahiri, Seshia, CAV’02]
• Proof sketch: satisfying solution corresponds to shortest path in constraint graph– Longest such path has length · n · (bmax + 1)
• Tighter formula-specific bounds possible
S. A. Seshia 12
Special Case 3: Generalized 2SAT
• Generalized 2SAT constraints– xi + xj ≥ b, - xi - xj ≥ b, xi - xj ≥ b, xi ≥ b
• d = 2 · n · (bmax + 1) [Seshia, Subramani, Bryant,’04]
7
S. A. Seshia 13
Full Integer Linear Arithmetic
• Can we avoid the mm blow-up?
• In fact, yes. The idea is to derive a new parameterized solution bound d– Formalize parameters that the bound really
depends on
– Parameters characterize sparse structure • Occurs especially in software verification; also in
many high-level hardware models
– [Seshia & Bryant, LICS’04, LMCS’05]
S. A. Seshia 14
Structure of Linear Constraints in Software Verification
• Characteristics of studied benchmarks– Mostly difference constraints
• Only 3% of constraints were NOT difference constraints
– Non-difference constraints are sparse• At most 6 variables per constraint (total number of
variables in 1000s)
• Some similar observations: Pratt’77, ESC/Java-Simplify-TR’03
8
S. A. Seshia 15
Parameterized Solution Bound
max |coefficient|
amax
max |constant|bmax
#variablesn
#constraintsm
New parameters: New parameters: –– kk nonnon--difference constraints, difference constraints,
–– ww variables per constraint (width)variables per constraint (width)
Our solution bound:Our solution bound:
n n ·· ((bbmaxmax +1) +1) ·· ( ( ww ·· aamaxmax ) ) kk
Previous:Previous:((nn++mm) ) ·· ((bbmaxmax +1) +1) ·· ( ( mm ·· aamaxmax ) ) 22mm+3+3
•• Direct dependence on Direct dependence on mm eliminatedeliminated(and (and kk ¿¿ mm ))
S. A. Seshia 16
Example
x1 - x2 ≥ 1
∨
∧
¬
∨
x1 + 2 x2 + x3 > -3
x2 – x4≥ 0
3widthw
1#non-differencek
2max |coefficient|amax
max |constant|
#variables
#constraints
3bmax
4n
3m
d = d = 9696
PreviousPrevious d d = = 282,175,488282,175,488
9
S. A. Seshia 17
Summary of d Values
n · (bmax + 1) · (amaxk · w k)Full Integer Linear
Arithmetic
2 · n · ( bmax + 1 )Generalized 2SAT logic
n · ( bmax + 1 )Difference logic
nEquality logic
Solution Bound dLogic
– 18 –
Proof of Our Bound: StepsProof of Our Bound: Steps
1.1. Previous result for integer linear programming Previous result for integer linear programming (ILP) (ILP) by by BoroshBorosh--TreybigTreybig--FlahiveFlahive [[’’76, 76, ’’86]86]
2.2. Express above result in Express above result in kk and and ww, in addition to , in addition to other parametersother parameters
3.3. Derive QFP bound from ILP boundDerive QFP bound from ILP bound
10
– 19 –
Integer Linear Programming (ILP) NotationInteger Linear Programming (ILP) Notation
A xA x = = b b , , xx ≥≥ 00
m
n
amax = maxi,j |aij | bmax = maxi |bi |
– 20 –
Borosh-Treybig-Flahive Result [1986]Borosh-Treybig-Flahive Result [1986]
Solution bound Solution bound dd is is
((nn+2) +2) ··
wherewhere = largest sub= largest sub--determinant of [determinant of [AA | | bb] ] (abs. value)(abs. value)
Problem: Exponentially many subProblem: Exponentially many sub--determinants!determinants!
11
– 21 –
Matrix StructureMatrix Structure
m
n
Non-difference constraintsk
w non-zeroes per row
– 22 –
k = 0 : Only Difference Constraintsk = 0 : Only Difference Constraints
xxii -- xxjj ≥≥ bb, , ±± xxii ≥≥ bb
Totally Unimodular: All subdeterminants are in {0, -1, +1}
· i | bi | · min(n+1, m) · bmax
12
– 23 –
Arbitrary kArbitrary k
· w
· k
Det. ∈{0,±1}
Each term · amaxk #Terms · w k
· i | bi | (amaxk · w k)
· min(n+1, m) · bmax · (amaxk · w k)
– 24 –
Bound for ILPBound for ILP
·· min(min(nn+1, +1, mm) ) ·· bbmaxmax ·· ((aamaxmaxkk ·· w w kk))
dd = (= (nn+2) +2) ·· [[BoroshBorosh--TreybigTreybig--FlahiveFlahive]]
= (= (nn+2) +2) ·· min(min(nn+1, +1, mm) ) ·· bbmaxmax ·· ((aamaxmaxkk ·· w w kk))
·· ((nn+2) +2) ·· nn ·· bbmaxmax ·· ((aamaxmaxkk ·· w w kk))
(assuming (assuming mm ·· nn))
13
– 25 –
QFP Bound from ILP BoundQFP Bound from ILP Bound
Consider DNF of arbitrary QFP formula Consider DNF of arbitrary QFP formula = = 11 ∨∨ 22 ∨∨ . . . . . . ∨∨ NN
Satisfying assignment to Satisfying assignment to must satisfy some must satisfy some ii
Each Each ii is an ILP is an ILP –– Parameters of Parameters of ii are bounded by those ofare bounded by those of
Therefore: Therefore: d = d = ((nn+2) +2) ·· nn ·· bbmaxmax ·· ((aamaxmaxkk ·· w w kk))
– 26 –
Summary of d ValuesSummary of d Values
((nn+2) +2) ·· n · (bmax + 1) · (amaxk · w k)QuantifierQuantifier--Free Free
PresburgerPresburger logiclogic
2 2 ·· nn ·· ( ( bbmaxmax + 1 )+ 1 )Generalized 2SAT logicGeneralized 2SAT logic
nn ·· ( ( bbmaxmax + 1 )+ 1 )Separation logicSeparation logic
nnEquality logicEquality logic
ddLogicLogic
Note: A paper by Sergei Veselov proved that the bound of Borosh-Treybig-Flahive has been improved from (n+2) · to simply . This eliminates the (n+2) term from the last entry in the table above.
14
S. A. Seshia 27
Abstraction-Based Methods
• For some logics, one cannot easily compute a closed-form expression for the small domain
• Example: Bit-Vector Arithmetic
• In such cases, an abstraction-refinement approach can be used to compute formula-specific small domains
S. A. Seshia 28
Bit-Vector Arithmetic: Some History
• B.C. (Before Chaff)– String operations (concatenate, field extraction)
– Linear arithmetic with bounds checking
– Modular arithmetic
• SAT-Based “Bit Blasting”– Generate Boolean circuit based on bit-level behavior of
operations• Handles arbitrary operations
– Check with best available SAT solver
– Effective in many applications• CBMC [Clarke, Kroening, Lerda, TACAS ’04]
• Microsoft Cogent + SLAM [Cook, Kroening, Sharygina, CAV ’05]
15
S. A. Seshia 29
Research Challenge
• Is there a better way than bit blasting?
• Requirements– Provide same functionality as with bit blasting
• Must support all bit-vector operators
– Exploit word-level structure
– Improve on performance of bit blasting
• Current Approaches based on two core ideas:1. Simplification: Simplify input formula using word-level
rewrite rules and solvers
2. Abstraction: Can use automatic abstraction-refinement to solve simplified formula
S. A. Seshia 30
Bit-Vector SMT Solvers, circa Spr.’2009
Current Techniques with Sample Tools – Proof-based abstraction-refinement – UCLID [Bryant et al.,
TACAS ’07]
– Solver for linear modular arithmetic to simplify the formula –STP [Ganesh & Dill, CAV’07]
– Automatic parameter tuning for SAT– Spear [Hutter et al., FMCAD ’07]
– Rewrites, underapproximation, efficient SAT engine –Boolector [Brummayer & Biere, TACAS’09]
– Equality/constant propagation, logic optimization, special rulesfor non-linear ops - Beaver [Jha et al., CAV’09]
– DPLL(T) framework: Layered approach, rewriting – CVC3[Barrett et al.], MathSAT [Bruttomesso et al], Yices [Dutertre et al.], Z3 [de Moura et al]
16
S. A. Seshia 31
Abstraction-Refinement• Deciding Bit-Vector Arithmetic with Abstraction
[Bryant et al., TACAS ’07, STTT ’09]
– Use bit blasting as core technique
– Apply to simplified versions of formula: under and over approximations
– Generate successive approximations until a solution is found or formula shown unsatisfiable
– Inspired by McMillan & Amla’s proof-based abstraction for finite-state model checking
• Small Motivating Example:
(x + y y + x) ∧ (x * y y * x)– Sufficient to prove the left-hand conjunct unsat
S. A. Seshia 32
Approximations to Formula
• Example Approximation Techniques
– Underapproximating• Restrict word-level variables to smaller ranges of values
– Overapproximating• Replace subformula with Boolean variable
Original Formula
+Overapproximation + More solutions:
If unsatisfiable, then so is
Underapproximation−
−
Fewer solutions:Satisfying solution also satisfies
17
S. A. Seshia 33
Starting Iterations
• Initial Underapproximation
– (Greatly) restrict ranges of word-level variables
– Intuition: Satisfiable formula often has small-domain solution
1−
S. A. Seshia 34
First Half of Iteration
• SAT Result for 1−
– Satisfiable
• Then have found solution for – Unsatisfiable
• Use UNSAT proof to generate overapproximation 1+
1−If SAT, then done
1+
UNSAT proof:generate overapproximation
18
S. A. Seshia 35
Second Half of Iteration
• SAT Result for 1+
– Unsatisfiable: then have shown unsatisfiable
– Satisfiable: solution indicates variable ranges that must be expanded
• Generate refined underapproximation
1−
If UNSAT, then done1+
SAT:Use solution to generate refined underapproximation
2−
S. A. Seshia 36
Example
:= (x = y+2) ∧ (x2 > y2)
1− := (x[1] = y[1]+2) ∧ (x[1]2 > y[1]
2)
2− := (x[2] = y[2]+2) ∧ (x[2]2 > y[2]
2)
1+ := (x = y+2)
SAT, done.
UNSATLook at proof
SATx = 2, y = 0
19
S. A. Seshia 37
Iterative Behavior
• Underapproximations– Successively more precise
abstractions of – Allow wider variable ranges
• Overapproximations– No predictable relation
– UNSAT proof not unique
1−
1+
2−
k−
2+
k+
S. A. Seshia 38
Overall Effect
• Soundness– Only terminate with solution on
underapproximation
– Only terminate as UNSAT on overapproximation
• Completeness– Successive underapproximations
approach – Finite variable ranges guarantee
termination
• In worst case, get k−
SAT
UNSAT
1−
1+
2−
k−
2+
k+
20
S. A. Seshia 39
Summary of Key Ideas
S. A. Seshia 40C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 40
Summary of Ideas: Lazy Methods
• Philosophy: Extend DPLL framework from SAT to SMT
• Literals assigned by SAT are sent to Theory Solver
• Theory Solver determines if literals are satisfiable in the theory
• Key optimizations: small explanations, early conflict detection, theory propagation
21
S. A. Seshia 41
Summary of Ideas: Eager Methods
• Philosophy: Constrain solution space with theory-specific solution bounds
• Small-domain encoding– Compute bounds that work for any formula in
the logic
• Abstraction-refinement of domains– Compute formula-specific small domains
• Rewrite rules: high level and bit level– Simplify formula before and after bit-blasting
S. A. Seshia 42
Challenges and Opportunities
• Solvers for new theories– Strings
– Non-linear arithmetic
– Can we exploit domain-specific structure?
• Parallel SMT
• Better support for quantifiers
• Better proof/interpolant generation