ibm watson research center cpaior-2013 workshop: seeking feasibility in combinatorial problems ©...

29
IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts in SAT Solvers Ashish Sabharwal

Upload: joelle-joynt

Post on 15-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts

IBM Watson Research Center

CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation

Branching Strategies and Restartsin SAT Solvers

Ashish Sabharwal

Page 2: IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts

IBM Research

© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems

Talk Outline

The SAT Problem, SAT Solvers

Conflict-Driven Systematic SAT Solvers

– Dramatic Progress

– Contrast with CP/MIP solvers

– “Everything” influenced by Learned Clauses and Conflict Analysis

Traditional Branching Heuristics

CDCL Solvers: Dynamic Heuristics and Associated Techniques

– Clause Learning

– Lazy Data Structures

– VSIDS Variable Selection Heuristic

– Restarts

Summary

2

Page 3: IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts

IBM Research

© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems

SAT: Problem and Solvers

3

Page 4: IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts

IBM Research

© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems

Boolean Satisfiability (SAT) : Basics

Variables with Boolean domain {T,F} or, equivalently, {1,0}

Constraints specified in the Conjunctive Normal Form (CNF)

E.g. (a or b) and (c or d or f) and (a or c or d)

SAT Solver: An algorithm (typically with an implementation) that, given a CNF formula F, finds a satisfying assignment for F, if there is one

– Complete SAT Solver: must terminate and output “unsatisfiable”, if F is unsat.

Dozens of (mostly Open Source) SAT Solvers available on the Internet

– 35+ solvers participated in SAT Competition 2006

– 65+ in 2011

4

clause = disjunction of literals

a : variablea, a : literals

Page 5: IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts

IBM Research

© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems

SAT Solvers: 3 Dominant Approaches

Local Search based stochastic algorithms

– Incomplete (do not prove unsatisfiability)

– Very effective on satisfiable Random instances, esp. near phase transition

5

Look-Ahead Based systematic solvers

– Complete search with careful selection of variables/values to branch on• Spend time exploring “reduction” in complexity with various branching possibilities• Local Learning: some local inference within a subtree is learned as “implication arrays”• Theoretical Studies: autarkies, “reduction” measures based on probability distributions

– Very effective on unsatisfiable Random instances

– Also effective on Crafted and some Industrial instances

Conflict Directed Clause Learning (CDCL) solvers

– Complete search producing General Resolution proofs of unsatisfiability

– Very effective on Industrial instances, esp. large and highly interconnected ones

Page 6: IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts

IBM Research

© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems

Look-Ahead vs. CDCL SAT Solvers

Complementary Regimes of Strength

– Plot shows dominating solver on Crafted and Industrial instances

+ March: typical Look-Ahead solver

Minisat: typical CDCL solver

6

Low ConstraintDensity

Credit: Heule & van Maaren,Handbook of SAT

Low Diameterof Resolution Graph

[two clauses have an edgeif they clash in 1 literal]

Page 7: IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts

IBM Research

© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems

CDCL SAT Solvers

7

Page 8: IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts

IBM Research

© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems

Systematic SAT Solvers as Search EnginesDramatic Progress in 20 Years

Started out with ~100 vars, ~200 constraints in early 1990’s

Now often easily handle over 1M vars, ~5M constraints

– Instances with 30M clauses being used in competitions!

Was it all just Moore’s Law? It helped, but not much…

– 2x faster computer does not solve 2x larger SAT instance

– Search difficulty does not scale linearly with problem size!

Key Development Drivers

– Academic: “Open” SAT Competitions, Races, and Challenges:Germany ’89, Dimacs ’93, China ’96, SAT-2002, …, SAT-2013

– Industrial: Verification: Backend of Model Checkers, SMT solvers

– Applications to Test Pattern generation, Optimal Control, ProtocolDesign, Routers, Cryptography, E-Commerce (E-auctions &electronic trading agents), Bioinformatics (Haplotype Inference), etc

8

Page 9: IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts

IBM Research

© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems

SAT vs. CP/MIP Search: A Contrast

SAT Solvers, esp. CDCL Solvers, work in a very different setting and with very different design principles/goals:

Blackbox approach

– No notion of designing custom search / decompositions as in CP Opt. or CPLEX

– Expected to work “out of the box” with perhaps a little parameter tuning

Very little structure available to exploit

– Binary domains, CNF form – very “flat” representation

– Advantage: Standardization, Competitions, Simplicity

No objective function to estimate for guidance or use to assess progress

– Number of unsatisfied clauses can be a highly misleading indicator

Reliance on LOTS of branching, backtracking, learning, restarting, …all performed extremely fast. How fast?

9

But Note: 1M variable CNF formula given to CP or MIP solver will not fly

Page 10: IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts

IBM Research

© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems

SAT Solvers as Fast Search Engines CDCL SAT solvers have become really efficient at searching fast

E.g., on an IBM model checking instance from SAT Race 2006, with ~170k variables, 725k clauses, solvers such as Minisat and Rsat roughly

– Make 2000-5000 decisions/second

– Deduce 600-1000 conflicts/second

– Learn 600-1000 clauses/second (#clauses grows rapidly)

– Restart every 1-2 seconds (aggressive restarts)

Leading solvers such as Glucose have pushed Restarts even further

– Extremely aggressive restarts!

– Rely on techniques such as phase saving, “intelligent” clause deletion (based on LBD level), and dynamic context-based freezing of restarts to achieve success

10

Page 11: IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts

IBM Research

© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems

SAT vs. CP/MIP: Branching “Tree” Structure

CP & MIP solvers traditionally explore a well-defined underlying search tree, albeit in different heuristic orders

– CP: typically binary/multi-way tree with DFS or LDS exploration order

– MIP: typically best-first style tree search with a frontier of Open nodes and “diving” to obtain feasible solutions quickly

Modern CDCL SAT solvers very far from building a traditional search tree!

– Branching is “uneven”

– Restarts are extremely frequent(context is retained using various techniques)

11

X=0 X=1 X=0 Y=1

• Under current context,X=0 UP Y=0

• Y is 1-UIP variablein last conflict analysis

• Note: Y=1 X=1but not necessarilyY=1 UP X=1

X=1

Normal:

“smaller”

?

Page 12: IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts

IBM Research

© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems

The Importance of Learned Clauses

“Everything” is influenced by Conflict Analysis and Learned Clauses!

– No need to “flip” value of the branched upon variable: 1-UIP learned clause automatically implies flipped value of the 1-UIP literal

– Enablement of Aggressive Restarts• Safe, as context is preserved by learned clauses

– Conflict-directed Backjumping

– Necessity of Lazy Data Structures due to ~1000 clauses learned per second• Fast but with a drawback: incomplete knowledge of current state of all clauses• E.g., can no longer determine how many clauses are not yet satisfied!

– Branching heuristic: typical state-based heuristics cannot be computed anymore with lazy data structures: missing information about current state of all clauses• VSIDS and variations for variable selection (more later)

12

Page 13: IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts

IBM Research

© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems

It Wasn’t Always the Case…Traditional, State-Dependent

and History-IndependentHeuristics in SAT

13

Page 14: IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts

IBM Research

© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems

Traditional Branching Heuristics

SAT Solvers, before Clause Learning became a must-have, had many variations of state-dependent heuristics similar to CSP solvers, e.g.:

1. DLCS: Dynamic Largest Combined Sum

maximize CP(x) + CN(x) (#unresolved clauses with literal x

occurring pos and neg, resp.)

2. DLIS: Dynamic Largest Individual Sum

maximize max {CP(x), CN(x)}

14

Page 15: IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts

IBM Research

© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems

Traditional Branching Heuristics… contd.

3. BOHM [Buro & Klein-Buning, 1992]

lexicographically-maximize

where

Intuitively, satisfy most small clauses or further reduce their size

15

#unresolved size-i clausescontaining literal x

Page 16: IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts

IBM Research

© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems

Traditional Branching Heuristics… contd.

4. MOMS: Maximum Occurrences in clauses of Minimum Size

Many variations, e.g.:

maximize

5. Jeroslow-Wang [1990]

maximize

Two-sided version: maximize

16

#unresolved smallest clausescontaining literal x

preference also to vars thatappear as both pos & neg

in smallest clauses

number of unresolved clauses literal lappears in, weighted inverselyproportional to exp(clause size)

Page 17: IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts

IBM Research

© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems

Key Techniques InsideModern CDCL Solvers

17

Page 18: IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts

IBM Research

© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems

DPLL Search as Implemented in Modern Solvers

Note:No “search tree” style search where we set x=0 andthen later “flip” to x=1

18

Page 19: IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts

IBM Research

© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems

Clause Learning: Conflict Graphs, etc.

Search tree behavior:

Branch: p=0, q=0, b=1

Detect conflict; learn, say, 1-UIP clause (¬a or t)

Backtrack to depth=2: assignment stack has p=0, q=0

Flip value of b to get b=0

Do nothing (not even state update) and simply observe t=1 is implied! further, t=1 implies b=0 (under the current context)

19

b=1

p=0

q=0

t =1

(¬a or t)

Page 20: IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts

IBM Research

© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems

Lazy Data Structures

SAT solvers (used to) spend 80% of their time doing unit propagation

Must make unit propagation efficient

– as more and more clauses are added (clause learning)

– as longer clauses are added (initial clauses tend to be mostly short)

Observation: Watching two un-falsified literals is sufficient,no matter how long the clause is!– With 2 un-faslified clauses, clause guaranteed to not unit propagate or be falsified

Can ignore processing most clauses unless the literal under consideration is being watched in them

Head and Tail Lists: SATO solver [1997]

Watched Literals: zChaff solver [2001]

20

Page 21: IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts

IBM Research

© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems

H/T Lists vs. Watched Literals

WL structure needs

No pointer “trail” maintenance

No work when backtracking

But can mean exploring the whole clause to detect unit literal

21

Credit: Marques-Silva, Lynce,& Malik; Handbook of SAT

Page 22: IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts

IBM Research

© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems

Dynamic Variable Selection Heuristics

VSIDS: Variable State Independent Decaying Sum (zChaff solver)

– Fast heuristic: not extremely accurate but adaptive and informed by conflicts!

– A key ingredient to make SAT solvers work well on industrial instances

– Necessitated by lazy data structures: accurate information about reduced clause size no longer available

Maintain one score for each literal

Increase score of literals appearing in the conflict clause

Periodically divide all scores by 2

Several variations, e.g., Berkmin solver:

– One score for each variable, incremented for all vars appearing in 1-UIP analysis

– More importantly: variable chosen from most recently learned and yet-unsatisfied conflict clause!

22

Page 23: IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts

IBM Research

© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems

Restarts: Without Clause Learning

Originally motivated by observations about runtime distributionsof SAT solvers without conflict learning [Gomes et al, 1998]

Really effective when “heavy-tailed” behavior is presentwith many short runs (“backdoors”) and many very long runs

The easy to grasp concept: key is the role of the exponential distribution (geometric distribution, really, for the discrete case)

– If probability of failure after time T(a) decays faster than exponentially hurts to restart(a) decays exponentially doesn’t matter (easy solution strategy: keep restarting!)(c) decays slower than exponentially should restart

23

Standard Distribution(finite mean & variance)

Power Law Decay

Exponential Decay

Page 24: IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts

IBM Research

© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems

Restarts: With Clause Learning

No clear empirical runtime distribution study (to my knowledge); however, large runtime variations often observed in practice and rapid restarts help!

– Safe: Context is kept through learned clauses and associated heuristics

Theoretical Justification/Intuition: Do we really need restarts?

– Stems from characterization of Clause Learning Proof System (CL) and its relation to General Resolution (RES)

– Full simulation of RES by CL known only in the presence of restarts!

1. CL (specific learning scheme, no restarts) exponentially more powerful than any “natural and proper” fragment of RES [2003]

2. CL** + lots of restarts = RES [2003]

3. F has a short RES proof F’ has a short CL proof w/o restarts [2008]

4. CL + lots of restarts = RES [2009]

5. CL has short proofs of natural candidate formulas for separation [2012]

24

Page 25: IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts

IBM Research

© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems

Summary

Dramatic Progress in CDCL SAT Solvers

– High Contrast with CP/MIP solvers w.r.t. “tree” structure

– “Everything” influenced by Learned Clauses and Conflict Analysis

Traditional Branching Heuristics

– Exist but no longer common (except in Look-Ahead SAT Solvers)

CDCL Solvers: Interesting Search Design, no clear “tree”

– Clause Learning

– Lazy Data Structures

– VSIDS Variable Selection Heuristic

– Aggressive (but careful) Restarts

Reference: Handbook of SAT

– 27 chapters: Everything from historical perspectives,theoretical foundations, practical solvers, applications, …

25

Page 26: IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts

IBM Research

© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems

EXTRA SLIDES

26

Page 27: IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts

IBM Research

© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems

Goal of This Talk

Highlight key advances in the design of DPLL-based SAT solvers that have made this scaling feasible

Note: it is not just the “simplicity” of the constraints per se

E.g., a CNF formula F given as a set of “clause constraints” to IBM/ILOG CP Solver or to a MIP solver would not scale up!

Several fundamental techniques make modern SAT solvers behave very differently from the traditional branch-and-backtrack search; e.g.

– there isn’t anymore a clearly defined “search tree”, or even a search data structure that “tries both branches” / “flips variable value”

– they don’t even look at most of the clauses when branching and propagating

– they literally do nothing upon backtrack besides un-assigning variable values (no “state” to revert back to)

27

Page 28: IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts

IBM Research

© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems

Basic DPLL Search for SAT

28

Page 29: IBM Watson Research Center CPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems © 2013 IBM Corporation Branching Strategies and Restarts

IBM Research

© 2013 IBM CorporationCPAIOR-2013 Workshop: Seeking Feasibility in Combinatorial Problems

Key Techniques in Modern SAT Solvers

1. Clause learning (no-goods)

– Requires a “conflict analysis” mechanism: implication graph, graph cuts

– Motivates/necessitates efficient data structures (e.g., watched literals)

– Enables getting rid of traditional search tree

– Takes “restarts” to another level: very rapid and less risky

– Helps guide the solver in many waysa) Conflict directed backjumping / non-chronological backtracking

b) Conflict directed variable selection: VSIDS

2. Lazy data structures: watched literals

– Motivated by SAT solvers spending ~80% of their time doing unit prop., and new clauses being added at a very rapid rate!

– Enables very efficient propagation: allows ignoring most clauses

– Enables “no work” upon backtracking

3. Very aggressive restarts

4. Assignment stack shrinking

5. Conflict clause minimization

6. Clause deletion (to save memory) [have to be careful about search tree]

29