# investigating computational hardness through realistic ...· investigating computational hardness

Post on 14-Aug-2018

212 views

Embed Size (px)

TRANSCRIPT

InvestigatingComputationalHardnessThroughRealistic Boolean SAT DistributionsRealisticBooleanSATDistributions

ThomasLiu&SoumyaKambhampatiPeggyPayneAcademyMcClintockHighSchool

Investigating Computational Hardness through Realistic Boolean SAT Distributions

Abstract: A fundamental question in Computer Science is understanding when a specific class

of problems go from being computationally easy to hard. Because of its generality and

applications, the problem of Boolean Satisfiability (aka SAT) is often used as a vehicle for these

studies. A signal result from these studies is that the hardness of SAT problems exhibits a

dramatic easy-to-hard phase transition with respect to the problem constrainedness. Past studies

have however focused mostly on SAT instances generated using uniform random distributions,

where all constraints are independently generated, and the problem variables are all considered

of equal importance. These assumptions are unfortunately not satisfied by most real problem.

Our project aims for a deeper understanding of hardness of SAT problems that arise in practice.

We study two key questions: (i) How does easy-to-hard transition change with more realistic

distributions that capture neighborhood sensitivity and rich-get-richer aspects of real problems

and (ii) Whether the results on random SAT distributions shed light on the hardness of specific

problems such as Sudoku. Our results, based on extensive empirical studies with our own

instrumented SAT solver, provide important insights into the practical implications of phase

transition behavior on SAT problems.

Investigating Computational Hardness through Realistic Boolean SAT Distributions

Executive Summary: In this project, we aimed to understand how real-world computational

tasks go from being easy to being hard in terms of the time taken to solve them. We used the

problem of Boolean Satisfiability (SAT) as the focus of our study because many other problems

can be translated into SAT problems. SAT problems involve assigning values to a set of Boolean

(True/False) decision variables while satisfying all the given constraints. They have applications

ranging from scheduling software to decision making for robots.

Prior research has shown that random SAT problems exhibit a dramatic easy-to-hard

transition in problem hardness as their constrainedness (number of constraints relative to number

of decision variables) is varied. The main focus past work has however been on uniform random

distributions, in which clauses are generated independently of other clauses. Our project aimed

for a deeper understanding of hardness of SAT problems that arise in practice. We studied two

key questions: (i) How does easy-to-hard transition change with more realistic distributions that

capture neighborhood sensitivity and rich-get-richer aspects of real problems and (ii) Whether

the results on random SAT distributions shed light on the hardness of specific problems such as

Sudoku. Our results, backed by extensive empirical studies with our own instrumented SAT

solver, shed considerable light on these questions. With the power law, the centralization of the

problem around a few key variables that occurred a lot made the problem less hard to solve but

also less satisfiable. With the neighborhood distribution, localization of the problem to several

smaller problems also made it easier to solve, but the unsatisfiability of one sub-problem caused

the entire problem to fail, making it less satisfiable. With Sudoku, we found that harder Sudoku

problems were indeed harder for the SAT solver, but this was not correlated to constrainedness,

indicating that further research is needed on the origin of hardness in Sudoku.

1

Investigating Computational Hardness

through Realistic Boolean SAT

Distributions

Introduction

A fundamental question in Computer Science is understanding computational

hardness of problems. Most work on computational hardness characterizes worst case

complexity of problemsthat is, the amount of computational time taken to solve the

problem in the worst case. Worst case complexity is however not always a good indicator

of problem hardness. A class of problems may be hard in the worst case, and still be easy

to solve in most practical cases. Consequently, researchers are interested in finding

exactly where the problems go from being easy to hard.

Work on this question has been done in the context of Boolean satisfiability (or

SAT) problem [1-4]. Given a set of Boolean constraints on a set of propositions (aka

variables), the SAT problem is to find a True/False assignment to the variables so that all

the constraints are satisfied. Because SAT models the problem of choice making under

constraints, SAT solvers have become an integral part in many computational tasks,

ranging from scheduling software to decision making for robots. Thus, developing an

understanding of the difficulty of SAT problems in practice is important.

2

SAT problems take exponential time (in the number of variables) to solve in the

worst case, but are, however, easy to solve for most cases in practice. Research in the

early 90s [1] showed that k-SAT problems (SAT problems with a fixed number of

variables, k, in every clause) are easy to solve both when the constrainedness (the ratio

of the number of clauses to the number of variables) is low and when it is high, abruptly

transitioning from easy to hard in a very narrow region of constrainedness. Most of this

phase transition studies were done on SAT instances that follow uniform random

distribution. In such a distribution, variables take part in clauses with uniform probability,

and clauses are independent (uncorrelated).

The assumptions of uniform random distribution are not satisfied when we

consider SAT instances that result from the compilation of real problems. Our project

thus aims for a deeper understanding of the hardness of SAT problems that arise in

practice. In particular, we study these two key questions:

1. How does the phase transition behavior change with more realistic and natural

distributions of SAT problems?

2. Do SAT phase transitions allow us to predict hardness of other real world

problems that can be compiled (translated) into SAT?

In aid of the first, we considered SAT instances generated according to two non-uniform

distributions: (i) power-law or rich-get-richer distribution and (ii) neighborhood-sensitive

distributions. For the second question, we focused on SAT instances that come from

Sudoku problems. We developed our own SAT solver and conducted a large-scale

systematic empirical study on the phase transition behavior of these SAT instances. Our

3

hypothesis is that these studies will shed light on how easy-to-hard problem transition in

real-world problems differs from uniform random distribution.

Background

A Boolean Satisfiability Problem (SAT) is the problem of determining if the

variables in a Boolean formula can be set to True/False values so as to make the entire

Boolean formula return true. If this is possible, the problem is called satisfiable. If not,

the problem is unsatisfiable. For example, given a Boolean formula (P AND (P IMPLIES

Q)), the assignment P=True, Q=True is a satisfying assignment. Given n variables, there

are 2n

possible assignments, and in the worst case, a solver has to check through each

one of them. Indeed, SAT problems are known to be NP-Complete [1].

CNF Clauses: The constraints in a general Boolean satisfiability problem can be any

arbitrary propositional logic formula (involving negation, conjunction, disjunction and

implication). CNF, or Conjunctive Normal Form, is a simpler (normal) form where each

constraint (also called clause) is a disjunction of normal or negated variables (aka

literals). Any Boolean formula can be easily converted to an equivalent set of CNF

clauses [3]. For example, the constraint (P IMPLIES Q) can be converted to the clause

(~P OR Q) (where ~stands for negation). The length of the clause is the number of

variables in it.

k-SAT: A general SAT problem, converted to CNF form, might normally result in

clauses of varying lengths. A k-SAT problem instance is one where all the clauses are

exactly length k. Interestingly, while 1-SAT and 2-SAT are both easy to solve (worst-

case polynomial), from 3-SAT onwards, the worst case complexity becomes exponential.

4

Thus scientists often study 3-SAT problems as a stand-in for the general Boolean

satisfiability problems [4].

Phase Transition in SAT: While SAT problems are worst case exponential, it has long

been known that most random SAT instances are easy to solve. This brings up the

question: where are the hard SAT problems? Researchers have tried to answer this

question by experimenting with randomly generated SAT instances [1]. They have found

that SAT problems instances can be characterized by their constrainedness () which is

the ratio of the number of clauses (constraints) to the number of variables. Hard

problems, or problems that take SAT solvers the maximum effort to solve, occur in a