survey propagation revisited - cornell universitykroc/pub/surveypropuai07b.pdfsurvey propagation...

Survey Propagation Revisited

Lukas Kroc Ashish Sabharwal Bart Selman

Department of Computer Science, Cornell University, Ithaca, NY 14853-7501, U.S.A.∗

{kroc,sabhar,selman}@cs.cornell.edu

Abstract

Survey propagation (SP) is an exciting newtechnique that has been remarkably success-ful at solving very large hard combinatorialproblems, such as determining the satisfia-bility of Boolean formulas. In a promisingattempt at understanding the success of SP,it was recently shown that SP can be viewedas a form of belief propagation, computingmarginal probabilities over certain objectscalled covers of a formula. This explana-tion was, however, shortly dismissed by ex-periments suggesting that non-trivial coverssimply do not exist for large formulas. Inthis paper, we show that these experimentswere misleading: not only do covers exist forlarge hard random formulas, SP is surpris-ingly accurate at computing marginals overthese covers despite the existence of manycycles in the formulas. This re-opens a po-tentially simpler line of reasoning for under-standing SP, in contrast to some alternativelines of explanation that have been proposedassuming covers do not exist.

1 INTRODUCTION

Survey Propagation (SP) is a new exciting algorithmfor solving hard combinatorial problems. It was dis-covered by Mezard, Parisi, and Zecchina (2002), andis so far the only known method successful at solvingrandom Boolean satisfiability (SAT) problems with 1million variables and beyond in near-linear time in thehardest region. The SP method is quite radical in thatit tries to approximate certain marginal probabilitiesrelated to the set of satisfying assignments. It theniteratively assigns values to variables with the most

∗Research supported by Intelligent Info. Systems Instt.(IISI), Cornell Univ., AFOSR grant FA9550-04-1-0151.

extreme probabilities. In effect, the algorithm be-haves like the usual backtrack search methods for SAT(DPLL-based), which also assign variable values incre-mentally in an attempt to find a satisfying assignment.However, quite surprisingly, SP almost never has tobacktrack. In other words, the “heuristic guidance”from SP is almost always correct. Note that, interest-ingly, computing marginals on satisfying assignmentsis actually believed to be much harder than findinga single satisfying assignment (#P-complete vs. NP-complete). Nonetheless, SP is able to efficiently ap-proximate certain marginals and uses this informationto successfully find a satisfying assignment.

SP was derived from rather complex statistical physicsmethods, specifically, the so-called cavity method de-veloped for the study of spin glasses. Close connectionsto belief propagation (BP) methods were subsequentlydiscovered. In particular, it was discovered by Braun-stein and Zecchina (2004) (later extended by Maneva,Mossel, and Wainwright (2005)) that SP equations areequivalent to BP equations for obtaining marginalsover a special class of combinatorial objects, calledcovers. Intuitively, a cover provides a representativegeneralization of a cluster of satisfying assignments.The discovery of a close connection between SP andBP via the use of covers laid an exciting foundationfor explaining the success of SP. Unfortunately, subse-quent experimental evidence suggested that hard ran-dom 3-SAT formulas have, with high probability, onlyone (trivial) cover (Maneva et al., 2005). This wouldleave all variables effectively in an undecided state, andwould mean that marginals on covers cannot provideany useful information on how to set variables. SinceSP clearly sets variables in a non-trivial manner, itwas conjectured that there must be another explana-tion for the good behavior of SP; in particular, onethat is not based on the use of marginal probabilitiesof variables in the covers.

In this paper, we revisit the claim that hard random 3-SAT formulas do not have interesting non-trivial cov-

ers. In fact, we show that such formulas have largenumbers of non-trivial covers. The main contributionof the paper is the first clear empirical evidence show-ing that in random 3-SAT problems near the satisfi-ability and hardness threshold, (1) a significant num-ber of non-trivial covers exist; (2) SP is remarkablygood at computing variable marginals based on cov-ers; and (3) these cover marginals closely relate to so-lution marginals at least in the extreme values, whereit matters the most for survey inspired decimation. Asa consequence, we strongly suspect that explaining SPin terms of covers may be the correct path after all.

Note that (2) above is quite surprising for random3-SAT formulas because such formulas have manyloops. The known formal proof that SP computescover marginals only applies to tree-structured formu-las, which in fact have only a single (trivial) cover.Further, it’s amazing that while SP computes suchmarginals in a fraction of a second, the next best meth-ods of computing these marginals that we know of (viaexact enumeration, or sampling followed by “peeling”)require over 100 CPU hours.

Our experiments also indicate that cover marginalsare more “conservative” than solution marginals in thesense that variables that are extreme with respect tocover marginals are almost certainly also extreme withrespect to solution marginals, but not vice versa. Thissheds light on why it is safe to set variables with ex-treme cover marginals in an iterative manner, as isdone in the survey inspired decimation process for find-ing a solution using the marginals computed by SP.

In addition to these empirical results, we also revisitthe derivation of the SP equations themselves, with thegoal of presenting the derivation in an insightful formpurely within the realm of combinatorial constraintsatisfaction problems (CSPs). We describe how onecan reformulate in a natural step-by-step manner theproblem of finding a satisfying assignment into one offinding a cover, by considering related factor graphson larger state spaces. The BP equations for this re-formulated problem are exactly the SP equations forthe original problem, as shown in the Appendix.

2 COVERS OF CNF FORMULAS

We start by introducing the notation and the ba-sic concepts that we use throughout the paper. Weare concerned with Boolean formulas in ConjunctiveNormal Form or CNF, that is, formulas of the formF ≡ (l11 ∨ . . . ∨ l1k1) ∧ . . . ∧ (lm1 ∨ . . . ∨ lmkm), whereeach lik (called a literal) is a Boolean variable xj orits negation ¬xj . Each conjunct of F , which itself isa disjunction of literals, is called a clause. In 3-CNFor 3-SAT formulas, every clause has 3 literals. Ran-

dom 3-SAT formulas over n variables are generated byuniformly randomly choosing a pre-specified numberof clauses over these n variables. The Boolean satis-fiability problem is the following: Given a CNF for-mula F over n variables, find a truth assignment σ forthe variables such that every clause in F evaluates totrue; σ is called a satisfying assignment or a solutionof F . We identify true with 1 and false with 0.

A truth assignment to n variables can be viewed as astring of length n over the alphabet {0, 1}, and extend-ing this alphabet to include a third letter “∗” leads toa generalized assignment. A variable with the value ∗can be interpreted as being “undecided,” while vari-ables with values 0 or 1 can be interpreted as being“decided” on what they want to be. We will be inter-ested in certain generalized assignments called covers.Our formal definition of covers follows the one given byAchlioptas and Ricci-Tersenghi (2006). Let variable xbe called a supported variable under a generalized as-signment σ if there is a clause C such that x is theonly variable that satisfies C and all other literals ofC are false. Otherwise, x is called unsupported.

Definition 1. A generalized assignment σ ∈ {0, 1, ∗}n

is a cover of a CNF formula F iff

1. every clause of F has at least one satisfying literalor at least two literals with value ∗ under σ, and

2. σ has no unsupported variables assigned 0 or 1.

The first condition ensures that each clause of F iseither already satisfied by σ or has enough undecidedvariables to not cause any undecided variable to beforced to decide on a value (no “unit propagation”).The second condition says that each variable that isassigned 0 or 1 is set that way for a reason: thereexists a clause that relies on this setting in order tobe satisfied. For example, consider the formula F ≡(x ∨ ¬y ∨ ¬z) ∧ (¬x ∨ y ∨ ¬z) ∧ (¬x ∨ ¬y ∨ z). F hasexactly two covers: 111 and ∗ ∗ ∗. This can be verifiedby observing that whenever some variable is 0 or ∗,then all non-∗ variables are unsupported. Notice thatthe string of all ∗’s always satisfies the conditions inDefinition 1; we refer to this string as the trivial cover.

Covers were introduced by Maneva et al. (2005) asa useful concept to analyze the behavior of SP, buttheir combinatorial properties are much less knownthan those of solutions. A cover can be thought of asa partial assignment to variables, where the variablesassigned ∗ are considered unspecified. In this sense,each cover is a representative of a potentially large setof complete truth assignments, satisfying as well as notsatisfying. This motivates further differentiation:

Definition 2. A cover σ ∈ {0, 1, ∗}n of F is a truecover iff there exists a satisfying assignment τ ∈{0, 1}n of F such that σ and τ agree on all values where

σ is not a ∗, i.e., ∀i ∈ {1, . . . , n}(σi 6= ∗ =⇒ σi = τi).Otherwise, σ is a false cover.

A true cover thus generalizes at least one satisfyingassignment. True covers are interesting to study whentrying to satisfy a formula, because if there exists atrue cover with variable x assigned 0 or 1, then theremust also exist a satisfying assignment with the samesetting of x.

One can construct a true cover σ ∈ {0, 1, ∗}n of F bystarting with any satisfying assignment τ ∈ {0, 1}n ofF and generalizing it using a simple procedure called∗-propagation.1 The procedure starts by initiallysetting σ = τ . It then repeatedly chooses an arbitraryvariable unsupported under σ and turns it into a ∗,until there are no more unsupported variables. The re-sulting string σ is a true cover, which can be verified asfollows. The satisfying assignment τ already satisfiesthe first condition in Definition 1, and ∗-propagationdoes not destroy this property. In particular, a vari-able on which some clause relies is never turned into a∗. The second condition in Definition 1 is also clearlysatisfied when ∗-propagation halts, so that σ must bea cover. Moreover, since σ generalizes τ , it is a truecover. Note that ∗-propagation can, in principle, beapplied to an arbitrary generalized assignment. How-ever, unless we start with one that satisfies the firstcondition in the cover definition, ∗-propagation maynot lead to a cover.

We end with a discussion of two insightful propertiesof covers. The first relates to “self-reducibility” andthe second to covers for tree-structured formulas.

No self-reducibility. Consider the relation betweenthe decision and search versions of the problem of find-ing a solution of a CNF formula F . In the decision ver-sion, one needs an algorithm that determines whetheror not F has a solution, while in the search version,one needs an algorithm that explicitly finds a solu-tion. The problem of finding a solution for F is self-reducible, i.e., given an oracle for the decision version,one can efficiently solve the search version by itera-tively fixing variables to 1 or 0, testing whether thereis still a solution, and continuing in this way. Some-what surprisingly, this strategy does not work for theproblem of finding a cover. In other words, an oraclefor the decision version of this problem does not im-mediately provide an efficient algorithm for finding acover. (The lack of self-reducibility makes it very hardto find covers as we will see below.) As a concreteexample, consider the formula F described right afterDefinition 1. To construct a cover of F , we could ask

1This was introduced under different names as the peel-ing procedure or coarsening, e.g., by Maneva et al. (2005).

whether there exists a cover with x set to 1. Since111 is a cover (yet unknown to us), the decision oraclewould say yes. We could then fix x to 1, simplify theformula to (y∨¬z)∧(¬y∧z), and ask whether there isa cover with y set to 0. This residual formula indeedhas 00 as a cover, and the oracle would say yes. Withone more query, we will end up with 100 as the valuesof x, y, z, which is in fact not a cover of F .

Tree-structured formulas. For tree-structuredformulas without unit clauses, i.e., formulas whose fac-tor graph does not have a cycle, the only cover is thetrivial all-∗ cover. We argue this using the connec-tion between covers and SP shown by Braunstein andZecchina (2004), which says that when generalized as-signments have a uniform prior, SP on a tree formula Fprovably computes probability marginals of variablesbeing 0, 1, and ∗ in covers of F . Moreover, it can beverified from the iterative equations for SP that withno unit clauses, zero marginals for any variable being0 or 1, and full marginals for any variable being a ∗ isa fixed point of SP. Since SP provably has exactly onefixed point on tree formulas, it follows that the onlycover of such formulas is the trivial all-∗ cover.

3 PROBLEM REFORMULATION:

FROM SOLUTIONS TO COVERS

We now show that the concept of covers can be quitenaturally arrived at when trying to find solutions ofa CNF formula, thus motivating the study of coversfrom a purely generative perspective. Starting witha CNF formula F , we describe how F is transformedstep-by-step into the problem of finding covers of F ,motivating each step.

Although our discussion applies to any CNF formulaF , we will be using the following example formula with3 variables and 4 clauses to illustrate the steps:

(x ∨ y ∨ ¬z)︸︷︷︸

a

∧ (¬x ∨ y)︸︷︷︸

b

∧ (¬y ∨ z)︸︷︷︸

c

∧ (x ∨ ¬z)︸︷︷︸

d

Let N denote the number of variables, M the numberof clauses, and L the number of literals of F .

Original problem. The problem is to find an as-signment in the space {0, 1}

Nthat satisfies F . The fac-

tor graph for F has N variable nodes and M functionnodes, corresponding directly to the variables x, y, . . .and clauses a, b, . . . in F (see e.g. Kschischang et al.(2001)). The factor graph for the example formulais depicted below. Here factors Fa, Fb, . . . representpredicates ensuring that the corresponding clause hasat least one satisfying literal.

FdFcFbFa

x y z

Variable occurrences. The first step in the trans-formation is to start treating every variable occurrencexa, xb, ya, yb, . . . in F as a separate unit that can be ei-ther 0 or 1. This allows for more flexibility in the pro-cess of finding a solution, since a variable can decidewhat value to assume in each clause separately. Ofcourse, we need to add constraints to ensure that theoccurrence values are eventually consistent: for everyvariable x in F , we add a constraint Fx that all occur-rences of x have the same value. Now the search spaceis {0, 1}L, and the corresponding factor graph containsL variable nodes and M +N function nodes (the orig-inal clause factors Fa, Fb, . . . and the new constraintsFx, Fy, . . .).

Fx Fa Fb Fy Fc Fd Fz

zdzczaycybyaxdxbxa

At this point, we have not relaxed solutions to theoriginal problem F : solutions to the modified problemcorrespond precisely to the original solutions, becausevariable occurrences are forced to be consistent. How-ever, we moved this consistency check from the syn-tactic level (variables could not be inconsistent simplyby the problem definition) to the semantic level (wehave special constraints to guarantee consistency).

Relaxing assignments. The next step is to relaxthe problem by allowing variable nodes to assume thespecial value “∗”. The semantics of ∗ is “undecided,”meaning that the variable node is set neither to 0nor to 1. The new search space is {0, 1, ∗}

L, and we

must specify how our constraints handle the value ∗.Variable constraints Fx, . . . have the same meaning asbefore, namely, all variable nodes xa, xb, . . . have thesame value for every variable x. Clause constraintsFa, . . . now have a modified meaning: a clause is sat-isfied if it contains at least one satisfying literal or atleast two literals with the value ∗. The motivation hereis to either satisfy a clause or leave enough “freedom”in the form of at least two undecided variables. (Asingle undecided variable would be forced to take on aparticular value if all other literals in the clause werefalsified.) With this transformation, the factor graphremains structurally the same, while the set of possible

values for variable nodes changes.

The solutions to this modified problem do not neces-sarily correspond directly to solutions of the originalone. In particular, if there are no unit clauses and allvariables are set to ∗, the problem is already “solved”without providing any useful information.

Reducing freedom of choice. To distinguish vari-ables that could assume the value ∗ from those thattruly need to be fixed to either 0 or 1, we require thatevery non-∗ variable has a clause that needs the vari-able to be 0 or 1 in order to be satisfied. The searchspace does not change, but we need to add constraintsto implement the reduction in the freedom of choice.

Notice that this requirement is equivalent to “no un-supported variables” in the definition of a cover, andthat the first requirement in that definition is ful-filled by the clause constraints. Therefore, we are nowsearching for covers of F . A natural way to representthe “no unsupported variable” constraint in the fac-tor graph is to add for each variable x a new functionnode F ′x, connected to the variable nodes for x as wellas for all other variables sharing a clause with x. This,of course, creates many new links and introduces ad-ditional short cycles, even if the original factor graphwas acyclic. The following transformation step allevi-ates this issue.

Reinterpreting variable nodes. As the final step,we change the semantics of the variable nodes’ val-ues and of the constraints so that the “no unsup-ported variable” condition can be enforced without ad-ditional function nodes. The reasoning is that the sim-ple {0, 1, ∗} domain creates a bottleneck for how muchinformation can be communicated between nodes inthe factor graph. By altering the semantics of thevariable nodes’ values, we can improve on this.

The new value of a variable node xa will be a pair(ra→x, wx→a) ∈ {(0, 0), (0, 1), (1, 0)}, so that the sizeof the search space is still 3L. We interpret the valuera→x as a request from clause a to variable x with themeaning that a relies on x to satisfy it, and the valuewx→a as a warning from variable x to clause a that x isset such that it does not satisfy a. The values 1 and 0indicate presence and absence, resp., of the request orwarning. We can recover the original {0, 1, ∗} valuesfrom these new values as follows: if ra→x = 1 for somea, then x is set to satisfy clause a; if there is no requestfrom any clause where x appears, then x is undecided(a value of ∗ in the previous interpretation). The vari-able constraints Fx, . . . not only ensure consistency ofthe values of xa, xb, . . . as before, but also ensure thesecond cover condition as described below. The clauseconstraints Fa, . . . remain unchanged.

The variable constraint Fx is a predicate ensuring thatthe following two conditions are met:

1. if ra→x = 1 for any clause a where x appears,then wx→b = 0 for all clauses b where x appearswith the same sign as in a, and wx→b = 1 for allb where x appears with the opposite sign. Since xmust be set to satisfy a, this ensures that clausesthat are unsatisfied by x do receive a warning.

2. if ra→x = 0 for all clauses a where x appears, thenwx→a = 0 for all of them, i.e., no clause receivesa warning from x.

To evaluate Fx, values (ra→x, wx→a) are needed onlyfor clauses a in which x appears, which is exactly theset of variable nodes the factor Fx is connected to. No-tice that the case (ra→x, wx→a) = (1, 1) cannot happendue to condition 1 above. The conditions also implythat the variable occurrences of x are consistent, and inparticular that two clauses where x appears with oppo-site signs (say a and b) cannot simultaneously requestto be satisfied by x. This is because either ra→x = 0or rb→x = 0 must hold due to condition 1.

The clause constraint Fa is a predicate stating thatclause a issues a request to its variable x if and only if itreceives warnings from all its other variables: ra→x = 1iff wy→a = 1 for all variables y 6= x in a. Again, Facan be evaluated using exactly values from the variablenodes it is connected to.

When clause a issues a request to variable x (i.e.,ra→x = 1), x must be set to satisfy a, thus providing asatisfying literal for a. If a does not issue any request,then according to the condition of Fa, at least two ofa’s variables, say x and y, must not have sent a warn-ing. In this case, Fx and Fy state that each of x andy is either undecided or satisfies a. Thus the first con-dition in the cover definition holds in any solution ofthis new constraint satisfaction problem. The secondcondition also holds, because every variable x that isnot undecided must have received a request from someclause a, so that x is the only literal in a that is notfalse. Therefore x is supported.

Let us denote this final constraint satisfaction problemby P (F ). (It is a function of the original formula F .)Notice that the factor graph of P (F ) has the sametopology as the factor graph of F . In particular, ifF has a tree factor graph, so does P (F ). Further, bythe construction of P (F ) described above, its solutionscorrespond precisely to the covers of F .

3.1 INFERENCE OVER COVERS

This section discusses an approach for solving theproblem P (F ) with probabilistic inference using beliefpropagation (BP). It arrives at the survey propagation

equations for F by applying BP equations to P (F ).

Since the factor graph of P (F ) can be easily viewedas a Bayesian Network (cf. Pearl, 1988), one can com-pute marginal probabilities over the set of satisfyingassignments of the problem, defined as

Pr[xa = v | all constraints of P (F ) are satisfied]

for each variable node xa and v ∈ {(0, 0), (0, 1), (1, 0)}.The probability space here is over all assignments tovariable nodes with uniform prior.

Once these solution marginals are known, we knowwhich variables are most likely to assume a particularvalue, and setting these variables simplifies the prob-lem. A new set of marginals can be computed on thissimplified formula, and the whole process repeated.This method of searching for a satisfying assignmentis called the decimation procedure. The problem,of course, is to compute the marginals (which, in gen-eral, is much harder than finding a satisfying assign-ment). One possibility for computing marginals is touse the belief propagation algorithm (cf. Pearl, 1988).Although provably correct essentially only for formulaswith a tree factor graph, BP provides a good approxi-mation of the true marginals in many problem domainsin practice (Murphy et al., 1999). Moreover, as shownby Maneva et al. (2005), applying the BP algorithm tothe problem of searching for covers of F results in theSP algorithm. Thus, on formulas with a tree factorgraph, the SP algorithm provably computes marginalprobabilities over covers of F , which are equivalent tomarginals over satisfying assignments of P (F ). Whenthe formula contains loops, SP computes a loopy ap-proximation to the cover marginals. Specific details ofthe derivation of SP equations from the problem P (F )are deferred to the Appendix.

4 EXPERIMENTAL RESULTS

This section presents our main contributions. We be-gin by demonstrating that non-trivial covers do ex-ist in large numbers in random 3-SAT formula, andthen explore connections between SP, BP, and vari-able marginals computed from covers as well as so-lutions, showing in particular that SP approximatescover marginals surprisingly well.

4.1 EXISTENCE OF COVERS

Motivated by theoretical results connecting SP to cov-ers of formulas, Maneva et al. (2005) suggested anexperimental study to test whether non-trivial coverseven exist in random 3-SAT formulas. They proposeda seemingly good way to do this (the “peeling experi-ment”), namely, start with a uniformly random satisfy-

ing assignment of a formula F and, while it has unsup-ported variables, ∗-propagate the assignment. Whenthe process terminates, one obtains a (true) cover ofF . Unfortunately, what they observed is that this pro-cess repeatedly hits the trivial all-∗ cover, from whichthey concluded that non-trivial covers most likely donot exist for such formulas. However, it is known thatnear-uniformly sampling solutions of such formulas tostart with is a hard problem in itself and that mostsampling methods obtain solutions in a highly non-uniform manner (Wei et al., 2004). Consequently, onemust be careful in drawing conclusions from relativelyfew and possibly biased samples.

0 1000 2000 3000 4000 5000

020

040

060

080

010

00

Number of stars

Num

ber o

f uns

uppo

rted

varia

bles Solutions leading to the trivial cover

Solutions leading to non−trivial covers

Figure 1: The peeling experiment, showing the evolu-tion of the number of stars as ∗-propagation proceeds.

To understand this issue better, we ran the same peel-ing experiment on a 5000 variable random 3-SAT for-mula at clause-to-variable ratio 4.2 (which is close tothe hardness threshold for random 3-SAT problems),but used SampleSat (Wei et al., 2004) to obtain sam-ples, which is expected to produce fairly uniform sam-ples. Figure 1 shows the evolution of the number ofunsupported variables at each stage as ∗-propagationis performed starting from a solution. Here, thex-axis shows the number of stars, which monotoni-cally increases by ∗-propagation. The y-axis showsthe number of unsupported variables present at eachstage. As one moves from left to right following the∗-propagation process, one hits a cover if the num-ber of unsupported variables drops to zero (so that∗-propagation terminates). The two curves in the plotcorrespond to solutions that ∗-propagated to the triv-ial cover and those that did not. In our experiment,out of 500 satisfying assignments used, nearly 74% ledto the trivial cover; their average is represented by thetop curve. The remaining 26% of the sampled solu-tions actually led to non-trivial covers; their averageis represented by the bottom curve. Thus, when so-lutions are sampled near-uniformly, a substantial frac-tion of them lead to non-trivial covers.2

2 That this was not observed by Maneva et al. (2005)can be attributed to the fact that SP was used to findsatisfying assignments (Mossel, 2007), resulting in highlynon-uniform samples.

2.0 2.5 3.0 3.5 4.0 4.5

0.0

0.2

0.4

0.6

0.8

1.0

Clause−to−variable ratio

P[no

n−tri

vial

cov

er]

90 vars70 vars50 vars

2.0 2.5 3.0 3.5 4.0 4.5

020

4060

8010

0

Clause−to−variable ratio

Num

ber o

f non

−triv

ial c

over

s

90 vars70 vars50 vars

Figure 2: Non-trivial covers in random formulas. Left:existence probability. Right: average number.

An alternative method of finding covers is to create anew Boolean formula G whose solutions correspond gothe covers of F . It turned out to be extremely hardto solve G to find any non-trivial cover using state-of-the-art SAT solvers for number of variables as low as150. So we confined our experiments to small formu-las, with 50, 70 and 90 variables. We found all coversfor such formulas with varying clause-to-variable ra-tios α. The results are shown in Figure 2, where eachdata point corresponds to statistics obtained from 500formulas. The left pane shows the probability that arandom formula, for a given clause-to-variable ratio,has at least one non-trivial cover (either true or false).The figure shows a nice phase transition where cov-ers appear, at around α = 2.5, which is surprisinglysharp given the small formula sizes. Also, the regionwhere covers surely exist is widening on both sidesas the number of variables increases, supporting theclaim that non-trivial covers exist even in large for-mulas. The right pane of Figure 2 shows the actualnumber of non-trivial covers, with a clear trend thatthe number increases with the size of the formula, forall values of the clause-to-variable ratio. It is worthnoting that the number of covers is very small com-pared to the number of satisfying assignments; e.g.for 90 variables and α = 4.2, the expected number ofsatisfying assignments is 150, 000, while there are only8 covers on average. Somewhat surprisingly, the num-ber of false covers is almost negligible, around 2 at thepeak, and does not seem to be growing nearly as fastas the total number of covers. This might explain whySP, although approximating marginals over all covers,is successful in finding satisfying assignments.

We also consider how the number of solutions that leadto non-trivial covers changes for larger formulas, asthe number of variables N increases from 200 to 4000.The left pane of Figure 3 shows that an estimate ofthis number, in fact, grows exponentially with N . Foreach N , the estimate is obtained by averaging over200 formulas at ratio 4.2 the following quantity: thefraction p(N) of 20,000 sampled solutions that lead toa non-trivial cover, scaled up by the expected number

of solutions for N -variable formulas at this ratio, whichis (2 × (7/8)4.2)N ≈ 1.1414N .3 The resulting number,p(N) × 1.1414N , is plotted on the y-axis of the leftpane, with N on the x-axis. The right pane of Figure 3shows the data used to estimate p(N) along with itsfit on the y-axis, with N on the x-axis again.

200 500 1000 2000

1e+1

21e

+56

1e+1

44

Number of Vars. (log scale)

E[#s

ols w

/ non

tr. c

over

](lo

g sc

ale)

1000 2000 3000 4000

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Number of Variables

P[no

n tri

vial

cov

er]

Figure 3: Left: Expected number of solutions leadingto non-trivial covers (log-log scale). Right: Probabilityof a solution leading to a non-trivial cover.

Notice that the left pane is in log-scale for both axes,and clearly increases faster than a linear function.Thus, the estimated number of solutions that lead tonon-trivial covers grows super-polynomially. In fact,performing a best fit for this curve suggests that thisnumber grows exponentially, roughly as 1.1407N . Thisnumber is indeed a vanishingly small fraction of the ex-pected number of solutions (1.1414N ) as observed byManeva et al. (2005), but nonetheless exponentiallyincreasing. The existence of covers for random 3-SATalso aligns with what Achlioptas and Ricci-Tersenghi(2006) recently proved for k-SAT with k ≥ 9.

4.2 SP, BP, AND MARGINALS

We now study the behavior of SP and BP on a ran-dom formula in relation to solutions and covers ofthat formula. While theoretical work has shown thatSP, viewed as BP on a related combinatorial problem,provably computes cover marginals on tree-structuredformulas, we demonstrate that even on random 3-SATinstances, which are far from tree-like, SP approxi-mates cover marginals surprisingly well. We also showthat cover marginals, especially in the extreme range,are closely related to solution marginals in an intrigu-ing “conservative” fashion. The combination of thesetwo effects, we believe, plays a crucial role in the suc-

cess of SP. Our experiments also reveal that BP per-forms poorly at computing any marginals of interest.

Given marginal probabilities, we define the magneti-zation of a variable to be the difference between the

3 The version of the paper published in UAI-07 incor-rectly stated, as pointed out by Lenka Zdeborova, that thenumber of solutions of such formulas is known to be highlyconcentrated around its expectation.

marginals of the variable being positive and it beingnegative. For the rest of our experiments, we startwith a random 3-SAT formula F with 5000 variablesand 21000 clauses (clause-to-variable ratio of 4.2), andplot the magnetization of the variables of F in therange [−1,+1].4 The marginals for magnetization areobtained from four different sources, which are com-pared and contrasted against each other: (1) by run-ning SP on F till the iterations converge; (2) by run-ning BP on F but terminating it after 10,000 itera-tions because the equations do not converge; (3) bysampling solutions of F using SampleSat and comput-ing an estimate of the positive and negative marginalsfrom the sampled solutions (the solution marginals);and (4) by sampling solutions of F using SampleSat,∗-propagating them to covers, and computing an esti-mate of the positive and negative marginals from thesecovers (the cover marginals). Note that in (4), we aresampling true covers and obtaining an estimate. Analternative approach is to use SP itself on F to try tosample covers of F , but the issue here is that the prob-lem of finding (non-trivial) covers is not self-reducibleto the decision problem of whether covers exist, asshown in Section 2. Therefore, it is not clear whetherSP can be used to actually find a cover, despite it ap-proximating the cover marginals very well.

Recall that the SP-based decimation process works byidentifying variables with extreme magnetization, fix-ing them, and iterating. We will therefore be inter-ested mostly in what happens in the extreme magne-tization regions in these plots, namely, the lower leftcorner (−1,−1) and the upper right corner (+1,+1).

In the left pane of Figure 4 we plot the magnetizationcomputed by SP on the x-axis and the magnetizationobtained from cover marginals on the y-axis. The scat-ter plot has exactly 5000 data points, with one pointfor each variable of the formula F . If the magneti-zations on the two axes matched perfectly, all pointswould fall on a single diagonal line from the bottom-left corner to the top-right corner. The plot shows thatSP is highly accurate at computing cover marginals, es-

pecially in the extreme regions at the bottom-left andtop-right.

The middle pane of Figure 4 compares the magneti-zation based on cover marginals with the magnetiza-tion based on solutions marginals. This will providean intuition for why it might be better to follow covermarginals rather than solution marginals when lookingfor a satisfying assignment.5 We see an interesting “s-

4 For clarity, the plots show magnetizations for one suchformula, although the trend is generic.

5 Of course, if solution marginals could be computedperfectly, this would not be an issue. In practice, how-ever, the best we can hope is to approximately estimate

−1.0 −0.5 0.0 0.5 1.0

−1.0

−0.5

0.0

0.5

1.0

SP Magnetization

Cove

r Mag

netiz

atio

n

−1.0 −0.5 0.0 0.5 1.0

−1.0

−0.5

0.0

0.5

1.0

Cover Magnetization

Solu

tion

Mag

netiz

atio

n

−1.0 −0.5 0.0 0.5 1.0

−1.0

−0.5

0.0

0.5

1.0

BP Magnetization

Solu

tion

Mag

netiz

atio

n

Figure 4: Magnetization plots. Left: SP vs. covers. Middle: covers vs. solutions. Right: BP vs. solutions.

shape” in this plot, which can be interpreted as follows:fixing variables with extreme cover magnetizations ismore conservative compared to fixing variables withextreme solution magnetizations. Which means thatvariables that are extreme w.r.t. cover-based magne-tization are also extreme w.r.t. solution-based magne-tization (but not necessarily vice-versa). Recall thatthe extreme region is exactly where decimation-basedalgorithms, that often fix a small set of extreme vari-ables per iteration, need to be correct. Thus, etimatesof cover marginals provide a safer heuristic for fixingvariables than estimates of solution marginals.

As a comparison with BP, the right pane of Figure 4shows BP magnetization vs. magnetization based onsolution marginals for the same 5000 variable, 21000clause formula. Since BP almost never converges onsuch formulas, we terminated BP after 10,000 itera-tions (SP took roughly 50 iterations to converge) andused the partially converged marginals obtained so farfor computing magnetization. The plot shows that BPprovides very poor estimates for the magnetizationsbased on solution marginals. (The points are equallyscattered when BP magnetization is plotted againstcover magnetization.) In fact, BP appears to identifyas extreme many variables that have the opposite so-lution magnetization. Thus, when magnetization ob-tained from BP is used as a heuristic for identifyingvariables to fix, mistakes are often made that eventu-ally lead to a contradiction, i.e. unsatisfiable reducedformula.

5 DISCUSSION

A comparison between left and right panes of Figure 4suggests that approximating statistics over covers (asdone by SP) is much more accurate than approximat-ing statistics over solutions (as done by BP). Thisappears to be because covers are much more coarse

marginals.

grained than solutions; indeed, even an exponentiallylarge cluster of solutions will have only a single coveras its representative. This cover still captures criticalproperties of the cluster necessary for finding solutions,such as backbone variables, which is what SP appearsto exploit.

We also saw that the extreme magnetization based oncover marginals is more conservative than that basedon solution marginals (as seen in the “s-shape” ofthe plot in the middle pane of Figure 4). This sug-gests that while SP, based on approximating covermarginals, may miss some variables with extreme mag-netization, when it does find a variable to have ex-treme magnetization, it is quite likely to be correct.This provides an intuitive explanation of why the dec-imation process based on extreme SP magnetizationsucceeds with high probability on random 3-SAT prob-lems without having to backtrack, while the decima-tion process based on BP magnetizations more oftenfails to find a satisfying assignment in practice.

We also note that BP and SP have been proven to com-pute exact marginals on solutions and covers, respec-tively, only for tree-structured formulas (with somesimple exceptional cases like formulas with a single cy-cle). For BP, solution marginals on tree formulas arealready non-trivial, and it is reasonable to expect it tocompute a fair approximation of marginals on loopynetworks (formulas). However, for SP, cover marginalson tree formulas are trivial: the only cover here is theall-∗ cover. Cover marginals become interesting onlywhen one goes to loopy formulas, such as random 3-SAT. In this case, as seen in our experiments, it is re-markable that the SP computes a good approximation

of non-trivial cover marginals for non-tree formulas.

We hope that our results have convincingly demon-strated that the study of the covers of formulas is veryfruitful and may well lead to a correct explanation ofthe success of SP.

References

D. Achlioptas and F. Ricci-Tersenghi. On the solution-space geometry of random constraint satisfaction prob-lems. In 38th STOC, pages 130–139, Seattle, WA, 2006.

A. Braunstein and R. Zecchina. Survey propagation as lo-cal equilibrium equations. J. Stat. Mech., P06007, 2004.URL http://lanl.arXiv.org/cond-mat/0312483.

A. Braunstein, M. Mezard, and R. Zecchina. Survey prop-agation: an algorithm for satisfiability. Random Struc-tures and Algorithms, 27:201, 2005.

F. R. Kschischang, B. J. Frey, and H. A. Loeliger. Fac-tor graphs and the sum-product algorithm. InformationTheory, IEEE Transactions on, 47(2):498–519, 2001.

E. N. Maneva, E. Mossel, and M. J. Wainwright. A newlook at survey propagation and its generalizations. In16th SODA, pages 1089–1098, Vancouver, Canada, 2005.

M. Mezard, G. Parisi, and R. Zecchina. Analytic and Al-gorithmic Solution of Random Satisfiability Problems.Science, 297(5582):812–815, 2002. doi: 10.1126/science.1073287.

E. Mossel. Personal communication, April 2007.

K. Murphy, Y. Weiss, and M. Jordan. Loopy belief prop-agation for approximate inference: An empirical study.In 15th UAI, pages 467–475, Sweden, July 1999.

R. E. Neapolitan. Learning Bayesian Networks. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 2004. ISBN0130125342.

J. Pearl. Probabilistic Reasoning in Intelligent Systems:Networks of Plausible Inference. Morgan Kauf., 1988.

R Development Core Team. R: A language and envi-ronment for statistical computing. R Foundation forStatistical Computing, Vienna, Austria, 2005. URLhttp://www.R-project.org. ISBN 3-900051-07-0.

W. Wei, J. Erenrich, and B. Selman. Towards efficientsampling: Exploiting random walk strategies. In 19thAAAI, pages 670–676, San Jose, CA, July 2004.

APPENDIX: DERIVATION OF THESP EQUATIONS

Section 3 shows how to formulate a constraint satisfac-tion problem P (F ) such that its solutions are exactlythe covers of a formula F . Here we proceed to showhow the belief propagation formalism applied to P (F )(as described in Section 3.1) results in the survey prop-agation equations.

Review of BP. We assume familiarity with thegeneral form of BP equations, as used for exampleby Neapolitan (2004) in Theorem 3.2. In short, BPuses messages to communicate information betweennodes of the factor graph (between variable nodesxa, . . . and function nodes Fx, Fa, . . .). Each messageis a function of one argument, which takes on thesame values as the variable node end-point of the mes-sage. There are two kinds of messages: from vari-able nodes to function nodes (denoted by πx→F (.)),and from function nodes to variable nodes (denotedby λF→x(.)). In a two-level Bayesian Network, π

messages are computed by (piecewise) multiplying to-gether the λ messages received on all other links.The λ messages are more complicated: they are sumsacross all possible worlds (values for arguments of re-ceived π messages) of products of all-but-one π mes-sages with the chosen arguments. In case of a deter-ministic system (which is our case: every world hasprobability of either 1 or 0), this is equivalent to sumof products of π messages with arguments that arecompatible with each other as judged by the corre-sponding function node, Fx or Fa. Moreover, if a vari-able node only has two neighboring function nodes,then it merely passes received λ messages from oneneighbor to the other. Since all variable nodes inP (F ) have degree two, we can safely ignore the ex-istence of π messages and only focus on λ messages.Thus, every Fx node receives messages from Fa nodes(which we will denote by λa→x) and and every Fanode receives messages from Fx nodes (denoted byλx→a), both of which are functions of one argument,(ra→x, wx→a) ∈ {(0, 0), (0, 1), (1, 0)}. The BP equa-tions are constructed by considering the set of com-patible variable node values given the one fixed valuein the argument.

Let C(x) be the set of all clauses containing variablex, and V (a) the set of all variables appearing in clausea. Further, let Csa(x) be the set of all clauses otherthan a where x occurs with the same sign as in a.Similarly define Cua (x) to be the set of clauses wherex occurs with the opposite sign as in a. Note thatCsa(x) ∪ C

ua (x) ∪ {a} = C(x).

Equations for Fx. The equations for messages sentfrom a factor node Fx are given in Figure 5. For theargument value of (1, 0) the set of compatible values,as judged by Fx when xa = (1, 0), is one where therecan be requests from clauses where x appears with thesame sign as in a (and no warnings sent to them), butthere must be no requests from opposite clauses (andwarning must be sent). Similarly for (0, 1), but herethe roles of Csa(x) and C

ua (x) are exchanged, plus the

fact that x sends a warning to a means that it must bereceiving a request from some opposite clause (which isaccounted for by the “−” term). Finally, for the value(0, 0), there are two possibilities: either at least onerequest is received from Csa(x) (the first term in thesum, analogous to the expression for (0, 1)), or thereare no requests at all (the second term in the sum).

Equations for Fa. Figure 6 shows equations formessages sent from a factor node Fa. The argumentvalue of (1, 0) is the easiest: a sends out a request ifand only if all other variables send it a warning, so thatthe only compatible values are all (0, 1). The case of(0, 0) is the complement: a cannot receive a warning

http://lanl.arXiv.org/cond-mat/0312483http://www.R-project.org

λx→a(1, 0) =Y

b∈Csa(x)

(λb→x(0, 0) + λb→x(1, 0))Y

b∈Cua

(x)

λb→x(0, 1)

λx→a(0, 1) =Y

b∈Csa(x)

λb→x(0, 1)

2

4

Y

b∈Cua

(x)

(λb→x(0, 0) + λb→x(1, 0)) −Y

b∈Cua

(x)

λb→x(0, 0)

3

5

λx→a(0, 0) =

2

4

Y

b∈Csa(x)

(λb→x(0, 0) + λb→x(1, 0)) −Y

b∈Csa(x)

λb→x(0, 0)

3

5

Y

b∈Cua

(x)

λb→x(0, 1) +Y

b∈C(x)\a

λb→x(0, 0)

Figure 5: BP equations for Fx

λa→x(1, 0) =Y

y∈V (a)\x

λy→a(0, 1)

λa→x(0, 0) =Y

y∈V (a)\x

(λy→a(0, 0) + λy→a(0, 1)) −Y

y∈V (a)\x

λy→a(0, 1)

λa→x(0, 1) =Y

y∈V (a)\x

(λy→a(0, 0) + λy→a(0, 1)) −Y

y∈V (a)\x

λy→a(0, 1)

+X

y∈V (a)\x

(λy→a(1, 0) − λy→a(0, 0))Y

y′∈V (a)\{x,y}

λy′→a(0, 1)

Figure 6: BP equations for Fa

from all other variables, and since x does not send awarning, a does not send a request anywhere. The lastcase, (0, 1), is a little more complicated. The first partis the same as before, but there needs to be a correc-tion term to account for two extra possibilities: firstit is now possible that a issues a request to some yif all other variables also send a warning (the positiveterm in the sum), and second it is not possible thatall-but-one variable send a a warning and yet a doesnot issue a request to the last one (the negative termin the sum).

Deriving the SP equations. The expressionfor λa→x(0, 1) can be simplified by assuming thatλy→a(1, 0) = λy→a(0, 0) for all y, in which case itreduces to the expression for λa→x(0, 0). This as-sumption is crucial, but not very restrictive. Noticethat it then follows that λx→a(1, 0) = λx→a(0, 0) (byinspecting the appropriate expressions in Figure 5),and therefore the assumption keeps holding when iter-atively solving the BP equations, provided it was trueat the beginning.

The last step in the derivation is to rename and nor-malize the terms appropriately so as to “recognize” theSP equations in Figures 5 and 6. Let us define

ηa→x4=

Y

y∈V (a)\x

λy→a(0, 1)

λy→a(0, 0) + λy→a(0, 1)

and

Πux→a4= λx→a(0, 1)

Π0x→a4=

Y

b∈C(x)\a

λb→x(0, 0)

Πsx→a4= λx→a(0, 0) − Π

0x→a

Notice that ηa→x is just a rescaled λa→x(1, 0), andthat the scaling factor λy→a(0, 0) + λy→a(0, 1) equalsΠuy→a + Π

sy→a + Π

0

y→a. Rescaling λa→x(0, 0) andλa→x(0, 1) in the same way (and using the assumptionthat they are equal) yields λa→x(0, 0) = λa→x(0, 1) =1 − ηa→x. Finally, writing down the BP equations forλx→a(0, 1) and λx→a(0, 0) in terms of these new vari-ables results in the familiar SP equations establishedin Braunstein et al. (2005):

ηa→x =Y

y∈V (a)\x

Πuy→aΠuy→a + Πsy→a + Π0y→a

Πux→a =Y

b∈Csa(x)

(1 − ηb→x)

2

41 −Y

b∈Cua

(x)

(1 − ηb→x)

3

5

Πsx→a =Y

b∈Cua

(x)

(1 − ηb→x)

2

41 −Y

b∈Csa(x)

(1 − ηb→x)

3

5

Π0x→a =Y

b∈C(x)\a

(1 − ηb→x)

In addition, the expressions for marginal probabilitiescomputed by BP from a fixed point of the above equa-tions can be shown, in a similar way, to be equivalentto the SP “bias” expressions.

INTRODUCTIONCOVERS OF CNF FORMULASPROBLEM REFORMULATION: FROM SOLUTIONS TO COVERSINFERENCE OVER COVERS

EXPERIMENTAL RESULTSEXISTENCE OF COVERSSP, BP, AND MARGINALS

DISCUSSION

survey propagation revisited - cornell universitykroc/pub/surveypropuai07b.pdfsurvey propagation...

Documents