hierarchic superposition with weak abstraction and the beagle theorem...

Peter BaumgartnerNICTA and ANU, Canberra

Hierarchic Superposition With Weak Abstraction and the Beagle Theorem Prover

Uwe WaldmannMPI für Informatik, Saarbrücken

Baumgartner/Waldmann Hierarchic superposition with weak abstraction

GoalAutomated deduction in hierarchic combinations of specifications

2

Lists over integers (l ≈ nil) ∨ (l ≈ cons(head(l), tail(l)) ¬(cons(k, l) ≈ nil) head(cons(k, l)) ≈ k tail(cons(k, l)) ≈ l

The inRange predicate, e.g. inRange([1,0,5], 6) nRange(l, n) ↔ (l ≈ nil ∨ (0 ≤ head(l) < n ∧ inRange(tail(l), n)))

Conjecture ∀ l:list n:int (¬(l ≈ nil) → (inRange(l, n) → inRange(cons(head(l), l), n)))

LIA + Lists/Arrays + Hypotheses ⊨ Conjecture ?

LIA + Lists/Arrays + Hypotheses ⊭ Conjecture ?


ContentsSemantics

Hierarchic specifications

Sufficient completeness

Hierarchic superposition

Weak abstraction

Two kinds of variables

Definitions

The Beagle theorem prover

Overview

Experimental results

3


Hierarchic SpecificationsBackground (BG) specification consists of

sorts, e.g. { int }

operators, e.g. { 0, 1, -1, 2, -2, ..., -, +, >, ≥}

models, e.g. linear integer arithmetic

Foreground (FG) specification extends BG specification by

new sorts, e.g. { list }

new operators, e.g.

{ cons: int × list ↦ list, nil: list, length: list ↦ int, a: list }

first-order clauses, e.g.

{ length(a) ≥ 1, length(cons(x, y)) ≈ length(y)+1 }

4


Hierarchic SpecificationsAssumption

We have a decision procedure for quantified formulas over the BG specification

Goal

Check whether the hierarchic combination has models or not, using the BG decision procedure as a subroutine

Question

What is a model of the hierarchic combination?

5


Hierarchic Specifications

6

Models of hierarchic specifications

Must satisfy the FG clauses, and

must leave the interpretation of the BG sorts and operators unchanged (conservative extension):

- distinct BG elements may not be identified (no confusion), and- no new elements may be added to BG sorts (no junk)

Fundamental problem 1

Absence of junk is not r.e.

⇒ Refutational completeness is only possible in certain cases

⇒ Require sufficient completeness


Sufficient CompletenessSufficient Completeness

In every model of the FG clauses, every ground FG term that has a BG sort must be equal to some BG term

Example

is not sufficiently complete:

take BG domain ℤ ∪ { NaN } and evaluate head(nil) to NaN

Adding head(nil) ≈ 0 and tail(nil) ≈ nil makes it sufficiently complete

7

(l ≈ nil) ∨ (l ≈ cons(head(l), tail(l))¬(cons(k, l) ≈ nil)head(cons(k, l)) ≈ ktail(cons(k, l)) ≈ l


Hierarchic SpecificationsFundamental Problem 2

We can pass only finite sets of formulas to the BG decision procedure

Second Requirement for Completeness

Compactness: If an infinite set of BG formulas is unsatisfiable, then it has a finite unsatisfiable subset

LIA with global symbolic constants α (parameters) is not compact:

take { α ≥ 0, α ≥ 1, α ≥ 2, ... }

LIA without parameters is compact

8


Calculi for Hierarchic ReasoningIf the FG clauses are ground

DPLL(T) + Nelson-Oppen

(Neither sufficient completeness nor compactness poses problems)

If the FG clauses are not ground

DPLL(T) + Nelson-Oppen + instantiation heuristics (CVC4, Z3,...)

Hierarchic superposition [Bachmair Ganzinger Waldmann 1994, Althaus Weidenbach Kruglov 2009, Weidenbach Kruglov 2012]

Model evolution with LIA constraints [B Tinelli 2008, 2011]

Sequent calculus [Rümmer 2008]

Theory instantiation [Korovin 2006]

LASCA [Korovin Voronkov 2007]

Hierarchic superposition with weak abstraction [B Waldmann 2013]

9


Hierarchic Superposition with Weak AbstractionCalculus Layout

10

Input clause set

Weak abstraction

Core calculusSuperpositionCloseWeak abstractionSimplificationSplittingDefine


(Weak) Abstraction

11

Unification cannot detect "semantic equality" of BG terms

P(1+2) ¬P(2+1)

?

- Abstraction extracts BG terms in terms of new variables- FG literals are subject to superposition inferences- BG clauses are passed to the BG solver, in Close inferences

ARI595=1.p


(Weak) Abstraction

P(X) ∨ X≉1+2

11


P(1+2) ¬P(2+1)

?


ARI595=1.p


(Weak) Abstraction

P(X) ∨ X≉1+2 ¬P(Y) ∨ Y≉2+1

11


P(1+2) ¬P(2+1)

?


ARI595=1.p


(Weak) Abstraction

P(X) ∨ X≉1+2 ¬P(Y) ∨ Y≉2+1

11


P(1+2) ¬P(2+1)

?


SupX≉1+2 ∨ X≉2+1

ARI595=1.p


(Weak) Abstraction

P(X) ∨ X≉1+2 ¬P(Y) ∨ Y≉2+1

11


P(1+2) ¬P(2+1)

?


SupX≉1+2 ∨ X≉2+1

Close□

ARI595=1.p


Weak AbstractionWeak Abstraction

Only non-variable BG terms that are direct subterms of non-BG terms are abstracted out

Concrete numbers (0, 1, -1, 2, -2, ...) are never abstracted out

Example (α and β are BG constants)

g(1, α, f(1)+(α+1), Z) ≈ β ↝ g(1, X, f(1)+ Y, Z) ≈ β ∨ X ≉ α ∨ Y ≉ (α+1)

Properties (in relation to [BGW 94])

Extracts viewer terms: less detrimental to unificationShorter clauses: preserves unit clauses more often (good for rewriting) Always preserves sufficient completeness (see below)Inference rules can destroy WA, hence need explicit WA of conclusion

12


Two Kinds of VariablesAbstraction Variables X, Y, Z

Stand for BG terms

↝ Never unify with non-variable FG terms

Pro: BG terms are always smaller than FG terms

E.g., f(X) ≈ g(Y) is ordered from left to right if f > g

Con1: Subsumption does not work as expected

E.g., P(X) does not subsume P(f(Y))

Con2: Unexpectedly don't get refutations

E.g. f(nil) + 1 ≉ Y + 1 May even destroy sufficient completeness during abstraction

Ordinary variables fix these problems

13


Two Kinds of VariablesOrdinary Variables x, y, z

Stand for arbitrary BG-sorted terms

↝ May also unify with non-variable FG terms

Con: viewer ordered equations

E.g., f(x) ≈ g(y) is not ordered from left to right even if f > g

Pro1: subsumption works as expected

E.g., P(x) subsumes P(f(y))

Pro2: mey get refutations even in absence of s.c.

E.g. f(nil) + 1 ≉ y + 1 Always preserves sufficient completeness during abstraction(use ordinary variables only if abstracted term contains ordinary variables)

14


Two Kinds of VariablesWhat is the kind of variables in the input problem?

Ordinary variables: { x ≉ f(1) } has a refutation

Abstraction variables: { X ≉ f(1) } does not have a refutation

A: there is a trade-off, see above, so let the user decide.In practice, most variables are abstraction variables, and ordinary variables are only used in additional lemmas:

Lemmas

Valid BG theory clauses, make BG knowledge available to FG reasoner

E.g. ¬(x<x), x+0 ≈ x, ¬(x<y) ∨ ¬(y<z) ∨ x<z

Used to simplify, e.g., f(1)<f(1), length(nil)+0

15


Definitions

16

In general, one cannot make an arbitrary hierarchic specification sufficiently complete by construction

We can, however, prevent that a ground BG-sorted FG term t is interpreted by a junk element:

- introduce a new parameter, i.e., a new BG constant αt

- add the definition t ≈ αt

This is a well-known preprocessing technique [KruglovWeidenbach 12]

However, in hierarchic superposition ground terms can show up in the middle of the saturation process

↝ use introduction of definitions as an inference rule

f(X)>5 ∨ X≉1+2

f(X) ≈ αf(1+2) ∨ X≉1+2


Main Theoretical Results [B Waldmann CADE 2013]

Completeness 1

HSPWA is refutationally complete for compact BG specifications and sufficiently complete input clause sets

Completeness 2

HSPWA is refutationally complete for input clause sets where every BG-sorted term is ground

17


The Beagle Prover• Full implementation of the calculus above

– Lemmas, ordinary/abstraction variables, definitions, splitting– Discount/otter loop, demodulation, subsumption– LPO/KBO, W/A ratio– Cautious and aggressive simplification

e.g. 1+(2+a) ≉ 1+x simplifies to 3+a ≉ 1+x• Front-end for TPTP TFA and SMT-Lib languages • Background reasoners

– LRA: Fourier/Motzkin, Simplex – LIA: Cooper QE, Branch and bound

• Written in Scala, easy to installhttp://users.cecs.anu.edu.au/~baumgart/systems/beagle/

• Team: PB, A Bauer, J Bax, T Cosgrove• Companion system: SMTtoTPTP

http://users.cecs.anu.edu.au/~baumgart/systems/smttotptp/18

http://users.cecs.anu.edu.au/~baumgart/systems/beagle/

http://users.cecs.anu.edu.au/~baumgart/systems/beagle/

http://users.cecs.anu.edu.au/~baumgart/systems/smttotptp/

http://users.cecs.anu.edu.au/~baumgart/systems/smttotptp/


User Experience

There is a number in [a,...,a+2] that is divisible by 3

19

$ beagle ARI595=1.p

This is beagle, version 0.7.1 (2/10/2013)

Input formulas==============¬((∀ Zᵃ:$int (((a ≤ Zᵃ) ∧ (Zᵃ ≤ (a + 2))) ⇒ p(Zᵃ))) ⇒ (∃ Xᵃ:$int p((3·Xᵃ))))

Clause set signature====================Background sorts: { " , ℤ, ℝ}Foreground sorts: {$i, $o, $tType}Background operators: $greatereq: ℤ × ℤ " $o : a: ℤForeground operators: $true: $o p: ℤ " $o $false: $o


User Experience

20

Precedence among foreground operators: p > $true > $false

Clause set==========p(Zᵃ) ∨ ¬(Zᵃ ≤ (a + 2)) ∨ ¬(a ≤ Zᵃ)¬p((3·Xᵃ))

Background sorts used in clause set: ℤUsed background theory solver: cooper-clauses

Proving...p(Zᵃ) ∨ ¬(Zᵃ ≤ (2 + a)) ∨ ¬(a ≤ Zᵃ)¬p((3·Xᵃ))¬((3·X_13ᵃ) ≤ (2 + a)) ∨ ¬(a ≤ (3·X_13ᵃ))¬(3|a)¬(3|(2 + a))¬(3|(1 + a))

SZS status Theorem for ARI595=1.p

Inference rules----------------------...


Beagle on TPTP

21

LIA-Theorems in TPTP 337

Full Abstraction [BGW94] 242 proved

Weak Abstraction 251 proved

WA + Definitions 297 proved

WA + Definitions + Aggressive Simplification 303 proved

Two more theorems can only be proved when BG axioms are added, but adding BG axioms is (obviously) a double-edged sword and not generally helpful


Proving Infinite Satisfiability [B Bax LPAR-19]

22

• Given the LIST axioms over integers• Suppose a set HYP defining functions/relations on lists

E.g. length, in, inRange, count, append• Suppose we know that LIST ∪ HYP is satisfiable (by construction)

• Then, to disprove a conjecture CON, i.e.

LIST ∪ HYP ⊭ CON

it suffices to prove

LIST ∪ HYP ⊨ ¬CON

• Same for ARRAY• Use this method in the following result tables, for all provers

– Directly establishing countersatisfiability does not work at all


Experiments with LIST

23

A detailed proof of Lemma 3.1 is in the appendix. It proceeds by constructing a canon-ical (minimal) model of the(-direction of Def

P

, which always also is a model of the)-direction. From a logic-programming angle, the user could as well give only the(-direction of Def

P

, and the system adds the completion ()-direction) for disprovingpurposes.

Example. Let inRange : Z ⇥ LIST be a predicate symbol. Consider the extension ofAxLIST with the following (admissible) definition for P (the free variables are universallyquantified with the obvious sorts).

inRange(n, l), l ⇡ nil _ 9 hZ tLIST . (l ⇡ cons(h, t) ^ 0 h ^ h < n ^ inRange(n, t))

This example comes from a case study with the first-order logic model checker from [1].The inRange predicate is used there to specify lists of “ordered items” handled in apurchase order process, which must all be in a range 0..N � 1, for some N � 0.

The following table lists some sample problems together with the runtimes (in sec-onds) needed to disprove them with the provers mentioned.1

Problem Beagle Spass+T Z3inRange(4, cons(1, cons(5, cons(2, nil)))) 6.2 0.3 0.2n > 4) inRange(n, cons(1, cons(5, cons(2, nil)))) 7.2 0.3 0.2inRange(n, tail(l))) inRange(n, l) 3.9 0.3 0.29 nZ lLIST . l 0 nil ^ inRange(n, l) ^ n � head(l) < 1 2.7 0.3 0.2inRange(n, l)) inRange(n � 1, l) 8.2 0.3 >60l 0 nil ^ inRange(n, l)) n � head(l) > 2 2.8 0.3 0.2n > 0 ^ inRange(n, l) ^ l

0 = cons(n � 2, l)) inRange(n, l0) 4.5 5.2 0.2

We remark that none of these problems is solvable by either prover by directly trying toestablish consistency of the axioms, definitions and the conjecture. Even if only the(-direction is used, Z3 and Spass+T do not terminate. Because the universally quantifiedvariables in the conjectures lead to Skolem constants, the resulting clause set is nolonger su�ciently complete (see [3]), and a finite saturation obtained by Beagle doesnot allow one to conclude satisfiability.

Functions. Let ⌃+ ◆ ⌃LIST be a signature, s 2 sorts(⌃) and f < ⌃+ a function symbolwith arity Z⇥LIST 7! s. Let Def

f

be a set of (implicitly) universally quantified formulasof the form below, where k and h are Z-sorted and t is LIST-sorted:

f (k, nil) ⇡ b[k]( B[k] (f0)f (k, cons(h, t)) ⇡ c1[k, h, t, f (k, t)]( C1[k, h, t, f (k, t)] (f1)

...

f (k, cons(h, t)) ⇡ c

n

[k, h, t, f (k, t)]( C

n

[k, h, t, f (k, t)] (fn

)1 Here and below, Beagle has been run with “cautious simplification on” and “ordinary vari-

ables on”; Z3, version 4.3.1 with the options ”pull-nested-quantifiers”, “mbqi” and “macro-finder” on; SPASS+T used Yices as a theory solver. All timings obtained on reasonablerecent computer hardware. The input problems are available on the Beagle website http://users.cecs.anu.edu.au/

˜

baumgart/systems/beagle/.

5

Problems 5 and 7 require "ordinary variables" and "cautious simplification" (the most complete parameter setting)


Experiments with LIST

24

where B is a ⌃+-formula of arity Z, each C

i

is a ⌃+-formula of arity Z⇥Z⇥ LIST⇥ s, b

is a ⌃+-term of arity Z 7! s, and each c

i

is a ⌃+-term with arity Z ⇥ Z ⇥ LIST ⇥ s 7! s.

Lemma 3.2. Let D be a ⌃+-domain with DLIST = LIST. If for all 1 i < j n the

formula

8 kZ hZ tLIST x

s

.Ci

[k, h, t, x] ^C

j

[k, h, t, x]) c

i

[k, h, t, x] ⇡ c

j

[k, h, t, x]

is valid in all ⌃+-interpretations with domain D then Deff

is an admissible definition

of f wrt. ⌃+ and D.

The condition in the lemma statement is needed to make sure that all cases (fi

) and (fj

)for i , j are consistent. For example, for f(cons(h, t)) ⇡ 1 ( h ⇡ 1 and f(cons(h, t)) ⇡a ( h ⇡ 1 + a this is not the case. Indeed, 8 hZ . h ⇡ 1 ^ h ⇡ 1 + a ) 1 ⇡ a is notvalid. Notice that establishing the condition is a theorem proving task, which fits wellwith our method. In the examples below it is trivial.

Example. Let length : LIST 7! Z, count : Z ⇥ LIST 7! Z, append : LIST ⇥ LIST 7!LIST and in : Z⇥LIST be operators. Consider the extension of AxLIST with the following(admissible) definitions, in the given order.

length(nil) ⇡ 0 append(nil, l) ⇡ l

length(cons(h, t) ⇡ 1 + length(t) append(cons(h, t), l) ⇡ cons(h, append(t, l))count(k, nil) ⇡ 0

count(k, cons(h, t)) ⇡ count(k, t)( k 0 h in(k, l), count(k, l) > 0count(k, cons(h, t)) ⇡ count(k, t) + 1( k ⇡ h

Here are some sample conjectures together with the times for disproving them.2

Problem Beagle Spass+T Z3length(l1) ⇡ length(l2)) l1 ⇡ l2 4.3 9.0 0.2n � 3 ^ length(l) � 4) inRange(n, l) 5.4 1.1 0.2count(n, l) ⇡ count(n, cons(1, l)) 2.5 0.3 >60count(n, l) � length(l) 2.7 0.3 >60l1 0 l2 ) count(n, l1) 0 count(n, l2) 2.4 0.8 >60length(append(l1, l2)) ⇡ length(l1) 2.1 0.3 0.2length(l1) > 1 ^ length(l2) > 1) length(append(k, l)) > 4 37 >60 >60in(n1, l1) ^ ¬in(n2, l2) ^ l3 ⇡ append(l1, cons(n2, l2)))

count(n, l3) ⇡ count(n, l1)>60 (6.2) 9.1 >60

4 Arrays

The signature ⌃ARRAY consist of sorts ARRAY and Z and the operators read : ARRAY⇥Z 7! Z, write : ARRAY ⇥ Z ⇥ Z 7! ARRAY, and init : Z 7! ARRAY. The array axioms

2 The time of 6.2 seconds for the last problem is with “ordinary variables o↵”.

6

The last problem is provable only with "abstraction variables"


Experiments with ARRAY

25

Examples. Let the operators inRange : ARRAY ⇥ Z ⇥ Z, max, distinct be defined asfollows (sorted and rev are as defined previously):

inRange(a, r, n), distinct(a, n),8 i . (n � i ^ i � 0) 8 i, j . (n > i ^ n > j ^ j � 0 ^ i � 0)) (r � read(a, i) ^ read(a, i) � 0) ) read(a, i) ⇡ read(a, j)) i ⇡ j)

max(a, n) ⇡ w( 8 i . (n > i ^ i � 0)) w � read(a, i)) ^ (9 i . n > i ^ i � 0 ^ read(a, i) ⇡ w)

Here are some sample conjectures together with the times for disproving them. 3

Note that u indicates termination with a status “unknown”.

Problem Beagle Spass+T Z3n � 0) inRange(a,max(a, n), n) 1.40 0.16 udistinct(init(n), i) 0.98 0.15 uread(rev(a, n + 1), 0) = read(a, n)) >60 >60(0.27) >60distinct(a, n)) distinct(rev(a, n)) >60 0.11 0.369 nZ .¬sorted(rev(init(n),m),m) >60 0.16 usorted(a, n) ^ n > 0) distinct(a, n) 2.40 0.17 0.01

In addition, SPASS+T, Beagle and Z3 were used to prove the functionality conditionin Lemma 4.2 for the max and rev operators. All provers verified the condition for maxbut only SPASS+T and Z3 verified that for rev.

5 Conclusions

The aim of this work is to provide a reasonably expressive language (in practical terms)that allows one to specify properties of data structures under consideration, like lists andarrays, and that supports disproving by existing theorem provers. The main idea is tocapitalize on the strengths of these systems for theorem proving for solving disprovingproblems, instead of relying on their model-building capabilities. To this end we gavesome examples and tested them with the theorem provers SPASS+T, Beagle and Z3.It turns out that the theorem provers work rather well, in the sense that all problemswe tried could be solved, and in short time. In general, the first-order solvers Beagleand SPASS+T worked most reliably, possibly thanks to handling quantified formulasnatively instead of relying solely on instantiation heuristics.

Our examples are inspired by case studies with the first-order model-checker de-scribed in [1]. Disprovable conjectures come up there not only by “faulty” conjectures,but also when trying to prove that two state-changing operators commute (for partial-order reduction). Clearly, more experiments are needed, also from di↵erent contexts.

3 SPASS+T used Yices as a theory solver. The time of 0.27s in the third problem is obtained byexcluding the inRange definition.

8

The first and the last problem is provable only with "ordinary variables"

Baumgartner/Waldmann Hierarchic superposition with weak abstraction 26


Colors

27

Foreground green

Foreground redCode Regular text

hierarchic superposition with weak abstraction and the beagle theorem...

Documents