goal-directed weakening of abstract interpretation...

39
39 Goal-Directed Weakening of Abstract Interpretation Results SUNAE SEO Korea Advanced Institute of Science and Technology HONGSEOK YANG and KWANGKEUN YI Seoul National University and TAISOOK HAN Korea Advanced Institute of Science and Technology One proposal for automatic construction of proofs about programs is to combine Hoare logic and abstract interpretation. Constructing proofs is in Hoare logic. Discovering programs’ invariants is done by abstract interpreters. One problem of this approach is that abstract interpreters often compute invariants that are not needed for the proof goal. The reason is that the abstract interpreter does not know what the proof goal is, so it simply tries to find as strong invariants as possible. These unnecessary invariants increase the size of the constructed proofs. Unless the proof-construction phase is notified which invariants are not needed, it blindly proves all the computed invariants. In this article, we present a framework for designing algorithms, called abstract-value slicers, that slice out unnecessary invariants from the results of forward abstract interpretation. The framework provides a generic abstract-value slicer that can be instantiated into a slicer for a particular abstract interpretation. Such an instantiated abstract-value slicer works as a post- processor to an abstract interpretation in the whole proof-construction process, and notifies to the S. Seo and T. Han were supported by Korea Ministry of Information and Communication under the Information Technology Research Center support program, supervised by the Institute of Infor- mation Technology Assessment (IITA-2005-C1090-0502-0031). H. Yang was supported by EPSRC and the Basic Research Program of the Korea-Science & Engineering Foundation (grant No. R08- 2003-000-10370-0). K. Yi was supported by Brain Korea 21 Project of Korea Ministry of Education and Human Resources, by IT Leading R&D Support Project of Korea Ministry of Information and Communication, by Korea Research Foundation grant KRF-2003-041-D00528, and by National Security Research Institute of Korea. Authors’ addresses: S. Seo and T. Han, Department of Computer Science, Korea Advanced Insti- tute of Science and Technology 373-1 Guseong-dong Yuseong-gu Daejeon 305-701, Korea; email: {saseo,han}@pllab.kaist.ac.kr; H. Yang, Department of Computer Science, Queen Mary, University of London, Mile End Road London E1 4NS UK; email: [email protected]; K. Yi, School of Computer Science and Engineering, Seoul National University, San 56-1 Shilim-dong Gwanak-gu Seoul 151-744 Korea; email: [email protected]. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected]. C 2007 ACM 0164-0925/2007/10-ART39 $5.00 DOI 10.1145/1286821.1286830 http://doi.acm.org/ 10.1145/1286821.1286830 ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Upload: lythien

Post on 28-Apr-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

39

Goal-Directed Weakening of AbstractInterpretation Results

SUNAE SEO

Korea Advanced Institute of Science and Technology

HONGSEOK YANG and KWANGKEUN YI

Seoul National University

and

TAISOOK HAN

Korea Advanced Institute of Science and Technology

One proposal for automatic construction of proofs about programs is to combine Hoare logic andabstract interpretation. Constructing proofs is in Hoare logic. Discovering programs’ invariants isdone by abstract interpreters.

One problem of this approach is that abstract interpreters often compute invariants that are notneeded for the proof goal. The reason is that the abstract interpreter does not know what the proofgoal is, so it simply tries to find as strong invariants as possible. These unnecessary invariantsincrease the size of the constructed proofs. Unless the proof-construction phase is notified whichinvariants are not needed, it blindly proves all the computed invariants.

In this article, we present a framework for designing algorithms, called abstract-value slicers,that slice out unnecessary invariants from the results of forward abstract interpretation. Theframework provides a generic abstract-value slicer that can be instantiated into a slicer for aparticular abstract interpretation. Such an instantiated abstract-value slicer works as a post-processor to an abstract interpretation in the whole proof-construction process, and notifies to the

S. Seo and T. Han were supported by Korea Ministry of Information and Communication under theInformation Technology Research Center support program, supervised by the Institute of Infor-mation Technology Assessment (IITA-2005-C1090-0502-0031). H. Yang was supported by EPSRCand the Basic Research Program of the Korea-Science & Engineering Foundation (grant No. R08-2003-000-10370-0). K. Yi was supported by Brain Korea 21 Project of Korea Ministry of Educationand Human Resources, by IT Leading R&D Support Project of Korea Ministry of Information andCommunication, by Korea Research Foundation grant KRF-2003-041-D00528, and by NationalSecurity Research Institute of Korea.Authors’ addresses: S. Seo and T. Han, Department of Computer Science, Korea Advanced Insti-tute of Science and Technology 373-1 Guseong-dong Yuseong-gu Daejeon 305-701, Korea; email:{saseo,han}@pllab.kaist.ac.kr; H. Yang, Department of Computer Science, Queen Mary, Universityof London, Mile End Road London E1 4NS UK; email: [email protected]; K. Yi, School ofComputer Science and Engineering, Seoul National University, San 56-1 Shilim-dong Gwanak-guSeoul 151-744 Korea; email: [email protected] to make digital or hard copies of part or all of this work for personal or classroom use isgranted without fee provided that copies are not made or distributed for profit or direct commercialadvantage and that copies show this notice on the first page or initial screen of a display alongwith the full citation. Copyrights for components of this work owned by others than ACM must behonored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers,to redistribute to lists, or to use any component of this work in other works requires prior specificpermission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 PennPlaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected]© 2007 ACM 0164-0925/2007/10-ART39 $5.00 DOI 10.1145/1286821.1286830 http://doi.acm.org/10.1145/1286821.1286830

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 2: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

39:2 • S. Seo et al.

next proof-construction phase which invariants it does not have to prove. Using the framework, wedesigned an abstract-value slicer for an existing relational analysis and applied it on programs.In this experiment, the slicer identified 62%–81% of the computed invariants as unnecessary, andresulted in 52%–84% reduction in the size of constructed proofs.

Categories and Subject Descriptors: F.3.1 [Logics and Meanings of Programs]: Specifying andVerifying and Reasoning about Programs—Mechanical verification; D.2.4 [Software Engineer-ing]: Software/Program Verification—Correctness proofs, Formal methods; D.3.1 [ProgrammingLanguages]: Formal Definitions and Theory—Semantics

General Terms: Algorithms, Design, Languages, Verification

Additional Key Words and Phrases: Abstract interpretation, backward analysis, Hoare logic, pro-gram verification, static analysis

ACM Reference Format:Seo, S., Yang, H., Yi, K., and Han, T. 2007. Goal-directed weakening of abstract interpretationresults. ACM Trans. Program. Lang. Syst. 29, 6, Article 39 (October 2007), 39 pages. DOI =10.1145/1286821.1286830 http://doi.acm.org/ 10.1145/1286821.1286830

1. INTRODUCTION

Though Proof-Carrying Code (PCC) technologies [Necula and Schneck 2002;Necula and Rahul 2001; Appel 2001; Hamid et al. 2002] have been a convincingapproach for certifying the safety of code, how to achieve the code’s safety proofis still a matter for investigation. The existing proof construction process iseither not fully automatic, assuming that the program invariants should beprovided by the programmer [Necula 1997; Necula and Lee 1997; Necula andRahul 2001], or limited to a class of properties that are automatically inferrableby the current type system technologies [Hamid et al. 2002; Appel and Felty2000; Morrisett et al. 1998].

One proposal [Seo et al. 2003] for automatic construction of proofs for awide class of program properties is to combine abstract interpretation [Cousotand Cousot 1977; Cousot 1998] and Hoare logic [Hoare 1969]. Constructingproofs is in Hoare logic. Discovering program’s invariants, which is the mainchallenge in automatically constructing Hoare proofs, is done by abstract in-terpreters [Cousot and Cousot 1977; Cousot 1998]. An abstract interpreterestimates program properties (i.e., approximate invariants) of interest, andthe proof-construction method constructs a Hoare proof for those approximateinvariants. The gap between the estimated invariants of an abstract inter-preter and the preconditions “computed by Hoare-logic proof rules” is filled bythe soundness of the abstract interpreter only, without involving any theoremprovers. For instance, when the abstract interpreter’s results (i.e., approximateinvariants at program points—boxed properties here) are p x := E q , thesoundness proofs of the abstract interpreter are used to produce a proof that pimplies the weakest precondition of x := E for q.

This proof-construction method for PCC has several appealing features. Oncedesigned, an abstract interpreter can be used repeatedly for all programs of thetarget language, as long as their properties to verify and check remain the same.Furthermore, the proof-checking side (code consumer’s side) is insensitive to a

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 3: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

Goal-Directed Weakening of Abstract Interpretation Results • 39:3

specific abstract interpreter. The code consumer does not have to know whichanalysis technique has been used to generate the proof. The assertion languagein Hoare logic is fixed to first-order logic for integers, into which we have totranslate abstract interpretation results. This translation procedure is definedby referencing the concretization formulas of the used abstract interpreter.Lastly, the proof-checking side is simple. Checking the Hoare proofs is simplyby pattern-matching the proof tree nodes against the corresponding Hoare logicrules. Checking if the proofs are about the accompanied code is straightforward,because the program texts are embedded in the Hoare proofs.

1.1 Problem

This work is motivated by one problem in the proof-construction method: ab-stract interpretation results are often unnecessarily informative for intendedHoare proofs. A (forward) abstract interpreter is usually designed to compute(approximate) program invariants that are as strong as possible, so that thecomputed invariants can verify a wide class of safety properties. Thus, whenthe abstract interpreter is used to verify one specific safety property, its resultsusually contain some (approximate) program invariants that are not necessaryto prove the safety property of interest, although those invariants might beneeded for some other safety properties. For instance, in our experiment withan existing relational analysis [Mine 2001], 62%–81% of the analysis resultswere not needed for the intended verification.1

The existence of such unnecessary invariants among the results of an ab-stract interpretation becomes a bottleneck for all the efforts to reduce the proofsize. When a Hoare proof of a safety property is constructed from the abstractinterpretation results, it consists of two kinds of subproofs: the ones that theabstract interpretation results are indeed (approximate) invariants, and theothers that those approximate invariants imply the safety property. The unnec-essarily informative analysis results mainly cause the first kind of subproofs to“explode”; they increase the number of such subproofs, by adding useless proofsubgoals.

Without addressing this problem, the proof-construction method often pro-duces unnecessarily big Hoare proofs, and hence is likely to be impracticalfor PCC. Big proofs accompanying mobile code degrade the code mobility ina network that usually has a limited bandwidth, or are impractical for codeconsumers that usually are small embedded systems with a limited memory.Note that no techniques for representing subproofs compactly by some cleverencoding can solve the problem, because they assume that all subproofs are nec-essary; it is not the purpose of such techniques to identify the useless subproofs.

Example 1.1. As an example where abstract interpretation results arestronger than necessary, consider the following assignment sequence with theparity abstract interpretation, which estimates whether each program variablecontains an even integer or an odd integer:

x := 4x; x := 2x.

1We explain this further in Section 5.

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 4: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

39:4 • S. Seo et al.

The estimated invariants from the abstract interpretation for variable x are:

� x := 4x; even x := 2x even .

Suppose we are interested in constructing a proof that variable x at the endis an even integer. Then the invariant “even” after the first assignment, whichmeans x is an even integer, is stronger than needed; just � is enough. This isbecause for the second assignment, Hoare triple {true}x := 2x{∃n.x = 2n} can bederived

true ⇒ ∃n. 2x = 2n {∃n. 2x = 2n}x := 2x{∃n. x = 2n}{true}x := 2x{∃n.x = 2n} .

and this triple is enough to construct the intended proof:

{true}x := 4x{true} {true}x := 2x{∃n. x = 2n}{true}x := 4x; x := 2x{∃n. x = 2n} .

That is, the following invariants, weaker than the original results, are justenough for our proof goal:

� x := 4x; � x := 2x even .

Example 1.2. Similarly, as another example where useless invariants occurin the results of an abstract interpretation, consider the following program,again with the parity abstract interpretation.

x := 1; y:=2x.

The estimated invariants from the abstract interpretation for each variable ateach program point are as follows:

x �→�, y �→� x := 1; x �→odd, y �→� y:=2x x �→odd, y �→even .

Suppose we are interested in constructing a proof that variable y at the end isan even integer. Then, all invariants about x are useless. Just � for x is enoughto reach the proof goal:

{true}x := 1{true} true ⇒ (∃n. 2x = 2n) {∃n. 2x = 2n}y := 2x{∃n. y = 2n}{true}y := 2x{∃n. y = 2n}

{true}x := 1; y := 2x{∃n. y = 2n} .

Thus, the original analysis results can be weakened to the following, while stillproving that y is even at the end:

x �→�, y �→� x := 1; x �→�, y �→� y:=2x; x �→�, y �→even .

This example illustrates that the conventional program slicing techniquedoes not immediately provide a satisfactory solution for our problem. One naiveidea to eliminate the unnecessary information from the abstract interpretationresult is to apply first the program slicing and then an abstract interpretation.However, for this example, this approach cannot identify any useless infor-mation from the abstract interpretation result. When the program slicing isapplied to the example program with the slicing criterion “the value of variabley after y := 2x,” it cannot slice out any parts of the program, because both x := 1

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 5: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

Goal-Directed Weakening of Abstract Interpretation Results • 39:5

Fig. 1. Annotated insertion sort before and after slicing.

and y := 2x affect the slicing criterion. As a result, the following parity abstractinterpretation is given the unmodified original program, thus getting no helpfrom the program slicing. Another idea might be to use dependency analysis inprogram slicing; to compute the dependency relationship between variables atdifferent program points, and then to use this relationship to slice the abstractinterpretation result. When this idea is applied to our example, it finds outthat x �→odd after y := 2x is not necessary, but it fails to discover that x �→oddafter x := 1 is not needed for verification. Given the slicing criterion “the valueof variable y after y := 2x,” dependency analysis finds out that the value of vari-able y after y := 2x is dependent upon that of variable x after x := 1. Thus, thefollowing slicing step does not delete x �→odd after x := 1.

Example 1.3. To see the problem in a “real world,” we consider a slightlymore realistic program—the insertion sort. Figure 1(a) shows the insertion sort,which is annotated with results of an abstract interpreter named “zone analy-sis” [Mine 2001]. Zone analysis estimates the upper and lower bounds of expres-sions x and x − y , for all program variables x and y . The insertion sort programtakes an array A and the size n of the array as an input, and sorts the array.

Suppose that we ran the abstract interpreter in order to verify the absenceof array index errors in the program. The annotations in the program provethis safety property.2

However, note that the annotations also contain unnecessary information.For instance, i ≤ n in the annotation marked by * neither is helpful for showing

2Here we assume that in “B1 and B2,” B2 is evaluated only when B1 is true.

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 6: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

39:6 • S. Seo et al.

that the subsequent array access A[j+1] is within bounds, nor is used to implythe loop invariant (2 ≤ i) ∧ (0 ≤ j ≤ i + 2). Thus, it can be eliminated withoutbreaking the proof. In fact, half of the annotations in the program are notneeded. Figure 1 shows the program where all such useless invariants areeliminated.

1.2 Our Solution

In this article, we present an algorithm, called abstract-value slicer, that filtersout unnecessary invariants from the results of a forward abstract interpreter.The abstract-value slicer works as a post-processor to the abstract interpreter.Given an annotated program and a property of interest, the slicer approxi-mates all the annotations further, until all the information in each annotationcontributes to the verification of the property.

The main idea of the abstract-value slicer is to view an abstract interpreta-tion result at each program point as conjunction of formulas, and to find outwhich formulas in the conjunction are not necessary for verification. For in-stance, suppose that an abstract interpreter analyzed the assignment x := Efor an input abstract value that means p1 ∧ p2 ∧ p3, and that it producedan output abstract value that means p′

1 ∧ p′2. That is, the abstract interpreter

verified that if a pre state satisfies p1 ∧ p2 ∧ p3, then the post state after the as-signment satisfies p′

1 ∧ p′2. When the abstract-value slicer is given this analysis

result and it is notified that only p′1 is used for verification, the slicer computes

a subset P ⊆ {p1, p2, p3} such that (1)∧

P can be represented by some abstractvalue and (2) although

∧P is weaker than the original p1 ∧ p2 ∧ p3, it is still

strong enough to ensure that the assignment can achieve the goal p′1: if the pre

state satisfies∧

P , then the post state of the assignment satisfies p′1. Then,

the slicer filters out the formulas in {p1, p2, p3} − P that are not necessary forverification: the slicer replaces the original input abstract value “{p1, p2, p3}”by the abstract state that means

∧P .

A reader might feel that a better alternative approach for solving the problemof unnecessary invariants is to use “on-line,” goal-oriented backward abstractinterpreters that compute the underapproximation of the weakest precondi-tion, that is, a set of pre states from which a program always achieves the givengoal. Note that our approach is, on the other hand, in the reverse direction and“off-line” yet achieving the same effect. Given the results of forward abstractinterpreters, which are already underapproximations of the weakest precon-ditions, we weaken the underapproximate results and make them closer frombelow to the weakest preconditions. Please be reminded that our problem isto underapproximate the weakest preconditions under which a program mustalways satisfy the given goal property.

Although designing such an abstract interpreter (an underapproximatebackward precondition analyzer) is plausible, we pursue the idea of designingan abstract-value slicer over an existing forward abstract interpreter. That is,such an abstract-value slicer is not a new analysis for estimating goal-directedinvariants but an “off-line” method that reuses the results of existing abstractinterpreters to achieve goal-directed invariants.

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 7: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

Goal-Directed Weakening of Abstract Interpretation Results • 39:7

This separation (or, modularization) of the slicing from the analysis is mean-ingful for the reuse of the analysis. First, the analysis results can be reused.Computed analysis results for a program can be reused to achieve differentslices for different slicing criteria (proof goals). The analysis itself can be reusedtoo. For example, once an abstract interpreter is designed originally for detect-ing buffer-overrun errors by estimating the ranges of buffer-accessing indexes,it can be reused now to provide our slicer with invariant candidates for beingused in buffer-overrun nonexistence proofs. We consider forward abstract inter-preters because they are most common in design and practice, having a numberof realistic instances. Our method can be seen as a method for achieving goal-directed analysis results from the results of a usual, goal-independent, forwardabstract interpretation.

The contributions of the article are:

—We present a general framework for designing correct abstract-value slicers.The framework defines the generic abstract-value slicer, which we can instan-tiate into a specific slicer for a particular abstract interpreter, by providingparameters. The framework also specifies the soundness conditions for thoseparameters of the generic slicer; if the parameters satisfy these conditions,the resulting slicer filters out only the unnecessary parts from abstract in-terpretation results.

—We present methods for constructing parameters of the generic abstract-value slicer, and show when all the constructed parameters by these methodssatisfy the soundness conditions.

—Using our framework, we build a specific abstract-value slicer for zone anal-ysis [Mine 2001], and demonstrate its effectiveness in the context of proofconstruction. In our experiment, the slicer identified that 62%-81% of theabstract interpretation results are not necessary for the verification, andresulted in 52%-84% reduction in the size of constructed program proofs.

1.3 Related Work

Our abstract-value slicer is closely related to program slicing [Tip 1995; Rival2005a] and cone of influences [Clarke et al. 1999] in model checking. All thesetechniques, including ours, identify the irrelevant parts for achieving a givengoal, and slice them out. The objects that get sliced are, however, different: theabstract-value slicer works only on the abstract interpretation results, whileprogram slicing and cone of influence modify a program or a Kripke structurethat models the behavior of a program.

Another important difference lies in the computation of the irrelevant partsfor the goal. In order to detect the irrelevant parts, both program slicing tech-niques and cone of influences compute (an overapproximation of) the depen-dency between program variables at different program points. Intuitively, thecomputed dependency of y at l1 on x at l0 means that some concrete com-putation uses the value of x at l0 to compute the value of y at l1, so thatthe different values of x at l0 will make y have different values at l1. Re-cently, Rival [2005a] generalized this dependency in his abstract program

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 8: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

39:8 • S. Seo et al.

slicing, so that the dependency is now between facts about one variable, suchas x > 3 and y < 9, but it is still about the concrete computations. Theabstract-value slicer, on the other hand, computes the dependency in the ab-stract computation between general facts which might involve multiple vari-ables such as x ≤ z + 3 at l0 and z ≤ y ≤ z + 9 at l1. For instance, sup-pose that an abstract interpretation result of the assignment x := y − z is( y ≤ z + 1 ∧ v ≤ y) ∧ ( y ≤ v ∧ v ≤ z + 1) x := y − z x ≤ 1 ∧ v ≤ z + 1 , and

the abstract-value slicer is asked to check whether x ≤ 1 in the post abstractvalue depends on the first conjunct y ≤ z + 1 ∧ v ≤ y of the pre abstractvalue. If by the same abstract interpretation the other conjunct can result inthe same conclusion x ≤ 1, that is, y ≤ v ∧ v ≤ z + 1 x := y − z x ≤ 1 , thenthe abstract-value slicer reports that x ≤ 1 does not depend on the first conjuncty ≤ z + 1 ∧ v ≤ y . Otherwise, that is, if the abstract interpreter approximatestoo much that its result from y ≤ v ∧ v ≤ z + 1 does not imply x ≤ 1, then theslicer decides that x ≤ 1 depends on y ≤ z + 1 ∧ v ≤ y . This is so, althoughthe Hoare triple { y ≤ v ∧ v ≤ z + 1} x := y − z {x ≤ 1} holds in the concretesemantics and thus there is no such dependency in the concrete semantics.

The dependency between general facts is also considered in the work on theabstract noninterference [Giacobazzi and Mastroeni 2004]. However, unlike ourabstract-value slicers, the dependency in the abstract noninterference is aboutthe concrete computations, not about the abstract computations. Moreover, theexisting work on the abstract noninterference does not consider the algorithmfor computing the dependency relation, while the main focus of our work is tofind such an algorithm.

Another related line of research is the work on abstraction refinementand predicate abstraction [Graf and Saidi 1997; Clarke et al. 2000; Ball andRajamani 2001; Ball et al. 2001; Henzinger et al. 2003, 2002]. The analyz-ers [Ball and Rajamani 2001; Henzinger et al. 2003] based on these two tech-niques usually find an abstract domain that is as abstract as possible but stillprecise enough for verifying a particular property. However, for the problemof identifying unnecessary invariants, abstraction refinement and predicateabstraction are not widely applicable, because many existing abstract inter-pretations are not based on those techniques. Our abstract-value slicers, onthe other hand, can be applied for such abstract interpretations. We note thatthe analyzers based on abstraction refinement and the abstract-value slicerswork in opposite “directions.” Such analyzers start with naive analysis resultsand keep strengthening the results until they are sufficient to prove a propertyof interest. On the other hand, the slicers start with precise analysis results,which already prove the property of interest, and weaken the results, whilemaintaining the proof.

From the view point that our off-line backward abstract-value slicing afteran overapproximate forward post-condition analyzer can be simulated by asingle underapproximate backward precondition analyzer, related works arebackward abstract interpreters that underapproximate the weakest precon-ditions. Please note that, however, backward abstract interpreters in Rival[2005b], Cousot [2005], Cousot and Cousot [1999], Masse [2001], Hughes and

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 9: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

Goal-Directed Weakening of Abstract Interpretation Results • 39:9

Launchbury [1992], Duesterwald et al. [1995] and Bourdoncle [1993], overap-proximate the weakest preconditions.3 Their backward abstract interpretersdiscover a superset of the pre states where a program might generate an er-ror. Thus such backward abstract interpreters are used in program debug-ging [Bourdoncle 1993] and alarm inspection [Rival 2005b]. On the other hand,abstract model checkers [Dams et al. 1997] can be seen as backward abstractinterpreters that underapproximate the weakest preconditions [Cousot 1981].

Projection analysis [Wadler and Hughes 1987; Hughes 1988; Davis andWadler 1990] in functional programs and mode analyses [King and Lu 2002;Howe et al. 2004] in logic programs both underapproximate the weakest precon-ditions sharing the same goal as our abstract-value slicer, but their techniquesare not directly applicable to the problem of this article. The projection analysisestimates a function that transforms the demand for the output to the one forthe input. The demand for the output specifies which part of the output will beneeded by the environment of the program (i.e., continuation), and the demandfor the input expresses which property of the input is sufficient for the programto produce the necessary part of the output. Mode analysis in logic programsis similar. It estimates context properties that, if satisfied by the initial query,guarantee that the program with the query never generate any moding error.However, the domains used in these backward analyses are not as general asthe ones used in our framework. Our abstract-value slicer works with com-plex domains that are infinite or relational, such as interval domain and zonedomain [Cousot and Cousot 1977; Mine 2001], can have nontrivial domain op-erations (e.g., closure operation in zone domain), or can require an accelerationmethod (e.g., widening) for quick convergence.

1.4 Organization

We start the article by reviewing the basics of the abstract interpretations inSection 2. In Section 3, we define a generic abstract-value slicer parameterizedby extractor domain and back-tracers for assignments and Boolean expressions.Intuitively, the extractor domain specifies the working space of the slicer, andthe back-tracers describe how the slicer treats each assignments and Booleanexpressions. In that section, we specify the soundness conditions for these twoparameters, and prove that the conditions ensure the correctness of the instan-tiated slicer. In the next section, we present methods for constructing parame-ters to the slicer, which satisfy the soundness requirements. There we describetwo techniques for constructing correct back-tracers. In Section 5, we explainthe experimental results about one specific abstract-value slicer in the contextof proof construction. Finally, we conclude the article in Section 6.

3This statement is true only for terminating deterministic programs, because those backwardabstract interpreters overapproximate so-called pre state-sets. The pre state-set of a command Cfor a postcondition p is the set of pre states from which C can output some state satisfying p.For terminating deterministic programs, the pre state-set of C and p is the same as the weakestprecondition of C and p.

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 10: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

39:10 • S. Seo et al.

Fig. 2. An example program and its abstract interpretation result from zone analysis.

2. ABSTRACT INTERPRETATION

We consider programs represented by control flow graphs [Cousot and Cousot1977]. Let ATerm be the set of atomic terms, that is, all the inequalities E ≤ E ′,assignments x := E, and command skip. A program (V , E, ni, n f , L) is a fi-nite graph with nodes in V and edges in E, together with two special nodesni and n f , and a labeling function L: E → ATerm. A node in V represents aprogram point, and an edge in E a flow of control between program points;with this flow, an inequality or assignment is associated, and the labelingfunction L expresses this association. The special nodes ni and n f , respec-tively, denote the entry and exit of the program. We assume that in the pro-gram, no edges go into the entry node ni, and no edges come out of the exitnode n f . Figure 2(a) shows a program that represents code with a singlewhile loop. In this program, we label each edge with an atomic term, exceptwhen the atomic term is skip. Another thing to note is that the conditionfor exiting the loop, ¬(x1 − x2 ≤ 0), is expressed by an equivalent conditionx1 − x2 ≥ 1; these two conditions are equivalent since variables range overintegers.

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 11: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

Goal-Directed Weakening of Abstract Interpretation Results • 39:11

In the article, we consider abstract interpretations that consist of threecomponents: a join semilattice A = (A, , ⊥, �), the abstract semantics4

[[−]]: ATerm → (A →m A) of atomic terms, and a strategy for computing postfixpoints. Given a program (V , E, ni, n f , L) and an initial abstract state a0 ∈ A,the abstract interpretation first uses A and [[−]] to define “abstract step func-tion” F :

F :∏n∈V

A →m

∏n∈V

A

F ( f )(n) def={

a0 if n = ni⊔{[[L(mn)]]( f (m)) | mn ∈ E} otherwise.

Here∏

n∈V A is the product join semilattice, which consists of tuples f of ele-ments in A and inherits the order structure from A pointwise.5 The first twocomponents A and [[−]] of the abstract interpretation are designed so as toensure that all the post fixpoints of this function F correctly approximate con-crete program invariants. The next step of the abstract interpretation is tocompute a post fixpoint of F (i.e., some f with F ( f ) f ) using the post-fixpoint-computation strategy. This post fixpoint becomes the result of the ab-stract interpretation.

Throughout the article, we will use two abstract interpretations to explainthe main ideas. The first one, evenness analysis, will be mainly used to helpthe reader to understand the ideas themselves. The second, zone analysis, willbe used to illustrate the significance and subtlety of the ideas.

Example 2.1 (Evenness Analysis). The goal of evenness analysis is to dis-cover (at each program point) the variables that always store even numbers.Let EV be a poset {⊥e, even, �e} ordered by

⊥e e even e �e.

Each element in EV means a set of integers: ⊥e denotes the empty set, eventhe set of all even integers, and �e the set of all integers. Note that the posetEV is a join semilattice; the least element is ⊥e and the join operation �e picksthe bigger element among its arguments. The abstract domain P = (P, ⊥, )of evenness analysis is given below:

P def= [Vars → EV] a a′ def⇔ ∀x ∈ Vars. a(x) e a′(x)⊥ def= λx. ⊥e a � a′ def= λx. a(x) �e a′(x).

Intuitively, each abstract value a in P specifies which variables should haveeven numbers. Formally, the meaning of a is given by the following concretiza-tion map γ from P to States = [Vars → Ints]:

γ (a) def={{σ | ∀x. (a(x) = even⇒σ (x) is even)} if (∀x. a(x) = even ∨ a(x) = �e){} otherwise.

4We use the subscript m to express the monotonicity of functions. Thus, for all posets (C, ) and(C′, ′), C →m C′ is the poset of monotone functions from C to C′.5For all f , g in

∏n∈V A, f g iff ∀n ∈ V . f (n) g (n). The least element ⊥ and join f � g in this

join semilattice are, respectively, defined by λn.⊥ and λn. f (n) � g (n).

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 12: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

39:12 • S. Seo et al.

For the abstract semantics of each atomic term, the analysis uses the followingdefinition:

[[x := 2E]]a def= a[x �→even][[x := y]]a def= a[x �→a( y)][[x := E]]a def= a[x �→�e] (for all the other assignments)

[[skip]]a def= a[[E ≤ E ′]]a def= a.

In addition, we consider a special atomic term even?(x) only for evenness anal-ysis, which tests whether variable x is even. Its abstract semantics is definedas follows:

[[even?(x)]]a def= if (even e a(x)) then a[x �→even] else a.

Note that in the semantics, the information “evenness” is created by x := 2E,propagated by x := y , and removed by all the other assignments. Thus, whenthe analysis is given (the control flow graph of) the code “x := 2x; y:=x; x := 1,”it returns the following annotation for the code:

x �→ �e

y �→ �ex := 2x;

x �→ even

y �→ �ey:=x;

x �→ even

y �→ evenx := 1

x �→ �e

y �→ even.

Example 2.2 (Zone Analysis). Zone analysis [Mine 2001] estimates the up-per and lower bounds on the difference x − y between two program variables,using so-called difference-bound matrices (in short, DBMs). In this article, wewill use a simplified version of zone analysis to illustrate our technique for therelational abstract interpretations. Although we use a simplified version, allthe definitions and algorithms in this example are essentially the ones by Mine[2001]. Let N be the number of the program variables in a given program, andlet x1, . . . , xN be an enumeration of all those variables. A DBM a for this pro-gram is an (N + 1) × (N + 1) matrix with integer values, −∞ or ∞. Intuitively,each aij entry denotes the upper bound of x j −xi (that is, x j −xi ≤ aij ). The rowand column of a DBM include an entry for an “auxiliary variable” x0 that neverappears in the program, and is assumed to have a fixed value 0. The main roleof x0 is to allow each DBM to express the range of all the other program vari-ables. For instance, a DBM a can store l in the i0-th entry (i.e., ai0 = l ) for eachi �= 0, to record that −l ≤ xi. A DBM a means the conjunction of x0 = 0 and allthe constraints x j − xi ≤ aij . Formally, the abstract domain is defined by thefollowing join semilattice M = (M , , ⊥, �) of the DBMs:

M def= {a | a is a DBM} a a′ def⇔ ∀i j . aij ≤ a′i j

⊥i jdef= −∞ [a � a′]i j

def= max(aij , a′i j ).

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 13: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

Goal-Directed Weakening of Abstract Interpretation Results • 39:13

Fig. 3. Abstract semantics of atomic terms in zone analysis.

The formal meaning of each DBM is given by a concretization map (i.e., meaningfunction) γ from M to the powerset of states:

Statesdef= [{x1, . . . , xN } → Ints]

γ (a) def= {σ ∈ States | ∀i j . σ [x0 �→0](x j ) − σ [x0 �→0](xi) ≤ aij },where σ [x0 �→0] means the extension of state σ with an additional component forx0: σ [x0 �→0](xi)

def= if (i = 0) then 0 else σ (xi). For instance, a0 in Figure 2(b)means the conjunction of five constraints for variables x1 and x2; these con-straints say that x1 and x2 are, respectively, in the intervals [1, 4] and [1, 3],and that x1 is at most as big as x2. Note that all the diagonal entries of a0 are∞, while those entries, meaning the upper bounds for xi − xi, could be tighterbound 0. In this article, we decide to use ∞, rather than 0, for diagonal entries ofDBMs, because both ∞ and 0 at the diagonal positions provide no informationabout concrete states and this is clarified by ∞.

The analysis classifies atomic terms into two groups, and defines the abstractsemantics of the terms in each group in a different style. The first group includesatomic terms whose execution can be precisely modelled by DBM transforma-tions. It consists of inequalities of the form xi ≤ c, xi ≥ c, xi − x j ≤ c, assign-ments of the form xi := xi + c, xi := x j + c, xi := c, and command skip. Foreach inequality E ≤ E ′ in the group, [[E ≤ E ′]] calculates the conjunction ofE ≤ E ′ and the constraints denoted by the input DBM; for each assignmentx := E in this group, [[x := E]] computes its strongest postcondition for theinput DBM; and [[skip]] is defined to be the identity function. The semantics ofthese atomic terms is shown in Figure 3. The abstract semantics of xi − x j ≤ cin Figure 3 implements the pruning of states, by updating the j i-th entry of theinput DBM a by min(aji, c). Note that the updated DBM means precisely theconjunction of xi −x j ≤ c and γ (a). Thus, among the states in (the concretizationof) the input DBM, [[xi − x j ≤ c]] filters out the states that violate the conditionxi −x j ≤ c, and returns a DBM for the remaining states. The abstract semantics[[xi := xi + c]] models the increment of xi by c. For every kl -th entry of a, if thecolumn index l is i, so the entry means the constraint involving xi, [[xi := xi +c]]increments the entry by c; and if the row index k is i, so the entry now meansthe constraint involving −xi, not xi, [[xi := xi + c]] decrements the entry by c.The case [[xi := x j + c]] in the figure is the most complex and interesting. Given

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 14: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

39:14 • S. Seo et al.

a DBM a, the semantic function [[xi := x j + c]] first transforms a, so that theDBM has the smallest element a∗ among the ones that mean the same state setas a: a∗ satisfies γ (a) = γ (a∗), and for all other such DBMs a′ (i.e., γ (a′) = γ (a)),a∗ a′. We call a∗ the closure of a. Zone analysis computes this closure usingthe Floyd-Warshall shortest path algorithm.6 Next, [[xi := x j + c]] eliminatesall the information in a∗ involving the old value of xi. Finally, it adds two facts,xi − x j ≤ c and x j − xi ≤ −c.

The atomic terms in the other group are interpreted “syntactically”: the se-mantics of an assignment xi := E in this group does not consider the expressionE, and transforms an input DBM a to the following xi-deleted DBM:

a∗([ki �→∞, ik �→∞]0≤k(�=i)≤N),

and the semantics of E ≤ E ′ in the group prunes nothing, and means theidentity function on the DBMs. A better alternative is to use interval analysis togive the semantics of atomic terms in the second group as shown by Mine [2001].In Section 5, we will discuss this better semantics and other improvements usedin the original zone analysis [Mine 2001].

Figure 2(c) shows a result of zone analysis in the form of DBMs and con-straints. The input to zone analysis is the program in Figure 2(a) and the DBMa0 in Figure 2(b). The result implies that when the program terminates, x2 isin the interval [1, 3] and it is equal to x1 − 1.

3. ABSTRACT-VALUE SLICER

An abstract-value slicer is an algorithm that filters out unnecessary informa-tion from the result of an abstract interpretation. When an abstract interpre-tation is used for verification, it usually computes stronger invariants thanneeded. This situation commonly happens, because an abstract interpretationis usually designed and implemented to blindly estimate best possible invari-ants at each program point without considering global goal of the intendedverification, normally assuming that every aspect of a program potentially con-tributes to the properties of interest. However, in the verification of a specificsafety property, only some aspects of the program are usually needed. As aresult, the abstract interpretation results are likely to contain unnecessary in-formation for such verification. Actually, this situation results from a designchoice too, because we want one abstract interpretation to estimate invariantsonce for multiple verifications of different safety properties.

The goal of an abstract-value slicer is to weaken the computed invariantsuntil no information in the invariants is unnecessary for a specific verification.Mathematically, the abstract-value slicer lifts the result f of an abstract in-terpretation: it computes a new post fixpoint f ′ of the abstract step functionF in Section 2, such that f f ′, but f ′ is still strong enough to prove theproperties of interest. Intuitively, the “difference” between f and f ′ representsthe information filtered out from f by the abstract-value slicer.

6When the shortest path algorithm is applied, program variables are considered nodes in the graphand each DBM entry aj i is regarded as the weight of the edge from node x j to node xi .

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 15: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

Goal-Directed Weakening of Abstract Interpretation Results • 39:15

In this section, we define the abstract-value slicers, and prove their correct-ness. First, we introduce extractor domain and back-tracers for atomic terms,which are two main components of an abstract-value slicer. An extractor do-main determines the working space of an abstract-value slicer, that is, a posetwhere the abstract-value slicer does the fixpoint computation, and back-tracersspecify how the abstract-value slicer treats atomic terms: they describe how theslicer filters out unnecessary information from the abstract interpretation re-sults for atomic terms. Then, we define an abstract-value slicer, and prove itscorrectness. Throughout the section, we assume a fixed abstract interpreta-tion, and denote its abstract domain and abstract semantics of atomic terms byA = (A, , ⊥, �) and [[−]], respectively.

3.1 Extractor Domain

An extractor domain for the abstract interpretation (A, [[−]]) is a pair (E , ex)where E is a complete lattice ( E , ⊥E , �E , �E , �E ) with finite height7 and exis a monotone map from E to upper closure operators on A.8 Intuitively, eachelement e in E denotes an “information extractor” that selects some informationfrom abstract values a inA, which is to be saved/preserved, and ex(e)(a) extractsinformation (to be saved/preserved) from a based on the extractor e. Note thatwe require ex(e) to be an upper closure operator, that is, a monotone functionthat satisfies extensiveness and idempotency requirements. The extensivenessrequirement means that the extracting operation ex(e)(a) lifts the value a. Whenan extractor e is applied to the abstract value a, it does not insert any newinformation, but only selects some information from a; thus, ex(e)(a) shouldhave less information than a (i.e., a ex(e)(a)). The idempotency requirementformalizes that the extraction by ex(e) is done all at once. We also point outthat ex should be monotone with respect to the order E on extractors. Thismonotonicity condition ensures that the order E means the “strength” of theinformation extractors in the reverse direction: if e E e′, then e′ extracts lessinformation than e.

We call E extractor lattice and ex extractor application. We often omit the sub-script −E in the lattice operators, E , ⊥E , �E , �E , �E , when the missing subscriptcan be recovered from context.

Example 3.1. We use the following extractor domain for evenness analysis:

E def= ℘(Vars) (ordered by ⊇) and ex(e)(a) def= λx. if x ∈ e then a(x) else �e.

In this extractor domain, each abstract value a is regarded as the conjunctionof information “x �→a(x)” for all x ∈ Vars, and the extractors in E indicate whichinformation should be selected from such conjunction. For instance, an abstractvalue [x �→even, y �→even, z �→even] is regarded as even(x) ∧ even( y) ∧ even(z),where the predicate even(x) asserts that x is even, and the extractor {x, y} ex-presses that only the first and second conjuncts, even(x) ∧ even( y), should be

7We don’t need to require the completeness of E , since all lattices with finite height are complete.However, we put the completeness requirement explicitly here to simplify presentation.8An upper closure operator ρ on A is a monotone function on A such that ρ is extensive (i.e., id ρ)and idempotent (i.e., ρ ◦ ρ = ρ).

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 16: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

39:16 • S. Seo et al.

selected. Note that the definition formalizes such selection using the top ele-ment �e of EV: all the unselected information is replaced by �e, while the otherselected information remains as it is. The extractor ex(e) for evenness analysisis an upper closure operator: it is extensive because ∀a ∈ A. ∀x ∈ Vars. a(x) ex(e)(a)(x), and idempotent because ∀a ∈ A. ex(e)(ex(e)(a)) = ex(e)(a).

Example 3.2. We construct an extractor domain (E , ex) for zone analysis,using a set of matrix indices as an information extractor. The idea is to use eachindex set e to specify which entries of the DBM matrices should be extracted.For each DBM matrix a, ex(e)(a) selects only the entries of a whose indices are ine, and it fills in the other missing entries by ∞. For example, when the extractor{(2, 1)} is applied to the DBM a0 in Figure 2(b), it filters out all entries exceptthe (2, 1)-th entry, and results in the below DBM, which means x1 − x2 ≤ 0.

x0 x1 x2

x0 ∞ ∞ ∞x1 ∞ ∞ ∞x2 ∞ 0 ∞

Let N be the number of program variables, so that the domain M of DBMsconsists of (N +1)×(N +1) matrices, and let I be the index set (N +1)×(N +1).The precise definition of E and ex is given below:

E def= 〈℘(I ), ⊇, I, ∅, ∩, ∪〉 and (ex(e)(a))i jdef=

{aij if i j ∈ e∞ otherwise.

Note that the extractor lattice uses the superset order; thus, a smaller extractorselects more matrix entries from the input DBM than a bigger one.

3.2 Back-Tracers

Let (E , ex) be an extractor domain for the abstract interpretation (A, [[−]]). Foreach atomic term t, define prepost(t) by

prepost(t) def= {(a, b) | a, b ∈ A ∧ [[t]]a b},which means the pre and post conditions of the triples {a}t{b} for t that can beproved by the abstract interpretation.

Definition 3.3 (Back-Tracer). A back-tracer k for an atomic term t is a func-tion of type prepost(t) → E → E that satisfies the following soundness condition:

∀(a, b) ∈ prepost(t). ∀e, e′ ∈ E . (kab(e) = e′) =⇒ [[t]](ex(e′)(a)) ex(e)(b).

The back-tracer kab at (a, b) transforms a post extractor e (for b) to a pre extrac-tor e′ (for a). The soundness condition ensures that the e′-part of a is sufficientto get the e-part of b in the abstract interpretation.

A back-tracer induces a map from {ex(e)(b) | e ∈ E} to {ex(e)(a) | e ∈ E}, whenit satisfies

ex(e)(b) = ex(e′)(b) =⇒ ex(kab(e))(a) = ex(kab(e′))(a).

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 17: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

Goal-Directed Weakening of Abstract Interpretation Results • 39:17

The domain and codomain of this map represent pieces of information fromb and a, respectively, so the map indicates that the back-tracer transforms apiece of information of b to another piece of information of a.

Note that the back-tracer kab for an atomic term t is not required to be mono-tone. Thus, it is relatively easy to design one correct back-tracer. For example,suppose that ex(⊥) is the identity function.9 Then, for every e ∈ E , there existse′ ∈ E such that

[[t]](ex(e′)(a)) ex(e)(b),

because ⊥ could be e′. We can now define kab(e) to be one such e′. However,designing a good back-tracer, not just correct one, is a nontrivial problem, andrequires insights about the abstract interpreter and the extractor domain.

Our next result, Proposition 3.6, is concerned with this issue of designinggood back-tracers. It gives a sufficient and necessary condition for the existenceof the best back-tracers. In Section 4, we will discuss this issue further, andprovide general techniques for implementing good back-tracers, including thebest ones.

Definition 3.4 (Best Back-Tracer). A back-tracer k for an atomic term t isbest if and only if it is the greatest back-tracer for t:10 for all back-tracers k′

for t,

∀(a, b) ∈ prepost(t). ∀e ∈ E . k′ab(e) kab(e).

LEMMA 3.5. A function k : prepost(t) → E → E is the best back-tracer for tif and only if for all (a, b) ∈ prepost(t) and all e, e′ ∈ E ,11

e′ kab(e) ⇐⇒ [[t]](ex(e′)(a)) ex(e)(b).

PROOF. First, we prove the only-if direction. Suppose that k is the best back-tracer for t, and choose arbitrary (a, b) ∈ prepost(t) and e, e′ ∈ E . We need toshow the equivalence:

e′ kab(e) ⇐⇒ [[t]](ex(e′)(a)) ex(e)(b).

Suppose that the left-hand side of the equivalence holds. Then,

[[t]](ex(e′)(a)) [[t]](ex(kab(e))(a)) (by the monotonicity of [[t]] and ex(−)(a)) ex(e)(b) (by the soundness of the back-tracer k).

Thus, the right-hand side of the equivalence holds as well. Now suppose thatthe right-hand side of the equivalence holds. Define a function k′ as follows:

k′ : prepost(t) → E → Ek′

cd (e0) def= if ((c, d , e0) = (a, b, e)) then e′ else kcd (e0).

9This supposition holds for all the examples in this paper.10This definition can be rephrased as follows. A function k is the best back-tracer for t if and onlyif k is the least upper bound of all back-tracers for t and k itself is a back-tracer for t.11The equivalence between the order relationships is reminiscent of Galois connection. We cannotuse Galois connection directly here, because kab: E → E and [[t]]:A → A are not type-correct forGalois connection. We resolve this type error using functions ex(−)(a) ex(−)(b) of type E → A.

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 18: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

39:18 • S. Seo et al.

k′ is a back-tracer because it satisfies the condition for back-tracers: when thearguments are (a, b, e), k′

ab(e) = e′ holds and the right-hand side of the equiv-alence directly means the condition for back-tracers, and when the argumentsare not (a, b, e), k′ is the same as k, which is already a back-tracer. Since k is abest back-tracer, k′ k holds, which implies the left-hand side of the equiva-lence as follows:

e′ = k′ab(e) kab(e).

Next, we prove the if direction. Suppose that k satisfies the equivalence inthe lemma. Then, for all back-tracers k′ for t, and all (a, b) ∈ prepost(t) ande ∈ E ,

[[t]](ex(k′ab(e))(a)) ex(e)(b) by the soundness of back-tracer k′,

and so, by the equivalence in the lemma, k′ab(e) kab(e). We just have shown

that k′ is smaller than or equal to k. It remains to show that k is a back-tracerfor t, that is, it satisfies the soundness requirement for back-tracers for t. Toshow the requirement, consider arbitrary (a, b) ∈ prepost(t) and e ∈ E . Let e′ bekab(e). Since e′ kab(e), the equivalence in the lemma gives

[[t]](ex(e′)(a)) ex(e)(b).

Note that this order relationship is precisely the soundness requirement forback-tracers for t.

PROPOSITION 3.6. An atomic term t has the best back-tracer if for all a ∈ A,the function λe.[[t]](ex(e)(a)) : E →m A preserves all finite joins. Moreover, whenex(⊥) is the identity function on A, the converse holds as well.

PROOF. First, we prove the if direction. Suppose that for all a ∈ A, functionλe.[[t]](ex(e)(a)) of type E →m A preserves all finite joins. Define a function k asfollows:

k : prepost(t) → E → Ekab(e) def=

⊔{e0 | [[t]](ex(e0)(a)) ex(e)(b)}.

We will show that k satisfies the following equivalence in Lemma 3.5: for alle, e′ ∈ E ,

e′ kab(e) ⇐⇒ [[t]](ex(e′)(a)) ex(e)(b).

The right-to-left implication follows from the definition of k. For the left-to-rightimplication, it is sufficient (because λe.[[t]](ex(e)(a)) is monotone) to prove thefollowing condition:

e′ = kab(e) =⇒ [[t]](ex(e′)(a)) ex(e)(b).

Suppose that e′ = kab(e). Since the extractor domain has finite height, thereexists a finite nonempty subset E0 of {e0 | [[t]](ex(e0)(a)) ex(e)(b)} such that e′ =⊔

E0. Note that by the assumption of the if direction, function λe. [[t]](ex(e)(a))preserves finite joins. From this join preservation and the choice of E0, we

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 19: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

Goal-Directed Weakening of Abstract Interpretation Results • 39:19

derived the required implication:

[[t]](ex(e′)(a)) = [[t]](ex(⊔

E0)(a)) (since e′ = ⊔E0)

= ⊔e0∈E0

[[t]](ex(e0)(a)) (by the join preservation of [[t]](ex(−)(a)))

ex(e)(b) (since ∀e0 ∈ E0. [[t]](ex(e0)(a)) ex(e)(b)).

Next, we show the only if direction, assuming that ex(⊥) is the identity onA. Suppose that t has the best back-tracer k. Then, it satisfies the equivalencein Lemma 3.5. Consider a finite family {ei}i∈I of extractors. Then, by the mono-tonicity of ex and [[t]], we have that

[[t]](

ex

( ⊔

i∈I

ei

)(a)

)!

i∈I

[[t]](ex(ei)(a)).

Thus, to show that the join in⊔

i∈I ei is preserved, we only need to prove theother order relationship. Let b be

⊔i∈I [[t]](ex(ei)(a)). Then,

∀i ∈ I. [[t]](ex(ei)(a)) b (by the definition of b)

⇐⇒ ∀i ∈ I. [[t]](ex(ei)(a)) ex(⊥)(b) (since ex(⊥) is the identity)

⇐⇒ ∀i ∈ I. ei kab(⊥) (by the equivalence in Lemma 3.5)

⇐⇒( ⊔

i∈I

ei

) kab(⊥)

⇐⇒ [[t]](

ex

( ⊔

i∈I

ei

)(a)

) ex(⊥)(b) (by the equivalence in Lemma 3.5)

⇐⇒ [[t]](

ex

( ⊔

i∈I

ei

)(a)

) b (since ex(⊥) is the identity).

We have just shown the required order relationship.

Example 3.7. We define a back-tracer for each atomic term for evennessanalysis. Recall the abstract domainP of evenness analysis in Example 2.1, andthe extractor domain (E , ex) for the analysis in Example 3.1. For a back-tracerfor an atomic term t, we use (|t|) for notational convenience. The back-tracer (|t|)for each atomic term t in this domain is defined as follows:

(|x := 2E|)ab(e) def= e−{x},(|x := y |)ab(e) def= if x ∈ e then (e−{x}) ∪ { y} else e,(|x := E|)ab(e) def= e−{x} (for all the other assignments),

(|skip|)ab(e) def= e,(|E ≤ E ′|)ab(e) def= e,

(|even?(x)|)ab(e) def= if (even e b(x)) then (e − {x}) else e.

The back-tracer for every assignment x := E has the same pattern. Given anextractor e, it first deletes x from e, and then adds (to the resulting extractor) thevariables used in the abstract semantics. Note that this is similar to the DEF-USE calculation in the conventional data-flow analysis. The main differenceis that the back-tracer computes the DEF-USE for the abstract semantics, not

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 20: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

39:20 • S. Seo et al.

for the concrete semantics. For instance, when the back-tracer (|x := y + z|)abis applied to the extractor {x, v}, it deletes x from the extractor and returns {v}.The variables y , z are not added to the extractor even though they are readby assignments in the concrete semantics. This is because y , z are not used bythe abstract semantics of the assignments. The back-tracer for evenness testeven?(x) uses the analysis results critically. The function [[even?(x)]] refines theinput a by replacing a(x) by the minimum of a(x) and even. Thus, for pre andpost conditions (a, b) ∈ prepost(even?(x)) (i.e., [[even?(x)]]a b), if even e b(x),then the x component of a is not necessary to obtain the x component of b. Theback-tracer (|even?(x)|)ab correctly captures this using the analysis result b; itfirst tests whether even e b(x), and if so, it deletes x from the given extractor e.

Example 3.8. Recall the abstract domain M for zone analysis in Exam-ple 2.2 and the extractor domain (E , ex) for the analysis in Example 3.2. Theback-tracer (|t|) for an atomic term t in this extractor domain should be a param-eterized index-set transformer that satisfies the following condition: for all preand post conditions (a, b) ∈ prepost(t) (i.e., [[t]]a b) and all extractors e for b,the computed index set (|t|)ab(e) contains (the indices of) all the entries of a thatare necessary for obtaining the e entries of b. We define such a back-tracer (|t|)as a two-step computation. First, (|t|)ab(e) deletes all the indices i j from e thatsatisfy bij = ∞ (i.e., e−{i j | bij = ∞}). Then, for each remaining i j -th entry of e,(|t|)ab(e) computes the entries of a that are needed for obtaining ([[t]]a)i j , collectsall the computed entries, and returns the set of the collected indices. Note thatall the deleted indices i j in the first step select only the empty information fromb: for all index sets e, we have that exb(e) = exb(e − {i j }). Thus, the first steponly makes e have a better representation e′ (i.e., e′ ⊆ e), without changing itseffect on b. Another thing to note is that the second step is concerned with onlya and [[t]]a, but not b. Here the second step exploits the fact that to get the e′

entries of b, we need only the e′ entries of [[t]]a.The actual implementation (|t|) for each atomic term t optimizes the generic

two-step computation, and it is shown in Figure 4. The most interesting partis the last case xi := E. The back-tracer (|xi := E|)ab(e) first checks whether theinput matrix a has a negative cycle, that is, a sequence k0k1 . . . kn of integers in[0, N ] such that k0 = kn, n ≥ 1, and

∑n−1m=0 akmkm+1 < 0. If a has a negative cycle,

(|xi := E|)ab(e) picks a shortest such cycle (i.e., one with smallest n), and returnsthe set of all the “edges” kmkm+1 in the cycle. If a does not have a negativecycle, (|xi := E|)ab(e) eliminates all the indices kl from e such that bkl = ∞ ori = k ∨ i = l . Then, for each remaining index kl in e, (|xi := E|)ab(e) selects asequence k0 . . . kn of integers in [0, N ] such that k0 = k, kn = l , and(

n−1∑m=0

akmkm+1

)≤ (b)kl .

The formula on the left-hand side of the above inequality computes an upperbound of (a∗)kl , and the inequality means that this upper bound is still tightenough to prove that (a∗)kl ≤ (b)kl . The set of all the “edges” kmkm+1 in theseselected paths is the result of the back-tracer (|xi := E|)ab(e).

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 21: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

Goal-Directed Weakening of Abstract Interpretation Results • 39:21

Fig. 4. Back-tracers for atomic terms in zone analysis.

When (|xi := E|)ab(e) chooses a path from k to l in the second step, it usuallypicks one with the minimum weight, denoted mPath(a, k, l ).12 However, whenakl ≤ bkl , (|xi := E|)ab(e) selects a possibly different and shorter path kl . Notethat the selected path for kl here might be different from the path that theabstract interpretation has used to compute the kl -th entry of b. This showsthat (|xi := E|)ab(e) does not necessarily denote the part of a that the abstractinterpretation has used to obtain the e-part of b; instead it means the part of athat the abstract interpretation can use to get the e-part of b.

To see how the back-tracer works more clearly, consider the following preand post conditions of x3 := 0:13

[[x3 := 0]]

⎛⎜⎜⎜⎜⎜⎜⎝

x0 x1 x2 x3

x0 ∞ 4 3 ∞x1 −1 ∞ 4 ∞x2 −1 0 ∞ ∞x3 ∞ ∞ ∞ ∞

⎞⎟⎟⎟⎟⎟⎟⎠

⎛⎜⎜⎜⎜⎜⎜⎝

x0 x1 x2 x3

x0 ∞ 5 ∞ 0x1 ∞ ∞ 2 ∞x2 ∞ ∞ ∞ ∞x3 0 ∞ ∞ ∞

⎞⎟⎟⎟⎟⎟⎟⎠

.

Let a and b be, respectively, the left and right DBMs of this relation-ship. When the back-tracer (|x3 := 0|)ab is given a post extractor e ={(0, 1), (1, 2), (2, 1), (3, 0)} it first gets rid of (2, 1) and (3, 0) from e, because the(2, 1)-th entry of b has ∞ and the (3, 0)-th entry of b is generated by the as-signment x3 := 0. Then, the back-tracer (|x3 := 0|)ab computes paths for (0, 1)

12mPath(a, k, l ) is a path k0 . . . kn such that k0 = k, kn = l and⎛⎝ n−1∑

m=0

akmkm+1

⎞⎠ = (a∗)kl .

13The second DBM can arise, for instance, when the assignment x3 := 0 is executed at the end ofthe false branch of a conditional statement and the analysis result of the true branch is a DBMwhose (0, 2), (1, 0), (2, 0), (2, 1) entries are ∞ and (0, 1) entry is 5.

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 22: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

39:22 • S. Seo et al.

and (1, 2) separately; for (0, 1), it picks the path 0, 1, because a01 ≤ b01; and forthe other index (1, 2), condition a12 ≤ b12 does not hold, and so the back-tracercomputes a path from 1 to 2 with the minimum weight, which is the sequence1, 0, 2. Finally, it returns the index set {(0, 1), (1, 0), (0, 2)} that consists of allthe edges in the computed two paths.

Back-tracers for all atomic terms induce a back-tracer for an entire programP = (V , E, ni, n f , L). Assume that we are given back-tracers (|t|) for all atomicterms t. Suppose that f and g are maps from program points of P to abstractvalues in A (i.e., f , g ∈ ∏

n∈V A) such that g approximates the abstract one-stepexecution from f : F ( f ) g for the abstract step function F for P . For such fand g , we define the back-tracer (|P |) f g for P to be the following function:

(|P |) f g :

( ∏n∈V

E)

→( ∏

n∈V

E)

(|P |) f g (ε)(n) def= � {(|L(nm)|) f (n)g (m)(ε(m)) | nm ∈ E

},

where∏

n∈V E is the cartesian product of lattices E , ordered pointwise. We callε ∈ ∏

n∈V E extractor annotation. The back-tracer (|P |) f g for P takes a postextractor annotation ε for g , and computes a pre extractor annotation ε′ forf , by first running given (|L(nm)|) f (n)g (m), and then combining all the resultingextractors at each program node.

Recall that back-tracers (|L(nm)|) for atomic terms L(nm) take only thosesubscripts ab that satisfy [[L(nm)]]a b. We note that when (|P |) calls(|L(nm)|) f (n)g (m), it always uses correct subscripts; for each nm ∈ E,

[[L(nm)]] f (n) ⊔{[[L(n′m)]] f (n′) | n′m ∈ E} (since nm ∈ E)= F ( f )(m) (by the definition of F ) g (m) (since F ( f ) g ).

We define exall to be the application of extractor annotations:

exall :( ∏

n∈V

E)

→m

(( ∏n∈V

A)

→m

( ∏n∈V

A))

exall(ε)( f ) def= λn∈V . ex(ε(n))( f (n)).

The following lemma shows that the back-tracer (|P |) f g computes a correct preextractor annotation.

LEMMA 3.9. For all ε in∏

n∈V E , if (|P |) f g (ε) = ε′,

F (exall(ε′)( f )) exall(ε)(g ).

PROOF. To show the lemma, pick an arbitrary program point n from V . Then,for all m such that mn ∈ E,

([[L(mn)]] f (m)) ( ⊔

m′n∈E

([[L(m′n)]] f (m′)))

(since mn ∈ E)

= (F ( f )(n)) (by the definition of F )

g (n) (by the assumption F ( f ) g ).

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 23: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

Goal-Directed Weakening of Abstract Interpretation Results • 39:23

Let e′ be (|L(mn)|) f (m)g (n)(ε(n)). By what we have derived above and the definitionof back-tracers for L(mn), we have that

[[L(mn)]](ex(e′)( f (m))) ex(ε(n))(g (n)).

We now prove the required inequality as follows.

F (exall((|P |) f g (ε))( f ))(n)

=⊔

mn∈E

([[L(mn)]]((exall ((|P |) f g (ε))( f ))(m))) (by the definition of F )

=⊔

mn∈E

([[L(mn)]](ex((|P |) f g (ε)(m))( f (m))))

(since exall(ε′)( f )(m) = ex(ε′(m))( f (m)) for all ε′)

=⊔

mn∈E

([[L(mn)]]

(ex

(�{(|L(mn′)|) f (m)g (n′)(ε(n′)) | mn′ ∈ E})( f (m))

))(by the definition of (|P |) f g )

mn∈E

([[L(mn)]](ex((|L(mn)|) f (m)g (n)(ε(n)))( f (m))))

(since mn ∈ E, and [[L(mn)]] and ex(−)( f (m)) are monotone)

⊔mn∈E ex(ε(n))(g (n))

(since ∀e ∈ E . ex(e)(g (n)) ! [[L(mn)]](ex((|L(mn)|) f (m)g (n)(e))( f (m)))

= ex(ε(n))(g (n)).

3.3 Abstract-Value Slicer SL

We now define an abstract-value slicer, assuming that we are given two com-ponents of the slicer, namely, an extractor domain (E , ex) and back-tracers (| − |)for all atomic terms in this domain. Suppose that we are given a programP = (V , E, ni, n f , L). Let F be the abstract one-step execution of P in theabstract interpretation, and let postfix(F ) be { f | F ( f ) f }, the set of postfixpoints of F .

Definition 3.10. The abstract-value slicer SL for the program P is the func-tion defined as follows:

SL :(

postfix(F ) ×∏n∈V

E)

→( ∏

n∈V

E)

SL( f , ε) def= let Bf = λε′.(ε′ � (|P |) f f ε′) and

k = min{n | n ≥ 0 ∧ Bn

f (ε) = Bn+1f (ε)

}in Bk

f (ε).

Intuitively, the first input f to the slicer denotes the result of the abstractinterpretation, and the second ε specifies the part of f that is used for verifi-cation; although exall(ε)( f ) is weaker than f , it is still strong enough to verifythe property of interest. Given such f and ε, the slicer SL defines a reductive

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 24: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

39:24 • S. Seo et al.

function14 Bf on∏

n∈V E , and then computes its fixpoint ε′ such that ε′ ε,by repeatedly applying Bf from ε. Note that SL always succeeds in computingsuch ε′, because the domain

∏n∈V E of Bf has finite height. The result of the

slicer SL( f , ε) is this computed fixpoint ε′.15

The result SL( f , ε) of the abstract-value slicer satisfies the following twoimportant properties, which together ensure the correctness of the slicer:

(1) SL( f , ε) ε, and(2) exall(SL( f , ε))( f ) is a post fixpoint of F .

The first property means that SL( f , ε) extracts at least as much informationas ε, so that if a property of P can be verified by exall(ε)( f ), it can also be ver-ified by exall(SL( f , ε))( f ). The second property means that exall(SL( f , ε))( f ) isanother possible solution of the abstract interpretation, which could have beenobtained if the abstract interpretation used a different strategy for computingpost fixpoints. Note that the first property holds because Bf is reductive andthe fixpoint computation of the slicer starts from ε. For the second property, weprove a slightly more general lemma, by using the soundness of the back-tracer(|P |) (Lemma 3.9).

LEMMA 3.11. For all f ∈ postfix(F ) and all ε′ ∈ ∏n∈V E ,

Bf (ε′) = ε′ =⇒ F (exall(ε′)( f )) exall(ε′)( f ).

PROOF. To show the lemma, choose an arbitrary post fixpoint f of F and anextractor annotation ε′ for f such that Bf (ε′) = ε′. Then,

(|P |) f f ε′ ! Bf (ε′) (by the definition of Bf )= ε′ (by assumption).

Using what we have just shown above, we prove the required inequality asfollows:exall(ε′)( f ) ! F (exall((|P |) f f ε

′)( f )) (by Lemma 3.9)! F (exall(ε′)( f )) (since ε′ (|P |) f f ε

′, and F and exallare monotone).

We summarize what we have just proved in the following proposition.

PROPOSITION 3.12 (CORRECTNESS). For all f ∈ postfix(F ) and all ε ∈ ∏n∈V E ,

the slicer SL( f , ε) terminates (i.e., it is a well-defined total function), and itoutputs ε′ such that ε′ ε and F (exall(ε′)( f )) exall(ε′)( f ).

Example 3.13. Consider the following result from evenness analysis:

x �→ �e

y �→ �ey := 2y; x �→ �e

y �→ evenx := 2y; x �→ even

y �→ eveny:=x

x �→ even

y �→ even

14A function f on a poset C is reductive iff f (x) x for all x ∈ C.15In general, the result of SL( f , ε) is not the greatest fixpoint ε′ of Bf satisfying the condition ε′ ε.In fact, such greatest fixpoints ε′ might not even exist. However, if (|t|)ab is monotone for all atomicterms t, so that (|P |) f g is monotone, then the result of SL( f , ε) is the greatest fixpoint ε′ of Bf thatsatisfies the condition. In this case, the result of SL( f , ε) is the greatest fixpoint of λε′. ε � (|P |) f f (ε′).

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 25: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

Goal-Directed Weakening of Abstract Interpretation Results • 39:25

Suppose that we have used the analysis in order to verify that variable y storesan even integer at the end. The following extractor annotation expresses thisverification goal:

{} y := 2y; {} x := 2y; {} y := x { y}

When the abstract-value slicer for evenness analysis is given the precedinganalysis result and extractor annotation, it returns the extractor annotationfollowing:

{} y := 2y; {} x := 2y; {x} y := x { y}

Thus, the original analysis result is sliced to:

x �→ �e

y �→ �ey := 2y; x �→ �e

y �→ �ex := 2y; x �→ even

y �→ �ey := x

x �→ �e

y �→ even

Note that the sliced result correctly expresses that the only necessary informa-tion for the verification is the evenness of x after x := 2y and the verificationgoal at the end.

Example 3.14. Figure 5 shows the result of the abstract-value slicing forzone analysis. Figure 5(b) shows the input to the slicer; the DBMs in the sec-ond row describe the result of zone analysis, and the extractors in the first rowspecify that only the (2, 1), (1, 2)-th entries of the DBM at n4 are used for ver-ification. Figure 5(c) shows the sliced result; the first row describes the resultof the abstract-value slicer for this input, and the second row describes theapplication of the obtained extractors to the abstract interpretation result. Forthis example, this table indicates that among 19 non-∞ DBM entries in theabstract interpretation result, only 5 entries are needed to prove the propertyof interest. Finally, Figure 5(d) expresses the abstract interpretation result andits slice in the form of constraints.

4. METHODS FOR DESIGNING BACK-TRACERS FOR ATOMIC TERMS

In this section, we provide two methods for designing back-tracers for atomicterms. As explained in Section 3.2, it is relatively easy to define a correct back-tracer for an atomic term t. However, designing a good back-tracer for t isdifficult, and requires special knowledge about the abstract interpretation thatis used. The first method in the section aims at producing accurate back-tracersfor atomic terms: a slicer with the back-tracer that is produced usually filtersout more information from the abstract interpretation result, than the one withnaively designed back-tracers. The second method, on the other hand, aimsat a back-tracer with low cost on time and space. Throughout the section, weassume a fixed abstract interpretation that uses an abstract domain (A, , ⊥, �)and an abstract semantics [[−]] for atomic terms. We also assume that a fixedextractor domain (E , ex) is given for this abstract interpretation, and that E isfinite.

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 26: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

39:26 • S. Seo et al.

Fig. 5. Result of abstract-value slicing for the program in Figure 2.

4.1 Best Back-Tracer Construction

The first method constructs the best back-tracer for each atomic term, consid-ered in Proposition 3.6. It is a slight modification of a rather well-known “re-versing technique” [Hughes and Launchbury 1992; Duesterwald et al. 1995].Let t be an atomic term, and let (a, b) be a pair of pre and post conditions inprepost(t). For these t, a, b, the method defines the back-tracer (|t|)ab as follows:

(|t|)ab(e) def=⊔

{e0 ∈ E | [[t]](ex(e0)(a)) ex(e)(b)}.ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 27: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

Goal-Directed Weakening of Abstract Interpretation Results • 39:27

Intuitively, (|t|)ab(e) is the “conjunction” of all correct pre extractors: (|t|)ab(e)selects some information from a, precisely when all the correct pre extractorsselect the same information. Note that for every correct pre extractor e0, thecomputed extractor e′ = (|t|)ab(e) filters out at least as much information from aas e0 (i.e., ex(e0)(a) ex(e′)(a)), and so, it induces a better slice of a than e0.

In addition to the bestness of the method, we need to check whether themethod constructs a correct back-tracer. Unfortunately, this method does notalways construct a correct back-tracer; in general, the constructed (|t|)ab does notsatisfy the following soundness condition from the definition of back-tracers:

∀e, e′ ∈ E . (|t|)ab(e) = e′ =⇒ [[t]](ex(e′)(a)) ex(e)(b).

To make the above soundness condition hold, we should restrict the use of themethod for join-preserving functions as the solutions in other studies [Hughesand Launchbury 1992; Duesterwald et al. 1995]: we use the constructed (|t|)ab,only when [[t]] and ex(−)(a) preserve finite joins.

LEMMA 4.1. If [[t]] and λe. ex(e)(a) preserve finite joins for all a, then (|t|)satisfies the soundness condition for back-tracers for t. Moreover, in this case,(|t|)ab is the best back-tracer for t (which exists by Proposition 3.6).

PROOF. In this proof, we show that the method constructs a correct back-tracer by the join-preservation, and then show why it is a best back-tracer. Tosee why the join-preservation provides a solution, suppose that [[t]] and ex(−)(a)preserves finite joins. Then, their composition λe. [[t]](ex(e)(a)) also preservesfinite joins. So, for all extractors e, e′ ∈ E such that (|t|)abe = e′, we have that

[[t]](ex(e′)(a))

= [[t]](ex((|t|)ab(e)

)(a))

= [[t]](ex

( ⊔{e0 | [[t]](ex(e0)(a)) ex(e)(b)}

)(a)

)(by the definition of (|t|)ab)

=( ⊔

{[[t]](ex(e0)(a)) | [[t]](ex(e0)(a)) ex(e)(b)})

(by join preservation)

ex(e)(b).

This order relationship implies the correctness of (|t|)ab. Now, we prove that(|t|)ab is a best back-tracer. Consider a back-tracer k for t. Then, for all (a, b) ∈prepost(t) and e ∈ E ,

[[t]](ex(kab(e))(a)) ex(e)(b).

Thus, by the definition of (|t|), we have that kab(e) (|t|)ab(e), as required.

The very definition of (|t|)ab gives a default (usually inefficient) implementa-tion, if all the extractor applications are computable. When a post extractor efor b is given, the implementation calculates all the correct pre extractors fora, b, t, e; this is possible, because the extractor domain E is finite (by assump-tion) and [[t]] is computable. Then, the implementation returns the greatest onefrom the calculated extractors. This default implementation is, however, veryslow, and in many cases, it can be improved dramatically. We illustrate thisimprovement using zone analysis.

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 28: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

39:28 • S. Seo et al.

Example 4.2. Consider zone analysis (M, , ⊥, �) and the extractor do-main (E , ex) in Example 3.2. In this case, the method in this section can beapplied to obtain the best back-tracer for an atomic term, if the term is eithera Boolean expression or an assignment of the form xi := xi + c; zone analysisinterprets all such atomic terms as join-preserving functions, and extractor ap-plication ex(−)(a) preserves finite joins for all a. In this example, we will explainhow to efficiently implement this best back-tracer.

Let t be an atomic term that is either a Boolean expression or an assignmentof the form xi := xi +c. Our implementation of the best back-tracer for t is basedon two important observations.

(1) First, no matter whether t is a Boolean expression or an assignment, the ab-stract semantics [[t]] of t is a pointwise transformation of DBM matrices; tocompute the i j -th entry of the output DBM, [[t]] uses at most the i j -th entryof the input DBM. More precisely, there exists a family { fi j }i j∈(N+1)×(N+1) ofmonotone functions on Ints ∪ {−∞, ∞} (ordered by ≤) such that

∀i j . ([[t]]a)i j = fi j (aij ).

(2) Second, when a function family { fi j }i j determines [[t]], it can be used tosimplify the “correctness condition” for pre and post extractors: for every(a, b) ∈ prepost(t), pre extractor e0 ∈ E and post extractor e ∈ E , we havethat

([[t]](ex(e0)(a)) ex(e)(b)) ⇐⇒ (∀i j . i j ∈ e ⇒ (i j ∈ e0 ∨ fi j (∞) ≤ bij )).

Intuitively, this simplified condition says that every index i j in the post ex-tractor e should belong to the pre extractor e0, except when t can “generate”the information bij (i.e., x j − xi ≤ bij ) without using the input DBM. To seethis, note that since fi j is monotone, fi j (∞) ≤ bij implies that for everyinput DBM a′, the i j -th entry of [[t]](a′) should be less than or equal to bij .Thus, no information from the input DBM is necessary for t to “produce”the i j -th entry of b.

We now use these two observations to optimize the best back-tracer for t:⊔

{e0 | [[t]](ex(e0)(a)) ex(e)(b)}=

⋂{e0 | [[t]](ex(e0)(a)) ex(e)(b)} (by the lattice structure of E)

=⋂

{e0 | ∀i j . i j ∈ e ⇒ (i j ∈ e0 ∨ fi j (∞) ≤ bij )} (by the second observation)

=⋂

{e0 | ∀i j . (i j ∈ e ∧ fi j (∞) �≤ bij ) ⇒ i j ∈ e0}=

⋂{e0 | {i j | i j ∈ e ∧ fi j (∞) �≤ bij } ⊆ e0

}= {i j | i j ∈ e ∧ fi j (∞) �≤ bij }= e − {i j | fi j (∞) ≤ bij }.Note that the obtained formula indicates the efficient implementation of thebest back-tracer (| f |)ab as a single set subtraction. Moreover, the subtracted setin the formula is a fixed set that does not depend on the post extractor e. Thisproperty can allow a further optimization of the set subtraction. In fact, the

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 29: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

Goal-Directed Weakening of Abstract Interpretation Results • 39:29

back-tracers for Boolean expressions E ≤ E ′, assignments xi := xi + c andcommand skip in Figure 4 are such further optimizations.

Before finishing the discussion on the construction of best back-tracers, weconsider one special case that the extractor domain is an atomic lattice. Recall(from the standard lattice theory [Davey and Priestley 1990]) that an elementx in a lattice L is an atom if and only if it is the second smallest element in L:

x �= ⊥ ∧ (∀x ′ ∈ L. (x ′ x ∧ x ′ �= x) ⇒ x ′ = ⊥)),

and that a lattice L is atomic if and only if every element x in the lattice L canbe reconstructed by combining (by join) all the atoms x ′ such that x ′ x:

∀x ∈ L. x =⊔

{x ′ | x ′ x and x ′ is an atom}.The following proposition suggests that we can optimize the best back-tracer(|t|)ab when E is atomic.

PROPOSITION 4.3. When E is atomic, the best back-tracer (|t|)ab is identical tothe following function:

λe.⊔

{e0 ∈ E | [[t]](ex(e0)(a)) ex(e)(b) and e0 is an atom}.PROOF. Let kab be the function defined in the proposition. For all e0 ∈

{e0 ∈ E | [[t]](ex(e0)(a)) ex(e)(b) and e0 is an atom}, it follows e0 (|t|)ab(e) byLemma 3.5. Since kab(e) is the join of all such e0, we have that kab(e) (|t|)ab(e).For the other direction, consider an arbitrary e1 ∈ E such that

[[t]](ex(e1)(a)) ex(e)(b).

Note that such e1 can be (|t|)ab by the definition (Definition 3.3) of back-tracer.For all atoms e0 such that e0 e1,

[[t]](ex(e0)(a)

) [[t]](ex(e1)(a)) (by the monotonicity of [[ f ]] and ex(−)(a)) ex(e)(b) (by the choice of e1).

Thus, e0 kab(e). This implies that e1 kab(e) since E is atomic. So, (|t|)ab(e) kab(e).

4.2 Extension Method

The second method, called extension method, is a dual approach to the atomiclattice case considered in the previous section. Extension method assumestwo properties of the extractor domain. Recall (from the standard lattice the-ory [Davey and Priestley 1990]) that an element x in a lattice L is a dual atomif and only if it is the second biggest element in L:

x �= � ∧ (∀x ′ ∈ L. (x x ′ ∧ x ′ �= x) ⇒ x ′ = �)),

and that a lattice L is dual atomic if and only if every element x in the latticeL can be reconstructed by combining (by meet) all the dual atoms x ′ such thatx x ′:

∀x ∈ L. x =�{x ′ | x x ′ and x ′ is a dual atom}.ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 30: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

39:30 • S. Seo et al.

The first assumption of the extension method is that each extractor lattice Eis dual atomic, and the second assumption is that each extractor applicationpreserves all finite meets:

ex(�) = � and ex(e � e′) = ex(e) � ex(e′).Here � and � are the lattice operations for upper closure operators, and theyare defined pointwise.16 Intuitively, these two assumptions mean that everyextractor e in E represents a collection {e1, . . . , en} of dual atomic extractors; eextracts information from a by first applying each ei to a and then conjoiningall the resulting information:

ex(e)(a) = �i=1, . . . ,n

ex(ei)(a).

When the extractor domain satisfies the preceding assumptions, the exten-sion method provides a recipe for constructing correct back-tracers for all atomicterms. Let t be an atomic term and (a, b) a pair of pre and post conditions inprepost(t). The first step of the extension method is to define a partial back-tracer g : g is a partial function of type E ⇀ E such that (1) the domain of g isprecisely the set of dual atoms in E and (2) for all post extractors in dom(g ), gcalculates correct pre extractors:

∀e ∈ dom(g ). [[t]](ex(g (e))(a)) ex(e)(b).

The next step is to extend g to the following complete back-tracer:

(|t|)ab(e) def=�{g (e1) | e e1 and e1 is a dual atom}.The total back-tracer (|t|)ab(e) decomposes the post extractor e into dual atoms,and then applies g to all the obtained dual atoms; finally, it merges all theresulting pre extractors (by meet). Note that, since E is finite, the meet here isover the finite sets, and so, it is well defined. The following lemma shows thatthe constructed (|t|)ab is correct.

LEMMA 4.4. For all extractors e, e′ ∈ E , we have that

e′ = (|t|)ab(e) =⇒ [[t]](ex(e′)(a)) ex(e)(b).

PROOF. We prove the lemma as follows:

[[t]](ex((|t|)abe)(a))

= [[t]](ex

(�{g (e1) | e e1 and e1 is a dual atom})(a)

)(by the definition of (|t|)ab)

= [[t]](�{ex(g (e1))(a) | e e1 and e1 is a dual atom}

)(by the meet preservation of ex)

�{[[t]](ex(g (e1))(a)) | e e1 and e1 is a dual atom} (since [[t]] is monotone)

�{ex(e1)(b) | e e1 and e1 is a dual atom}(since [[t]](ex(g (e1))(a)) ex(e1)(b) for all dual atoms e1)

16Precisely, �(a) = � and (ex(e) � ex(e′))(a) = ex(e)(a) � ex(e′)(a).

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 31: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

Goal-Directed Weakening of Abstract Interpretation Results • 39:31

= ex(�{e1 | e e1 and e1 is a dual atom}

)(b)(by the meet preservation of ex)

= ex(e)(b) (since E is dual atomic).

The extension method has two advantages over the best back-tracer con-struction. First, the extension method usually provides a relatively efficientdefault implementation of the defined back-tracers. By “efficient,” we do notmean a linear-time algorithm, but simply mean a polynomial-time algorithm,instead of exponential-time algorithm. Suppose that we have defined a back-tracer (|t|)ab, by applying the extension method to a partial back-tracer g . Thedefault implementation of (|t|)ab uses an internal table T that records the graphof function g , and is defined as follows:

(* e is an input (a post extractor),e0 is an output (a pre extractor) *)

X := {e1 | e e1 and e1 is a dual atom};e0 := �;for each e1 ∈ Xdo

lookup e′1 s.t. (e1, e′

1) ∈ T ;e0 := e0 � e′

1od

Here we specify the implementation imperatively, in order to distinguish it fromthe definition of (|t|)ab. Note that this implementation simply follows the defini-tion of (|t|)ab without any special optimizations. However, the implementationis efficient. In the worst case, the set X contains all the dual atoms, and so,all the basic operations in the implementation, such as the look-up of table Tand the meet operation of extractors, are executed at most as many times asthe number of dual atoms in E . Even when the extractor lattice E is big, theycontain only smaller number of dual atoms; in many cases, even though thecardinality of an extractor lattice is exponential in the size of the program, thenumber of dual atoms in the lattice is polynomial in the program size. Thus, insuch cases, the implementation runs in polynomial time. Note that in the bestback-tracer construction, a naive implementation, which directly follows thedefinition, executes basic operations as many times as the size of the extractorlattice.

Second, the extension method is more widely applicable than the best back-tracer construction. The assumptions of the extension method are only aboutthe extractor domain, not about the abstract interpretation. Thus, once theextractor domain is well chosen, the method can be used to get back-tracers forall atomic terms. This contrasts with the best back-tracer construction, whichcan be applicable to an atomic term t only when [[t]] preserves finite joins.

Example 4.5. For zone analysis and the extractor domain (E , ex) in Exam-ple 3.2, the extension method can be applied to all atomic terms; E is a dualatomic lattice, whose dual atoms are singleton index sets {i j }, and the extractor

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 32: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

39:32 • S. Seo et al.

application ex preserves finite meets. Here we apply the method to constructback-tracers for assignments xi:= E that were not handled in Example 4.2,namely those that are not of the form xi := xi + c. In fact, for such assignmentsxi := E, we cannot apply the best back-tracer construction, because [[xi := E]]does not preserve some finite joins.

Let a, b be DBMs such that [[xi := E]]a b. To apply the extension method,we need to define a correct partial back-tracer g for xi := E at a and b. We definesuch g by doing the case analysis on the input DBM a. When a contains a cyclewith negative weight, we pick one such cycle pickNegCycle(a) = k0k1 . . . kn in a,and define g to be a constant function λe.{k0k1, k1k2, . . . , kn−1kn}. Otherwise,that is, when a does not contain a negative cycle, we define g as follows, usingpaths with minimum weight:

g ({kl }) def= if (bkl = ∞ ∨ k = i ∨ l = i)then {}else (if akl ≤ bkl then {kl } else edges(mPath(a, k, l ))).

Here we used the subroutine edges in Figure 4, which takes a path k0k1 . . . kn

and returns the set {k0k1, k1k2, . . . , kn−1kn} of all the edges in the path. Thedefined function g first checks whether [[t]] needs to use the input to generatebkl . If so, g returns the entries of the input a that are necessary for generatingbkl . Otherwise, g returns the empty set.

Now the extension method gives the back-tracer (|xi := E|)ab defined as fol-lows:

(|xi := E|)ab(e) def=�{g (e1) | e e1 and e1 is a dual atom}.This definition can be optimized to the back-tracer for xi := E in Figure 4. Whena contains a negative cycle,

(|xi := E|)ab(e)=�{g (e1) | e e1 and e1 is a dual atom}=�{edges(pickNegCycle(a)) | e e1 and e1 is a dual atom}

(by the definition of g )

= edges(pickNegCycle(a)).

When a does not contain a negative cycle,

(|xi := E|)ab(e)=�{g (e1) | e e1 and e1 is a dual atom}=

⋃{g (e1) | e e1 and e1 is a dual atom} (by the lattice structure of E)

=⋃kl∈e

g ({kl })(since (e1 = {kl } for some kl ∈ e) ⇐⇒ (e e1 and e1 is a dual atom))

=⋃kl∈e

if (bkl = ∞ ∨ i = k ∨ i = l )then {}else (if akl ≤ bkl then {kl } else edges(mPath(a, k, l )))

(by the definition of g )

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 33: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

Goal-Directed Weakening of Abstract Interpretation Results • 39:33

= let e′ = e − {kl | bkl = ∞ ∨ k = i ∨ l = i}in

⋃kl∈e′ if akl ≤ bkl then {kl } else edges(mPath(a, k, l ))

= let e′ = e − {kl | bkl = ∞} − {ik, ki | 0 ≤ k ≤ N }in

⋃kl∈e′ if akl ≤ bkl then {kl } else edges(mPath(a, k, l )).

Note that in both cases, the optimized definition coincides with the back-tracerfor (|xk := E|) in Figure 4.

5. EXPERIMENTS

We designed an abstract-value slicer for the full zone analysis [Mine 2001], andtested the efficiency of the resulting slicer in the context of proof generation.

5.1 Abstract-Value Slicer for the Full Zone Analysis

The full zone analysis is different from our simplified version in Example 2.2in two aspects. First, the full analysis additionally applies the DBM closureoperator −∗ (in Example 2.2) before all DBM joins in the analysis. Second, ithas better abstract semantics of atomic terms. For all the assignments xi :=E, if E does not have the form c, xi + c, or x j + c, our simplified analysisreplaces xi := E by a random assignment xi := ?, which chooses an integernondeterministically and assigns the chosen number to xi; then, the simplifiedanalysis defines [[xi := E]] to be the strongest postcondition transformer ofxi := ?. The full version, on the other hand, does not do such a replacement, butdefines more accurate abstract semantics of xi := E using interval analysis.Given an input DBM a, the full analysis first applies the closure −∗ to a, justlike the simplified analysis. But then, instead of updating all xi-related entriesby ∞, the full analysis estimates the range of the right-hand side expressionE of xi := E, using interval analysis. It projects a∗ into the following abstractvalue prj(a∗) in interval analysis

prj(a∗) = λx j . [−(a∗) j 0, (a∗)0 j ],

and runs [[E]](prj(a∗)) in interval analysis to obtain the (approximate) range[n, m] of E. Finally, using this obtained range of E, the full analysis updatesthe i0 and 0i entries of the input a∗, and returns the following DBM:(

(a∗)[ki �→∞, ik �→∞]1≤k �=i≤N)[0i �→m, i0 �→−n].

We designed an abstract-value slicer for the full zone analysis, by modifyingthe slicer for the simplified version in Examples 3.2 and 3.8. To deal with theadditional uses of the closure operator −∗, we defined the back-tracer β for −∗

as follows. For all a, b such that a∗ b,

βab : E → Eβab(e) def= if (hasNegCycle(a) = true)

then edges(pickNegCycle(a))else let e′ = e − {kl | bkl = ∞}

in⋃

kl∈e′ (if akl ≤ bkl then {kl } else edges(mPath(a, k, l ))).

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 34: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

39:34 • S. Seo et al.

Table I. Number of Sliced DBM Entries

Number of DBM entriesProgram (1)totala Extractedb (2)removedc (2)/(1) Slicing time

Insertionsort 92 22 70 76% 0.07Partitiond 120 46 74 62% 0.03Bubblesort 217 42 175 81% 0.11KMPe 463 133 330 72% 0.28Heapsort 817 181 636 78% 0.29

aNumber of non-∞ DBM entries in the results of zone analysis.bNumber of the DBM entries in (1) that are extracted (i.e., not changed) by the slicer.cNumber of the DBM entries in (1) that are changed to ∞ by the slicer.d Partition function in Quicksort.eKnuth-Morris-Pratt pattern matching algorithm.

The defined β was, then, inserted into the old slicer in Example 3.8, in order toback-trace newly added closure applications in the full analysis. To handle themodified abstract semantics [[x := E]], we designed a slicer for interval anal-ysis, which contains a back-tracer (|E|) for expression E. Then, using (|E|), wechanged the old back-tracer (|x := E|), in order to account for the new improvedabstract semantics of x := E.

5.2 Experimental Results

We implemented the full zone analysis, the abstract-value slicer, and the proofconstruction algorithm using our previous work [Seo et al. 2003]. In our ex-periment, we first executed the analysis with five array accessing programs,and obtained approximate invariants which are strong enough to show the ab-sence of array bounds errors. Then, we ran the slicer for each of the computedabstract interpretation results, and measured the number of invariants (i.e.,DBM entries) in the results that have been eliminated (i.e., replaced by ∞). Fi-nally, we applied the proof construction algorithm to both the original abstractinterpretation results and their sliced versions, and measured how much theslicer reduced the size of the constructed proofs.

Table I shows the number of invariants that have been sliced out by theabstract-value slicer. The second column, labeled by “total,” contains the num-ber of all the nontrivial DBM entries (i.e., entries that are not ∞) in the re-sult of the abstract interpreter, and the fourth column, labeled by “removed,”shows how many of those nontrivial entries the slicer found unnecessaryfor verifying the absence of array bounds errors. The experimental resultshows that about 62% to 81% of computed invariants are not needed for theverification.

The reduction in the size of constructed proofs is shown in Table II. Theconstructed proofs are trees whose nodes express the application of Hoare logicrule or first-order logic rule. The nodes for first-order logic rules have differentsizes, depending on the first-order logic formulas that are contained in thenodes. Thus, for each constructed proof, we counted three entities: the nodesfor Hoare logic rules, the nodes for first-order logic rules, and the first-orderformulas. The abstract-value slicer did not reduce the number of Hoare logic

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 35: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

Goal-Directed Weakening of Abstract Interpretation Results • 39:35

Table II. Reduction in the Proof Size

Before slicing After slicing Reduction inProgram (1)FOLa (2)formulasb (3)FOLc (4)formulasd (1)–(3)/(1) proof sizee

Insertionsort 248 2530 166 1122 33% 53%Partition 398 3866 201 1847 49% 52%Bubblesort 894 12230 389 2677 56% 76%KMP 1364 26898 653 7683 52% 70%Heapsort 2542 52370 1028 7936 60% 84%aNumber of nodes for first-order logic rules that appear in the proof tree for an original (unsliced) analysis result.bNumber of first-order formulas that appear in the proof tree for the original analysis result.cNumber of nodes for first-order logic rules that appear in the proof tree for a sliced analysis result.d Number of formulas that appear in the proof tree for the sliced analysis result.eHere the size of a proof counts all of applied Hoare logic rules, applied first-order logic rules, and first-orderlogic formulas in the proof.

rules, because Hoare rules are applied as many times as the number of programconstructs in the program, and the abstract-value slicer does not change theprogram. However, the slicer reduced the number of first-order logic rules andthe number of first-order formulas. In Table II, we show those numbers beforeand after slicing. The experimental result shows that in the proof trees for slicedanalysis results, about 33% to 60% less rules are used for showing implicationsbetween first-order logic formulas. In the seventh column of the table, we showthe reduction ratio in the size of the whole proofs. For each of the constructedproof trees, we add the number of the nodes and that of first-order formulas,and then, we compute the reduction ratio in this number. The experimentalresult shows that the proof trees for sliced analysis results are about 52% to84% smaller than those for original analysis results.

6. CONCLUSION

In this paper, we have presented a framework for abstract-value slicers thatweaken the abstract interpretation results. We have presented two designguides to define back-tracers for atomic terms that propagate the slicing in-formation of each atomic term backwards. In fact, designing a back-tracer is akey task in implementing an abstract-value slicer.

The motivating application of the slicer is to reduce the proof size in theproof construction method [Seo et al. 2003] that takes the program invariantscomputed by an abstract interpretation and produces a Hoare proof for theseinvariants. Since the slicer reduces the number of invariants to prove, it enablesus to have smaller proofs. In our experiment in constructing the proofs for theabsence of array bound violations in five small yet representative array-accessprograms, our slicing algorithm reduce the proofs’ sizes. In our experimentwith zone analysis, the slicer identified 62%–81% of the abstract interpretationresults as unnecessary, and resulted in 52%–84% reduction in the proof size.

Our abstract-value slicer has been targeted for one specific application, theconstruction of Hoare proofs from abstract interpretation results. For instance,our slicer guarantees that the sliced analysis results are post fixpoints ofthe abstract transfer function. Because of this guarantee, the following proof-construction phase does not have to call a (possibly expensive) theorem prover,

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 36: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

39:36 • S. Seo et al.

but it can instead rely on the soundness of the abstract interpretation only [Seoet al. 2003].

While this paper was being reviewed, Besson et al. [2007] have indepen-dently considered the problem of weakening abstraction interpretation resultsand developed an approach similar to our abstract value slicer. However, thefocus of their work is slightly different from ours. Their work emphasizes theissue of the existence of the weakest abstract interpretation results that provethe property of interest. On the other hand, this paper focuses on algorithmsfor weakening abstract interpretation results and a general framework for val-idating the soundness of such algorithms.

One interesting future direction is to disconnect the tie between the proof con-struction and our framework for abstract-value slicers, and revisit the frame-work. For instance, instead of asking the sliced analysis results to be post fix-points of abstract transfer functions, we might require them to be post fixpointsof concrete transfer functions. This might lead to a new formulation of abstract-value slicers, which is suitable for studying semantics-driven slicing.

Abstract-value slicers can be seen as algorithms for simplifying an abstractdomain without losing the abstract-interpretation based proof of a property ofinterest. Concretely, consider a join semi latticeD, a monotone function F :D →m

D, and abstract values d0, d ∈ D, such that

d0 d and F (d0) d0.

Here D represents an abstract domain for an entire program (not for a singleprogram point) and F an abstract transfer function. Abstract values d0 and ddenote an abstract-interpretation result and a property to verify, respectively,17

and the condition on d0 and d means that the abstract interpreter is able toprove d . In this setting, an abstract-value slicer can be considered to computean upper closure operator ρ on D, such that

ρ(d0) d and (ρ ◦ F )(ρ(d0)) ρ(d0) (equivalently, F (ρ(d0)) ρ(d0)).

That is, it simplifies the abstract domain D to ρ(D), such that the induced bestabstract transfer function ρ ◦ F in the simplified domain can still verify theproperty d , using ρ(d0). Moreover, the slicer attempts to make ρ as abstract aspossible.

The question about simplifying or compressing abstract domains has alreadybeen studied in the theory of abstract domain transformations [File et al. 1996;Giacobazzi and Ranzato 1997; Giacobazzi et al. 2000; Cortesi et al. 1998]. Itwould be interesting to see how the existing results can be used to give a newinsight for designing better abstract value slicers. We currently expect thatthe work on compressing abstract domains can answer when the most abstract(i.e., biggest) upper closure operator ρ satisfying the condition in the previousparagraph exists.

Finally, another interesting future direction is to develop more recipes forbuilding abstract-value slicers. In particular, by using the known techniques for

17Domain D amounts to∏

n∈V A in Section 2, and d0 and d correspond to f and exall(ε0, f ) inDefinition 3.10.

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 37: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

Goal-Directed Weakening of Abstract Interpretation Results • 39:37

constructing abstract interpretations systematically [Cousot and Cousot 1979;Giacobazzi et al. 2000; Giacobazzi and Ranzato 1999; Giacobazzi and Scozzari1998], we can provide corresponding systematic methods for building abstract-value slicers. Some preliminary results in this direction appear in Yang et al.[2006].

ACKNOWLEDGMENTS

We would like to thank David Schmidt, Alan Mycroft, Xavier Rival, DaejunPark and anonymous referees for their helpful comments.

REFERENCES

APPEL, A. W. 2001. Foundational proof-carrying code. In Proceedings of the IEEE Sympo-sium on Logic in Computer Science (LICS). IEEE Computer Society Press, Los Alamitos, 247–258.

APPEL, A. W. AND FELTY, A. P. 2000. A semantic model of types and machine instructions forproof-carrying code. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles ofProgramming Languages (POPL). ACM Press, New York, 243–253.

BALL, T., MAJUMDAR, R., MILLSTEIN, T., AND RAJAMANI, S. K. 2001. Automatic predicate abstractionof C programs. In Proceedings of the ACM SIGPLAN Conference on Programming LanguageDesign and Implementation (PLDI). ACM Press, New York, 203–213.

BALL, T. AND RAJAMANI, S. K. 2001. Automatically validating temporal safety properties of in-terfaces. In Proceedings of the SPIN Workshop on Model Checking of Software. Lecture Notes inComputer Science (LNCS), vol. 2057. Springer-Verlag, 103–122.

BESSON, F., JENSEN, T., AND TURPHIN, T. 2007. Small witnesses for abstract interpretation-basedproofs. In Proceedings of the European Symposium on Programming (ESOP). Lecture Notes inComputer Science, vol. 4421. Springer-Verlag, 268–283.

BOURDONCLE, F. 1993. Abstract debugging of higher-order imperative languages. In Proceedingsof the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).ACM Press, New York, 46–55.

CLARKE, E. M., GRUMBERG, O., JHA, S., LU, Y., AND VEITH, H. 2000. Counterexample-Guidedabstraction refinement. In Proceedings of the International Conference on Computer-AidedVerification (CAV). Lecture Notes in Computer Science, vol. 1855. Springer-Verlag, 154–169.

CLARKE, E. M., GRUMBERG, O., AND PELED, D. A. 1999. Model Checking. The MIT Press.CORTESI, A., FILE, G., AND WINSBOROUGH, W. H. 1998. The quotient of an abstract interpretation.

Theor. Comput. Sci. 202, 1-2, 163–192.COUSOT, P. 1981. Semantic foundations of program analysis. In Program Flow Analysis: Theory

and Applications, S. Muchnick and N. Jones, Eds. Prentice-Hall, Inc., Englewood Cliffs, NJ,Chapter 10, 303–342.

COUSOT, P. 1998. The calculational design of a generic abstract interpreter. In Course notes for theNATO International Summer School Marktoberdorf (Germany) on Calculational System Design,M. Broy and R. Steinbruggen, Eds. NATO ASI Series F. IOS Press, Amsterdam.

COUSOT, P. 2005. Abstract interpretation. MIT course 16.399, http://web.mit.edu/16.

399/www/.COUSOT, P. AND COUSOT, R. 1977. Abstract interpretation: A unified lattice model for static anal-

ysis of programs by construction or approximation of fixpoints. In Proceedings of the ACMSIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). ACM Press,New York, 238–252.

COUSOT, P. AND COUSOT, R. 1979. Systematic design of program analysis frameworks. In Pro-ceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages(POPL). ACM Press, New York, 269–282.

COUSOT, P. AND COUSOT, R. 1999. Refining model checking by abstract interpretation. Autom.Softw. Engin. 6, 1, 69–95.

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 38: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

39:38 • S. Seo et al.

DAMS, D., GERTH, R., AND GRUMBERG, O. 1997. Abstract interpretation of reactive systems. ACMTrans. Program. Lang. Syst. 19, 2, 253–291.

DAVEY, D. A. AND PRIESTLEY, H. A. 1990. Introduction to Lattices and Order. Cambridge UniversityPress.

DAVIS, K. AND WADLER, P. L. 1990. Backwards strictness analysis: Proved and improved. In Func-tional Programming: Proceedings of the 1989 Glasgow Workshop. Springer-Verlag, 12–30.

DUESTERWALD, E., GUPTA, R., AND SOFFA, M. L. 1995. Demand-driven computation of interpro-cedural data flow. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles ofProgramming Languages (POPL). ACM Press, New York, 37–48.

FILE, G., GIACOBAZZI, R., AND RANZATO, F. 1996. A unifying view of abstract domain design. ACMComput. Surv. 28, 2, 333–336.

GIACOBAZZI, R. AND MASTROENI, I. 2004. Abstract noninterference: parameterizing noninterfer-ence by abstract interpretation. In Proceedings of the ACM SIGPLAN-SIGACT Symposium onPrinciples of Programming Languages (POPL). ACM Press, New York, 186–197.

GIACOBAZZI, R. AND RANZATO, F. 1997. Refining and compressing abstract domains. In Proceedingsof the International Colloquium on Automata, Languages and Programming (ICALP). LectureNotes in Computer Science, vol. 1256. Springer-Verlag, 771–781.

GIACOBAZZI, R. AND RANZATO, F. 1999. The reduced relative power operation on abstract domains.Theor. Comput. Sci. 216, 1-2, 159–211.

GIACOBAZZI, R., RANZATO, F., AND SCOZZARI, F. 2000. Making abstract interpretations complete. J.ACM 47, 2, 361–416.

GIACOBAZZI, R. AND SCOZZARI, F. 1998. A logical model for relational abstract domains. ACM Trans.Program. Lang. Syst. 20, 5, 1067–1109.

GRAF, S. AND SAIDI, H. 1997. Construction of abstract state graphs with pvs. In Proceedings ofthe International Conference on Computer Aided Verification (CAV). Lecture Notes in ComputerScience, vol. 1254. Springer-Verlag, 72–83.

HAMID, N., SHAOI, Z., TRIFONOV, V., MONNIER, S., AND NI, Z. 2002. A syntactic approach to founda-tional proof-carrying code. In Proceedings of the IEEE Symposium on Logic in Computer Science(LICS). IEEE Computer Society Press, Los Alamitos, 89–100.

HENZINGER, T., JHALA, R., MAJUMDAR, R., AND SUTRE, G. 2002. Lazy abstraction. In Proceedingsof the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL).ACM Press, New York, 58–70.

HENZINGER, T., JHALA, R., MAJUMDAR, R., AND SUTRE, G. 2003. Software verification with blast. InProceedings of the SPIN Workshop on Model Checking of Software. Lecture Notes in ComputerScience, vol. 2648. Springer-Verlag, 235–239.

HOARE, C. A. R. 1969. An axiomatic basis for computer programming. Comm. ACM 12, 10, 576–580.

HOWE, J. M., KING, A., AND LU, L. 2004. Analysing logic programs by reasoning backwards. InProgram Development in Computational Logic. Lecture Notes in Computer Science, vol. 3049.Springer-Verlag, 152–188.

HUGHES, J. 1988. Backwards analysis of functional programs. In Proceedings of the IFIP TC2Workshop on Partial Evaluation and Mixed Computation. Elsevier, 187–208.

HUGHES, J. AND LAUNCHBURY, J. 1992. Reversing abstract interpretations. In Proceedings of theEuropean Symposium on Programming (ESOP). Lecture Notes in Computer Science, vol. 582.Springer-Verlag, 269–286.

KING, A. AND LU, L. 2002. A backward analysis for constraint logic programs. Theory Prac. LogicProgr. 2, 4-5, 517–547.

MASSE, D. 2001. Combining forward and backward analyses of temporal properties. In Proceed-ings of the 2nd Symposium on Programs as Data Objects (PADO). Lecture Notes in ComputerScience, vol. 2053. Springer-Verlag, 103–116.

MINE, A. 2001. A new numerical abstract domain based on difference-bound matrices. In Pro-ceedings of the 2nd Symposium on Programs as Data Objects (PADO). Lecture Notes in ComputerScience, vol. 2053. Springer-Verlag, 155–172.

MORRISETT, G., WALKER, D., CRARY, K., AND GLEW, N. 1998. From System F to typed assembly lan-guage. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of ProgrammingLanguages (POPL). ACM Press, New York, 85–97.

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.

Page 39: Goal-Directed Weakening of Abstract Interpretation Resultsropas.snu.ac.kr/~kwang/paper/07-toplas-yaseyiha.pdf · Goal-Directed Weakening of Abstract Interpretation Results ... Goal-Directed

Goal-Directed Weakening of Abstract Interpretation Results • 39:39

NECULA, G. C. 1997. Proof-carrying code. In Proceedings of the ACM SIGPLAN-SIGACT Sym-posium on Principles of Programming Languages (POPL). ACM Press, New York, 106–119.

NECULA, G. C. AND LEE, P. 1997. Safe, untrusted agents using proof-carrying code. In SpecialIssue on Mobile Agent Security, G. Vigna, Ed. Lecture Notes in Computer Science, vol. 1419.Springer-Verlag, 61–91.

NECULA, G. C. AND RAHUL, S. P. 2001. Oracle-based checking of untrusted software. In Proceedingsof the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL).ACM Press, New York, 142–154.

NECULA, G. C. AND SCHNECK, R. 2002. Proof-carrying code with untrusted proof rules. In SoftwareSecurity—Theories and Systems. Lecture Notes in Computer Science, vol. 2609. Springer-Verlag,283–298.

RIVAL, X. 2005a. Abstract dependences for alarm diagnosis. In Proceedings of the Asian Sym-posium on Programming Languages and Systems (APLAS). Lecture Notes in Computer Science,vol. 3780. Springer-Verlag, 347–363.

RIVAL, X. 2005b. Understanding the origin of alarms in ASTREE. In Proceedings of the In-ternational Static Analysis Symposium (SAS). Lecture Notes in Computer Science, vol. 3672.Springer-Verlag, 303–319.

SEO, S., YANG, H., AND YI, K. 2003. Automatic construction of Hoare proofs from abstract in-terpretation results. In Proceedings of the Asian Symposium on Programming Languages andSystems (APLAS). Lecture Notes in Computer Science, vol. 2895. Springer-Verlag, 230–245.

TIP, F. 1995. A survey of program slicing techniques. J. Program. Lang. 3, 3, 121–189.WADLER, P. AND HUGHES, R. J. M. 1987. Projections for Strictness Analysis. In Functional Pro-

gramming Languages and Computer Architecture, G. Kahn, Ed. Lecture Notes in ComputerScience, vol. 274. Springer, Berlin, 385–407.

YANG, H., SEO, S., YI, K., AND HAN, T. 2006. Off-line semantic slicing from abstract inter-pretation results. Tech. mem. ROPAS-2006-34, Programming Research Laboratory, School ofComputer Science & Engineering, Seoul National University. Available at http://ropas.

snu.ac.kr/lib/dock/YaSeYiHa2006.pdf.

Received December 2005; revised October 2006; accepted May 2007

ACM Transactions on Programming Languages and Systems, Vol. 29, No. 6, Article 39, Publication date: October 2007.