model checking from tools to theory rajeev alur university of pennsylvania 25mc, floc, august 2006

21
Model Checking From Tools to Theory Rajeev Alur University of Pennsylvania 25MC, FLOC, August 2006

Upload: danielle-harrington

Post on 26-Mar-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Model Checking From Tools to Theory Rajeev Alur University of Pennsylvania 25MC, FLOC, August 2006

Model CheckingFrom Tools to Theory

Rajeev Alur

University of Pennsylvania

25MC, FLOC, August 2006

Page 2: Model Checking From Tools to Theory Rajeev Alur University of Pennsylvania 25MC, FLOC, August 2006

Automata/Logics

CS Theory

Algorithms/Complexity

Model Checking Databases Compilers Linguistic/NLP

How has model checking influenced basic theory?Connecting tree automata, fixpoint logics, parity gamesTemporal logics and automata over infinite wordsData structures: OBDDsTimed and hybrid automataMany more examples

Page 3: Model Checking From Tools to Theory Rajeev Alur University of Pennsylvania 25MC, FLOC, August 2006

Software Analysis

Analysis tool

Model checkingStatic analysisDeductive reasoningTestingRuntime monitoring

Product M

Specification SProgram P

Logics/automataAd-hoc patternsImplicit (built in tool)Program annotations

Page 4: Model Checking From Tools to Theory Rajeev Alur University of Pennsylvania 25MC, FLOC, August 2006

Checking Structured Programs Classical model checking: Both program and specification define

regular languages Control-flow requires stack, so program defines a context-free

language Algorithms exist for checking regular specifications against

context-free modelsEmptiness of pushdown automata is solvable

Product of a regular language and a context-free language is context-free

But, checking context-free spec against a context-free model is undecidable!

Context-free languages are not closed under intersection

Inclusion as well as emptiness of intersection undecidable

Existing software model checkers: pushdown models (Boolean programs) and regular specifications

SLAM, BLAST, F-SOFT, jMoped …

Even in absence of recursion, hierarchical structure retained for analysis

Page 5: Model Checking From Tools to Theory Rajeev Alur University of Pennsylvania 25MC, FLOC, August 2006

Are Context-free Specs Interesting? Classical Hoare-style pre/post conditions

If p holds when procedure A is invoked, q holds upon return

Total correctness: every invocation of A terminates

Integral part of emerging standard JML

Stack inspection properties (security/access control)

If setuuid bit is being set, root must be in call stack

Interprocedural data-flow analysis

All these need matching of calls with returns, or finding unmatched calls

Recall: Language of words over [, ] such that brackets are well matched is not regular, but context-free

Page 6: Model Checking From Tools to Theory Rajeev Alur University of Pennsylvania 25MC, FLOC, August 2006

Checking Context-free Specs Many tools exist for checking specific properties

Security research on stack inspection properties

Annotating programs with asserts and local variables

Inter-procedural data-flow analysis algorithms

What’s common to checkable properties?

Both model M and spec S have their own stacks, but the two stacks are synchronized, so product is possible

As a generator, program should expose the matching structure of calls and returns

Solution: Nested words and theory ofregular languages over nested words

Page 7: Model Checking From Tools to Theory Rajeev Alur University of Pennsylvania 25MC, FLOC, August 2006

Nested Words

Nested word: Linear sequence + well-nested edges

Positions labeled with symbols in

a1a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12

Positions classified as: Call positions: both linear and hierarchical successors

Return positions: both linear and hierarchical predecessors

Internal positions: otherwise

Page 8: Model Checking From Tools to Theory Rajeev Alur University of Pennsylvania 25MC, FLOC, August 2006

Program Executions as Nested Words

Program

global int x;bool P() { … x = 3; if Q x = 1 ; …}

bool Q () { local int y; … x = y; return (x==0);}

An executionas a word

Symbols: w : write xr : read xs : other

s

w

s

w

r

s

w

s

11

2

2

3

3

4

4

An executionas a nested word

s

w

s

w

r

s

w

s

1

2

3

4

Summary edges from calls to returns

Page 9: Model Checking From Tools to Theory Rajeev Alur University of Pennsylvania 25MC, FLOC, August 2006

RNA as a Nested Word

Primary structure: Linear sequence of nucleotides (A, C, G, U)

Secondary structure: Hydrogen bonds between complementary nucleotides (A-U, G-C, G-U)

In literature, this is modeled as trees.

Algorithmic question: Find similarity between RNAs using edit distances

G

C

U

GA

A

U

AC

G C

G

C

U

C

G

Page 10: Model Checking From Tools to Theory Rajeev Alur University of Pennsylvania 25MC, FLOC, August 2006

Model for Linear Hierarchical Data Nested words: both linear and hierarchical structure is made

explicit. This seems natural in many applications

Executions of structured program

RNA: primary backbone is linear, secondary bonds are well-nested

XML documents: matching of open/close tags

Words: only linear structure is explicit

Pushdown automata add/discover hierarchical structure

Parantheses languages: implicit nesting edges

Ordered Trees: only hierarchical structure is explicit

Ordering of siblings imparts explicit partial order

Linear order is implicit, and can be recovered by infix traversal

Page 11: Model Checking From Tools to Theory Rajeev Alur University of Pennsylvania 25MC, FLOC, August 2006

Nested Word Automata (NWA)

a1 a2

a3 a4

a5 a6

a7 a8

a9

States Q, initial state q0, final states F

Starts in initial state, reads the word from left to right labeling edges with states, where states on the outgoing edges are determined from states of incoming edges

Transition function: c : Q x -> Q x Q (for call positions)

i : Q x -> Q (for internal positions)

r : Q x Q x -> Q (for return positions)

Nested word is accepted if the run ends in a final state

q8q7

q5q4

q3q2

q1q0

q9=r(q8,q29,a9)

q6=i(q5,a6)(q2,q29)=c(q1,a2)

q29

q47

Page 12: Model Checking From Tools to Theory Rajeev Alur University of Pennsylvania 25MC, FLOC, August 2006

Regular Languages of Nested Words

A set of nested words is regular if there is a finite-state NWA that accepts it

Nondeterministic automata over nested wordsTransition function: c: Qx->2QxQ, i :Q x -> 2Q, r:Q x Q x -> 2Q

Can be determinized

Appealing theoretical properties

Effectively closed under various operations (union, intersection, complement, concatenation, projection, Kleene-* …)

Decidable decision problems: membership, language inclusion, language equivalence …

Alternate characterization: MSO, syntactic congruences

Page 13: Model Checking From Tools to Theory Rajeev Alur University of Pennsylvania 25MC, FLOC, August 2006

Determinization

Goal: Given a nondeterministic automaton A with states Q, construct an equivalent deterministic automaton B

Intuition: Maintain a set of “summaries” (pairs of states)

State-space of B: 2QxQ

Initially, state contains q->q, for each q

At call, if state u splits into (u’,u’’), summary q->u splits into (q->u’,u’->u’’)

At return, summaries q->u’ and u’->w join to give q->u

Acceptance: must contain q->q’, where q is initial and q’ is final

q->qq’->q’…

q->uq’->v…

u’->u’’ v’-

>v’’…

u’->wu’->w’v’->w’’…

q->wq->w’q’->w’’…

q->u’q’->v’…

Page 14: Model Checking From Tools to Theory Rajeev Alur University of Pennsylvania 25MC, FLOC, August 2006

MSO-based Characterization Monadic Second Order Logic of Nested Words

First order variables: x,y,z; Set variables: X,Y,Z…

Atomic formulas: a(x), X(x), x=y, x < y, x -> y

Logical connectives and quantifiers

Sample formula:

For all x,y. ( (a(x) and x -> y) implies b(y))

Every call labeled a is matched by a return labeled b

Thm: A language L of nested words is regular iff it is definable by an MSO sentence

Robust characterization of regularity as in case of languages of words and languages of trees

Page 15: Model Checking From Tools to Theory Rajeev Alur University of Pennsylvania 25MC, FLOC, August 2006

Application: Software Analysis

A program P with stack-based control is modeled by a set L of nested words it generates

Choice of depends on the intended application

Summary edges exposing call/return structure are added (exposure can depend on what needs to be checked)

If P has finite data (e.g. pushdown automata, Boolean programs, recursive state machines) then L is regular

Specification S given as a regular language of nested words

Verification: Does every behavior in L satisfy S ? Take product of P and complement of S and analyze

Runtime monitoring: Check if current execution is accepted by S (compiled as a deterministic automaton)

Model checking: Check if L is contained in S, decidable when P has finite data (no extra cost, as analysis still requires context-free reachability)

Page 16: Model Checking From Tools to Theory Rajeev Alur University of Pennsylvania 25MC, FLOC, August 2006

Temporal Logic of Nested Time: CaRet

Global paths, Local paths, Caller paths

Three versions of every temporal modality

Sample CaRet formulas:

(if p then local-next q) global-unless r

if p then caller-eventually q

Global-always (if p then local-eventually q)

Page 17: Model Checking From Tools to Theory Rajeev Alur University of Pennsylvania 25MC, FLOC, August 2006

Connection to Pushdown Automata

Note: First formalization of our ideas led to capturing the shape by typing of input symbols as calls, internals, and returns, and the class of Visibly Pushdown Languages (STOC’04) as a subclass of deterministic context-free languages with the same closure/decidability properties.

Pushdown AutomataWords Parse Trees

Nested Word Automata Nested wordsGenerator

Acceptor

Page 18: Model Checking From Tools to Theory Rajeev Alur University of Pennsylvania 25MC, FLOC, August 2006

New Theory Problems

Congruences and minimization

First-order Temporal logics for nested time

Infinite nested words and -regular languages

Nested trees for branching-time verification (talk@CAV)

Fixpoint logics

Page 19: Model Checking From Tools to Theory Rajeev Alur University of Pennsylvania 25MC, FLOC, August 2006

Application: Document Processing

XML Document

<conference> <name> CAV 2006 </name> <location> <city> Seattle </city> <hotel> Sheraton </hotel> </location> <sponsor> MSR </sponsor> <sponsor> Cadence </sponsor></conference>

Model a document d as a nested word Nesting edges from <tag> to </tag>

Sample Query: Find documents related to conferences sponsored by Cadence inSeattle

Specify query as a regular language L of nested wordsAnalysis: Membership question Does document d satisfy query L ?

Use NWA instead of tree automata!(typically, no recursion, but only hierarchy)

Query Processing

Page 20: Model Checking From Tools to Theory Rajeev Alur University of Pennsylvania 25MC, FLOC, August 2006

Nested Words vs Ordered Trees

Why not use tree encoding and tree automata ?

Notion of regularity is basically the same in both views

Nested words are more flexible: can take prefix, suffix, word concatenation, being well-matched not a pre-req

Reading input from left to right for query processing is more natural with nested words

Many versions of tree automata with different properties

NWA have both top-down and bottom-up aspect, more succinct!

<a> <b> x y z </b> <c> </c></a>

a

x y z

b c

Page 21: Model Checking From Tools to Theory Rajeev Alur University of Pennsylvania 25MC, FLOC, August 2006

Recap

1. Program executions have both linear and hierarchical structureAllow specification logics/automata to refer to both structuresRobust foundation for more expressive specificationsAnalysis techniques for structured programs already based uopn computing summaries, so extra expressiveness comes for free

2. Nested words are worth theoretical investigationsNew way of looking at pushdown automataNested trees and fixpoint logicsTemporal modalities for nested time …

3. Modeling documents as nested words can be fruitfulAutomata have both top-down and bottom-up flavor

Better ways of querying streaming documents?

Talk based on joint work with P. Madhusudan

S. Chaudhuri, K. Etessami, M. Viswanathan …