model checking from tools to theory rajeev alur university of pennsylvania 25mc, floc, august 2006
TRANSCRIPT
Model CheckingFrom Tools to Theory
Rajeev Alur
University of Pennsylvania
25MC, FLOC, August 2006
Automata/Logics
CS Theory
Algorithms/Complexity
Model Checking Databases Compilers Linguistic/NLP
How has model checking influenced basic theory?Connecting tree automata, fixpoint logics, parity gamesTemporal logics and automata over infinite wordsData structures: OBDDsTimed and hybrid automataMany more examples
Software Analysis
Analysis tool
Model checkingStatic analysisDeductive reasoningTestingRuntime monitoring
Product M
Specification SProgram P
Logics/automataAd-hoc patternsImplicit (built in tool)Program annotations
Checking Structured Programs Classical model checking: Both program and specification define
regular languages Control-flow requires stack, so program defines a context-free
language Algorithms exist for checking regular specifications against
context-free modelsEmptiness of pushdown automata is solvable
Product of a regular language and a context-free language is context-free
But, checking context-free spec against a context-free model is undecidable!
Context-free languages are not closed under intersection
Inclusion as well as emptiness of intersection undecidable
Existing software model checkers: pushdown models (Boolean programs) and regular specifications
SLAM, BLAST, F-SOFT, jMoped …
Even in absence of recursion, hierarchical structure retained for analysis
Are Context-free Specs Interesting? Classical Hoare-style pre/post conditions
If p holds when procedure A is invoked, q holds upon return
Total correctness: every invocation of A terminates
Integral part of emerging standard JML
Stack inspection properties (security/access control)
If setuuid bit is being set, root must be in call stack
Interprocedural data-flow analysis
All these need matching of calls with returns, or finding unmatched calls
Recall: Language of words over [, ] such that brackets are well matched is not regular, but context-free
Checking Context-free Specs Many tools exist for checking specific properties
Security research on stack inspection properties
Annotating programs with asserts and local variables
Inter-procedural data-flow analysis algorithms
What’s common to checkable properties?
Both model M and spec S have their own stacks, but the two stacks are synchronized, so product is possible
As a generator, program should expose the matching structure of calls and returns
Solution: Nested words and theory ofregular languages over nested words
Nested Words
Nested word: Linear sequence + well-nested edges
Positions labeled with symbols in
a1a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12
Positions classified as: Call positions: both linear and hierarchical successors
Return positions: both linear and hierarchical predecessors
Internal positions: otherwise
Program Executions as Nested Words
Program
global int x;bool P() { … x = 3; if Q x = 1 ; …}
bool Q () { local int y; … x = y; return (x==0);}
An executionas a word
Symbols: w : write xr : read xs : other
s
w
s
w
r
s
w
s
11
2
2
3
3
4
4
An executionas a nested word
s
w
s
w
r
s
w
s
1
2
3
4
Summary edges from calls to returns
RNA as a Nested Word
Primary structure: Linear sequence of nucleotides (A, C, G, U)
Secondary structure: Hydrogen bonds between complementary nucleotides (A-U, G-C, G-U)
In literature, this is modeled as trees.
Algorithmic question: Find similarity between RNAs using edit distances
G
C
U
GA
A
U
AC
G C
G
C
U
C
G
Model for Linear Hierarchical Data Nested words: both linear and hierarchical structure is made
explicit. This seems natural in many applications
Executions of structured program
RNA: primary backbone is linear, secondary bonds are well-nested
XML documents: matching of open/close tags
Words: only linear structure is explicit
Pushdown automata add/discover hierarchical structure
Parantheses languages: implicit nesting edges
Ordered Trees: only hierarchical structure is explicit
Ordering of siblings imparts explicit partial order
Linear order is implicit, and can be recovered by infix traversal
Nested Word Automata (NWA)
a1 a2
a3 a4
a5 a6
a7 a8
a9
States Q, initial state q0, final states F
Starts in initial state, reads the word from left to right labeling edges with states, where states on the outgoing edges are determined from states of incoming edges
Transition function: c : Q x -> Q x Q (for call positions)
i : Q x -> Q (for internal positions)
r : Q x Q x -> Q (for return positions)
Nested word is accepted if the run ends in a final state
q8q7
q5q4
q3q2
q1q0
q9=r(q8,q29,a9)
q6=i(q5,a6)(q2,q29)=c(q1,a2)
q29
q47
Regular Languages of Nested Words
A set of nested words is regular if there is a finite-state NWA that accepts it
Nondeterministic automata over nested wordsTransition function: c: Qx->2QxQ, i :Q x -> 2Q, r:Q x Q x -> 2Q
Can be determinized
Appealing theoretical properties
Effectively closed under various operations (union, intersection, complement, concatenation, projection, Kleene-* …)
Decidable decision problems: membership, language inclusion, language equivalence …
Alternate characterization: MSO, syntactic congruences
Determinization
Goal: Given a nondeterministic automaton A with states Q, construct an equivalent deterministic automaton B
Intuition: Maintain a set of “summaries” (pairs of states)
State-space of B: 2QxQ
Initially, state contains q->q, for each q
At call, if state u splits into (u’,u’’), summary q->u splits into (q->u’,u’->u’’)
At return, summaries q->u’ and u’->w join to give q->u
Acceptance: must contain q->q’, where q is initial and q’ is final
q->qq’->q’…
q->uq’->v…
u’->u’’ v’-
>v’’…
u’->wu’->w’v’->w’’…
q->wq->w’q’->w’’…
q->u’q’->v’…
MSO-based Characterization Monadic Second Order Logic of Nested Words
First order variables: x,y,z; Set variables: X,Y,Z…
Atomic formulas: a(x), X(x), x=y, x < y, x -> y
Logical connectives and quantifiers
Sample formula:
For all x,y. ( (a(x) and x -> y) implies b(y))
Every call labeled a is matched by a return labeled b
Thm: A language L of nested words is regular iff it is definable by an MSO sentence
Robust characterization of regularity as in case of languages of words and languages of trees
Application: Software Analysis
A program P with stack-based control is modeled by a set L of nested words it generates
Choice of depends on the intended application
Summary edges exposing call/return structure are added (exposure can depend on what needs to be checked)
If P has finite data (e.g. pushdown automata, Boolean programs, recursive state machines) then L is regular
Specification S given as a regular language of nested words
Verification: Does every behavior in L satisfy S ? Take product of P and complement of S and analyze
Runtime monitoring: Check if current execution is accepted by S (compiled as a deterministic automaton)
Model checking: Check if L is contained in S, decidable when P has finite data (no extra cost, as analysis still requires context-free reachability)
Temporal Logic of Nested Time: CaRet
Global paths, Local paths, Caller paths
Three versions of every temporal modality
Sample CaRet formulas:
(if p then local-next q) global-unless r
if p then caller-eventually q
Global-always (if p then local-eventually q)
Connection to Pushdown Automata
Note: First formalization of our ideas led to capturing the shape by typing of input symbols as calls, internals, and returns, and the class of Visibly Pushdown Languages (STOC’04) as a subclass of deterministic context-free languages with the same closure/decidability properties.
Pushdown AutomataWords Parse Trees
Nested Word Automata Nested wordsGenerator
Acceptor
New Theory Problems
Congruences and minimization
First-order Temporal logics for nested time
Infinite nested words and -regular languages
Nested trees for branching-time verification (talk@CAV)
Fixpoint logics
Application: Document Processing
XML Document
<conference> <name> CAV 2006 </name> <location> <city> Seattle </city> <hotel> Sheraton </hotel> </location> <sponsor> MSR </sponsor> <sponsor> Cadence </sponsor></conference>
Model a document d as a nested word Nesting edges from <tag> to </tag>
Sample Query: Find documents related to conferences sponsored by Cadence inSeattle
Specify query as a regular language L of nested wordsAnalysis: Membership question Does document d satisfy query L ?
Use NWA instead of tree automata!(typically, no recursion, but only hierarchy)
Query Processing
Nested Words vs Ordered Trees
Why not use tree encoding and tree automata ?
Notion of regularity is basically the same in both views
Nested words are more flexible: can take prefix, suffix, word concatenation, being well-matched not a pre-req
Reading input from left to right for query processing is more natural with nested words
Many versions of tree automata with different properties
NWA have both top-down and bottom-up aspect, more succinct!
<a> <b> x y z </b> <c> </c></a>
a
x y z
b c
Recap
1. Program executions have both linear and hierarchical structureAllow specification logics/automata to refer to both structuresRobust foundation for more expressive specificationsAnalysis techniques for structured programs already based uopn computing summaries, so extra expressiveness comes for free
2. Nested words are worth theoretical investigationsNew way of looking at pushdown automataNested trees and fixpoint logicsTemporal modalities for nested time …
3. Modeling documents as nested words can be fruitfulAutomata have both top-down and bottom-up flavor
Better ways of querying streaming documents?
Talk based on joint work with P. Madhusudan
S. Chaudhuri, K. Etessami, M. Viswanathan …