end-user shape analysis national taiwan university – august 11, 2009 xavier rival inria/ens paris...
TRANSCRIPT
End-User Shape AnalysisEnd-User Shape Analysis
National Taiwan University – August 11, 2009
Xavier RivalINRIA/ENS
Paris
Bor-Yuh Evan Chang Bor-Yuh Evan Chang 張張博聿博聿
U of Colorado, BoulderU of Colorado, Boulder
George C. NeculaU of California,
Berkeley
Programming Languages Programming Languages Research at the University of Research at the University of
Colorado, Boulder Colorado, Boulder
3
Software errors cost a lotSoftware errors cost a lot
~$60 billion $60 billion annually (~0.5% of US GDP)– 2002 National Institute of Standards and Technology
report
total annual revenue of>>10x annual budget of >>
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
4
But there’s hope in program But there’s hope in program analysisanalysis
MicrosoftMicrosoft uses and distributesthe Static Driver VerifierStatic Driver Verifier
AirbusAirbus appliesthe Astrée Static AnalyzerAstrée Static Analyzer
Companies, such as Coverity and Fortify, market static source code analysis toolsBor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
5
Because program analysis canBecause program analysis caneliminate entire classes of bugseliminate entire classes of bugs
For example,
– Reading from a closed file:
– Reacquiring a locked lock:
How?
– Systematically examine the program
– Simulate running program on “all inputs”
– “Automated code review”
read( );read( );
acquire( );acquire( );
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
6
…code …// x now points to an unlocked lock
acquire(x);… code …
analysis state
Program analysis by example:Program analysis by example:Checking for double acquiresChecking for double acquires
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
Simulate running program on “all inputs”
x
acquire(acquire(x););… code …
7
…code …// x now points to an unlocked lock in a linked listin a linked list
acquire(acquire(x));… code …
ideal analysis state
Program analysis by example:Program analysis by example:Checking for double acquiresChecking for double acquires
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
Simulate running program on “all inputs”
x xx
or or or …
8
…code …// x now points to an unlocked lock in a linked listin a linked list
acquire(acquire(x));… code …
ideal analysis state analysis state
Must abstractMust abstract
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
x xx
or or or …
xFor decidability, must abstractabstract—“model all inputs” (e.g., merge objects)
For decidability, must abstractabstract—“model all inputs” (e.g., merge objects)
Abstraction too coarse or not precise not precise enough (e.g., lost x is always unlocked)
Abstraction too coarse or not precise not precise enough (e.g., lost x is always unlocked)
mislabels good code as buggy
9
To address the precision challengeTo address the precision challenge
TraditionalTraditional program analysis mentality:
“Why can’t developers write more specifications specifications for our analysisfor our analysis? Then, we could verify so much more.”
“Since developers won’t write specifications, we will use default abstractionsdefault abstractions (perhaps coarse) that work hopefully most of the time.”
End-user approachEnd-user approach:
“Can we design program analyses around the user? Developers write testing code. Can we adapt the analysisadapt the analysis to use those as specifications?”
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
10
Summary of overviewSummary of overview
Challenge in analysis: Finding a good abstraction
precise enough but not more than necessary
Powerful, generic abstractionsexpensive, hard to use and understand
Built-in, default abstractionsoften not precise enough (e.g., data structures)
End-user approachEnd-user approach:
Must involve the user in abstractionMust involve the user in abstractionwithout expecting the user to be a program
analysis expertBor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
11
Overview of contributionsOverview of contributions
Extensible Inductive Shape Analysis (Xisa)Precise inference of data structure properties
Able to check, for instance, the locking example
Targeted to software developersUses data structure checking code for guidanceTurns testing code into a specification for static Turns testing code into a specification for static
analysisanalysis
Efficient~10-100x speed-up over generic approachesBuilds abstraction out of developer-supplied Builds abstraction out of developer-supplied
checking codechecking code
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
Extensible InductiveExtensible InductiveShape AnalysisShape Analysis
PrecisePrecise inference of data structure properties
PrecisePrecise inference of data structure properties
End-userEnd-user approachEnd-userEnd-user approach
…
13
Shape analysis is a fundamental Shape analysis is a fundamental analysisanalysisData structures are at the core of
– Traditional languages (C, C++, Java)– Emerging web scripting languages
Improves verifiers that try to– Eliminate resource usage bugs
(locks, file handles)– Eliminate memory errors (leaks, dangling pointers)– Eliminate concurrency errors (data races)– Validate developer assertions
Enables program transformations– Compile-time garbage collection– Data structure refactorings
…
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
14
Shape analysis by example:Shape analysis by example:Removing duplicatesRemoving duplicates
// l is a sorted doubly-linked list
for each node cur in list l {remove cur if duplicate;
}assert l is sorted, doubly-
linked with no duplicates;
Example/TestingExample/Testing Code Review/Static AnalysisCode Review/Static Analysis
“no duplicates”l
“sorted dl list”l
program-specificprogram-specific
l 2 2 44
l 2 44
cur
l 2 4
“sorted dl list”l“segment withno duplicates”
cur
intermediate state more
complicated
intermediate state more
complicated
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
15
Shape analysis is not yet practicalShape analysis is not yet practical
Choosing the heap abstraction difficult for precision
Parametric in high-level, developer-oriented predicates++ Extensible
++ Targeted at developers
Xisa
Built-in high-level predicates
-- Harder to extend
++ No additional user effort (if precise enough)
Parametric in low-level, analyzer-oriented predicates++ Very general and expressive
-- Harder for non-expert
89
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
Some representative approachesSome representative approaches:
End-user approachEnd-user approach:
Space Invader [Distefano et
al.]
TVLA[Sagiv et al.]
16
Our approach: Executable specificationsOur approach: Executable specifications
Utilize “run-time checking codechecking code” as specification for static analysis.
assert(sorted_dll(l,…));
for each nodecurinlistl {removecurif duplicate;
}
assert(sorted_dll_nodup(l,…));
l
l
cur
l
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
h.dll(p) =if (h =null) then
trueelse
h!prev= p and h!next.dll(h)
checker
Contribution: Automatically generalize checkers for complicated intermediate states
Contribution: Automatically generalize checkers for complicated intermediate states
Contribution:Build the abstraction for analysis out of developer-specified checking code
Contribution:Build the abstraction for analysis out of developer-specified checking code
•p specifies where prev should point
17
Xisa is …Xisa is …
• Extensible and targeted for developers– Parametric in developer-supplied checkers—viewed as
inductive definitions in separation logic
• Precise yet compact abstraction for efficiency– Data structure-specific based on properties of interest
to the developer
An automated shape analysisshape analysis with a precise memory abstraction based around invariant invariant checkerscheckers.
Xisa
h.dll(p) =if (h = null) then
trueelse
h!prev = prev and h!next.dll(h)
checkers
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
18
SplittingSplitting of summaries
To reflect updates precisely
And summarizing summarizing for termination
Shape analysis is an abstract Shape analysis is an abstract interpretation on abstract memory interpretation on abstract memory descriptions with …descriptions with …
cur
l
cur
l
cur
l
cur
l
cur
l
cur
l
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
19
Roadmap: Components of XisaRoadmap: Components of Xisa
Xisa shape analyzer
abstract interpretation
splitting andinterpreting update
summarizing
level-typeinference
on checkerdefinitions
h.dll(p) =if (h = null) then
trueelse
h!prev = prev and h!next.dll(h)
checkers
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
Learn information about the checker to use it as an abstraction
Learn information about the checker to use it as an abstraction
Compare and contrast manual code review and our automated shape analysis
Compare and contrast manual code review and our automated shape analysis
20
Overview: Split summariesOverview: Split summariesto interpret updates preciselyto interpret updates precisely
l
cur
l
cur
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
Want abstract update to be “exact”, that is, to update one “concrete memory cell”.The example at a high-level: iterate using cur changing the doubly-linked list from purple to red.
l
cur
split at cur
update cur purple to red
l
cur
Challenge:How does the analysis “split” summaries and know where to “split”?
Challenge:How does the analysis “split” summaries and know where to “split”?
21
““Split forward”Split forward”by unfolding inductive definition by unfolding inductive definition
Çh.dll(p) =
if(h =null) thentrue
elseh!prev= p and
h!next.dll(h)
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
l
curget: cur!next
l
cur
null
p dll(cur, p)
l
cur
pdll(n, cur)
n
Analysis doesn’t forget the empty case
Analysis doesn’t forget the empty case
22
““Split backward” also possible and Split backward” also possible and necessarynecessary
h.dll(p) =if (h =null) then
trueelse
h!prev= p and h!next.dll(h)
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
l
cur
pdll(n, cur)
n
for each node cur in list l {
remove cur if duplicate;}assert l is sorted, doubly-linked with no duplicates;
“dll segment”
l
cur
p0dll(n, cur)
n“dll segment”
cur!prev!next= cur!next;
l
cur
dll(n, cur)nnull
get: cur!prev!next
Ç
Technical Details:How does the analysis do this unfolding?Why is this unfolding allowed?(Key: Segments are also inductively defined)
[POPL’08]
How does the analysis know to do this How does the analysis know to do this unfolding?unfolding?
Technical Details:How does the analysis do this unfolding?Why is this unfolding allowed?(Key: Segments are also inductively defined)
[POPL’08]
How does the analysis know to do this How does the analysis know to do this unfolding?unfolding?
23
Roadmap: Components of XisaRoadmap: Components of Xisa
Xisa shape analyzer
abstract interpretation
splitting andinterpreting update
summarizing
level-typeinference
on checkerdefinitions
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
Contribution: Turns testing code into specification for static analysis
Contribution: Turns testing code into specification for static analysis
How do we decide where to unfold?
How do we decide where to unfold?
Derives additional information to guide unfolding
Derives additional information to guide unfolding
h.dll(p) =if (h = null) then
trueelse
h!prev = prev and h!next.dll(h)
checkers
… to be discussed this afternoon
… to be discussed this afternoon
24
Summary of interpreting updatesSummary of interpreting updates
Splitting of summaries needed for precision
Unfolding checkers is a natural way to do splitting
When checker traversal matches code traversal
Checker parameter type analysisUseful for guiding unfolding in difficult cases, for example, “back pointer” traversals
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
25
Results: PerformanceResults: Performance
Benchmark
Max. Num.
Graphs at a
Program Pt
Analysis
Time (msms)
singly-linked list reverse 1 1.0
doubly-linked list reverse 1 1.5
doubly-linked list copy 2 5.4
doubly-linked list remove 5 17.9
doubly-linked list remove and back 5 18.1
search tree with parent insert 3 16.6
search tree with parent insertand back
5 64.7
two-level skip list rebalance 1 11.7
Linux scull driver (894 loc) (char arrays ignored, functions inlined)
4 3969.6
Times negligible for data structure operations (often in sec or 1/10 sec)
Times negligible for data structure operations (often in sec or 1/10 sec)ExpressivenessExpressiveness:
Different data structures
ExpressivenessExpressiveness: Different data structures
Verified shape invariant as given by the checker is preserved across the operation.
Verified shape invariant as given by the checker is preserved across the operation.Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
TVLA: 850 msTVLA: 850 ms
TVLA: 290 msTVLA: 290 ms
Space Invaderonly analyzes lists (built-in)
Space Invaderonly analyzes lists (built-in)
26
Demo: Doubly-linked list reversalDemo: Doubly-linked list reversal
http://www.cs.colorado.edu/~bec/
Body of loop over the Body of loop over the elementselements:Swaps the next and prev fields of curr.
Body of loop over the Body of loop over the elementselements:Swaps the next and prev fields of curr.
Already reversed segmentAlready reversed segmentNode whose next and prev fields were swapped
Node whose next and prev fields were swappedNot yet reversed listNot yet reversed list
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
27
Experience with the toolExperience with the tool
Checkers are easy to writeCheckers are easy to write and try out– Enlightening (e.g., red-black tree checker in 6 lines)– Harder to “reverse engineer” for someone else’s code– Default checkers based on types useful
Future expressiveness and usability improvements– Pointer arithmetic and arraysPointer arithmetic and arrays (in progress)– More generic checkers:
polymorphic “element kind unspecified”
higher-order parameterized by other predicates
Future evaluation: user study
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
28
Near-term future work:Near-term future work:Exploiting common specification Exploiting common specification frameworkframeworkScenarioScenario: Code instrumented with lots of checker calls
(perhaps automatically with object invariants)
assert( mychecker(x) );// … operation on x …assert( mychecker(x) );
Can we prove partsparts statically?Static Analysis View:Hybrid checkingTesting View: Incrementalize invariant checking
ExampleExample: Insert in a sorted list
l v wu
Preservation of sortedness shown staticallyPreservation of sortedness shown statically
Emit run-time check for new element: u · v · wEmit run-time check for new element: u · v · w
• Very slow to execute• Hard to prove statically (in
general)
• Very slow to execute• Hard to prove statically (in
general)
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
29
ConclusionConclusion
Extensible Inductive Shape Analysisprecision demanding program analysis improved by novel user interaction
Developer: Gets results corresponding to intuitionAnalysis: Focused on what’s important to the developer
Practical precise tools for better software with an end-user approach!
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
Programming Languages Programming Languages Research at the University of Research at the University of
Colorado, Boulder Colorado, Boulder
31
Who we areWho we are
Faculty
Ph.D. Students
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder
Amer Diwan Jeremy Siek Sriram SankaranarayananBor-Yuh Evan Chang
32
OutlineOutline
• Gradual Programming– A new collaborative project involving
Amer Diwan, Jeremy Siek, and myself
• Brief Sketches of Other Activities
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder
Gradual Programming: Gradual Programming: Bridging the Semantic GapBridging the Semantic Gap
34
Have you noticed a time where your Have you noticed a time where your program is not optimized where you program is not optimized where you expect?expect?
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder
“I need a map data structure”
“I need a map data structure”
Load class fileRun class initializationCreate hashtable
Load class fileRun class initializationCreate hashtable
ProblemProblem: Tools (IDEs, checkers, optimizers) have no knowledge of what the programmer cares about
ProblemProblem: Tools (IDEs, checkers, optimizers) have no knowledge of what the programmer cares about
… hampering programmer productivity, software reliability, and execution efficiency
… hampering programmer productivity, software reliability, and execution efficiency
semantic gap
ObservationObservation: A disconnect between programmer intent and program meaning
35
Example: Iteration OrderExample: Iteration Order
class OpenArray extends Object {private Double data[]; public boolean contains(Object
lookFor) {for (i = 0; i < data.length; i++) {
if (data[i].equals(lookFor)) return true;
}return false;
}}
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder
Compiler cannot choose a different iteration order (e.g., parallelparallel)
Compiler cannot choose a different iteration order (e.g., parallelparallel)
Must specify an iteration Must specify an iteration orderorder even when it should not matter
Must specify an iteration Must specify an iteration orderorder even when it should not matter
36
Wild and Crazy Idea: Use Non-Wild and Crazy Idea: Use Non-DeterminismDeterminism
• Programmer starts with a potentially non-deterministic program
• Analysis identifies instances of “under-determinedness”
• Programmer eliminates “under-determinedness”
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder
class OpenArray extends Object {private Double data[]; public boolean contains(Object lookFor)
{for (i = 0; i < data.length; i++) {if (data[i].equals(lookFor)) return
true; }return false;
}}
class OpenArray extends Object {private Double data[]; public boolean contains(Object lookFor)
{for (i = 0; i < data.length; i++) {if (data[i].equals(lookFor)) return
true; }return false;
}}
class OpenArray extends Object {private Double data[]; public boolean contains(Object lookFor)
{i 0 .. data.length-1 {
if (data[i].equals(lookFor)) return true;
}return false;
}}
class OpenArray extends Object {private Double data[]; public boolean contains(Object lookFor)
{i 0 .. data.length-1 {
if (data[i].equals(lookFor)) return true;
}return false;
}}
“over-determined”
“under-determined”
just right
starting point
QuestionQuestion: What does this mean? Is it “under-determined”?ResponseResponse: Depends, is the iteration order important?
QuestionQuestion: What does this mean? Is it “under-determined”?ResponseResponse: Depends, is the iteration order important?
37
Let’s try a few program variantsLet’s try a few program variants
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder
public boolean contains(Object lookFor) { for (i = data.length-1; i >= 0; i--) {
if(data[i].equals(lookFor)) return true; } return false;}
public boolean contains(Object lookFor) { for (i = data.length-1; i >= 0; i--) {
if(data[i].equals(lookFor)) return true; } return false;}
public boolean contains(Object lookFor) { for (i = 0; i < data.length; i++) {
if(data[i].equals(lookFor)) return true; } return false;}
public boolean contains(Object lookFor) { for (i = 0; i < data.length; i++) {
if(data[i].equals(lookFor)) return true; } return false;}
public boolean contains(Object lookFor) { parallel_for (0, data.length-1) i => {
if(data[i].equals(lookFor)) return true; } return false;}
public boolean contains(Object lookFor) { parallel_for (0, data.length-1) i => {
if(data[i].equals(lookFor)) return true; } return false;}
Do they compute the same result?Do they compute the same result?
ApproachApproach: Try to verify equivalence of program variants up to a specification
ApproachApproach: Try to verify equivalence of program variants up to a specificationYes Pick any oneNo Ask user
Yes Pick any oneNo Ask user
What about here?What about here?
38
Surprisingly, analysis says no. Surprisingly, analysis says no. Why? Why?
Exceptions!
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder
Need user interaction to refine specification that captures programmer intent
Need user interaction to refine specification that captures programmer intent
nulla.data=
a.contains( )left-to-right iterationreturns true
right-to-left iterationthrows NullPointerException
39
Proposal SummaryProposal Summary
• “Fix semantics per program”: Abstract constructs with many possible concrete implementations
• Apply program analysis to find inconsistent implementations
• Interact with the user to refine the specification
• Language designer role can enumerate the possible implementations
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder
40
Bridging the Semantic GapBridging the Semantic Gap
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder
“I need a map data structure”
“I need a map data structure”
“Looks like iterator order matters for your program”
“Looks like iterator order matters for your program”
“Yes, I need iteration in sorted order”
“Yes, I need iteration in sorted order”
“Let’s use a balanced binary tree (TreeMap)”
“Let’s use a balanced binary tree (TreeMap)”
Other ActivitiesOther Activities
42
Formal MethodsFormal Methods
Prof. Sriram Sankaranarayanan Sriram Sankaranarayanan (CS)Cyber-physical systems verification– hybrid automata theory, control systems
verification, analysis of Simulink and Stateflow diagrams
– advanced mathematical techniques:• convex optimization: linear and semi-definite• differential equations: set-valued analysis• SMT solvers over non-linear theories
– applications to automotive software (with NEC labs and GM labs)
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder
Prof. Aaron Bradley Aaron Bradley (ECEE)Decision procedures, Model checking
Prof. Fabio Somenzi Fabio Somenzi (ECEE)
43
Programming Languages and Programming Languages and AnalysisAnalysisProf. Amer Diwan Amer Diwan (CS)
Performance analysis of computer systemsHow do we know that we have not perturbed our data?Using machine learning and statistical techniques to reason about
data
Tool-assisted program transformationsAlgorithmic optimizations for performanceProgram metamorphosis for improving code quality
Prof. Jeremy Siek Jeremy Siek (ECEE/CS)Gradual type checking: static (Java) dynamic (Python)
Meta-programming: programs that write programs
Compilers for optimizing scientific codes
Prof. Bor-Yuh Evan ChangBor-Yuh Evan Chang (CS)End-user program analysisPrecise analysis (shape, collections)Interactive analysis refinement (type checking + symbolic evaluation)
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder
44
Applying to ColoradoApplying to Colorado
• Computer Science Department information
• Deadlines
• Graduate Advisor: Nicholas Vocatura
• Talk to me about application fee waiver
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder
http://www.cs.colorado.edu/grad/admission/http://www.cs.colorado.edu/grad/admission/
http://www.cs.colorado.edu/~bec/http://www.cs.colorado.edu/~bec/
[email protected]@colorado.edu
Dec 1 for Fall (Sep 1 for Spring)Dec 1 for Fall (Sep 1 for Spring)