page 1 5/2/2007 kestrel technology llc a tutorial on abstract interpretation as the theoretical...

53
Page 1 5/2/2007 Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk Arnaud Venet Kestrel Technology [email protected] Please see Notes View for annotations.

Upload: hilda-evans

Post on 05-Jan-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 15/2/2007 Kestrel Technology LLC

A Tutorial on Abstract Interpretation as the Theoretical Foundation of

CodeHawk

Arnaud Venet

Kestrel Technology

[email protected]

Please see Notes View for annotations.

Page 2: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 25/2/2007 Kestrel Technology LLC

Static Analysis

CodeStatic

AnalysisTool

Parameters

Listof

defects

• Confirmed property violations

• Indeterminates (possible violations)

Page 3: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 35/2/2007 Kestrel Technology LLC

Can we find all defects?

• Classic undecidability results in Computer Science apply (halting problem)

• We require soundness (no defects are missed)– A conservative approach is acceptable

• Abstract Interpretation is the enabling theory in CodeHawk– Sound– Tunably precise– Scalable– “Generatively general”

Page 4: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 45/2/2007 Kestrel Technology LLC

Intuition

SWDefects

All lines in the code

Abstract Interpretation

Page 5: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 55/2/2007 Kestrel Technology LLC

Abstract Interpretation

• Computes an envelope of all data in the program

• Mathematical assurance• Static analyzers based on Abstract

interpretation are difficult to engineer• KT’s expertise: building scalable and effective

abstract interpreters

Page 6: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 65/2/2007 Kestrel Technology LLC

Properties vs. defects

• An application might be free of programming language defects but not carry the desired property– resource issues (memory, execution time)– separation– range of output data

• Abstract Interpretation can address application-specific of properties as well as language defects

Page 7: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 75/2/2007 Kestrel Technology LLC

Objectives

• Go over a detailed example– Understand how the technology works

• Achievements and challenges in the engineering of abstract interpreters– What it means to build an analyzer based on

Abstract Interpretation

Page 8: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 85/2/2007 Kestrel Technology LLC

Detailed example

Page 9: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 95/2/2007 Kestrel Technology LLC

Buffer overflow

for(i = 0; i < 10; i++) {

if(message[i].kind == SHORT_CMD)

allocate_space (channel, 1000);

else

allocate_space (channel, 2000);

}

Can we exceed the channel’s buffer capacity?

Page 10: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 105/2/2007 Kestrel Technology LLC

Control Flow Graph

i = 0

i < 10 ?

i++

message[i].kind == SHORT_CMD ?

allocate_space (channel, 2000)

allocate_space(channel, 1000)

stop

Page 11: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 115/2/2007 Kestrel Technology LLC

i = 0

i < 10 ?

i++

message[i].kind == SHORT_CMD ?

allocate_space (channel, 2000)

allocate_space (channel, 1000)

stop

i = 0

i < 10 ?

i++

message[i].kind == SHORT_CMD ?

allocate_space (channel, 2000)

allocate_space (channel, 1000)

stop

Analytic model of the code

i < 10 ?

i = 0

i++

M = M + 2000M = M + 1000

stop

M = 0

Page 12: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 125/2/2007 Kestrel Technology LLC

Analysis process intuition

• We mimic the execution of the program• We collect all possible data values for all

variables for all possible traces

Page 13: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 135/2/2007 Kestrel Technology LLC

Analyzing the model

i = 0

i < 10 ?

M = M + 2000M = M + 1000

M = 0

stop

i++

i

M

Page 14: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 145/2/2007 Kestrel Technology LLC

Initially

i = 0

i < 10 ?

M = M + 2000M = M + 1000

M = 0

stop

i++

i

M

Page 15: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 155/2/2007 Kestrel Technology LLC

Loop initialization

i = 0

i < 10 ?

M = M + 2000M = M + 1000

M = 0

stop

i++

i

M

Page 16: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 165/2/2007 Kestrel Technology LLC

Loop entry

i = 0

i < 10 ?

M = M + 2000M = M + 1000

M = 0

stop

i++

i

M

Page 17: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 175/2/2007 Kestrel Technology LLC

Analyzing a branching (1)

i = 0

i < 10 ?

M = M + 2000M = M + 1000

M = 0

stop

i++

i

M

Page 18: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 185/2/2007 Kestrel Technology LLC

Analyzing a branching (2)

i = 0

i < 10 ?

M = M + 2000M = M + 1000

M = 0

stop

i++

i

M

i

M

?

Page 19: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 195/2/2007 Kestrel Technology LLC

Accumulating all possible values

i = 0

i < 10 ?

M = M + 2000M = M + 1000

M = 0

stop

i++

i

M

Page 20: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 205/2/2007 Kestrel Technology LLC

Abstraction of point clouds

• We want the analysis to terminate in reasonable time

• We need a tractable representation of point clouds in arbitrary dimensions

• Abstract Interpretation offers a broad choice of such representations

• Example: convex polyhedra– Compute the convex hull of a point cloud

Page 21: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 215/2/2007 Kestrel Technology LLC

Analyzing a branching

i = 0

i < 10 ?

M = M + 2000M = M + 1000

M = 0

stop

i++

i

M

i

M

Page 22: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 225/2/2007 Kestrel Technology LLC

Convex hull

i = 0

i < 10 ?

M = M + 2000M = M + 1000

M = 0

stop

i++

i

M

Page 23: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 235/2/2007 Kestrel Technology LLC

Iterating the loop analysis

i = 0

i < 10 ?

M = M + 2000M = M + 1000

M = 0

stop

i++ i

M

i

M

Page 24: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 245/2/2007 Kestrel Technology LLC

Building the loop invariant

i = 0

i < 10 ?

M = M + 2000M = M + 1000

M = 0

stop

i++

i

M

Page 25: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 255/2/2007 Kestrel Technology LLC

Analyzing a branching

i = 0

i < 10 ?

M = M + 2000M = M + 1000

M = 0

stop

i++

i

M

Page 26: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 265/2/2007 Kestrel Technology LLC

Analyzing a branching

i = 0

i < 10 ?

M = M + 2000M = M + 1000

M = 0

stop

i++

i

M

i

M

Page 27: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 275/2/2007 Kestrel Technology LLC

Convex hull

i = 0

i < 10 ?

M = M + 2000M = M + 1000

M = 0

stop

i++

i

M

Page 28: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 285/2/2007 Kestrel Technology LLC

Building the loop invariant

i = 0

i < 10 ?

M = M + 2000M = M + 1000

M = 0

stop

i++ i

M

i

M

Page 29: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 295/2/2007 Kestrel Technology LLC

Keeping iterating…

i = 0

i < 10 ?

M = M + 2000M = M + 1000

M = 0

stop

i++

i

M

Page 30: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 305/2/2007 Kestrel Technology LLC

Passing to the limit

• We want this iterative process to end at some point

• We need to converge when analyzing loops• After some iteration steps, we use a widening

operation at loop entry to enforce convergence

Page 31: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 315/2/2007 Kestrel Technology LLC

Widening

• Let a1, a2, …an, … be a sequence of polyhedra, then the sequence– w1 = a1

– wn+1 = wn an+1

is ultimately stationary

• The widening is a join operation i.e.,

a a b & b a b

Page 32: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 325/2/2007 Kestrel Technology LLC

Widening for intervals

• [a, b] [c, d] =[if c < a then - else a, if b < d then + else b]

• Example:

[10, 20] [11, 30] = [10, +]

Page 33: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 335/2/2007 Kestrel Technology LLC

Widening for polyhedra

• We eliminate the faces of the computed convex envelope that are not stable

• Convergence is reached in at most N steps where N is the number of faces of the polyhedron at loop entry

Page 34: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 345/2/2007 Kestrel Technology LLC

Widening

i = 0

i < 10 ?

M = M + 2000M = M + 1000

M = 0

stop

i++ i

M

i

M

Page 35: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 355/2/2007 Kestrel Technology LLC

After widening

i = 0

i < 10 ?

M = M + 2000M = M + 1000

M = 0

stop

i++

i

M

Page 36: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 365/2/2007 Kestrel Technology LLC

Detecting convergence

• Abstract iteration sequence– F1 = P (initial polyhedron)

– Fn+1 = Fn if S(Fn) Fn

Fn S(Fn) otherwise

where S is the semantic transformer associated to

the loop body

• Theorem: if there exists N such that FN+1 FN, then Fn = FN for n > N.

Page 37: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 375/2/2007 Kestrel Technology LLC

Convergence

i = 0

i < 10 ?

M = M + 2000M = M + 1000

M = 0

stop

i++

i

M

The computation has converged

Page 38: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 385/2/2007 Kestrel Technology LLC

We are not done yet…

• The analyzer has just proven that

1000 * i ≤ M ≤ 2000 * i• But we have lost all information about the termination

condition 0 ≤ i ≤ 10• Since we have obtained an envelope of all possible

values of the variables, if we run the computation again we still get such an envelope

• The point is that this new envelope can be smaller• This refinement step is called narrowing

Page 39: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 395/2/2007 Kestrel Technology LLC

Refinement

i = 0

i < 10 ?

M = M + 2000M = M + 1000

M = 0

stop

i++

M

i

i

M

9

Page 40: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 405/2/2007 Kestrel Technology LLC

Analyzing a branching

i = 0

i < 10 ?

M = M + 2000M = M + 1000

M = 0

stop

i++

i

i

M

9 i

i

M

9

Page 41: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 415/2/2007 Kestrel Technology LLC

Convex hull

i = 0

i < 10 ?

M = M + 2000M = M + 1000

M = 0

stop

i++

i

M

9

Page 42: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 425/2/2007 Kestrel Technology LLC

Back to loop entry

i = 0

i < 10 ?

M = M + 2000M = M + 1000

M = 0

stop

i++ i

M

i

M

101

Page 43: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 435/2/2007 Kestrel Technology LLC

Narrowing

i = 0

i < 10 ?

M = M + 2000M = M + 1000

M = 0

stop

i++i

M

10

M

i

Page 44: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 445/2/2007 Kestrel Technology LLC

Refined loop invariant

i = 0

i < 10 ?

M = M + 2000M = M + 1000

M = 0

stop

i++

M

i10

Page 45: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 455/2/2007 Kestrel Technology LLC

Invariant at loop exit

i = 0

i < 10 ?

M = M + 2000M = M + 1000

M = 0

stop

i++

M

i10

20,000

10,000

i 10

Page 46: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 465/2/2007 Kestrel Technology LLC

[ ]

Interpretation of the results

5,000 15,000 25,00010,000 20,000

Space allocated in the buffer

Buffer size:

Certain buffer overflow

Possible buffer overflow

No buffer overflow

Page 47: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 475/2/2007 Kestrel Technology LLC

Achievements and challenges

Page 48: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 485/2/2007 Kestrel Technology LLC

Assured static analyzers for NASA

• C Global Surveyor: verified array bound compliance for NASA mission-critical software– Mars Exploration Rovers: 550K LOC– Deep Space 1: 280K LOC– Mars Path Finder: 140K LOC

• Pointer analysis: – International Space Station payload software (major

bug found)

Page 49: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 495/2/2007 Kestrel Technology LLC

What we observe

• No scalable, precise, and general-purpose abstract interpreter

• PolySpace:– Handles all kinds of runtime errors– Decent precision (<20% indeterminates)– Doesn’t scale (topped out at 40K LOC)

• Customization is the key– Specialized for a property or a class of applications– Manually crafted by experts

Page 50: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 505/2/2007 Kestrel Technology LLC

CodeHawk™

• Abstract Interpretation development platform/ static analyzer generator

• Automated generation of customized static analyzers– Leverage from pre-built analyzers– Directly tunable by the end-user

Page 51: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 515/2/2007 Kestrel Technology LLC

Where CodeHawk stands

Application:• architecture• environment• input range• usage protocol• …

Abstract Interpretation:• application independent• abstract model• algorithms difficult to implement• needs a lot of expertise

Co

deH

awk

Page 52: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 525/2/2007 Kestrel Technology LLC

Recent Applications

• Malware detection– Customized analyzers for specific kinds of malware– Naturally resistant to complex obfuscating transformations– Evaluated on DoD-supplied test case

• Library/Component analysis– Proof of absence of buffer overflow in OpenSSH’s dynamic

buffer library

• Shared variables– Protection policies for shared variables– Evaluated on a code from a large DoD prime contractor

Page 53: Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology

Page 535/2/2007 Kestrel Technology LLC

Conclusions

• Promising and proven technology– key distinction for assurance: no false negatives– can verify application properties as well as detect

defects– can be tailored for various domains (e.g., malware)

• Not a silver bullet..rather a bullet generator– each modeled domain offers leverage– required expertise for broad use must still be

reduced