rahul sharma, aditya v. nori, alex aiken stanford msr india stanford

29
Bias-Variance Tradeoffs in Program Analysis Rahul Sharma, Aditya V. Nori, Alex Aiken Stanford MSR India Stanford

Upload: albert-lambert

Post on 03-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Bias-Variance Tradeoffs inProgram Analysis

Rahul Sharma, Aditya V. Nori, Alex Aiken Stanford MSR India Stanford

Observation

int i = 1, j = 0;while (i<=5) { j = j+i ; i = i+1;}

Invariant inference

Intervals

Octagons

PolyhedraIncreasing precision

D. Monniaux and J. L. Guen. Stratified static analysis based on variable dependencies. Electr. Notes Theor. Comput. Sci. 2012

Another Example: Yogi

A. V. Nori and S. K. Rajamani. An empirical study of optimizations in YOGI.ICSE (1) 2010

The Problem

Increased precision is causing worse results

Programs have unbounded behaviors

Program analysis Analyze all behaviors Run for a finite time

In finite time, observe only finite behaviors

Need to generalize

Generalization

Generalization is ubiquitous

Abstract interpretation: widening

CEGAR: interpolants

Parameter tuning of tools

Lot of folk knowledge, heuristics, …

Machine Learning

“It’s all about generalization”

Learn a function from observations

Hope that the function generalizes

Work on formalization of generalization

Our Contributions

Model the generalization process Probably Approximately Correct (PAC)

model

Explain known observations by this model

Use this model to obtain better tools

http://politicalcalculations.blogspot.com/2010/02/how-science-is-supposed-to-work.html

Why Machine Learning?

INTERPOLANTS CLASSIFIERS

+

Rahul Sharma, Aditya V. Nori, Alex Aiken: Interpolants as Classifiers. CAV 2012

++

++- ---

-

PAC Learning Framework

Assume an arbitrary but fixed distribution Given (iid) samples from Each sample is example with a label

(+/-)

+++

++- ---

-

c

PAC Learning Framework

+++

++- ---

-

Empirical error of a hypothesis

PAC Learning Framework

+++

++- ---

-

c

Empirical risk minimization (ERM) Given a set of possible hypotheses

(precision) Select that minimizes empirical error

PAC Learning Framework

Generalization error: for a new sample

Relate generalization error to empirical error and precision

Precision

Capture precision by VC dimension (VC-d)

Higher precision -> More possible hypotheses

H𝑉𝐶 (𝐻 )=𝑑

For any arbitrary labeling

++-+

++-+

VC-d Example

+

+--

+

+--

+

+--

+

+--

+

+--

+ +

+- - -

-

+

+--+

Regression Example

Precision is lowUnderfitting

Precision is highOverfitting

Good fit

Y

X

Main Result of PAC Framework

Generalization error is bounded by sum of Bias: Empirical error of best available

hypothesis Variance: O(VC-d)

Bias

Variance

Increase precision

Generalization error

Possible hypotheses

Example Revisited

int i = 1, j = 0;while (i<=5) { j = j+i ; i = i+1;}

Invariant inference

Intervals

Octagons

Polyhedra

Intuition

What goes wrong with excess precision?

Fit polyhedra to program behaviors Transfer functions,

join, widening

Too many polyhedra, make a wrong choice

int i = 1, j = 0;while (i<=5) { j = j+i ; i = i+1;}

Intervals: Polyhedra:

Abstract Interpretation

J. Henry, D. Monniaux, and M. Moy. Pagai: A path sensitive static analyser. Electr. Notes Theor. Comput. Sci. 2012.

Yogi

A. V. Nori and S. K. Rajamani. An empirical study of optimizations in YOGI.ICSE (1) 2010

Case Study

Parameter tuning of program analyses

Overfitting? Generalization on new tasks?

P. Godefroid, A. V. Nori, S. K. Rajamani, and S. Tetali. Compositionalmay-must program analysis: unleashing the power of alternation. POPL 2010.

Benchmark Set (2490 verification tasks)

𝑌𝑜𝑔𝑖 Train

Tuned , test length =500, …

Cross Validation

How to set the test length in YogiBenchmark Set (2490 verification

tasks)

Training Set (1743)

Test Set (747)

𝑌𝑜𝑔𝑖50 𝑌𝑜𝑔𝑖𝑖… …Train

Test𝑌𝑜𝑔𝑖50 𝑌𝑜𝑔𝑖𝑖… …

𝑌𝑜𝑔𝑖𝑘

Cross Validation on Yogi

Performance on test set of tuned ’s

350

500

Comparison

On 2106 new verification tasks

40% performance improvement!

Yogi in production suffers from overfitting

Recommendations

Keep separate training and test sets Design of the tools governed by training

set Test set as a check

SVCOMP: all benchmarks are public Test tools on some new benchmarks too

Increase Precision Incrementally

R. Jhala and K. L. McMillan. A practical and complete approach to predicate refinement.TACAS 2006.

Suggests incrementally increasing precision Find a sweet spot where generalization

error is low

More in the paper

VC-d of TCMs: intervals, octagons, etc.

Templates:

Arrays, separation logic

Expressive abstract domains -> higher VC-d

VC-d can help choose abstractions

Inapplicability

No generalization -> no bias-variance tradeoff

Certain classes of type inference Abstract interpretation without widening Loop-free and recursion-free programs

Verify a particular program (e.g., seL4) Overfit on the one important program

Conclusion

A model to understand generalization Bias-Variance tradeoffs

These tradeoffs do occur in program analysis

Understand these tradeoffs for better tools