ide dataflow analysis in the presence of large object-oriented libraries

23
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries Atanas (Nasko) Rountev Mariana Sharp Guoqing (Harry) Xu Ohio State University Supported by NSF Career grant CCF-0546040 and IBM Eclipse Innovation grant

Upload: zelenia-noel

Post on 03-Jan-2016

31 views

Category:

Documents


3 download

DESCRIPTION

IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries. Atanas (Nasko) Rountev Mariana Sharp Guoqing (Harry) Xu Ohio State University Supported by NSF Career grant CCF-0546040 and IBM Eclipse Innovation grant. Interprocedural Analysis with Large Libraries. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries

PRESTO: Program Analyses and Software Tools Research Group, Ohio State University

IDE Dataflow Analysis in the Presence of Large Object-

Oriented Libraries

Atanas (Nasko) RountevMariana Sharp

Guoqing (Harry) XuOhio State University

Supported by NSF Career grant CCF-0546040 and IBM Eclipse Innovation grant

Page 2: IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries

22 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University

Interprocedural Analysis with Large Libraries

All programs are built with reusable components- Standard libraries in C++, Java, C#- Domain-specific libraries

Whole-program analysis: complete client program C, together with all libraries it uses- Solutions for all program points in C and in the libraries

Summary-based analysis: pre-analyze the library and record reusable library summary information- Solutions for all program points in C

Goal: reduce the cost without losing any precision- e.g., the solutions inside C should be the same

This may be low-hanging fruit

Page 3: IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries

33 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University

Talk Outline Interprocedural distributed environment (IDE)

dataflow analysis problems- Definition; precise whole-program analysis- Examples: dependence analysis and type analysis

Generation of library summaries for IDE problems- Intra/interprocedural analysis in the library

Handling the possible effects of unknown clients Filtering away details that are irrelevant for

clients

Experimental evaluation- Entire Java 1.4.2 libraries; 20 client programs

Page 4: IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries

44 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University

Interproc. Distributive Environment Problems

Defined by Sagiv, Reps, and Horwitz [TheorCompSci96]

- Subsumes the interprocedural finite distributive subset (IFDS) problems from their [POPL95] work

- Versions of constant propagation, slicing, alias analysis, side-effect analysis, reaching definitions, liveness, etc.

An environment is a map e : D L; e Env(D,L)- D is a set of symbols, L is a meet semi-lattice- Environment meet: (e1 e2)(d) = e1(d) e2(d)

Environment transformer t : Env(D,L) Env(D,L)- Distributive: e.g. t(e1 e2) = t(e1) t(e2)

Page 5: IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries

55 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University

Dependence Analysis and Type Analysis for Java

Dependencies: for a local variable v at CFG node n, which formal parameters of n’s method influence v?- Restricted form of dep. analysis; useful for SDG building

D = { v1, …, vk }: locals vi

L = powerset of { f1, …, fm }: formals fj; meet is Transformer for v1:=f2: t(e) = e[v1 {f2}] Transformer for v1:=v2+v3: t(e) = e[v1 e(v2) e(v3)] Call v1:=meth(v2): composition of v2-to-formal, valid

same-level paths in meth, return-to-v1

0-CFA type analysis: D = { v1, …, vk, fld1, …, fldm }: locals and fields; L = powerset of set of types

Page 6: IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries

66 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University

Representation of Environment Transformers

Key issue for any summary-based analysis: how do we represent and manipulate dataflow functions?- For IDE: composition/meet of environment

transformers

Sagiv et al.: a transformer can be represented by a bipartite directed graph with 2(|D|+1) nodes- Edges labeled with functions L L

ll

d1

ll

d1

dn

ll

dn

…..

ll

d1

lf

d1

dn

dn

…..

ll

ll

d1

d1

dn

dn

…..

d2

d2

d3

d3

llll

ll ll

t(env) = env t(env) = env[d1 ]}{ f t(env) = env[d2 env(d1) env(d3)]

Page 7: IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries

77 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University

Composition of Transformers Graph reachability + composition of edge

labels

ll

d1

lf

d2

ll

ll

d1 d2 d3

llll

ll ll

}{ f

t(env) = env[d2 env(d1) env(d3)]

t(env) = env[d1 ]

d3

ll

ll

d1

d1

d2

d2

d3

d3

llll

lf

lf

Page 8: IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries

88 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University

Precise Whole-Program Analysis Graph reachability along valid interprocedural paths Phase 1: summary function n for each CFG node n

- Represents the solution at n as a function of the solution at the entry of the procedure containing n

- Computed through composition and meet of transformers- Summary function at proc exit used at call sites to proc- Partial functions n: only for the subset of the domain that

is relevant to callers of n’s procedure

Phase 2: Top-down propagation of actual environments (e.g., dependence sets, type sets)

Adapt to library summary generation?

Page 9: IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries

99 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University

Talk Outline Interprocedural distributed environment (IDE)

dataflow analysis problems- Definition; precise whole-program analysis- Examples: dependence analysis and type analysis

Generation of library summaries for IDE problems- Intra/interprocedural analysis in the library

Handling the possible effects of unknown clients Filtering away details that are irrelevant for

clients

Experimental evaluation- Entire Java 1.4.2 libraries; 20 client programs

Page 10: IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries

1010 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University

Phase 1: Intraprocedural Summary Generation

Produce a set of summary functions n,m

- n is the entry or a call site- m is the exit or a call site- there exists a call-free path from n to m

Similar to the summary functions n from the whole-program analysis, but - complete functions instead of partial functions- all possible compositions and meets of transformers

(as graph operations), until a fixed point is reached

After this, some elements of D are filtered away- e.g., for dependence analysis: locals that are not

actuals of calls and not written the return values from calls

Page 11: IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries

1111 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University

Example

class DateFormatString format(Date f1) { DateFormat r0; Date r1; StringBuffer r2, r3; r0 = this; r1 = f1; r2 = new StringBuffer();cs1: r3 = r0.format(r1,r2);cs2: String r4 = r3.toString(); return r4;}

this f1 r0 r1 r2 r3 r4 ret

this f1 r0 r1 r2 r3 r4 ret

l ll

ll

this f1 r0 r1 r2 r3 r4 ret

this f1 r0 r1 r2 r3 r4 ret

ll

entry cs1

rs2 exit

this f1 r0 r1 r2

this f1 r0 r1 r2

l ll

ll

r4 ret

r4 ret

ll

Page 12: IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries

1212 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University

Phase 2: Interprocedural Summary Generation

class DateFormatString format(Date f1) { DateFormat r0; Date r1; StringBuffer r2, r3; r0 = this; r1 = f1; r2 = new StringBuffer();cs1: r3 = r0.format(r1,r2);cs2: String r4 = r3.toString(); return r4;}

summary for toString, at cs2 r3

r3

ll

r4

r4

r4 ret

r4 ret

ll

r3 ret

r3 ret

ll

rs1 exit

Page 13: IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries

1313 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University

Phase 2: Interprocedural Summary Generation

Fixed call site: has exactly one possible target- Cannot be a site that calls back client methods

Check type hierarchy for possible overriding in clients- Cannot have multiple target methods

Static calls; constructor calls; final classes/methods Intraprocedural 0-CFA type analysis: in the summary

function, the only edge reaching x should be x

Fixed method: has only fixed calls (or no calls), and this also holds for all methods reachable from it

Bottom-up traversal of the SCC-DAG of fixed methods; composition and filtering

In non-fixed methods: instantiate fixed calls to fixed methods; composition and filtering

Page 14: IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries

1414 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University

Example: Final Summary for format

class DateFormatString format(Date f1) { DateFormat r0; Date r1; StringBuffer r2, r3; r0 = this; r1 = f1; r2 = new StringBuffer();cs1: r3 = r0.format(r1,r2);cs2: String r4 = r3.toString(); return r4;}

r3 ret

r3 ret

ll

rs1 exit

entry cs1

this f1 r0 r1 r2

this f1 r0 r1 r2

l ll

ll

Page 15: IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries

1515 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University

Talk Outline Interprocedural distributed environment (IDE)

dataflow analysis problems- Definition; precise whole-program analysis- Examples: dependence analysis and type analysis

Generation of library summaries for IDE problems- Intra/interprocedural analysis in the library

Handling the possible effects of unknown clients Filtering away details that are irrelevant for

clients

Experimental evaluation- Entire Java 1.4.2 libraries; 20 client programs

Page 16: IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries

1616 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University

Summary Generation Libraries: 10238 classes, 77190 methods 0-CFA type analysis + dependence analysis [w/

Soot]- Both data and control dependencies- Simple optimizations: def-use chains, sparse graphs

Cost: 90 minutes time, 1.2GB memory- Includes all Soot-related costs and all I/O

Final summary on disk: 18MB Measurements: number of edges in the graph

representation of transformers- [1]: before any composition or meet- [2]: after intraprocedural composition and meet- [3]: after [2] and intraprocedural filtering: remove

elements that are irrelevant for callers and callees

Page 17: IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries

1717 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University

Intraprocedural Propagation

0

500000

1000000

1500000

2000000

2500000

3000000

1 2 3

0

100000

200000

300000

400000

500000

600000

700000

1 2 3

dependence analysis:reduction in # edgesfrom [2] to [3]: 53%

type analysis:reduction in # edgesfrom [2] to [3]: 55%

Page 18: IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries

1818 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University

Interprocedural Propagation for Dep. Analysis

Fixed methods: 25490 (33%); eliminate 7195 (9%) of them because their only callers are in the library

Summary functions for fixed methods- Instantiate at fixed calls within non-fixed methods:

eliminates 21% of all library call sites- Additional intraprocedural propagation and filtering

0

500000

1000000

1500000

2000000

2500000

3000000

1 2 3 4

reduction in # edgesfrom [3] to [4]: 32%

Page 19: IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries

1919 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University

Summary-Based Analysis of Clients

0%

10%

20%

30%

40%

50%

60%

70%

compress db

fractal

jack

javac

javacup-0.10j

jb-6.1

jess

jflex-1.4.1

jlex-1.2.6

jtar-1.21

mindterm-1.1.5

mpegaudio

muffin-0.9.3a

rabbit2

raytrace

sablecc-2.18.2

socksecho

socksproxy

violet

Reduction in start-to-end time: IR building, type analysis + call graph, dependence analysis

Page 20: IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries

2020 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University

Only Dependence Analysis Reduction in analysis time: actual analysis and

a hypothetical best case with no library dependencies

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

compress db

fractal

jack

javac

javacup-0.10j

jb-6.1

jess

jflex-1.4.1

jlex-1.2.6

jtar-1.21

mindterm-1.1.5

mpegaudio

muffin-0.9.3a

rabbit2

raytrace

sablecc-2.18.2

socksecho

socksproxy

violet

Page 21: IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries

2121 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University

Overview of Results Start-to-end cost: IR, type analysis, dep.

analysis- Average time reduction 51%- Average memory reduction 33%

Only dependence analysis- Average time reduction 69% - Average memory reduction 90%- Very close to a conservative upper bound

Conclusions- Summary generation has reasonable cost- Summary size is small (# edges and total disk size)- Significant savings for analysis running time and

memory usage, compared to whole-program analysis

Page 22: IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries

2222 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University

Future Work This is a very preliminary study

- Promising initial results, but just the tip of the iceberg

More IDE analyses, with different characteristics- e.g. points-to analysis, side-effect analysis, constant

propagation, typestate properties, etc.

Beyond IDE analyses- e.g. recent [POPL08] paper by Yorsh et al.

Better handling of callbacks and polymorphic calls- e.g. take advantage of behavioral subtyping

Reusable API for storing and retrieving summary information – generality for many different analyses- Open-source API implementation based on Soot

Page 23: IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries

2323 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University

Questions?