carol v. alexandru, sebastiano panichella, sebastian...

91
Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald C. Gall Software Evolution and Architecture Lab University of Zurich, Switzerland {alexandru,panichella,proksch,gall}@ifi.uzh.ch 26.03.2018 59th CREST Open Workshop Centre for Research on Evolution, Search and Testing University College London, London, United Kingdom

Upload: others

Post on 26-Feb-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald C. GallSoftware Evolution and Architecture Lab

University of Zurich, Switzerland{alexandru,panichella,proksch,gall}@ifi.uzh.ch

26.03.2018

59th CREST Open WorkshopCentre for Research on Evolution, Search and Testing

University College London, London, United Kingdom

Page 2: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

The Problem Domain

• Static analysis (e.g. #Attr., McCabe, coupling...)

1

Page 3: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

The Problem Domain

v0.7.0 v1.0.0 v1.3.0 v2.0.0 v3.0.0 v3.3.0 v3.5.0

2

• Static analysis (e.g. #Attr., McCabe, coupling...)

Page 4: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

The Problem Domain

v0.7.0 v1.0.0 v1.3.0 v2.0.0 v3.0.0 v3.3.0 v3.5.0

2

• Static analysis (e.g. #Attr., McCabe, coupling...)

• Many revisions, fine-grained historical data

Page 5: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

A Typical Analysis Process

www

clone

select project

3

Page 6: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

A Typical Analysis Process

www

clone

checkout

select project

select revision

3

Page 7: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

A Typical Analysis Process

www

clone

checkout

Purpose-built,language specific tool

Res

select project

select revision

applytool

storeanalysis

results

3

Page 8: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

A Typical Analysis Process

www

clone

checkout

Purpose-built,language specific tool

Res

select project

select revision

applytool

storeanalysis

results

more revisions?

3

Page 9: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

A Typical Analysis Process

www

clone

checkout

Purpose-built,language specific tool

Res

select project

select revision

applytool

storeanalysis

results

more revisions?

more projects?

3

Page 10: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Redundancies all over...

4

Redundancies in historical code analysis

Impact on Code Study Tools

Page 11: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Redundancies all over...

Redundancies in historical code analysis

Few files change

Only small parts of a file change

Across RevisionsImpact on Code

Study Tools

4

Page 12: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Redundancies all over...

Redundancies in historical code analysis

Few files change

Only small parts of a file change

Across RevisionsImpact on Code

Study Tools

Repeated analysis of "known" code

4

Page 13: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Redundancies all over...

Redundancies in historical code analysis

Few files change

Only small parts of a file change

Changes may not even affect results

Across RevisionsImpact on Code

Study Tools

Repeated analysis of "known" code

Storing redundant results

4

Page 14: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Redundancies all over...

4

Redundancies in historical code analysis

Few files change

Across Languages

Only small parts of a file change

Changes may not even affect results

Each language has their own toolchain

Yet they share many metrics

Across RevisionsImpact on Code

Study Tools

Repeated analysis of "known" code

Storing redundant results

Page 15: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Redundancies all over...

Redundancies in historical code analysis

Few files change

Across Languages

Only small parts of a file change

Changes may not even affect results

Each language has their own toolchain

Yet they share many metrics

Across RevisionsImpact on Code

Study Tools

Repeated analysis of "known" code

Storing redundant results

Re-implementing identical analyses

Generalizability is expensive

4

Page 16: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Redundancies all over...

Redundancies in historical code analysis

Few files change

Across Languages

Only small parts of a file change

Changes may not even affect results

Each language has their own toolchain

Yet they share many metrics

Across RevisionsImpact on Code

Study Tools

Repeated analysis of "known" code

Storing redundant results

Re-implementing identical analyses

Generalizability is expensive

5

Page 17: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

#1: Avoid Checkouts

Page 18: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Avoid checkouts

7

clone

Page 19: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Avoid checkouts

7

clone

checkout

read write

Page 20: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Avoid checkouts

7

clone

checkout

read

read write

analyze

Page 21: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Avoid checkouts

7

clone

checkout

read

For every file: 2 read ops + 1 write opCheckout includes irrelevant filesNeed 1 CWD for every revision to be analyzed in parallel

read write

analyze

Page 22: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Avoid checkouts

8

clone analyze

read

Page 23: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Avoid checkouts

8

clone analyze

Only read relevant files in a single read opNo write opsNo overhead for parallization

read

Page 24: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Avoid checkouts

8

clone analyze

Only read relevant files in a single read opNo write opsNo overhead for parallization

Git

Analysis Tool

File Abstraction Layer

read

Page 25: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Avoid checkouts

8

clone analyze

Only read relevant files in a single read opNo write opsNo overhead for parallization

Git

Analysis Tool

File Abstraction Layer

E.g. for the JDK Compiler:

class JavaSourceFromCharrArray(name: String, val code: CharBuffer)extends SimpleJavaFileObject(URI.create("string:///" + name), Kind.SOURCE) {override def getCharContent(): CharSequence = code

}

read

Page 26: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Avoid checkouts

clone analyze

Only read relevant files in a single read opNo write opsNo overhead for parallization

Git

Analysis Tool

File Abstraction Layer

E.g. for the JDK Compiler:

class JavaSourceFromCharrArray(name: String, val code: CharBuffer)extends SimpleJavaFileObject(URI.create("string:///" + name), Kind.SOURCE) {override def getCharContent(): CharSequence = code

}

read

9

Page 27: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

#2: Use a multi-revision representationof your sources

Page 28: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4

10

Page 29: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

rev. 1

rev. 2

rev. 3

rev. 4

11

rev. 1

Merge ASTs

Page 30: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

rev. 1

rev. 2

rev. 3

rev. 4

12

rev. 2

Merge ASTs

Page 31: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4

13

rev. 3

Page 32: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4

14

rev. 4

Page 33: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4

15

Page 34: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4

16

rev. range [1-4]

rev. range [1-2]

Page 35: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4AspectJ (~440k LOC):

1 commit: 2.2M nodesAll >7000 commits: 6.5M nodes

17

Page 36: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4AspectJ (~440k LOC):

1 commit: 2.2M nodesAll >7000 commits: 6.5M nodes

18

Page 37: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4AspectJ (~440k LOC):

1 commit: 2.2M nodesAll >7000 commits: 6.5M nodes

19

Page 38: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

#3: Store AST nodes only ifthey're needed for analysis

Page 39: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

System.out.println(i)

}}

}

}

What's the complexity (1+#forks) and name for each method and class?

20

Page 40: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

140 AST nodes(using ANTLR)

parse

What's the complexity (1+#forks) and name for each method and class?

20

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

System.out.println(i)

}}

}

}

Page 41: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

140 AST nodes(using ANTLR)

parseCompilationUnit

TypeDeclaration

Modifiers

public

Members

Method

Modifiers

public

Name

run

Name

Demo

Parameters ReturnType

PrimitiveType

VOID

Body

Statements

...

...

What's the complexity (1+#forks) and name for each method and class?

20

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

System.out.println(i)

}}

}

}

Page 42: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

What's the complexity (1+#forks) and name for each method and class?

140 AST nodes(using ANTLR)

parse filtered parse

TypeDeclaration

Method

Name

run

Name

DemoForStatement

IfStatement

ConditionalExpression

7 AST nodes(using ANTLR)

21

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

System.out.println(i)

}}

}

}

Page 43: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

System.out.println(i)

}

}

}

What's the complexity (1+#forks) and name for eachmethod and class?

140 AST nodes(using ANTLR)

parse filtered parse

TypeDeclaration

Method

Name

run

Name

DemoForStatement

IfStatement

ConditionalExpression

7 AST nodes(using ANTLR)

22

Page 44: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

System.out.println(i)

}

}

}

What's the complexity (1+#forks) and name for eachmethod and class?

140 AST nodes(using ANTLR)

parse filtered parse

TypeDeclaration

Method

Name

run

Name

DemoForStatement

IfStatement

ConditionalExpression

7 AST nodes(using ANTLR)

23

Page 45: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

#4: Use non-duplicative data structuresto store your results

Page 46: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

rev. 1

rev. 2

rev. 3

rev. 4

24

Page 47: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

rev. 1

rev. 2

rev. 3

rev. 4

24

Page 48: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

rev. 1

rev. 2

rev. 3

rev. 4

24

[1-1]

label

#attr

mcc

[4-4]

label

#attr

mcc

InnerClass

0 4

1 2 4

[2-3]

label

#attr

mcc

Page 49: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

rev. 1

rev. 2

rev. 3

rev. 4 [1-1]

label

#attr

mcc

[4-4]

label

#attr

mcc

InnerClass

0 4

1 2 4

[2-3]

label

#attr

mcc

25

Page 50: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

LISA also does:#5: Parallel Parsing#6: Asynchronous graph computation#7: Generic graph computationsapplying to ASTs from compatiblelanguages

26

Page 51: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

A light-weight view onmulti-language analysis

Page 52: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Typical solutions

• Toolchains / Frameworks

• Integrate language-specific tooling

• Lots of engineering required

• Meta-models

• Translate language code to some common

representation

• Significant overhead / rigid models

52

Page 53: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Structure matters most

• Complexity?

• # of Functions / Attributes etc.

• Coupling between Classes

• Call graphs

53

if (true) {

if (true) { }

} # CYCLO: 3

if (true) { }

if (true) { }

# CYCLO: 4

Page 54: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Relative structure is similar

54

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

System.out.println(i)

}}}}

class Demo:

def run():

for i in range(1, 100):

if i % 3 == 0 or i % 5 == 0:

print(i)

Page 55: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Relative structure is similar

55

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

System.out.println(i)

}}}}

class Demo:

def run():

for i in range(1, 100):

if i % 3 == 0 or i % 5 == 0:

print(i)

Page 56: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Relative structure is similar

56

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

System.out.println(i)

}}}}

class Demo:

def run():

for i in range(1, 100):

if i % 3 == 0 or i % 5 == 0:

print(i)

class

function

for

if

print

Page 57: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Entities are different

57

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

System.out.println(i)

}}}}

class Demo:

def run():

for i in range(1, 100):

if i % 3 == 0 or i % 5 == 0:

print(i)

class

function

for

if

print

CLASS_DECL

METHOD_DECL

FOR_STATIF_STAT

METHOD_INVOK

classdef

funcdefforstat

ifelsestatatomexpr

Page 58: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Can filter irrelevant nodes

58

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

System.out.println(i)

}}}}

class Demo:

def run():

for i in range(1, 100):

if i % 3 == 0 or i % 5 == 0:

print(i)

class

function

for

if

print

Page 59: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Can filter irrelevant nodes

59

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

System.out.println(i)

}}}}

class Demo:

def run():

for i in range(1, 100):

if i % 3 == 0 or i % 5 == 0:

print(i)

class

function

for

if

print

Page 60: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

How LISA handlesMultiple languages

Page 61: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Analysis formulation

• Signal/Collect (like “Google Pregel” for Scala)

• Graph vertices send information packets

(signals) and do something when receiving

(collecting) signals.

61

Page 62: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Analysis formulation

• Signal/Collect (like “Google Pregel” for Scala)

• Graph vertices send information packets

(signals) and do something when receiving

(collecting) signals.

62

class

function

Page 63: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Analysis formulation

• Signal/Collect (like “Google Pregel” for Scala)

• Graph vertices send information packets

(signals) and do something when receiving

(collecting) signals.

63

class

functionfunc-count signal

1

1 1

signal

Page 64: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Analysis formulation

• Signal/Collect (like “Google Pregel” for Scala)

• Graph vertices send information packets

(signals) and do something when receiving

(collecting) signals.

64

class

function

collect

func-count signal

fc = fc+1+1 = 2

Page 65: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Analysis formulation

• Signal/Collect (like “Google Pregel” for Scala)

• Graph vertices send information packets

(signals) and do something when receiving

(collecting) signals.

65

class

functionfunc-count signal

1

signal

Page 66: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Analysis formulation

• Signal/Collect (like “Google Pregel” for Scala)

• Graph vertices send information packets

(signals) and do something when receiving

(collecting) signals.

66

class

function

collect

func-count signal

fc = fc+1 = 3

Page 67: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Analysis formulation

• Signal/Collect (like “Google Pregel” for Scala)

• Graph vertices send information packets

(signals) and do something when receiving

(collecting) signals.

• Analyses use generic lables for entities

67

Page 68: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

68

Page 69: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

69

Page 70: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Light-weight entity mappings

70

Page 71: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Light-weight entity mappings

71

Lables used by Analyses

Page 72: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Light-weight entity mappings

72

Lables used by the parser or

grammar

Page 73: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Light-weight entity mappings

73

Page 74: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Parsing

• AST is kept as the parser supplies it

• ANTLRv4 integration

• Filtering can be enabled

• Only AST nodes that correspond to a label

used in an analysis are kept

• Reduces graph size by a factor of 10 or more

74

Page 75: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Adding new languages

1. Integrate a parser (or generate one)

• Graph interface allows adding nodes/edges

2. Write a node mapping

3. Re-use existing analyses on new ASTs

75

Page 76: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

76

Page 77: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

77

Page 78: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

To Summarize...

Page 79: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

The LISA Analysis Process

27

www

clone

select project

Page 80: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

The LISA Analysis Process

27

www

clone

parallel parseinto mergedgraph

select project

Language Mappings(Grammar to Analysis)

determines which ASTnodes are loaded

ANTLRv4Grammar used by

Generates Parser

Page 81: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

The LISA Analysis Process

27

www

clone

parallel parseinto mergedgraph

Res

select project

storeanalysis

results

Analysis formulated as Graph Computation

Language Mappings(Grammar to Analysis) used by

Async. compute

determines which ASTnodes are loaded

determines whichdata is persisted

ANTLRv4Grammar used by

Generates Parser

runson graph

Page 82: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

The LISA Analysis Process

27

www

clone

parallel parseinto mergedgraph

Res

select project

storeanalysis

results

more projects?

Analysis formulated as Graph Computation

Language Mappings(Grammar to Analysis) used by

Async. compute

determines which ASTnodes are loaded

determines whichdata is persisted

ANTLRv4Grammar used by

Generates Parser

runson graph

Page 83: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

How well does it work, then?

Page 84: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Marginal cost for +1 revision

41.670

4.6330.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033

0

5

10

15

20

25

30

35

40

45

50

1 10 100 1000 2000 3000 4000 5000 6000 7000

Average Parsing+Computation time per Revision when analyzing n revisions of AspectJ (10 common metrics)

Seco

nd

s

# of revisions

28

Page 85: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Overall Performance Stats

29

Language Java C# JavScript Python

#Projects 1000 1000 1000 1000

#Revisions 2 Million 1.4 Million 1.5 Million 2.3 Million

#Files (parsed!) 10 Billion 3 Billion 380 Thousand 1.2 Billion

#Lines (parsed!) 1.6 Trillion 0.6 Trillion 43 Billion 0.3 Trillion

Total Runtime (RT)¹ 2d 5h 3d 23h 2d 4h

Median RT¹ 8.35s 40.5s 14.4s 24.5s

Tot. Avg. RT per Rev.²

97ms 183ms 57ms 83ms

Med. Avg. RT per Rev.²

41ms 88ms 25ms 48ms

¹ Including cloning and persisting results² Excluding cloning and persisting results

Page 86: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

What's the catch?(There are a few...)

Page 87: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

The (not so) minor stuff

• Must implement analyses from scratch

• No help from a compiler

• Non-file-local analyses need some effort

30

Page 88: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

The (not so) minor stuff

• Must implement analyses from scratch

• No help from a compiler

• Non-file-local analyses need some effort

• Moved files/methods etc. add overhead

• Uniquely identifying files/entities is hard

• (No impact on results, though)

30

Page 89: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Parser performance matters

31

Javascript C# Java Python

Page 90: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

LISA is E X TREME

32

complexfeature-richheavyweight

simplegeneric

lightweight

Page 91: Carol V. Alexandru, Sebastiano Panichella, Sebastian ...crest.cs.ucl.ac.uk/cow/59/slides/cow59_Alexandru.pdf · Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald

Thank you for your attention

59th CREST Open Workshop, London, 26.03.2018

Read the paper: http://t.uzh.ch/Fj(upcoming EMSE publication is much more detailed)

Try the tool: http://t.uzh.ch/Fk

Get the slides: http://t.uzh.ch/NR

Contact me: [email protected]