understanding source code with deep learning

30
Microsoft Research Cambridge miltos1 https://miltos.allamanis.com

Upload: others

Post on 04-Nov-2021

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Understanding Source Code with Deep Learning

Microsoft Research Cambridge

miltos1

https://miltos.allamanis.com

Page 2: Understanding Source Code with Deep Learning
Page 3: Understanding Source Code with Deep Learning

https://visualstudio.microsoft.com/services/intellicode/

http://www.eclipse.org/recommenders/

Page 4: Understanding Source Code with Deep Learning

http://jsnice.org/

Deep Learning Type Inference

V. Hellendoorn, C. Bird, E.T. Barr, M. Allamanis. 2018

Predicting Program Properties from Code

V. Raychev, M. Vechev, A. Krause. 2015

Page 5: Understanding Source Code with Deep Learning

Defined Types

string

string

Page 6: Understanding Source Code with Deep Learning
Page 7: Understanding Source Code with Deep Learning

Research in ML+Code

Page 8: Understanding Source Code with Deep Learning
Page 9: Understanding Source Code with Deep Learning

Target Task

Page 10: Understanding Source Code with Deep Learning
Page 11: Understanding Source Code with Deep Learning

int int int

int

return

for (int i =0; < ; ++)

if ( [ ]>0)

+= [ ];

int int int

int

return

for (int i = 0; i < lim; i++)

if (arr[i] > 0)

sum += arr[i];

Programs as Graphs: Key Idea

Page 12: Understanding Source Code with Deep Learning

Assert.NotNull(clazz);

Assert . (NotNull …

ExpressionStatement

InvocationExpression

MemberAccessExpression ArgumentList

Next Token

AST Child

Programs as Graphs: Syntax

Page 13: Understanding Source Code with Deep Learning

(x, y) = Foo();

while (x > 0)

x = x + y;

Last Write

Last Use

Computed From

Programs as Graphs: Data Flow

Page 14: Understanding Source Code with Deep Learning

Programs as Graphs

int int int

int

return

for (int i =0; < ; ++)

if ( [ ]>0)

+= [ ];

~900 nodes/graph ~8k edges/graph

Page 15: Understanding Source Code with Deep Learning

Graph Representation for Variable Misuse B

A

E G

D

C

F

Page 16: Understanding Source Code with Deep Learning

Graph Representation for Variable Misuse B

A

E G

D

C

F

Page 17: Understanding Source Code with Deep Learning

Vector Space Representations

Page 18: Understanding Source Code with Deep Learning

Graph Neural Networks

BA

EG

D

C

F

Li et al (2015). Gated Graph Sequence Neural Networks.

BA

EG

D

C

F

Gilmer et al (2017). Neural Message Passing for Quantum Chemistry.

Page 19: Understanding Source Code with Deep Learning

Graph Neural Networks: Message Passing

E

D

F

BA

EG

D

C

F

Page 20: Understanding Source Code with Deep Learning

Graph Neural Networks: Message Passing

E

D

F

Li et al (2015). Gated graph sequence neural networks.

BA

EG

D

C

F

Page 21: Understanding Source Code with Deep Learning

Graph Neural Networks: Unrolling

Page 22: Understanding Source Code with Deep Learning

Graph Neural Networks: Unrolling

Li et al (2015). Gated graph sequence neural networks.

Page 23: Understanding Source Code with Deep Learning

Graph Neural Networks: Unrolling

Li et al (2015). Gated Graph Sequence Neural Networks.Gilmer et al (2017). Neural Message Passing for Quantum Chemistry.

• node selection• node classification• graph classification

https://github.com/Microsoft/gated-graph-neural-network-samples

Page 24: Understanding Source Code with Deep Learning

Quantitative Results – Variable Misuse

Seen Projects: 24 F/OSS C# projects (2060 kLOC): Used for train and test

3.8 type-correct alternative variables per slot (median 3, σ= 2.6)

Accuracy (%) BiGRU BiGRU+Dataflow GGNN

Seen Projects 50.0 73.7 85.5

Page 25: Understanding Source Code with Deep Learning

Quantitative Results – Variable Misuse

Accuracy (%) BiGRU BiGRU+Dataflow GGNN

Seen Projects 50.0 73.7 85.5

Unseen Projects 28.9 60.2 78.2

Seen Projects: 24 F/OSS C# projects (2060 kLOC): Used for train and testUnseen Projects: 3 F/OSS C# projects (228 kLOC): Used only for test3.8 type-correct alternative variables per slot (median 3, σ= 2.6)

Page 26: Understanding Source Code with Deep Learning

bool string string out string

var

while null

if return true

null

return false

What the model sees…

Page 27: Understanding Source Code with Deep Learning

UI/UX

ML Capabilities

Metrics

Low resources

Page 28: Understanding Source Code with Deep Learning

Learning Signals

target

prediction

𝑓𝜃(𝑥)input

data 𝑥

model of problem

• Given dataset 𝑥1, 𝑦0 , … , 𝑥𝑁 , 𝑦𝑁• Minimize Loss ℒ 𝜃 =

1

𝑁σ𝑖 𝐿 𝑓𝜃 𝑥𝑖 , 𝑦𝑖

Page 29: Understanding Source Code with Deep Learning
Page 30: Understanding Source Code with Deep Learning