with a graph-structured cache open vocabulary learning on …12-16-00)-12-16-40-4504... · next...

10
Open Vocabulary Learning on Source Code with a Graph-Structured Cache Milan Cvitkovic Caltech, Amazon Web Services Badal Singh Amazon Web Services Anima Anandkumar Caltech ICML, 2019-6-12

Upload: others

Post on 29-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: with a Graph-Structured Cache Open Vocabulary Learning on …12-16-00)-12-16-40-4504... · Next Node. . . Add Graph-Structured Cache foo add my baz. . . Method Declaration Parameter

Open Vocabulary Learning on Source Code with a Graph-Structured Cache

Milan CvitkovicCaltech, Amazon Web Services

Badal SinghAmazon Web Services

Anima AnandkumarCaltech

ICML, 2019-6-12

Page 2: with a Graph-Structured Cache Open Vocabulary Learning on …12-16-00)-12-16-40-4504... · Next Node. . . Add Graph-Structured Cache foo add my baz. . . Method Declaration Parameter

Open Vocabulary Learning

Standard, closed vocabulary model Open vocabulary

1 of 400k word embeddings → 1 of 400k words Any words → Any words

Goal: Models that can reason over flexible sets of inputs and outputs

Page 3: with a Graph-Structured Cache Open Vocabulary Learning on …12-16-00)-12-16-40-4504... · Next Node. . . Add Graph-Structured Cache foo add my baz. . . Method Declaration Parameter

Open Vocabulary Learning

Motivation: Tasks on source codeExample: Variable naming

Needs an open vocabularyIn our data, 28% of variable names contain out–of–vocabulary word

Inputint <NAME-ME> = assertArraysAreSameLength(expected, actuals, header); for (int i = 0; i < <NAME-ME>; i++) { Object expected = Array.get(expected, i);

Output

‘expected_length’

Page 4: with a Graph-Structured Cache Open Vocabulary Learning on …12-16-00)-12-16-40-4504... · Next Node. . . Add Graph-Structured Cache foo add my baz. . . Method Declaration Parameter

Strategy: Represent distinct words and usages with graph structure, process with GNN

Graph-Structured Cache

Original input def get_jupyter_addr():

jupyter_addr = ‘localhost’ if is_serving() else Nonereturn jupyter_addr

Same input, represented using a Graph-Structured Cache

getjupyter addr serving

Edge Indicating Word Use

<word> <word> <word> <word>

<word>

<word>

<word> <word> <word> <word> <word> <word> <word>

<word><word> Edge Indicating Next Word

Page 5: with a Graph-Structured Cache Open Vocabulary Learning on …12-16-00)-12-16-40-4504... · Next Node. . . Add Graph-Structured Cache foo add my baz. . . Method Declaration Parameter

Full Model for Tasks on Source Code

. . .

Method Declaration

Parameter Code Block

Method Call

add Foo

myBaz

add

foo

Name Expr

foo

Field Access

Input

/** SomeFile.java

public void addFoo(Foo foo){ this.myBaz.add(foo); }

Augment AST with semantic information

Parse code into AST

. . .

Method Declaration

Parameter Code Block

Method Call

add Foo

myBaz

add

foo

Name Expr

foo

Field Access

Last Use

Field Reference

Next Node

. . .

Strategy from recent work [1]

[1] Allamanis et al. “Learning to Represent Programs with Graphs.” ICLR 2018

Page 6: with a Graph-Structured Cache Open Vocabulary Learning on …12-16-00)-12-16-40-4504... · Next Node. . . Add Graph-Structured Cache foo add my baz. . . Method Declaration Parameter

Full Model for Tasks on Source Code

. . .

Method Declaration

Parameter Code Block

Method Call

add Foo

myBaz

add

foo

Name Expr

foo

Field Access

Input

/** SomeFile.java

public void addFoo(Foo foo){ this.myBaz.add(foo); }

Augment AST with semantic information

Parse code into AST

. . .

Method Declaration

Parameter Code Block

Method Call

add Foo

myBaz

add

foo

Name Expr

foo

Field Access

Last Use

Field Reference

Next Node

. . .

Add Graph-Structured Cache

foo

add

my

baz

. . .

Method Declaration

Parameter Code Block

Method Call

add Foo

myBaz

add

foo

Name Expr

foo

Field Access

Last Use

Field Reference

Next Node

. . .

Word Use

Our main contribution to prior work

Page 7: with a Graph-Structured Cache Open Vocabulary Learning on …12-16-00)-12-16-40-4504... · Next Node. . . Add Graph-Structured Cache foo add my baz. . . Method Declaration Parameter

Full Model for Tasks on Source Code

. . .

Method Declaration

Parameter Code Block

Method Call

add Foo

myBaz

add

foo

Name Expr

foo

Field Access

Input

/** SomeFile.java

public void addFoo(Foo foo){ this.myBaz.add(foo); }

Augment AST with semantic information

Parse code into AST

. . .

Method Declaration

Parameter Code Block

Method Call

add Foo

myBaz

add

foo

Name Expr

foo

Field Access

Last Use

Field Reference

Next Node

. . .

Add Graph-Structured Cache

foo

add

my

baz

. . .

Method Declaration

Parameter Code Block

Method Call

add Foo

myBaz

add

foo

Name Expr

foo

Field Access

Last Use

Field Reference

Next Node

. . .

Word Use

Convert all nodes to vectors,process with GNN

Output(Depends on task)

Page 8: with a Graph-Structured Cache Open Vocabulary Learning on …12-16-00)-12-16-40-4504... · Next Node. . . Add Graph-Structured Cache foo add my baz. . . Method Declaration Parameter

● Full-name reproduction accuracy (and top 5 accuracy):

Experiment: Variable Naming Task

For other tasks and experiments, see our poster or paper

Page 9: with a Graph-Structured Cache Open Vocabulary Learning on …12-16-00)-12-16-40-4504... · Next Node. . . Add Graph-Structured Cache foo add my baz. . . Method Declaration Parameter

Takeaways

Graph-Structured Caches are an appealing strategy for open vocabulary learning

○ Whatever your current embedding strategy, GSC + GNN can augment it

○ No free lunch! About 30% training slowdown.

○ But helps in all cases we tried, sometimes significantly

Page 10: with a Graph-Structured Cache Open Vocabulary Learning on …12-16-00)-12-16-40-4504... · Next Node. . . Add Graph-Structured Cache foo add my baz. . . Method Declaration Parameter

Acknowledgments

● Badal Singh, Anima Anandkumar

● Miltos Allamanis

● Hyokun Yun

● Haibin Lin

Our code, for use on your code

https://github.com/mwcvitkovic/Open-Vocabulary-Learning-on-Source-Code-with-a-Graph-Structured-Cache--Code-Preprocessor

https://github.com/mwcvitkovic/Open-Vocabulary-Learning-on-Source-Code-with-a-Graph-Structured-Cache