top-down tree long short-term memory networks · top-down tree long short-term memory networks...

47
Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh 12th June, 2016 Zhang et al., 2016 Tree LSTM 12th June, 2016 1 / 18

Upload: others

Post on 23-Jun-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Top-down Tree Long Short-Term Memory Networks

Xingxing Zhang, Liang Lu, Mirella Lapata

School of Informatics, University of Edinburgh

12th June, 2016

Zhang et al., 2016 Tree LSTM 12th June, 2016 1 / 18

Page 2: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Sequential Language Models

P(S = w1,w2, . . . ,wn) =n∏

i=1

P(wi |w1:i−1) (1)

State of the Art

based on Long Short Term Memory Network Language Model(Hochreiter and Schmidhuber, 1997; Sundermeyer et al., 2012)Billion word benchmark results reported in Jozefowicz et al., (2016)

Models PPLKN5 67.6LSTM 30.6LSTM+CNN INPUTS 30.0

Zhang et al., 2016 Tree LSTM 12th June, 2016 2 / 18

Page 3: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Will tree structures help LMs?

Probably yes

LMs based on Constituency Parsing (Chelba and Jelinek, 2000; Roark,2001; Charniak, 2001)LMs based on Dependency Parsing (Shen et al., 2008; Zhang, 2009;Sennrich, 2015)

Zhang et al., 2016 Tree LSTM 12th June, 2016 3 / 18

Page 4: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Will tree structures help LMs?

Probably yes

LMs based on Constituency Parsing (Chelba and Jelinek, 2000; Roark,2001; Charniak, 2001)LMs based on Dependency Parsing (Shen et al., 2008; Zhang, 2009;Sennrich, 2015)

Zhang et al., 2016 Tree LSTM 12th June, 2016 3 / 18

Page 5: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

LSTMs + Dependency Trees = TreeLSTMs

+

Why?

Sentence Length N v.s. Tree Height log(N)

How?

Top-down GenerationBreadth-first searchreminiscent of Eisner (1996)

Zhang et al., 2016 Tree LSTM 12th June, 2016 4 / 18

Page 6: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

LSTMs + Dependency Trees = TreeLSTMs

+

Why?

Sentence Length N v.s. Tree Height log(N)

How?

Top-down GenerationBreadth-first searchreminiscent of Eisner (1996)

Zhang et al., 2016 Tree LSTM 12th June, 2016 4 / 18

Page 7: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Generation Process (Unlabeled Trees)

The luxury auto manufacturer last year sold 1,214 cars in the U.S.

Zhang et al., 2016 Tree LSTM 12th June, 2016 5 / 18

Page 8: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Generation Process (Unlabeled Trees)

The luxury auto manufacturer last year sold 1,214 cars in the U.S.

Zhang et al., 2016 Tree LSTM 12th June, 2016 5 / 18

Page 9: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Generation Process (Unlabeled Trees)

The luxury auto manufacturer last year sold 1,214 cars in the U.S.

Zhang et al., 2016 Tree LSTM 12th June, 2016 5 / 18

Page 10: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Generation Process (Unlabeled Trees)

The luxury auto manufacturer last year sold 1,214 cars in the U.S.

Zhang et al., 2016 Tree LSTM 12th June, 2016 5 / 18

Page 11: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Generation Process (Unlabeled Trees)

The luxury auto manufacturer last year sold 1,214 cars in the U.S.

Zhang et al., 2016 Tree LSTM 12th June, 2016 5 / 18

Page 12: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Generation Process (Unlabeled Trees)

The luxury auto manufacturer last year sold 1,214 cars in the U.S.

Zhang et al., 2016 Tree LSTM 12th June, 2016 5 / 18

Page 13: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Generation Process (Unlabeled Trees)

The luxury auto manufacturer last year sold 1,214 cars in the U.S.

Zhang et al., 2016 Tree LSTM 12th June, 2016 5 / 18

Page 14: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Tree LSTM

P(S) =n∏

i=1

P(wi |w1:i−1) (2)

P(S |T ) =∏

w∈BFS(T )\root

P(w |D(w)) (3)

D(w) is the Dependency Path of w .

D(w) is a generated sub-tree.

Works on projective and unlabeled dependency trees.

Zhang et al., 2016 Tree LSTM 12th June, 2016 6 / 18

Page 15: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Tree LSTM

Zhang et al., 2016 Tree LSTM 12th June, 2016 7 / 18

Page 16: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Tree LSTM

Zhang et al., 2016 Tree LSTM 12th June, 2016 7 / 18

Page 17: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Tree LSTM

Zhang et al., 2016 Tree LSTM 12th June, 2016 7 / 18

Page 18: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Tree LSTM

Zhang et al., 2016 Tree LSTM 12th June, 2016 7 / 18

Page 19: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Tree LSTM

Zhang et al., 2016 Tree LSTM 12th June, 2016 7 / 18

Page 20: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Tree LSTM

Zhang et al., 2016 Tree LSTM 12th June, 2016 7 / 18

Page 21: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Tree LSTM

Zhang et al., 2016 Tree LSTM 12th June, 2016 7 / 18

Page 22: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

One Limitation of Tree LSTM

Zhang et al., 2016 Tree LSTM 12th June, 2016 8 / 18

Page 23: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Left Dependent Tree LSTM

Zhang et al., 2016 Tree LSTM 12th June, 2016 9 / 18

Page 24: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Left Dependent Tree LSTM

Zhang et al., 2016 Tree LSTM 12th June, 2016 9 / 18

Page 25: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Left Dependent Tree LSTM

Zhang et al., 2016 Tree LSTM 12th June, 2016 9 / 18

Page 26: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Left Dependent Tree LSTM

Zhang et al., 2016 Tree LSTM 12th June, 2016 9 / 18

Page 27: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Experiments

Zhang et al., 2016 Tree LSTM 12th June, 2016 10 / 18

Page 28: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

MSR Sentence Completion Challenge

Training set: 49 million words (around 2 million sentences)

development set: 4000 sentences

test set: 1040 completion questions.

Zhang et al., 2016 Tree LSTM 12th June, 2016 11 / 18

Page 29: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Zhang et al., 2016 Tree LSTM 12th June, 2016 12 / 18

Page 30: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Zhang et al., 2016 Tree LSTM 12th June, 2016 12 / 18

Page 31: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Zhang et al., 2016 Tree LSTM 12th June, 2016 12 / 18

Page 32: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Zhang et al., 2016 Tree LSTM 12th June, 2016 12 / 18

Page 33: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Dependency Parsing Reranking

Rerank 2nd Order MSTParser (McDonald and Pereira, 2006)

We train TreeLSTM and LdTreeLSTM as language models.

We only use words as input features; POS tags, dependency labels orcomposition features are not used.

Zhang et al., 2016 Tree LSTM 12th June, 2016 13 / 18

Page 34: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Dependency Parsing Reranking

NN: Chen & Manning, 2014; S-LSTM: Dyer et al., 2015Zhang et al., 2016 Tree LSTM 12th June, 2016 14 / 18

Page 35: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Dependency Parsing Reranking

NN: Chen & Manning, 2014; S-LSTM: Dyer et al., 2015Zhang et al., 2016 Tree LSTM 12th June, 2016 14 / 18

Page 36: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Dependency Parsing Reranking

NN: Chen & Manning, 2014; S-LSTM: Dyer et al., 2015Zhang et al., 2016 Tree LSTM 12th June, 2016 14 / 18

Page 37: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Dependency Parsing Reranking

NN: Chen & Manning, 2014; S-LSTM: Dyer et al., 2015Zhang et al., 2016 Tree LSTM 12th June, 2016 14 / 18

Page 38: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Tree Generation

Four binary classifiers:

Add Left? No!

Add Left?

Add Right?

Add Next Left?

Add Next Right?

Features: hidden states andword embeddings

Classifiers Accuracies

Add-Left 94.3Add-Right 92.6Add-Nx-Left 93.4Add-Nx-Right 96.0

Zhang et al., 2016 Tree LSTM 12th June, 2016 15 / 18

Page 39: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Tree Generation

Four binary classifiers:

Add Right? Yes!

Add Left?

Add Right?

Add Next Left?

Add Next Right?

Features: hidden states andword embeddings

Classifiers Accuracies

Add-Left 94.3Add-Right 92.6Add-Nx-Left 93.4Add-Nx-Right 96.0

Zhang et al., 2016 Tree LSTM 12th June, 2016 15 / 18

Page 40: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Tree Generation

Four binary classifiers:

Add Right? Yes!

Add Left?

Add Right?

Add Next Left?

Add Next Right?

Features: hidden states andword embeddings

Classifiers Accuracies

Add-Left 94.3Add-Right 92.6Add-Nx-Left 93.4Add-Nx-Right 96.0

Zhang et al., 2016 Tree LSTM 12th June, 2016 15 / 18

Page 41: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Tree Generation

Four binary classifiers:

Add Next Right? No!

Add Left?

Add Right?

Add Next Left?

Add Next Right?

Features: hidden states andword embeddings

Classifiers Accuracies

Add-Left 94.3Add-Right 92.6Add-Nx-Left 93.4Add-Nx-Right 96.0

Zhang et al., 2016 Tree LSTM 12th June, 2016 15 / 18

Page 42: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Tree Generation

Four binary classifiers:

Add Left? Yes!

Add Left?

Add Right?

Add Next Left?

Add Next Right?

Features: hidden states andword embeddings

Classifiers Accuracies

Add-Left 94.3Add-Right 92.6Add-Nx-Left 93.4Add-Nx-Right 96.0

Zhang et al., 2016 Tree LSTM 12th June, 2016 15 / 18

Page 43: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Tree Generation

Four binary classifiers:

Add Left? Yes!

Add Left?

Add Right?

Add Next Left?

Add Next Right?

Features: hidden states andword embeddings

Classifiers Accuracies

Add-Left 94.3Add-Right 92.6Add-Nx-Left 93.4Add-Nx-Right 96.0

Zhang et al., 2016 Tree LSTM 12th June, 2016 15 / 18

Page 44: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Tree Generation

Four binary classifiers:

Add Next Left? No!

Add Left?

Add Right?

Add Next Left?

Add Next Right?

Features: hidden states andword embeddings

Classifiers Accuracies

Add-Left 94.3Add-Right 92.6Add-Nx-Left 93.4Add-Nx-Right 96.0

Zhang et al., 2016 Tree LSTM 12th June, 2016 15 / 18

Page 45: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Tree Generation

Four binary classifiers:

Add Left?

Add Right?

Add Next Left?

Add Next Right?

Features: hidden states andword embeddings

Classifiers Accuracies

Add-Left 94.3Add-Right 92.6Add-Nx-Left 93.4Add-Nx-Right 96.0

Zhang et al., 2016 Tree LSTM 12th June, 2016 15 / 18

Page 46: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Tree Generation

Zhang et al., 2016 Tree LSTM 12th June, 2016 16 / 18

Page 47: Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks Xingxing Zhang, Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh

Conclusions

Syntax can help language modeling.

Predicting tree structures with Neural Networks is possible.

Next Steps:

Sequence to Tree ModelsTree to Tree Models

code available:https://github.com/XingxingZhang/td-treelstm

Thanks & Questions?

Zhang et al., 2016 Tree LSTM 12th June, 2016 17 / 18