powerpoint presentation€¦ · title: powerpoint presentation author: rohan gupta created date:...

14
End-To-End Memory Networks Sainbayar Sukhbaatar, Arthur Szlam, Jason Wetson, Rob Fergus Dept. Of Computer Science Courant Institute, NYU & Facebook AI Research New York

Upload: others

Post on 03-Oct-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: Rohan Gupta Created Date: 4/17/2018 3:44:45 PM

End-To-End Memory Networks

Sainbayar Sukhbaatar, Arthur Szlam, Jason Wetson, Rob Fergus Dept. Of Computer Science

Courant Institute, NYU &

Facebook AI Research New York

Page 2: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: Rohan Gupta Created Date: 4/17/2018 3:44:45 PM

Outline

• Motivation • Model • Experiments • Results • Conclusion

Page 3: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: Rohan Gupta Created Date: 4/17/2018 3:44:45 PM

Motivation

• Make a model that can perform many computational steps to answer a question.

• Make a model that describes dependencies in sequential data.

• I.E. sequential reasoning • Lightweight & easily trainable

Page 4: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: Rohan Gupta Created Date: 4/17/2018 3:44:45 PM

Motivation over MemNN

• End-To-End Trainable • Far less supervision • More generalizable

Page 5: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: Rohan Gupta Created Date: 4/17/2018 3:44:45 PM

Overview of Model

• Variables: – Discrete set of inputs (𝑥𝑖) – A query (q) – Produce an answer (a)

• Static Memory Bank • Multiple Hops

Page 6: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: Rohan Gupta Created Date: 4/17/2018 3:44:45 PM

𝑝𝑖 = Softmax 𝑢𝑇𝑚𝑖 𝑜 = 𝑝𝑖𝑐𝑖𝑖

𝑎 = Softmax(𝑊 𝑜 + 𝑢 )

Page 7: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: Rohan Gupta Created Date: 4/17/2018 3:44:45 PM
Page 8: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: Rohan Gupta Created Date: 4/17/2018 3:44:45 PM

Weight Tying

• Adjacent: – 𝐴𝑘+1= 𝐶𝑘 – 𝑊𝑇 = 𝐶𝐾 – B = 𝐴1

• Layer-wise (RNN-like): – 𝐴1 = … = 𝐴𝑘, 𝐶1= … = 𝐶𝑘 – 𝑢𝑘+1= 𝐻𝑢𝑘 + 𝑜𝑘

Page 9: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: Rohan Gupta Created Date: 4/17/2018 3:44:45 PM

Sentence Representation

• Bag-of-words – 𝑚𝑖 = 𝐴𝑥𝑖𝑗𝑗

• Position Encoding (PE) – 𝑚𝑖 = 𝑙𝑗 ∙ 𝐴𝑥𝑖𝑗𝑗

• Temporal Encoding (TE) – 𝑚𝑖 = 𝐴𝑥𝑖𝑗𝑗 + 𝑇𝐴(𝑖)

Page 10: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: Rohan Gupta Created Date: 4/17/2018 3:44:45 PM

Synthetic QA Experiments

Page 11: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: Rohan Gupta Created Date: 4/17/2018 3:44:45 PM

Similarity to Attention

NOTE: This model does not use the “support” label during training

Page 12: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: Rohan Gupta Created Date: 4/17/2018 3:44:45 PM

Results

Page 13: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: Rohan Gupta Created Date: 4/17/2018 3:44:45 PM

Language Modeling

Page 14: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: Rohan Gupta Created Date: 4/17/2018 3:44:45 PM

Conclusion

• Outperforms all baselines with the same level of supervision (LSTMs etc.)

• Slightly worse than a strongly supervised Memory Network, but it was trained without supporting facts, so it can be easily trained in more general settings.

• On language modeling, outperforms RNNs and LSTMs