![Page 1: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/1.jpg)
Research TalkAnna Goldie
![Page 2: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/2.jpg)
Overview
● Natural Language Processing○ Conversational Modeling (Best Paper Award at ICML Language
Generation Workshop, EMNLP 2017)○ Open-Source tf-seq2seq framework (4000+ stars, 1000+ forks), and
exploration of NMT architectures (EMNLP 2017, 100+ citations)● Deep Dive: ML for Systems
○ Device Placement with Deep Reinforcement Learning (ICLR 2018)
![Page 3: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/3.jpg)
Tell me a story about a bear...
Generating High-Quality and Informative Conversation Responses with Sequence-to-Sequence Models, EMNLP 2017
![Page 4: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/4.jpg)
Tell me a story about a bear...a. “I don’t know.”
Generating High-Quality and Informative Conversation Responses with Sequence-to-Sequence Models, EMNLP 2017
![Page 5: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/5.jpg)
Tell me a story about a bear...a. “I don’t know.”
b. ”A bear walks into a bar to get a drink, then another bear comes and sits in his room with the bear thought he was a wolf.”
Generating High-Quality and Informative Conversation Responses with Sequence-to-Sequence Models, EMNLP 2017
![Page 6: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/6.jpg)
Motivation: Generate Informative and Coherent Responses
● Address shortcomings of sequence-to-sequence models○ Short/generic responses with high MLE in virtually any context
■ “I don’t know.”○ Incoherent and redundant responses when forced to elaborate
through explicit length promoting heuristics■ “I live in the center of the sun in the center of the sun in the
center of the sun…”
Generating High-Quality and Informative Conversation Responses with Sequence-to-Sequence Models, EMNLP 2017
![Page 7: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/7.jpg)
● Generate segment by segment○ Inject diversity early in generation process○ Computationally efficient form of target-side attention
● Stochastic beam search○ Rerank segments using negative sampling
Method Overview
Generating High-Quality and Informative Conversation Responses with Sequence-to-Sequence Models, EMNLP 2017
![Page 8: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/8.jpg)
Self Attention for Coherence● Glimpse Model:
Computationally efficient form of Self Attention
● Memory capacity of the decoder LSTM is a bottleneck
● So, let decoder also attend to the previously generated text
Generating High-Quality and Informative Conversation Responses with Sequence-to-Sequence Models, EMNLP 2017
![Page 9: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/9.jpg)
Stochastic Beam Search with Segment Reranking
(Jiwei Li etc., 2015)
Generating High-Quality and Informative Conversation Responses with Sequence-to-Sequence Models, EMNLP 2017
![Page 10: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/10.jpg)
Evaluation
Generating High-Quality and Informative Conversation Responses with Sequence-to-Sequence Models, EMNLP 2017
![Page 11: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/11.jpg)
Sample Conversation Responses
Generating High-Quality and Informative Conversation Responses with Sequence-to-Sequence Models, EMNLP 2017
![Page 12: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/12.jpg)
Generating High-Quality and Informative Conversation Responses with Sequence-to-Sequence Models, EMNLP 2017
![Page 13: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/13.jpg)
Overview
● Natural Language Processing○ Conversational Modeling (Best Paper Award at ICML Language
Generation Workshop, EMNLP 2017)○ Open-Source tf-seq2seq framework (4000+ stars, 1000+ forks), and
exploration of NMT architectures (EMNLP 2017, 100+ citations)● Deep Dive: ML for Systems
○ Device Placement with Deep Reinforcement Learning (ICLR 2018)
![Page 14: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/14.jpg)
tf-seq2seq: A general-purpose encoder-decoder framework for Tensorflow
Massive Exploration of Neural Machine Translation Architectures, EMNLP 2017
![Page 15: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/15.jpg)
Goals for the Framework● Generality: Machine Translation, Summarization, Conversational Modeling, Image Captioning, and more!
● Usability: Train a model with a single command. Several types of input data are supported, including standard
raw text.
● Reproducibility: Training pipelines and models configured using YAML files
● Extensibility: Code is modular and easy to build upon
○ E.g., adding a new type of attention mechanism or encoder architecture requires only minimal code changes.
● Documentation:
○ All code is documented using standard Python docstrings
○ Guides to help you get started with common tasks.
● Performance:
○ Fast enough to cover almost all production and research use cases
○ Supports distributed training
Massive Exploration of Neural Machine Translation Architectures, EMNLP 2017
![Page 16: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/16.jpg)
Reception
● Featured in AMTA Panel on “Deploying Open Source Neural Machine Translation (NMT) in the Enterprise”
● Used in dozens of papers from top industry and academic labs
Massive Exploration of Neural Machine Translation Architectures, EMNLP 2017
![Page 17: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/17.jpg)
Massive Exploration of Neural Machine Translation Architectures, EMNLP 2017
![Page 18: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/18.jpg)
● LSTM Cells consistently outperformed GRU Cells.● Parameterized additive attention outperformed
multiplicative attention.● Large embeddings with 2048 dimensions achieved the
best results, but only by a small margin.● A well-tuned beam search with length penalty is crucial.
Beam widths of 5 to 10 together with a length penalty of 1.0 seemed to work well.
Massive Exploration of Neural Machine Translation Architectures, EMNLP 2017
Takeaways
![Page 19: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/19.jpg)
Overview
● Natural Language Processing○ Conversational Modeling (Best Paper Award at ICML Language
Generation Workshop, EMNLP 2017)○ Open-Source tf-seq2seq framework (4000+ stars, 1000+ forks), and
exploration of NMT architectures (EMNLP 2017, 100+ citations)● Deep Dive: ML for Systems
○ Device Placement with Deep Reinforcement Learning (ICLR 2018)
![Page 20: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/20.jpg)
Systems ML
In the past decade, systems and hardware have transformed ML.
A Hierarchical Model For Device Placement, ICLR 2018
![Page 21: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/21.jpg)
Systems ML
In the past decade, systems and hardware have transformed ML.
Now, it’s time for ML to transform systems.
A Hierarchical Model For Device Placement, ICLR 2018
![Page 22: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/22.jpg)
Problems in computer systemsDesign
● Computer architecture exploration○ Architectural specification tuning○ MatMul tiling optimization
● ML engines like TensorFlow● Chip design
○ Verification○ Logic synthesis○ Placement○ Manufacturing
Operation
● Resource allocation○ Model parallelism (e.g. TPU Pods)○ Compiler register allocation
● Resource provisioning○ Network demand forecasting○ Memory forecasting
● Scheduling○ TensorFlow op scheduling ○ Compiler instruction scheduling
A Hierarchical Model For Device Placement, ICLR 2018
![Page 23: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/23.jpg)
ML for Systems Brain Moonshot
Device Placement
A Hierarchical Model For Device Placement, ICLR 2018
![Page 24: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/24.jpg)
![Page 25: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/25.jpg)
Trend towards many-device training, bigger models, larger batch sizes
What is device placement and why is it important?
Google neural machine translation’16300 million parameters,trained on 128 GPUs
BigGAN’18355 million parameters,
trained on 512 TPU cores
Sparsely gated mixture of experts’17 130 billion parameters,trained on 128 GPUs
A Hierarchical Model For Device Placement, ICLR 2018
![Page 26: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/26.jpg)
Standard practice for device placement
● Often based on greedy heuristics ● Requires deep understanding of devices: nonlinear FLOPs, bandwidth,
latency behavior ● Requires modeling parallelism and pipelining● Does not generalize well
A Hierarchical Model For Device Placement, ICLR 2018
![Page 27: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/27.jpg)
ML for device placement
● ML is repeatedly replacing rule based heuristics● We show how RL can be applied to device placement
○ Effective search across large state and action spaces to find optimal solutions○ Automated learning from underlying environment only based on reward function
(e.g. runtime of a program)
A Hierarchical Model For Device Placement, ICLR 2018
![Page 28: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/28.jpg)
Posing device placement as an RL problem
CPU GPU
Set of available devices
Neural model
Policy Assignment of ops in neural model to devices
Input OutputRL model
A Hierarchical Model For Device Placement, ICLR 2018
![Page 29: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/29.jpg)
Posing device placement as an RL problem
CPU GPU
Set of available devices
Neural model
Policy Assignment of ops in neural model to devices
Input OutputRL model
Evaluate runtime
A Hierarchical Model For Device Placement, ICLR 2018
![Page 30: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/30.jpg)
An end-to-end hierarchical placement model
A Hierarchical Model For Device Placement, ICLR 2018
![Page 31: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/31.jpg)
Training with REINFORCE
Objective: Minimize expected runtime for predicted placement d
𝐽(𝜃g, 𝜃d): expected runtime𝜃g: trainable parameters of Grouper 𝜃d: trainable parameters of Placer Rd: runtime for placement d
A Hierarchical Model For Device Placement, ICLR 2018
![Page 32: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/32.jpg)
Training with REINFORCE
Objective: Minimize expected runtime for predicted placement d
𝐽(𝜃g, 𝜃d): expected runtime𝜃g: trainable parameters of Grouper 𝜃d: trainable parameters of Placer Rd: runtime for placement d
A Hierarchical Model For Device Placement, ICLR 2018
![Page 33: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/33.jpg)
Training with REINFORCE
Probability of predicted group assignment of operations
A Hierarchical Model For Device Placement, ICLR 2018
![Page 34: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/34.jpg)
Training with REINFORCE
Probability of predicted device placement conditioned on grouping results
A Hierarchical Model For Device Placement, ICLR 2018
![Page 35: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/35.jpg)
Gradient update for Grouper
Derivative w.r.t. parameters of Grouper
A Hierarchical Model For Device Placement, ICLR 2018
![Page 36: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/36.jpg)
Gradient update for Placer
Derivative w.r.t. parameters of Placer
A Hierarchical Model For Device Placement, ICLR 2018
![Page 37: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/37.jpg)
Results (runtime in seconds)
A Hierarchical Model For Device Placement, ICLR 2018
![Page 38: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/38.jpg)
Learned placements on NMT
A Hierarchical Model For Device Placement, ICLR 2018
![Page 39: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/39.jpg)
Profiling placement on NMT
A Hierarchical Model For Device Placement, ICLR 2018
![Page 40: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/40.jpg)
Learned placement on Inception-V3
A Hierarchical Model For Device Placement, ICLR 2018
![Page 41: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/41.jpg)
Profiling placement on Inception-V3
A Hierarchical Model For Device Placement, ICLR 2018
![Page 42: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/42.jpg)
Profiling placement on Inception-V3
A Hierarchical Model For Device Placement, ICLR 2018
![Page 43: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/43.jpg)
Overview
● Natural Language Processing○ Conversational Modeling (Best Paper Award at ICML Language
Generation Workshop, EMNLP 2017)○ Open-Source tf-seq2seq framework (4000+ stars, 1000+ forks), and
exploration of NMT architectures (EMNLP 2017, 100+ citations)● Deep Dive: ML for Systems
○ Device Placement with Deep Reinforcement Learning (ICLR 2018)
![Page 44: Research Talk - The Stanford Natural Language Processing Group · Overview Natural Language Processing Conversational Modeling (Best Paper Award at ICML Language Generation Workshop,](https://reader033.vdocuments.net/reader033/viewer/2022052720/5f09e8957e708231d42914d2/html5/thumbnails/44.jpg)
Questions?