interactive language acquisition with one-shot visual ... · one-shot learning has been...

26
Interactive Language Acquisition with One-shot Visual Concept Learning through a Conversational Game Haichao Zhang , Haonan Yu, and Wei Xu Baidu Research - Institute of Deep Learning, Sunnyvale USA Presenter: Zhexiong Liu

Upload: others

Post on 04-Jul-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Interactive Language Acquisition with One-shot Visual ... · One-shot learning has been investigated in image classification Limitation Obtain transferable speaking and one-shot ability

Interactive Language Acquisition with One-shot Visual Concept Learning through a Conversational Game

Haichao Zhang , Haonan Yu, and Wei Xu

Baidu Research - Institute of Deep Learning, Sunnyvale USA

Presenter: Zhexiong Liu

Page 2: Interactive Language Acquisition with One-shot Visual ... · One-shot learning has been investigated in image classification Limitation Obtain transferable speaking and one-shot ability

Outline

1. Introduction

2. Related work

3. Conversational game

4. Proposed approach

5. Experimental result

6. Conclusion

Page 3: Interactive Language Acquisition with One-shot Visual ... · One-shot learning has been investigated in image classification Limitation Obtain transferable speaking and one-shot ability

Introduction: Task description

Learn an intelligent agent that can communicate with as well as learn from humans.

Communication

One-shot Learning

Page 4: Interactive Language Acquisition with One-shot Visual ... · One-shot learning has been investigated in image classification Limitation Obtain transferable speaking and one-shot ability

Introduction: Motivation from humans

• Humans can learn from the consequences of the responses in the form of verbal and behavioral feedback

• Humans have shown an ability to learn new concepts from small amount of data

• The agents should have skills of actively seeking, memorizing, and developing the one-shot learning ability

Page 5: Interactive Language Acquisition with One-shot Visual ... · One-shot learning has been investigated in image classification Limitation Obtain transferable speaking and one-shot ability

Related work: Achievement & limitation

Approach Supervised Language Learning Reinforcement Learning for Sequences

Achievement Capture the statistics of training data

Select actions from a candidate sequences set

Limitation Less flexible for acquiring new knowledge without retraining

Learn to generate a new sequence action

Page 6: Interactive Language Acquisition with One-shot Visual ... · One-shot learning has been investigated in image classification Limitation Obtain transferable speaking and one-shot ability

Related work: Achievement & limitation

Approach Communication and Emergence of Language

One-shot Learning and Active Learning

Achievement Use a guesser-responder setting to achieve goal

One-shot learning has been investigated in image classification

Limitation Obtain transferable speaking and one-shot ability

Target language and one-shot learning via conversational interaction

Page 7: Interactive Language Acquisition with One-shot Visual ... · One-shot learning has been investigated in image classification Limitation Obtain transferable speaking and one-shot ability

Related work: Challenge of recent approaches

• Hardly flexible for acquiring new knowledge without inefficient retraining or catastrophic forgetting

• Applications require rapid learning from a small dataset

Page 8: Interactive Language Acquisition with One-shot Visual ... · One-shot learning has been investigated in image classification Limitation Obtain transferable speaking and one-shot ability

Conversational Game: Game Rule

• Participants: teacher & learner/agent• Strategies: teach & learn a concept

Page 9: Interactive Language Acquisition with One-shot Visual ... · One-shot learning has been investigated in image classification Limitation Obtain transferable speaking and one-shot ability

Conversational Game: Teacher Speaks

Teacher randomly selects an

object and interacts with the

learner about the object.

• posing a question

• saying nothing

• making a statement

Page 10: Interactive Language Acquisition with One-shot Visual ... · One-shot learning has been investigated in image classification Limitation Obtain transferable speaking and one-shot ability

Conversational Game: Learner Reward

Leaner interact with the teacher’s response

• Raise a question à reward (+1)

• Correct statement à reward (+1)

• Incorrect responses à reward (-1)

• Say nothing/silence à reward (-1)

Page 11: Interactive Language Acquisition with One-shot Visual ... · One-shot learning has been investigated in image classification Limitation Obtain transferable speaking and one-shot ability

Proposed approach: Imitation & Reinforce

Imitation Function:

• Agent perceives sentences and images, and save extracted information for later use

Reinforce Function:

• Agent leverages feedback from the teacher to converse adaptively by adjusting the action policy

Page 12: Interactive Language Acquisition with One-shot Visual ... · One-shot learning has been investigated in image classification Limitation Obtain transferable speaking and one-shot ability

Proposed approach: Network Structure

Page 13: Interactive Language Acquisition with One-shot Visual ... · One-shot learning has been investigated in image classification Limitation Obtain transferable speaking and one-shot ability

Proposed approach: Imitation Learning

Imitation is achieved by predicting probability of teacher’s future sentences with the image as well as conversation history

equals to the multiplication of each predicting words in the sentences

Where is the is the last state of the RNN, which is the summarization of

Page 14: Interactive Language Acquisition with One-shot Visual ... · One-shot learning has been investigated in image classification Limitation Obtain transferable speaking and one-shot ability

Proposed approach: Imitation Learning

The generation of next word is adapted from • the predictive distribution of next word• the information in the external memory

Where and represent the probability of next word as well as word in external

Page 15: Interactive Language Acquisition with One-shot Visual ... · One-shot learning has been investigated in image classification Limitation Obtain transferable speaking and one-shot ability

Proposed approach: Reading Memory

A visual encoder implemented as a CNN is used to encode the visual

image into a visual key

Where and are memories for visual and sentence modalities

Page 16: Interactive Language Acquisition with One-shot Visual ... · One-shot learning has been investigated in image classification Limitation Obtain transferable speaking and one-shot ability

Proposed approach: Writing memory

Memory write is similar to reading but with a content importance gate

controlling whether the content should be written into memory.

Where are the memory, content, and gate respectively.

For the visual content . For the sentence content ,

where is the word embedding, is the attention vector in BiLSTM

Page 17: Interactive Language Acquisition with One-shot Visual ... · One-shot learning has been investigated in image classification Limitation Obtain transferable speaking and one-shot ability

Proposed approach: Reinforcement Learning

Generate the agent response from a distribution over all possible sequences

Share the parameters of imitation network with a modulator to learn in an adaptive manner

Policy is adjusted by maximizing expected future reward

Page 18: Interactive Language Acquisition with One-shot Visual ... · One-shot learning has been investigated in image classification Limitation Obtain transferable speaking and one-shot ability

Experiments: Dataset

• Text dataset

Page 19: Interactive Language Acquisition with One-shot Visual ... · One-shot learning has been investigated in image classification Limitation Obtain transferable speaking and one-shot ability

Experiment: Dataset

• 40 animal classes with 408 images in total, with about 10 images per class

• The fruit dataset contains 16 classes and 48 images in total with 3 images per class.

• Train on animal test on fruit

Page 20: Interactive Language Acquisition with One-shot Visual ... · One-shot learning has been investigated in image classification Limitation Obtain transferable speaking and one-shot ability

Experiments: Setup

The training algorithm is implemented on the deep learning platform PaddlePaddle.

• batch size is 16

• learning rate is 1×10−5

• weight decay rate is 1.6 × 10−3

• word embedding dimension d is 1024

• visual image size is 32×32

Page 21: Interactive Language Acquisition with One-shot Visual ... · One-shot learning has been investigated in image classification Limitation Obtain transferable speaking and one-shot ability

Experiments: Baseline model

• Reinforce: the same network structure as the proposed model and trained using RL only.

• Imitation: the same structure as proposed model and trained using Imitation only

• Imitation + Gaussian + RL: a joint imitation and reinforcement method using a Gaussian policy

Page 22: Interactive Language Acquisition with One-shot Visual ... · One-shot learning has been investigated in image classification Limitation Obtain transferable speaking and one-shot ability

Experiments: Learning on word

• Proposed reaches the highest success rate (97.4%) and average reward (+1.1)

Page 23: Interactive Language Acquisition with One-shot Visual ... · One-shot learning has been investigated in image classification Limitation Obtain transferable speaking and one-shot ability

Experiments: Learning with Image Variations

• Objective: testify the impact of within-class image variations on one-shot learning

Model trained without (a, c) and with (b, d) image variations on the the Animal dataset.

Page 24: Interactive Language Acquisition with One-shot Visual ... · One-shot learning has been investigated in image classification Limitation Obtain transferable speaking and one-shot ability

Experiments: Learning on Sentence

• Extract useful information which could appear at different locations of the sentence

• Learner has to adaptively fuse information from RNN and external memory to generate a complete sentence.

Page 25: Interactive Language Acquisition with One-shot Visual ... · One-shot learning has been investigated in image classification Limitation Obtain transferable speaking and one-shot ability

Experiments: Proposed Approach

Page 26: Interactive Language Acquisition with One-shot Visual ... · One-shot learning has been investigated in image classification Limitation Obtain transferable speaking and one-shot ability

Conclusion

• Contribution• The author presented an effective approach for grounded language

acquisition with one-shot visual concept learning through joint imitation and reinforcement learning.

• Limitation• While offering flexibility in training, a synthetic task has limited

amount of variation compared to real-world scenarios with natural languages.