visual dialog - web.stanford.eduweb.stanford.edu/class/archive/cs/cs224n/cs224n... · aiding...

9
Visual Dialog Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M.F. Moura, Devi Parikh, Dhruv Batra Presented by: Alan Luo 1

Upload: others

Post on 17-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Visual Dialog - web.stanford.eduweb.stanford.edu/class/archive/cs/cs224n/cs224n... · Aiding visually impaired users in understanding their surroundings or social media content Interacting

Visual DialogAbhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M.F. Moura, Devi Parikh, Dhruv Batra

Presented by: Alan Luo

1

Page 2: Visual Dialog - web.stanford.eduweb.stanford.edu/class/archive/cs/cs224n/cs224n... · Aiding visually impaired users in understanding their surroundings or social media content Interacting

Introduction Natural Language Processing + Computer Vision

● Aiding visually impaired users in understanding their surroundings or social media content

● Interacting with an AI assistant

2

Page 3: Visual Dialog - web.stanford.eduweb.stanford.edu/class/archive/cs/cs224n/cs224n... · Aiding visually impaired users in understanding their surroundings or social media content Interacting

Video Captioning

Related Work Image/Video Captioning Image Captioning

3

Page 4: Visual Dialog - web.stanford.eduweb.stanford.edu/class/archive/cs/cs224n/cs224n... · Aiding visually impaired users in understanding their surroundings or social media content Interacting

Datasets

Related Work Visual-Semantic Alignments Visual-Semantic Alignments

4

Page 5: Visual Dialog - web.stanford.eduweb.stanford.edu/class/archive/cs/cs224n/cs224n... · Aiding visually impaired users in understanding their surroundings or social media content Interacting

5

Related Work Visual Q&A

Page 6: Visual Dialog - web.stanford.eduweb.stanford.edu/class/archive/cs/cs224n/cs224n... · Aiding visually impaired users in understanding their surroundings or social media content Interacting

Contributions1. Propose a new AI task: Visual Dialog

2. Develop a novel two-person chat data-collection protocol and introduce a new dataset

3. Introduce a family of neural encoder-decoder models for Visual Dialog

6

Page 7: Visual Dialog - web.stanford.eduweb.stanford.edu/class/archive/cs/cs224n/cs224n... · Aiding visually impaired users in understanding their surroundings or social media content Interacting

Technical Details With Late Fusion Encoder

7

Page 8: Visual Dialog - web.stanford.eduweb.stanford.edu/class/archive/cs/cs224n/cs224n... · Aiding visually impaired users in understanding their surroundings or social media content Interacting

Qualitative Quantitative

8

Dataset VisDial

Page 9: Visual Dialog - web.stanford.eduweb.stanford.edu/class/archive/cs/cs224n/cs224n... · Aiding visually impaired users in understanding their surroundings or social media content Interacting

ResultsQualitative Results

9

Quantitative Results