visual dialog - web.stanford.eduweb.stanford.edu/class/archive/cs/cs224n/cs224n... · aiding...

Visual Dialog Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M.F. Moura, Devi Parikh, Dhruv Batra Presented by: Alan Luo 1

Upload: others

Post on 17-Aug-2020

2 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Visual Dialog - web.stanford.eduweb.stanford.edu/class/archive/cs/cs224n/cs224n... · Aiding visually impaired users in understanding their surroundings or social media content Interacting

Visual DialogAbhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M.F. Moura, Devi Parikh, Dhruv Batra

Presented by: Alan Luo

Page 2: Visual Dialog - web.stanford.eduweb.stanford.edu/class/archive/cs/cs224n/cs224n... · Aiding visually impaired users in understanding their surroundings or social media content Interacting

Introduction Natural Language Processing + Computer Vision

● Aiding visually impaired users in understanding their surroundings or social media content

● Interacting with an AI assistant

Page 3: Visual Dialog - web.stanford.eduweb.stanford.edu/class/archive/cs/cs224n/cs224n... · Aiding visually impaired users in understanding their surroundings or social media content Interacting

Video Captioning

Related Work Image/Video Captioning Image Captioning

Page 4: Visual Dialog - web.stanford.eduweb.stanford.edu/class/archive/cs/cs224n/cs224n... · Aiding visually impaired users in understanding their surroundings or social media content Interacting

Datasets

Related Work Visual-Semantic Alignments Visual-Semantic Alignments

Page 5: Visual Dialog - web.stanford.eduweb.stanford.edu/class/archive/cs/cs224n/cs224n... · Aiding visually impaired users in understanding their surroundings or social media content Interacting

Related Work Visual Q&A

Page 6: Visual Dialog - web.stanford.eduweb.stanford.edu/class/archive/cs/cs224n/cs224n... · Aiding visually impaired users in understanding their surroundings or social media content Interacting

Contributions1. Propose a new AI task: Visual Dialog

2. Develop a novel two-person chat data-collection protocol and introduce a new dataset

3. Introduce a family of neural encoder-decoder models for Visual Dialog

Page 7: Visual Dialog - web.stanford.eduweb.stanford.edu/class/archive/cs/cs224n/cs224n... · Aiding visually impaired users in understanding their surroundings or social media content Interacting

Technical Details With Late Fusion Encoder

Page 8: Visual Dialog - web.stanford.eduweb.stanford.edu/class/archive/cs/cs224n/cs224n... · Aiding visually impaired users in understanding their surroundings or social media content Interacting

Qualitative Quantitative

Dataset VisDial

Page 9: Visual Dialog - web.stanford.eduweb.stanford.edu/class/archive/cs/cs224n/cs224n... · Aiding visually impaired users in understanding their surroundings or social media content Interacting

ResultsQualitative Results

Quantitative Results

CS224NFinalProject - web.stanford.eduweb.stanford.edu/class/cs224n/posters/15791892.pdftraining set and dev set for SQuAD2.0 because the test set is kept secret and we split the

ReinforcementLearning for NLP - web.stanford.eduweb.stanford.edu/class/cs224n/lectures/lecture16-guest.pdf · Outline Introduction to Reinforcement Learning Policy-basedDeep RL Value-basedDeep

web.stanford.eduweb.stanford.edu/~kiparsky/Papers/WordFormationMALC1982.pdf · web.stanford.edu

web.stanford.eduweb.stanford.edu/~knutson/ans/bechara95.pdf · web.stanford.edu

Natural Language Processing with Deep Learning CS224N ...web.stanford.edu/class/cs224n/slides/cs224n-2021-lecture...PP attachment ambiguities multiply •A key parsing decision is

Natural Language Processing with Deep Learning CS224n

Natural Language Processing with Deep Learning CS224N/Ling284web.stanford.edu/class/cs224n/slides/cs224n-2020... · Neural graph-based parser: Dozat and Manning (2017) 3.Constraint

Natural Language Processing with Deep Learning CS224N/Ling284web.stanford.edu/class/cs224n/slides/cs224n-2020-lecture02-wordve… · Natural Language Processing with Deep Learning

web.stanford.eduweb.stanford.edu/class/cs224n/reports/6905745.pdf · Question paramount paid fo , 10 cloverfield lane and which other film trailer to be aired during the game ? True

CS224N/Lin4 with Deep Learning tural Language Pr ocessingweb.stanford.edu/class/cs224n/slides/cs224n-2019-lecture06-rnnlm.pdf · longer, but this slide doesn’t have space! hidden

MULTIPLE CRITERIA DECISION MAKINGMAKING/ AIDING/ AIDING …sydney.edu.au/business/__data/assets/pdf_file/0004/105745/jacekz... · multiple criteria decision makingmaking/ aiding

web.stanford.eduweb.stanford.edu/class/cs224n/reports/6839297.pdfThis simple index method would definitely bring about incorrect results. More efficient methods make use of audio features

cs224n-2018-lecture12-Transformers and CNNs

CS224N/Lin4 with Deep Learning tural Language Pr ocessingweb.stanford.edu/class/cs224n/slides/cs224n-2019-lecture07-fancy-rnn.pdfNatural Language Processing with Deep Learning CS224N/Ling284

Aiding Algeria

web.stanford.eduweb.stanford.edu/~rehall/Flat Tax 1995.pdfweb.stanford.edu

CS224N Python Introduction - Stanford Universityweb.stanford.edu/class/cs224n/readings/cs224n-python-review-20.pdf · CS224N Python Introduction. Plan for Today Intro to Python Installing

Natural Language Processing with Deep Learningweb.stanford.edu/class/cs224n/slides/cs224n-2020-lecture... · 2020-02-07 · Lecture 10: (Textual) Question Answering Architectures,

CS224n: Natural Language Processing with Deep Learning …web.stanford.edu/class/cs224n/readings/cs224n-2019-notes... · 2020-01-02 · cs224n: natural language processing with deep

cs224n-python-review-code-updatedweb.stanford.edu/class/cs224n/readings/cs224n-python... · 2021. 1. 19. · Recommended IDEs Spyder (in-built in Anaconda) Pycharm (the most popular

web.stanford.eduweb.stanford.edu/class/cs224n/reports/6880085.pdf · Abstract Humans receive ... giving it advice such as "climb the ladder" to help direct it's exploration. ... The

CS224n: Natural Language Processing with Deep Learning ...web.stanford.edu/class/cs224n/readings/cs224n-2019-notes02-wordvecs2.pdfcs224n: natural language processing with deep learning

CS224N Final Project Geo-location Route Recognition

CS224N/Ling284web.stanford.edu/class/cs224n/slides/cs224n-2020-lecture04-neural… · Make sure to get help if you need it Visit office hours Friday/Tuesday Note: ... Later we discuss

web.stanford.eduweb.stanford.edu/class/cs224n/reports/6878267.pdf2 Model In this section, we introduce our model architecture that consists of an embedding layer, encoding layer, bi-directional

CS224N/Lin4 with Deep Learning tural Language Pr …web.stanford.edu/class/cs224n/slides/cs224n-2019-lecture...Natural Language Processing with Deep Learning CS224N/Ling284 Lecture

Natural Language Processing with Deep Learning CS224N/Ling284web.stanford.edu/class/cs224n/slides/cs224n-2019-lecture01-wordvecs1.pdf · What’s different this year? •Lectures

web.stanford.eduweb.stanford.edu/class/cs224n/reports/6837517.pdf · This project focuses on studying the effects of three different kinds of neural network ... time series 3.3

Natural Language Processing with Deep Learning …web.stanford.edu/class/cs224n/lectures/lecture2.pdf · Natural Language Processing with Deep Learning CS224N/Ling284 ... Lecture

Natural Language Processing with Deep Learning CS224N/Ling284web.stanford.edu/class/cs224n/slides/cs224n-2020... · • Commonly in NLP deep learning: • We learn both W and word

Natural Language Processing with Deep Learning CS224N/Ling284web.stanford.edu/class/cs224n/slides/cs224n-2020... · 1.The course (10 mins) 2.Human language and word meaning (15 mins)

CS224N Research Highlight - Stanford University

web.stanford.eduweb.stanford.edu/class/cs224n/reports/6907018.pdfWe achieve an aggregate accuracy (Acc ) of 92.5%, and SELECT column accuracy (Accsel) of agg 94.02%. 2 Background/Related

CS224n: Natural Language Processing with Deep Learning

CS224n, Winter 2019web.stanford.edu/class/cs224n/posters/15785384.pdf · Improving coreference resolution by learning entity-level distributed representations. CoRR, abs/1606.01323