combining image and language to predict and understand the...

Back Ground Results • Cast the prediction problem to Regression and Classification • Make usefulness prediction by combing two neural networks • CNN on an image associated with the review • Attention based Convolutional RNN on the text of the review • Analyze Attention weights to gain insight into what make a review useful Using 10.2K Reviews - Class 0: Reviews have 0 useful votes - Class 1: Reviews have >9 useful votes - A subset of images from the 200,000 business pictures provided by yelp # of ConvRNN Layer V.S. Classiﬁcation Accuracy 71% 73.25% 75.5% 77.75% 80% 0 1 2 3 4 5 # of ConvRNN Layer V.S. Regression Accuracy (RMS) 2.79 2.805 2.82 2.835 2.85 0 1 2 3 4 Classiﬁcation Results (All Model ) Accuracy 50% 57.5% 65% 72.5% 80% Image+ConvRNN ConvRNN RNN+Attention Bi-RNN SVM Regression Results (All Model ) RMSE 2.6 2.75 2.9 3.05 3.2 Image+ConvRNN ConVRNN RNN+Attention Bi-RNN Linear Regression Combining Image and Language to Predict and Understand the Usefulness of Yelp Reviews David Z Liu ([email protected]) Challenge • Predicting a reviews usefulness is important and challenging • Knowing the usefulness of a review in advance, businesses can recommend high quality and fresh reviews to customers and gain business insight • For reviews with the same length, No obvious features that directly indicate the usefulness of a document • Only a small amount of data have significant number of useful votes Approach Data Set Link Image With Text - Link a business photo to a review written about that business by match words in the image caption with words in the review text - Pick the image with the highest Jaccard similarity Model • Input to the model: • Written review to the RNN. Text are converted to 300 dimensional GloVe • Associated image to the CNN, converted to 64X64X3 • Output from the model: • Classification or regression prediction 64 x 64 x 3 32 x 32 x 8 16 x 16 x 16 8 x 8 x 64 4 x 4 x 64 I Love food and Images 300x300x1 200x200x16 100x100x16 100 (0 = bidirectional RNN without attention) Visualizing Attention Weight A typical review with the top 25 words greened base on attention intensity. Image + ConvRNN Bidirectional RNN Without Attention Bidirectional RNN With Attention Prediction ConvRNN Size corresponds to the attention weight

Upload: others

Post on 12-Jun-2020

6 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Combining Image and Language to Predict and Understand the ...cs231n.stanford.edu/reports/2017/posters/816.pdf · Combining Image and Language to Predict and Understand the Usefulness

Back Ground

Results

• Cast the prediction problem to Regression and Classification

• Make usefulness prediction by combing two neural networks

• CNN on an image associated with the review

• Attention based Convolutional RNN on the text of the review

• Analyze Attention weights to gain insight into what make a review useful

Using 10.2K Reviews - Class 0: Reviews have 0

useful votes- Class 1: Reviews have

>9 useful votes- A subset of images from

the 200,000 business pictures provided by yelp

# of ConvRNN Layer V.S. Classification Accuracy

71%

73.25%

75.5%

77.75%

80%

0 1 2 3 4 5# of ConvRNN Layer V.S.

Regression Accuracy (RMS)

2.79

2.805

2.82

2.835

2.85

0 1 2 3 4

Classification Results (All Model )

Accu

racy

50%

57.5%

65%

72.5%

80%

Image+ConvRNN

ConvRNN

RNN+Attention

Bi-RNN

SVMRegression Results

(All Model )

RMSE

2.6

2.75

2.9

3.05

3.2

Image+ConvRNN

ConVRNN

RNN+Attention

Bi-RNN

Linear Regression

Combining Image and Language to Predict and Understand the Usefulness of Yelp ReviewsDavid Z Liu ([email protected])

Challenge

• Predicting a reviews usefulness is important and challenging

• Knowing the usefulness of a review in advance, businesses can recommend high quality and fresh reviews to customers and gain business insight

• For reviews with the same length, No obvious features that directly indicate the usefulness of a document

• Only a small amount of data have significant number of useful votes

Approach

Data Set

Link Image With Text - Link a business photo to a review

written about that business by match words in the image caption with words in the review text

- Pick the image with the highest Jaccard similarity

Model

• Input to the model:• Written review to the RNN.

Text are converted to 300 dimensional GloVe

• Associated image to the CNN, converted to 64X64X3

• Output from the model:• Classification or regression

prediction

64x64x3

32x

32x8

16x

8x8x

4x4x

I Love food and Images

300x300x1

200x200x16

100x100x16

100

(0 = bidirectional RNN without attention)

Visualizing Attention Weight A typical review with the top 25 words greened base on attention intensity.

Image + ConvRNNBidirectional RNN Without Attention

Bidirectional RNN With Attention

Prediction

ConvRNN

Size corresponds to the attention weight

mailto:[email protected]

1 PSION+: Combining logical topology and physical layout ... · During generation of the logical topology we are unable to accurately predict important physical characteristics, e.g

Shedding Light on Philippine Poverty · National Economic Development Authority (NEDA) Jean, Neal, et al. "Combining satellite imagery and machine learning to predict poverty." Science

Predictive Models Lab RNR/Geog 420/520 Spring. Predictive Models zImportant to understand what we are attempting to predict yThese models predict location

Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Comparing and combining deterministic and ensemble forecasts: How to predict rainfall ... · 2018. 2. 13. · precipitation everywhere and not just in mountainous regions, observed

INTRODUCTION. AIM TO UNDERSTAND HOW CHINA DEVELOPED AS A STRONG COUNTRY AND PREDICT THE FUTURE PERFORMANCE

Using Topographic and Soils Data to Understand and Predict Field

Combining Knowledge and Data Mining to Understand Sentiment

Jo elian dimitrius - How to understand people and predict their behavior

Combining measurements and models to understand emissions ... · to understand emissions, fate, exposure and pharmacokinetics of POPs Matthew MacLeod ... 11. Challenge the robustness

Combining mesocosm and ﬁeld experiments to predict ...trevorcaughlin.com/wp-content/uploads/2017/02/...been lacking. Bayesian hierarchical models (BHMs) can facilitate the integration

Day 11-20 Ms. Burns. KBAD understand why we predict

Combining Homologous Recombination and Phosphopeptide … · 2018-10-31 · DNA Damage and Repair Combining Homologous Recombination and Phosphopeptide-binding Data to Predict the

Combining satellite imageryand machine learning to predict … · 2018. 2. 13. · RESEARCH ARTICLES ECONOMICS Combining satellite imageryand machine learning to predict poverty Neal

Goal 1: Study, understand, model & predict the impacts of land use & climate variability

Combining historic growth and climate data to predict growth response to climate change in balsam fir in the Acadian Forest region Elizabeth McGarrigle

Combining Chemoinformatics with Bioinformatics: In ... · combines chemoinformatics and bioinformatics analyses, to predict the ‘‘missing links’’ between compounds of interest

Combining Remote Sensing and Biological Data to Predict the Consequences of Climate Change on Hummingbird Diversity Catherine Graham, Susan Wethington,

COMBINING LOCAL KNOWLEDGE AND GLOBAL EXPERIENCE - … · understand the scope and specific requirements of every brief, combining business planning with engineering to ensure the

Problem: Understand/predict the dynamics of aquatic microbial populations. Aquatic Application: Goals and Approaches Approaches: Wireless, networked sensing

Combining structure and genomics to understand

SE313: Computer Graphics and Visual Programming - …€¦ · Web viewMake sure you understand both directions of thinking about combining transformations. Make sure you understand

Combining Machine Learning and Homology-Based ...Bioinformatics Combining Machine Learning and Homology-Based Approaches to Accurately Predict Subcellular Localization in Arabidopsis1[C][W][OA]

Science applications (monitor, detect, understand and predict) : CO 2 uptake by the ocean

Neural Networks to Predict Stock Market Price - iaeng. · PDF filetechnical analysis of stock market, ... economic and financial theory, combining technical ... Kalman filter for forecasting

A METHOD TO PREDICT AND UNDERSTAND FISH SURVIVAL … · a method to predict and understand fish survival under dynamic chemical stress using standard ecotoxicity data roman ashauer,*yzpernille

Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

Assessing Long-Term Wind Conditions by Combining … Measure-Correlate-Predict Algorithms Preprint ... stations, the concurrent wind speed and direction measurement was allocated to

Combining resource selection and movement behavior to predict

The Lean Business Model Gameboard: Deeply Understand and Predict Customer Behavior

Novel approaches to understand & predict non-obvious

ESI for: Combining EXAFS spectroscopy and Molecular ... · ESI for: Combining EXAFS spectroscopy and Molecular Dynamics simulations to understand the structural and dynamic properties

Wireless Communication Understand the trend of wireless communication Understand WAP protocol, PCS, GSM and CDMA Understand the power of combining mobile

Goals of Psychology Describe Control Understand Predict D-CUP

Using waterswap to predict and understand binding affinities