supplementary material reasoning about fine...

Supplementary Material

Reasoning about Fine-grained Attribute Phrases using Reference Games

1. Annotation interface for user study

We gathered responses of human annotators for the task ofthe listener in the RG on Amazon mechanical turk usingthe interface shown in Fig. 1. Annotators are asked to se-lect if the description refers to the “Left image”, “Right im-age”, or “I’m not sure”. Each worker is paid $0.10 to anno-tate a single group consisting of 10 descriptions generatedby speakers. Three workers are independently recruited foreach task.

Instructions:

Please check if the description refers to the airplane in the left image or right image.If you are not sure which of two then click "I'm not sure".If there are more than one airplanes in an image consider the most prominent one.

This property "commercial" refers to:Left image Right image I'm not sure

page 1 of 10

Previous Next

Figure 1. MTurk interface for user annotations

2. Additional dataset details

Some more details of the dataset are provided. Most ofthe attribute phrases have two words, and the longest is 12words long. The histogram of the phrase lengths in thetraining set is shown in Figure 2. Additional examples ofannotations are shown in Figure 3. Table 1 shows the top 20most frequent attribute phrases, and attribute phrase pairs.

3. Additional results

Visualizing attribute phrases. Here we show more visu-alizations of the space of the attribute phrases. Figure 4 isthe detailed version of Figure 6 in the paper. The embeddedspace of the contrastive phrases “P1 vs. P2” obtained us-ing our discerning listener DL model in Figure 5. Figure 6shows the embedding of images obtained by the SL. Phrases

Figure 2. Histogram of the length of the attribute phrases in thetraining set of our dataset.

with similar semantic meanings, or images with same at-tributes, are clustered together.

Image retrieval with descriptive attributes. Figure 7shows additional image retrieval results using SL (extensionof Figure 7 in the paper.)

Comparing speakers Figure 8 for more examples thatcompare the simple and the pragmatic speakers. The lastimage pair is a challenging example where two images arevery similar and the target image is misleading (the propel-lor looks like being on the wings but is in fact on the nose).SS fails on this case with most of generated phrases to betrue to both images. DS successfully describes the majordifference of wings and number of seats, and SLr improvesthe ordering.

Attribute-based explanations for differences between

two categories. Figure 9 shows additional examples ofattributes generated as differences between two categories(more examples of Figure 8 in the paper.) The first and sec-ond example show that different phrases are generated forone category when it is compared to different categories.When compared with “A380”, “Falcon 900” is consideredsmall (DS generates “less windows”); When compared with“DR-400”, “Falcon 900” is considered large (DS generates“large plane”). It reveals that DS has learnt the relative na-ture of phrases.

1

IndustrialGrounded

GreyPropellers

Few Windows

CommercialFlying

Green and whiteEngines

Many windows

vs.vs.vs.vs.vs.

Blue colorSmall in size

Facing forwardPropeller engine

Without logo

White colorBig in size

Facing left sideTurbofan engine

With logo

vs.vs.vs.vs.vs.

YellowOn the ground

Facing leftEnclosed cockpitOne set of wings

vs.vs.vs.vs.vs.

RedFlying

Facing rightOpen cockpit

Two sets of wings

gray colortwo propellers

flying in airlarge plane

lots of windows on side

large passenger jetlinermany windows on body

rounded nosefour jet engines

moveable landing gear

vs.vs.vs.vs.vs.

vs.vs.vs.vs.vs.

open cockpiton the ground

privately owned planeno weaponsblue color

vs.vs.vs.vs.vs.

yellow and purple colorone propellersitting in grass

small planeone window for pilot

small privately owned planefew windows on body

pointed noseone propeller

fixed landing gear

closed cockpitin flight

military planevisible weapons

grey color

Figure 3. More example annotations from our dataset.

Phrases Freq. Phrase pairs Freq.1 facing left 1258 facing right VS facing left 6032 facing right 1214 facing left VS facing right 5403 on the ground 785 on the ground VS in the air 1984 private plane 647 in the air VS on the ground 1655 small plane 550 commercial plane VS private plane 1586 commercial plane 516 private plane VS commercial plane 1557 in the air 458 large plane VS small plane 1108 white color 402 on the ground VS flying in the air 1049 white 376 propellor engine VS turbofan engine 98

10 turbofan engine 328 small plane VS big plane 9211 propellor engine 310 big plane VS small plane 9112 propeller engine 291 flying in the air VS on the ground 9013 single engine 289 small VS large 8714 on ground 288 in air VS on ground 8515 flying in the air 281 outside VS inside 8516 military plane 252 turbofan engine VS propellor engine 8417 small 240 large VS small 8318 large plane 238 small plane VS large plane 8119 jet engine 233 on ground VS in air 8120 big plane 233 inside VS outside 68

Table 1. Top 20 attribute phrases and contrastive attribute phrases from the training set in our dataset.

The last example is a challenging one with two very similarcategories. The model fails in a pattern of describing undis-tinguishable attributes (engine, stabilizer) and attributes ir-relative with categories (color, on ground or not). It alsoemphasizes that “757-200” is smaller than “A310” , but infact they have similar size.

Figure 4. t-SNE embedding of attribute phrases from our simple listener (SL) model.

Figure 5. t-SNE embedding of contrastive attribute phrases, e.g. “P1 vs. P2”, from our discerning listener (DL) model.

3/22/2017 TensorBoard

http://localhost:6006/ 1/1

TensorBoard SCALARS IMAGES AUDIO GRAPHS DISTRIBUTIONS HISTOGRAMS EMBEDDINGS

DATA

Checkpoint: ./embedding.ckpt

Metadata: metadata_image.tsv

T-SNE PCA CUSTOM

Dimension 2D

Perplexity 30

Learning

rate10

Re‑run Stop

Iteration: 2016

How to use t-SNE effectively.

Show AllData

Isolate 198points

Clearselection

1 tensor found

image_feat

Label by

Name

Color by

No color map

Sphereize data

3D

Points: 200 Dimension: 1024

Search

0 .*

by

Name

BOOKMARKS (1)

State 0

Figure 6. t-SNE embedding of 200 randomly selected images using our simple listener (SL) model. Images have same attributes areclustered together. For example, highlighted five boxes in the figure have the attribute “private plane on grass”, “private plane in the air”,“passenger plane in the air”, passenger plane on runway”, and “in hangar”.

propeller plane; in hangar wing on top stabilizer on top of the tail

Figure 7. Top 18 images ranked by the listener for various attribute phrases as queries (shown on top). We rank the images by the scoresfrom the simple listener SL on the concatenation of the attribute phrases. The images are ordered from top to bottom, left to right.

SS + SLr:✔ fighter plane✔ small plane✔ military plane✔ no windows on body✔ on the ground✔ on ground? facing right✔ grey✔ jet engine? outside

DS + SLr:✔ single engine✔ two seater plane? propellor engine✔ small✔ small plane✔ small plane✔ military plane✔ few windows? facing right✔ grey color

SS + SLr:✔ jet fighter✔ fighter jet✔ jet engine✔ military plane✔ military? turbofan engine✔ facing left✘ on the ground✘ single engine✘ facing right

DS + SLr:✔ jet engine✔ on runway✔ black in color✔ military plane✔ turbofan engine✔ army plane✔ facing left ✔ gray in color ✔ gray color✘ facing right

SS + SLr:? military plane? facing left? open cockpit? military? propellor engine? in flight✘ single person aircraft? _UNK between front wheels✔ no cone on nose? single propellor engine

DS + SLr:? twin propellor✔ 2 seater✔ 2 seater? facing the left✔ two seater✔ single wing✘ three wheels? no fan in nose✔ one set of wings in front? green and yellow

SS:? facing right✔ military plane✔ on ground✔ no windows on body✔ on the ground✔ small plane? outside✔ grey✔ jet engine✔ fighter plane

DS:? facing right✔ small plane✔ small ✔ grey color✔ military plane✔ small plane? propellor engine✔ single engine✔ few windows✔ two seater plane

SS:✔ facing left✔ jet engine✔ jet fighter✔ military✘ on the ground✔ fighter jet✔ military plane? turbofan engine✘ single engine✘ facing right

DS:✔ army plane✘ facing right✔ military plane✔ jet engine✔ gray color✔ facing left✔ on runway✔ turbofan engine✔ gray in color✔ black in color

SS:? military plane? propellor engine? facing left? in flight? open cockpit? _UNK between front wheels? single propellor engine✘ single person aircraft✔ no cone on nose? military

DS:? facing the left✘ three wheels? twin propellor✔ two seater? green and yellow✔ one set of wings in front✔ 2 seater? no fan in nose✔ single wing✔ 2 seater

Figure 8. More pragmatic speaker results. Given the image pairs in the left as input, we use SS and DS to generate phrases, and then useSLr to rerank them. SLr only takes the descriptions targeted at images in green boxes as input. Green checks mean human listener pickscorrect image with majority vote, X marks mean human listener picks opposite image with majority vote, and question marks mean humanlistener is uncertain which image is referred to.

DR-400 Falcon 900

propeller enginesmall plane

single enginepropellor engine

prop planeprivate planefew windowssmaller plane

propellerpropeller plane

commercial planetwin engine

turbofan enginejet engine

medium planetwo engineslarge plane

turbofan engineswhite

jet plane

large planeengines under wings

stabilizer on bottom of tailrounded nosemore windows

largewhite with blue and red

low wingcommercial

more windows on body

medium planeless windowsprivate plane

high wingengines next to the tail

fewer windowsmedium

fewer windows on bodystabilizer on top of tail

pointed nose

A380 Falcon 900

757-200 A310

medium planetwo enginessmall plane

fewer windows on bodyon the groundsmaller plane

red white and bluestabilizer on top of tail

on groundsmall commerical plane

commercial planelarge planebig plane

stabilizer on bottom of tailstabilizer on the bottom of the tail

air francelargewhite

twin enginein the air

Figure 9. More examples of attribute-based explanations for visual differences between two categories. Phrases are generated by DS andsorted by their occurrence frequency.

supplementary material reasoning about fine...

Documents