multimodal deep learning - max-planck-institut für ... · multimodal deep learning zeynep akata...

1
Multimodal Deep Learning Zeynep Akata Zero-Shot Learning Latent Embeddings for Zero-Shot Image Classification Xian et.al., CVPR’16 & CVPR’17 W Large error WWLinear compatibility function: large errors (left). Piecewise-linear: significantly improves results (right). Multi-Cue Zero-Shot Learning with Strong Supervision Akata et.al., CVPR’16 Class Embedding F Image Embedding Blue Jay Albatross black long tail blue cone beak Attributes: costly but good, W2V: cheap but weak. Strong visual supervision: to compensate weak W2V. Learning Deep Representations of Fine-Grained Visual Descriptions Reed et.al., CVPR’16 2 4 6 8 10 # of train sentences per image 30 35 40 45 50 55 60 Top-1 Acc. (in %) Zero-Shot in CUB Ours (word) Ours (char) LSTM TCNN Attributes CNN-RNN: fast + models sequence of words or characters With >4 sentences: outperforms SoA with attributes Gaze Embeddings for Zero-Shot Image Classification Karessli et.al., CVPR’17 Original image Gaze points Gaze Features with Grid (GFG) Gaze histogram (GH) 0 0 0 0 0 0 2 13 22 Raw gaze data 0 22 0 0 13 0 0 2 0 Gaze Features without Grid (GFS) GH Embedding per class GFS Embedding per class Gaze heatmap Outlier removal Gaze data collection GFG Embedding per class + x y 1 2 3 d ... x 9 + x y d ... x 3 1 2 3 Generating: Vision + Language Generarive Adversarial Text to Image Synthesis Reed et.al. ICML’16 GAN conditioned on sentences: real/fake, matching/not a tiny bird, with a tiny beak, tarsus and feet, a blue crown, blue coverts, and black cheek patch this small bird has a yellow breast, brown crown, and black superciliary an all black bird with a distinct thick, rounded bill. this bird is different shades of brown all over with white and black spots on its head and back GAN - CLS GAN - INT GAN GAN - INT - CLS the gray bird has a light grey head and grey webbed feet GT Generates pixels from characters: intuitive Language compensates lack of large # training images Learning What and Where to Draw Reed et.al. NIPS’16 This bird has a yellow head, black eyes, a gray pointy beak and orange lines on its breast. This water bird has a long white neck, black body, yellow beak and black head. This bird is large, completely black, with a long pointy beak and black eyes. This bird is completely red with a red and cone-shaped beak, black face and a red nape. This white bird has gray wings, red webbed feet and a long, curved and yellow beak. This small bird has a blue and gray head, pointy beak and a white belly. GT GT GT GT GT GT Generating Visual Explanations Hendricks et.al. ECCV’16 D: this bird has a white breast black wings and a red spot on its head. E: this is a white bird with a black wing and a black and white striped head. D: this bird has a white breast black wings and a red spot on its head. E: this is a black and white bird with a red spot on its crown. This is a Downy Woodpecker because... This is a Downy Woodpecker because... Class + image conditional LSTM & Reinforcement Loss Learns to mention class-specific and visible properties

Upload: dokien

Post on 06-May-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multimodal Deep Learning - Max-Planck-Institut für ... · Multimodal Deep Learning Zeynep Akata Zero-Shot Learning Latent Embeddings for Zero-Shot Image Classification Xian et.al.,

Multimodal Deep Learning

Zeynep Akata

Zero-Shot Learning

Latent Embeddings for Zero-Shot Image ClassificationXian et.al., CVPR’16 & CVPR’17

WLarge

error

W₁

W₂

Linear compatibility function: large errors (left).

Piecewise-linear: significantly improves results (right).

Multi-Cue Zero-Shot Learning with Strong SupervisionAkata et.al., CVPR’16

Class

Embedding

F

Image

Embedding

Blue Jay Albatross

black

long tailblue

cone beak

Attributes: costly but good, W2V: cheap but weak.

Strong visual supervision: to compensate weak W2V.

Learning Deep Representations of Fine-Grained Visual

Descriptions Reed et.al., CVPR’16

2 4 6 8 10

# of train sentences per image

30

35

40

45

50

55

60

To

p-1

Acc.

(in

%)

Zero-Shot in CUB

Ours (word)Ours (char)LSTMTCNNAttributes

CNN-RNN: fast + models sequence of words or characters

With >4 sentences: outperforms SoA with attributes

Gaze Embeddings for Zero-Shot Image Classification

Karessli et.al., CVPR’17

Original image Gaze pointsGaze Features

with Grid (GFG)

Gaze histogram (GH)

0

0

0

0

0

02

13

22Raw gaze data

0

22

0

0

13

0

0

2

0

Gaze Features

without Grid (GFS)

GH Embedding per class

GFS Embedding per class

Gaze heatmap

Outlier

removal

Gaze data

collection

GFG Embedding per class+

x

y

�₁1

2

3

�₂

d

...

x 9

+x

y

d

...

x 3

�₁1

2

3

�₂

Generating: Vision + Language

Generarive Adversarial Text to Image SynthesisReed et.al. ICML’16

GAN conditioned on sentences: real/fake, matching/not

a tiny bird, with a

tiny beak, tarsus and

feet, a blue crown,

blue coverts, and

black cheek patch

this small bird has

a yellow breast,

brown crown, and

black superciliary

an all black bird

with a distinct

thick, rounded bill.

this bird is different

shades of brown all

over with white and

black spots on its

head and back

GAN - CLS

GAN - INT

GAN

GAN - INT

- CLS

the gray bird has a

light grey head and

grey webbed feetGT

Generates pixels from characters: intuitive

Language compensates lack of large # training images

Learning What and Where to Draw Reed et.al. NIPS’16

This bird has a yellow head, black eyes, a gray pointy beak and orange lines on its breast.

This water bird has a long white neck, black body, yellow beak and black head.

This bird is large, completely black, with a long pointy beak and black eyes.

This bird is completely red with a red and cone-shaped beak, black face and a red nape.

This white bird has gray wings, red webbed feet and a long, curved and yellow beak.

This small bird has a blue and gray head, pointy beak and a white belly.

GT

GT

GT

GT

GT

GT

Generating Visual Explanations Hendricks et.al. ECCV’16

D: this bird has a white breast

black wings and a red spot on its

head.

E: this is a white bird with a

black wing and a black and white

striped head.

D: this bird has a white breast

black wings and a red spot on its

head.

E: this is a black and white bird

with a red spot on its crown.

This is a Downy Woodpecker because... This is a Downy Woodpecker because...

Class + image conditional LSTM & Reinforcement Loss

Learns to mention class-specific and visible properties