multimodal deep learning - github pages · 2021. 4. 11. · interpretability in multimodal deep...

34
Multimodal Deep Learning Purvanshi Mehta [email protected]

Upload: others

Post on 29-Aug-2021

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Multimodal Deep Learning

Purvanshi [email protected]

Page 2: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Table of contents

● Introduction● Challenges

○ Representation○ Translation○ Alignment○ Fusion○ Co-Learning

● Interpretability in Multimodal Deep Learning

Page 3: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Aim of the presentation

● Identify challenges particular to Multimodal Learning● Popular research topics in the field● Brief of the problem I have been working on - Interpretability in

Multimodal Deep Learning

Page 4: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Multimodal Learning - Heterogeneous Information Sources

Page 5: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Challenge - 1) Representation

● How to combine the information from multiple sources?● How to deal with different levels of noise?● How to deal with missing data?

Page 6: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

1) Representation - Ways of learning

Page 7: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Coordinated Representation DeViSE — a deep visual-semantic embedding - Similarity model

Page 8: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Challenge - 2) Translation

Page 9: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Translation

(mapping) from one modality to another

Retrieval based models

Generative

Page 10: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

How to evaluate translations

Page 11: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Challenge - 3) Alignment

1) Identify the direct relations between (sub)elements from two or more different modalities.

Page 12: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Alignment

Given an image and a caption we want to find the areas of the image corresponding to the caption’s words or phrases

Page 13: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Alignement problems

● Few datasets with explicitly annotated alignments● It is difficult to design similarity metrics between modalities● Existence of multiple possible alignments and not all elements

in one modality have correspondences in another

Page 14: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Challenge - 4) Fusion

Page 15: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Fusion

● Signals not temporarily aligned● Lack of interpretability of where is the prediction coming from

- (This is what I am working on)

Page 16: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Challenge - 5) Co-Learning

● Aiding the modeling of a (resource poor) modality by exploiting knowledge from another (resource rich) modality.

● When one modality has lack of annotated data, noisy inputs and unreliable labels.

Page 17: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Colearning - Zero Shot learningUsing text embeddings to classify unseen classes of images

Page 18: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Interpretability in Multimodal Deep Learning

Problem statement -

Not every modality has equal contribution to the prediction

Page 19: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Interpretability in Multimodal Deep Learning

Solution -

We give different weights to different modalities

Page 20: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Real data experiments - Multimodal Sentiment Analysis(MOSI Dataset)

Page 21: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

MUltimodal Sentiment Analysis

Page 22: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

P1 and P2 contribution of each modality

Page 23: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Θ

Θ

Page 24: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Modified loss function with 𝞫 weight given to each modality

Page 25: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Modified loss function with 𝞫 weight given to each modality

Let’s make 𝞫 trainable

Page 26: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Modified loss function with 𝞫 weight given to each modality

Using the Lemma

Page 27: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Final Optimization problem

Page 28: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Tensor fusion

To allow interaction between modalities - take tensor product between modalities.

Page 29: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Problem with interpretability and tensorfusion - Degree Inflation

Beta weights came to be higher for higher dimension weights - Always high for tensor fusion.

Page 30: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Degree Inflation - reason

When one modality learns constant features.

The information from the first modality can be

expressed in the tensorfusion.

Relevant source-with information

Irrelevant source-constant

Page 31: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Iterative batch normalisation

We derived a batch norm which does not allow the lower dimension information to be represented in higher dimension(TensorFusion).

This method helps to remove noise from the data and give better accuracies.

Page 32: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Synthetic data experiment

● Relevant data - Sequences of length 100 (composed of 4 letters - ATGC) with a signature which defines the label

● Irrelevant data - Random 100 length sequences

Page 33: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Example - 1 as relevant source

Page 34: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction

Thank You