gabriel kreiman ://cbmm.mit.edu/.../bmm_11aug2020_kreiman.pdf · virtual bmm 2020...

39
Recurrent computations to the rescue Scan to download papers+data+code Gabriel Kreiman http://klab.tch.harvard.edu Virtual BMM 2020 [email protected]

Upload: others

Post on 18-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

Recurrent computations to the rescue

Scan to download papers+data+code

Gabriel Kreiman

http://klab.tch.harvard.edu

Virtual BMM 2020

[email protected]

Page 2: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

What is there?

Who are they?

Where are

they?

What are they doing?

What is their relationship?

What are they looking

at?

What happened

before?

Where is Obama’s

foot

Search for all

the mirrors

How many shoes are

there?

An image is worth a million words

Page 3: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

State of the Art in Image Captioning

Kreiman and Serre, 2020Beyond the feedforward sweep

Page 4: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

Visual cognition: a sequence of routines*

1. Extract initial sensory map à Call VisualSampling

What is there?

What are they doing?

What are they looking

at?What will

happen next?

Where is Obama’s

foot?Where

are they?

Search for all

the mirrors

* Shimon Ullman. Visual Routines. 1984

Divide et impera: break down task into simpler, reusable visual routines

2. Propose image gist à Call FastPeriphAssess

3. Propose foveal objects à Call FovealRecognition

4. Inference à Call PatternCompletion

5. Working memory à Call Visual Buffer

6. Task-dependent sampling à Call EyeMovement

7. Response à Call DecisionMaking

Page 5: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

Standing on the shoulders of giants:Mesoscopic connectivity of the primate visual system

Felleman and Van Essen. Cerebral Cortex 1991

Retina

Primary visual cortex

~ Deep convolutional neural networks

Krizhevsky et al, NIPS 2012

Page 6: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

Visual cognition: a sequence of routines*

1. Extract initial sensory map à Call VisualSampling Retina

What is there?

What are they doing?

What are they looking

at?What will

happen next?

Where is Obama’s

foot?Where

are they?

Search for all

the mirrors

* Shimon Ullman. Visual Routines. 1984

Divide et impera: break down task into simpler, reusable visual routines

2. Propose image gist à Call FastPeriphAssess V1, V2, V4

3. Propose foveal objects à Call FovealRecognition ITC

4. Inference à Call PatternCompletion Ventral Stream

5. Working memory à Call Visual Buffer PFC

6. Task-dependent sampling à Call EyeMovement Dorsal Stream

7. Response à Call DecisionMaking PFC

Page 7: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

Outline

Each additional processing step takes ~10 to 15 ms

Schmolesky et al 1998

Page 8: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

Visual cognition: a sequence of routines*

1. Extract initial sensory map à Call VisualSampling Retina ~50 ms

What is there?

What are they doing?

What are they looking

at?What will

happen next?

Where is Obama’s

foot?Where

are they?

Search for all

the mirrors

* Shimon Ullman. Visual Routines. 1984

Divide et impera: break down task into simpler, reusable visual routines

2. Propose image gist à Call FastPeriphAssess V1, V2, V4 ~100 ms

3. Propose foveal objects à Call FovealRecognition ITC ~150 ms

4. Inference à Call PatternCompletion Ventral Stream ~200 ms

5. Working memory à Call Visual Buffer PFC ~300 ms

6. Task-dependent sampling à Call EyeMovement Dorsal Stream ~300 ms

7. Response à Call DecisionMaking PFC ~300-1000 ms

Page 9: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

Outline

1. Pattern completion: making inferences from partial information

2. Putting vision in context

3. Active sampling: extracting information via eye movements

Page 10: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

Outline

1. Pattern completion: making inferences from partial information

2. Putting vision in context

3. Active sampling: extracting information via eye movements

Tang et al, Neuron 2014Tang et al, PNAS 2018

Martin Schrimpf

HanlinTang

Bill Lotter

Page 11: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

Pattern completion is a hallmark of intelligence

A

1, 2, 3, 5, 7, 11,

Even though it was raining heavily, John decided to go out without an…

V_S_A_ R_C_G_I_I_N

I

13

Visual Recognition

Umbrella

Also: Reading a storySocial interactions

C E G

Page 12: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

Visual recognition of occluded objects

Page 13: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

Visual inference from partial information

Page 14: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

Strong robustness to limited visibility

Tang et al, Neuron 2014Tang et al, PNAS 2018

Robust recognition despite heavy occlusion

It “costs” ~50 ms extra to visually complete patterns

Page 15: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

Peeking inside the human brain

Page 16: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

Neural selectivity is retained despite strong occlusion

Tang et al, Neuron 2014Tang et al, PNAS 2018

Neural mechanisms of pattern completion

Neural responses are delayed by ~50 ms

Page 17: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

Rumination through horizontal connections

1. Behavior: 50 ms cost.

2. Neurophysiology: 50 ms delay

3. Neuroanatomy: slow horizontal connection

Page 18: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

Neurobiological recipe: add horizontal recurrent connections to deep convolutional networks

Page 19: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

Outline

1. Pattern completion: making inferences from partial information

2. Putting vision in context

3. Active sampling: extracting information via eye movements

Mengmi ZhangMartin Schrimpf

Zhang et al, CVPR 2020

Page 20: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

The role of context in vision: Example 1

Page 21: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

The role of context in vision: Example 2

++

Page 22: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

Investigating the role of context in visual recognition

Page 23: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

What, where, and when of contextual modulation

Page 24: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

Context enhances visual recognition

Page 25: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

Attention Predictionon context𝛼!"

�̂�!Feature Extraction

using VGG16(weights shared)

𝐼#

ℎ!

Predicted Class Labels

“pizza”

ℎ$

𝐼"

𝑎#

𝑎"

𝛼$"

Attention Predictionon object

𝛼!# 𝛼$#

LSTM1 �̂�$ LSTM2

“cake”

ℎ! ℎ$Predicted

Class Labels( '𝑧!#, '𝑧!") ( '𝑧$#, '𝑧$")

Time

CATNet: Context-aware Attention Two-Stream Network

Page 26: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

Context enhances visual recognition in the model

Page 27: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

What, where, and when of contextual modulation

Page 28: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

Outline

1. Pattern completion: making inferences from partial information

2. Putting vision in context

3. Active sampling: extracting information via eye movements

Mengmi Zhang

Zhang et al, Nature Communications 2018

Page 29: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

High-resolution fovea, low-resolution periphery

Paper

Page 30: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

We are legally blind outside the fovea

DO NOT MOVE YOUR EYES!

Page 31: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

Eye movements are critical for scene understanding

Page 32: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

Four key properties of visual search

1. Selectivity[Distinguishing target from distractors]

2. Invariance [Finding target irrespective of changes in appearance]

3. Efficiency [Rapid search, avoiding exhaustive exploration]

4. Generalization [No training required]

Waldo, Wally,CharlieWalter

Page 33: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

Active sampling during visual search

Zhang et al, 2018

Mengmi Zhang

Page 34: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

Invariant Visual Search Network (IVSN)

VGG16 (Simonyan et al 2014)Visual search circuitry: e.g. Bichot et al (2015)

Scan to download paper+code+data

Zhang et al. Nature Communications 2018

Page 35: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

Neurally inspired model captures sampling behavior

Finding any Waldo: zero-shot invariant and efficient visual search.Zhang et al, 2018

Page 36: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

Summary

Attractor-based recurrent neural networks can perform pattern completion

Top-down signals can guide eye movements during visual search

Recurrent connections and eccentricity dependence help incorporate context

Divide and Conquer Strategy: A sequence of flexible, reusable, interchangeable visual routines

Algorithm discovery: neuroscience + behavior + computational models

Scan to download papers+data+code

http://[email protected]

Page 37: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

Recurrent computations to the rescue

Scan to download papers+data+code

Gabriel Kreiman

http://klab.tch.harvard.edu

Virtual BMM 2020

[email protected]

Page 38: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

A long and exciting way forward

Microsoft Captioning Bothttps://www.captionbot.ai/

Kreiman and Serre, 2020Beyond the feedforward sweep

Page 39: Gabriel Kreiman ://cbmm.mit.edu/.../BMM_11Aug2020_Kreiman.pdf · Virtual BMM 2020 gabriel.kreiman@tch.harvard.edu. What is there? Who are they? Where are they? What are they doing?

The what, where, and when of contextual modulation