why consider probabilistic models? computational reasons

80
1 Probabilistic Models of Cortical Computation Rajesh P. N. Rao Dept. of Computer Sci. and Engineering & Neurobio. and Behavior Program University of Washington Seattle, WA Lab website: http:// neural.cs.washington.edu November, 2004 Funding: Sloan Foundation, Packard Foundation, ONR, and NSF

Upload: tracey

Post on 22-Jan-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Probabilistic Models of Cortical Computation Rajesh P. N. Rao Dept. of Computer Sci. and Engineering & Neurobio. and Behavior Program University of Washington Seattle, WA Lab website: http://neural.cs.washington.edu November, 2004 Funding: Sloan Foundation, Packard Foundation, ONR, and NSF. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Why Consider Probabilistic Models? Computational Reasons

1

Probabilistic Models of Cortical Computation

Rajesh P. N. RaoDept. of Computer Sci. and Engineering &

Neurobio. and Behavior ProgramUniversity of Washington

Seattle, WA Lab website: http://neural.cs.washington.edu

November, 2004

Funding: Sloan Foundation, Packard Foundation, ONR, and NSF

Page 2: Why Consider Probabilistic Models? Computational Reasons

2

Why Consider Probabilistic Models?Computational Reasons

Sensory measurements are typically ambiguous E.g. Projection from 3D to 2D in vision

Biological sensors and processing elements are noisy

Animal’s knowledge of the world is usually incomplete

There appears to be a need to be able to represent, learn, and reason about probabilities

Page 3: Why Consider Probabilistic Models? Computational Reasons

3

Example 1: Ambiguity of Stimuli

Is it an oval-shaped or a circular object?

Retinal Image

Eye

Eye

Page 4: Why Consider Probabilistic Models? Computational Reasons

4

Bayesian Model: The Likelihood Function

(From Geisler & Kersten, 2002)

Retinal Image I

Likelihood = P(I | Slant, Aspect ratio)

Page 5: Why Consider Probabilistic Models? Computational Reasons

5

Bayesian Model: The Posterior

(From Geisler & Kersten, 2002)

Posterior = Likelihood Prior k

(k = normalization constant)

Page 6: Why Consider Probabilistic Models? Computational Reasons

6

What isthis imagedepicting?

Example 2: Noise and Incomplete Knowledge

Page 7: Why Consider Probabilistic Models? Computational Reasons

7

Bayesian Model

LikelihoodP(I | )

Prior probabilityP()

PosteriorProbabilityP( | I) = P(I | )P()/P(I)

Input Image

dog …street

???(Bayesian decision)

sample

Okinawabeach

Page 8: Why Consider Probabilistic Models? Computational Reasons

8

Bayesian Model with “Top-Down” Bias

LikelihoodP(I | )

Prior probabilityP()

PosteriorProbabilityP( | I) = P(I | )P()/P(I)

Input Image

dog

“Dog”(Bayesian decision)

sample

dog …street Okinawabeach

Page 9: Why Consider Probabilistic Models? Computational Reasons

9

Psychophysical Evidence for Bayesian Perception

Motion from cast shadows (Kersten et al., 1996)

Surface perception based on texture (Knill, 1998)

Inferring 3D shape from 2D images (Mamassian et al., 2002)

Color perception (Bloj et al., 1999)

Cue combination for depth perception (Jacobs, 2002)

Motion illusions (Weiss et al., 2002)

Motor Control (Körding and Wolpert, 2004)

Page 10: Why Consider Probabilistic Models? Computational Reasons

10

Other Results: Contextual Modulation in V1

(Zipser et al., 1996 )

Page 11: Why Consider Probabilistic Models? Computational Reasons

11

Attentional Modulation in V2 and V4

(Reynolds et al., 1999)

Page 12: Why Consider Probabilistic Models? Computational Reasons

12

Decision Neurons in Areas LIP and FEF

t (ms)

(Roitman and Shadlen, 2002)

Page 13: Why Consider Probabilistic Models? Computational Reasons

13

Rev. Thomas Bayes (1702-1761)

Can a network of neurons perform Bayesian inference?

• How is prior knowledge about the world (prior probabilities and likelihoods) stored in a network?

• How are posterior probabilities of states computed?

Page 14: Why Consider Probabilistic Models? Computational Reasons

14

Generative Models for Bayesian Inference

Fundamental Idea: Inputs received by an organism are caused by external “states” of the world (hidden “causes”)

Goal: Estimate the probability of these causes (or states or “interpretations”) based on the inputs received thus far

Page 15: Why Consider Probabilistic Models? Computational Reasons

15

Example: Linear Generative Models

Page 16: Why Consider Probabilistic Models? Computational Reasons

16

Linear Generative Model

Spatial Generative Model:I(t) = Ur(t) + n(t)

r(t) = representation vector, n = zero mean Gaussian white noise with covariance

Temporal Dynamics for Time-Varying Processes:r(t) = Vr(t-1) + m(t-1)

V = transition matrix, m = zero mean Gaussian white noise with covariance m

Goal: Find optimal representation vector r(t) given inputs I(t), I(t-1), …, I(1).

Page 17: Why Consider Probabilistic Models? Computational Reasons

17

Optimization Functions

Find optimal r(t) by Minimizing Prediction Errors for all t:

= mean of r before measurement of I

Generalize to Weighted Least Squares Function:

M = covariance before measurement of I

rrrrrIrI

rrrI

TT

i

ii

i

ii

UU

UE22

1

rrrrrIrI 11 MUUE TT

r

Page 18: Why Consider Probabilistic Models? Computational Reasons

18

Minimizing E = Maximizing Posterior Probability

Minimizing E is equivalent to Maximizing log P(r|I) which is equivalent to Maximizing Posterior Probability P(r|I)

kE

kMUU

PPPPTT

)(log)(log)|(log)|(log11 rrrrrIrI

IrrIIr

Page 19: Why Consider Probabilistic Models? Computational Reasons

19

Optimal Estimation and Kalman Filtering

Setting dE/dr = 0 and solving for the optimal r yields the Kalman Filter:

K(t) = “Kalman gain” matrix = N(t)UT-1

N(t) = covariance of r after measurement of I(t) = (UT-1U + M(t) -1) -1

M(t) = VN(t-1)VT + m

)1()(

)()()()()(

tVt

tUttKtt

rr

rIrr

Page 20: Why Consider Probabilistic Models? Computational Reasons

20

A Simplified Kalman Filter

If is diagonal and equal to , K(t) = (N(t)/ )UT = G(t)UT

Kalman filter equation is of the form:

New Estimate = Prediction + Gain x Prediction Error

UT = Feedforward Matrix

U = Feedback Matrix

V = Recurrent Matrix (Lateral Connections)

)1()( Prediction

)()()()()(

tVt

tUtUtGtt T

rr

rIrr

Page 21: Why Consider Probabilistic Models? Computational Reasons

21

Neural Implementation via Predictive Coding

(Rao & Ballard, 1997,1999; Rao, 1999)

Predictive Coding Model:Feedback = PredictionFeedforward = Prediction Error

Page 22: Why Consider Probabilistic Models? Computational Reasons

22

Clues from Cortical Anatomy

HigherArea

LowerArea

Page 23: Why Consider Probabilistic Models? Computational Reasons

23

Hierarchical Organization of the Visual Cortex

Lower

Higher

Page 24: Why Consider Probabilistic Models? Computational Reasons

24

Hierarchical Generative Model (Rao & Ballard, 1999)

Original Generative Model:I = Ur + n

Hierarchical Generalization:r = Uhrh+ nh

rh = representation at a higher level

With Temporal Dynamics:r(t) = Vr(t-1) + Uhrh(t-1) + m(t-1)

Can derive Kalman filter equations for each levelYields a Hierarchical Model for Predictive Coding

r

I

rh

Page 25: Why Consider Probabilistic Models? Computational Reasons

25

Hierarchical Predictive Coding Model

I

I

rI U

(Rao & Ballard, 1997,1999)

= Uh rh

Page 26: Why Consider Probabilistic Models? Computational Reasons

26

The Predictive Coding Hypothesis

Feedback connections from higher areas convey predictions of expected activity in lower areas

Feedforward connections convey the errors between actual and predicted responses

Model Prediction

Since feedforward connections to higher areas originate from layer 2+3, responses of layer 2+3 neurons should be

interpretable as prediction errors

Page 27: Why Consider Probabilistic Models? Computational Reasons

27

Results from the Classic Studies of Hubel and Weisel (1960s)

Page 28: Why Consider Probabilistic Models? Computational Reasons

28

“Endstopping” in Cortical Neurons

Page 29: Why Consider Probabilistic Models? Computational Reasons

29

Contextual Modulation in Visual Cortex

(Zipser et al., 1996 )

Page 30: Why Consider Probabilistic Models? Computational Reasons

30

Example Network for Predictive Coding

Page 31: Why Consider Probabilistic Models? Computational Reasons

31

Natural Images used for Training

Page 32: Why Consider Probabilistic Models? Computational Reasons

32

Synaptic Weights after Learning

Page 33: Why Consider Probabilistic Models? Computational Reasons

33

Endstopping as a Predictive Error Signal

Page 34: Why Consider Probabilistic Models? Computational Reasons

34

Comparison with Layer 2+3 Cortical Neuron

Page 35: Why Consider Probabilistic Models? Computational Reasons

35

Why Does

Endstopping

Occur in the

Model?

Orientation-

Dependent

Correlations

in Natural

Images

Page 36: Why Consider Probabilistic Models? Computational Reasons

36

Other Contextual Effects in the Model

Page 37: Why Consider Probabilistic Models? Computational Reasons

37

Support for

Predictive

Coding

from an

Imaging

Study

(Murray et al., 2002)

Page 38: Why Consider Probabilistic Models? Computational Reasons

38

Predictive Coding in the Retina

From:Nicholls et al., 1992

Response of a retinal ganglion cell can be interpreted as the difference (error) between center pixel values and their prediction based on surrounding pixels (Srinivasan et al., 1982)

+- -

-+ +

Receptive Fields

On-center off-surround

Off-center on-surround

Page 39: Why Consider Probabilistic Models? Computational Reasons

39

Predictive Coding in the LGN

Temporal Receptive Field of LGN X-cell

From:Dan et al., 1996

LGN cell responses

Response of LGN cell can be interpreted as the difference (error) between current pixel values and their prediction based on past pixel values

Page 40: Why Consider Probabilistic Models? Computational Reasons

40

Summary for Part I

Computational and experimental studies point to the need for probabilistic models of brain function

Probabilistic models typically rely on generative models of sensory (and motor) processes

We examined a simple linear generative model and its hierarchical generalizationBayesian inference via Kalman filteringNeural implementation allows Hierarchical Predictive Coding

Feedback connections convey predictionsFeedforward connections convey errors in prediction

Hierarchical predictive coding explains endstopping and other contextual surround effects based on natural image statistics

Page 41: Why Consider Probabilistic Models? Computational Reasons

41

Break

Questions to Ponder over:

1. Can we go beyond linear generative models and Gaussian distributions?

2. Can a neural population encode an entire probability distribution rather than simply

the mean or mode?

Page 42: Why Consider Probabilistic Models? Computational Reasons

42

Generative Models II: Graphical Models

Graphical models depict the generative process as a graphNodes denote random variables (states)Edges denote dependencies

Example: If states are continuous, linear generative model: I = Ur + n

),;()|( rIrI UNP

r

I

Earthquake Burglar

Radio Alarm

Page 43: Why Consider Probabilistic Models? Computational Reasons

43

Continuous versus Discrete States

1 i M

UnimodalE.g. Normal N(;,)

Multimodal

Discrete Approximation

Discrete States

Page 44: Why Consider Probabilistic Models? Computational Reasons

44

The Belief Propagation Algorithm

If states are discrete, probabilities of random variables can be calculated through “belief propagation” (Pearl, 1988): Each node j sends a “message” (probability

density) to every neighbor iMessage to neighbor i depends on messages

received from all other neighbors

ijkj XXNX

jkjjjjix

ijiji xmxxxxm\)(

)()(),()(

Earthquake Burglar

Radio Alarm

Page 45: Why Consider Probabilistic Models? Computational Reasons

45

An Example: Hidden Markov Models (HMMs)

A Simple but Powerful Graphical Model for Temporal Data:Observed world can be in one of M states 1, 2, …, M

The state t at time step t depends only on previous state t-1 and is given by the probabilities:

P(t = i | t-1 = j ) (or for convenience)

The input It at time t is given by P(It | t = j )

)|( 1tj

tiP

It-2

t-2

It-1

t-1

It

t

Graphical Model for a HMM

State

Input

Page 46: Why Consider Probabilistic Models? Computational Reasons

46

Inference in HMMs

),,|()|(),,|,( 1111 IIIIII tti

tittt

ti PPP

),,|,() |()|( 12111 IIII

tttj

tj

j

ti

tit PPP

Likelihood of i at time t Prediction for i at time t

It-2

t-2

It-1

t-1

It

tState

Input

Page 47: Why Consider Probabilistic Models? Computational Reasons

47

Equivalence to Belief Propagation for HMMs

Equivalent to on-line

(“forward”) belief

propagation through

time

ijkj XXNX

jkjjjjix

ijiji xmxxxxm\)(

)()(),()(

It-2

t-2

It-1

t-1

It

tState

Input

)|() |( ,111, tit

ttj

tj

j

ti

tti PmPm I

Page 48: Why Consider Probabilistic Models? Computational Reasons

48

Can a network of neurons perform this computation?

ttj

tj

j

ti

tit

tti mPPm ,111, ) |()|( I

Page 49: Why Consider Probabilistic Models? Computational Reasons

49

Recurrent Network Model

vIvv

RW dt

d

Synaptic weights

Input I

Leaky Integrator Equation for Output Firing Rate v

Output Decay Input Feedback

R

Page 50: Why Consider Probabilistic Models? Computational Reasons

50

Discrete Implementation

jjijtii

jjijtiiii

tvrtv

tvRtvtvtv

)()1( i.e.

))()(()()1(

Iw

Iw

New activity Input Prior Activity

Input I

R

Page 51: Why Consider Probabilistic Models? Computational Reasons

51

Can this equation implement Belief Propagation for HMMs?

?

j

jijtii tvrtv )()1( Iw

ttj

tj

j

ti

tit

tti mPPm ,111, ) |()|( I

Page 52: Why Consider Probabilistic Models? Computational Reasons

52

Consider Belief Propagation in Log Domain

j

jijtii tvrtv )()1( IwEquation for a recurrent network:

log) |(log)|(loglog ,111,

tt

jtj

j

ti

tit

tti mPPm I

Page 53: Why Consider Probabilistic Models? Computational Reasons

53

Bayesian Inference in a Recurrent Network

Network can perform Bayesian inference using:

)()|(log

)|(log

,11

jjij

ttj

tj

j

ti

titit

tvrmP

P

IwI

log)()1( and j

jijtii tvrtv Iw

log posterior log likelihood log prior normalization log 1, tt

im

Page 54: Why Consider Probabilistic Models? Computational Reasons

54

Example 1: Orientation Discrimination Task

Feedforward weights wi (= F(i)): A set of 36 oriented filters spanning orientations i = 0°, 5°, 10°, …, 175°

Transition Probabilities = 1 if i = j, 0 otherwise

Input images = oriented edge plus additive Gaussian noise

)|( 1tj

tiP

t = 1 t = 2 t = 3 t = 4 t = 5 t = 6

Page 55: Why Consider Probabilistic Models? Computational Reasons

55

Demo: Orientation Discrimination

Input Image Sequence

Log likelihood computed from Feedforward Weights

Posterior computed by the Network over time

Orientation Estimation: Pick the preferred orientation of neuron with maximum response Maximum a Posteriori (MAP) Estimation

Neurons

Res

pons

e

Page 56: Why Consider Probabilistic Models? Computational Reasons

56

Example 2: Motion Detection Task

• The Task: Guess the direction of motion of the coherently moving dots (UP/DOWN or LEFT/RIGHT)

Coherence of dots controls task difficulty Widely used to study decision making in humans and monkeys (E.g. (Shadlen and Newsome, 2001))

Example Stimuli:5% coherence50% coherence

Page 57: Why Consider Probabilistic Models? Computational Reasons

57

Network for Motion Detection

Let ij encode (stimulus location i, motion direction j)

We can create a network for detection of 1D motion direction by selecting appropriate transition probabilities P(ij | kl)

P(iR | kR)

P(kL | jL)Rightward selective

Leftward selective

Input image

F(i)

Page 58: Why Consider Probabilistic Models? Computational Reasons

58

Feedforward Weights

Spatial Location

F(1) F(2) … F(15)

Page 59: Why Consider Probabilistic Models? Computational Reasons

59

Recurrent Weights

t-1

Transition Probabilities Recurrent Weights

jj

ji

jjijij xPxmm )|(loglogsuch that chosen weightsRecurrent

From Neuron j

To

Neu

ron

i

Rightward Leftward

t

Page 60: Why Consider Probabilistic Models? Computational Reasons

60

Network Output for Moving Inputs

Rightward Moving Input Leftward Moving Input

Right selective neurons

Left selective neurons

))|(log( slikelihood log bIP tit

Posterior log

Posterior

Right selective neurons

Left selective neurons

Page 61: Why Consider Probabilistic Models? Computational Reasons

61

Solving the Random Dots Task

Neurons in the network compute log posterior probabilities:

Random dots task: Need to decide whether majority of dots are moving Left or Right

Compute posterior probability of L and R by summing over all locations xi (marginalize over xi)

),,|,(log and ),,|,(log 11 IIRxPIILxP titi

itit

itit

IIRxPIIRP

IILxPIILP

),,|,( ),,|(

),,|,(),,|(

11

11

L R

Page 62: Why Consider Probabilistic Models? Computational Reasons

62

Probabilistic Motion Detection in a Model Network

Demo 1: Activities in a model network for noisy motion Activities represent posterior probabilities of left/rightward motion

Demo 2: Activities of model “decision” neurons Decision neurons sum up log posterior probabilities over time Solid line = Leftward motion, Dotted line = Rightward motion

Demo 3: Effect of making the stimulus more noisy Longer decision times for noisier stimuli

Page 63: Why Consider Probabilistic Models? Computational Reasons

63

Reaction Time depends on Coherency

Rate of evidence accumulation depends on stimulus coherency

Reaction Time(decision making time)

Shorter reaction times for more coherent stimuli

40% coherency 60% coherency 80% coherency

Page 64: Why Consider Probabilistic Models? Computational Reasons

64

Two Brain Areas involved in Visual Decision Making

Page 65: Why Consider Probabilistic Models? Computational Reasons

65

“Decision Neurons” in cortical area LIP

Monkey deciding direction of motion in random dots task

Plot shows average response in LIP to stimuli with different noise levels

Model neuron responses resemble LIP activities

Slower rise to threshold for noisier stimuli

t (ms)

(Roitman and Shadlen, 2002)

Page 66: Why Consider Probabilistic Models? Computational Reasons

66

“Decision” Neurons in Frontal Cortex

Monkey making an eye movement to an “odd-ball” target among a field of distractors

Monkey’s reaction time distribution can be predicted from threshold crossings!

Data from (Schall & Thompson, 1999)

Page 67: Why Consider Probabilistic Models? Computational Reasons

67

Distribution of Reaction Times in the Model

0 100 200

Fre

quen

cy

Reaction Times (number of time steps)

60 % Coherence 90 % Coherence

0 40 80

Page 68: Why Consider Probabilistic Models? Computational Reasons

68

What if we increase the prior for Leftward motion?

Higher prior for L

dL

(Based on www.physiol.cam.ac.uk/staff/carpente/recinormal.htm)

Page 69: Why Consider Probabilistic Models? Computational Reasons

69

Model Prediction: Increasing Prior for Left Motion

0 50 100 150 200

Fre

quen

cy

Reaction Times (number of time steps)

Left/Right equal probabilities Left more probable than Right

0 20 40 60 80 100

Distribution shifts –Shorter reaction times

for Left trials

60% coherence 60% coherence

Page 70: Why Consider Probabilistic Models? Computational Reasons

70

What if speed is more important than accuracy?

Lower threshold

for making faster

decisions dL

(Based on www.physiol.cam.ac.uk/staff/carpente/recinormal.htm)

Page 71: Why Consider Probabilistic Models? Computational Reasons

71

Model Prediction: Imposing an “Urgency” Constraint

0 50 100 150 200

Fre

quen

cy

Reaction Times (number of time steps)

Decision Threshold = T Decision Threshold = T/2

0 20 40 60 80 100

T= 0.03 T= 0.015

Distribution shifts – Shorter reaction times

Page 72: Why Consider Probabilistic Models? Computational Reasons

72

What about Spikes?

Recall the leaky integrator equation:

Assume vi is linearly related to the membrane potential of neuron i as follows:

For the standard integrate-and-fire model with additive noise, one can show (Plesser & Gerstner, 2000; Gerstner, 2000):

)()( j

jijtiii tvRtv

dt

dvIw

miV

TvkV im

i

ti

tti

tvkTtVmii meetVtspikeP i

mi

ofy ProbabilitPosterior

)1(|)1(( 1,)1(/))1((

Page 73: Why Consider Probabilistic Models? Computational Reasons

73

Example

Membrane Potential (log posterior)

SampledSpikes

PostsynapticMembrane Potential(decoded log posterior)

))1(),...,1(|)(,(log

)1(

III

ttP

tVti

mi

kTtV

mii

mie

tVtspikeP/))1((

)1(|)1((

Recipient neuron withalpha synapse

Page 74: Why Consider Probabilistic Models? Computational Reasons

74

What about Top-Down Information?

Hypothesis: Top-down

priors influence

lower-level probability estimates

Page 75: Why Consider Probabilistic Models? Computational Reasons

75

Probabilistic Graphical ModelHierarchical Network

• Top-down feedback conveys prior probability for spatial locations

• Posterior probability at lower level computed from prior & image

Hierarchical Belief Propagation in Cortical Networks

(Rao, NIPS, 2004)

Page 76: Why Consider Probabilistic Models? Computational Reasons

76

Attention can restore V4 responses in the presence of distractors (Reynolds et al., 1999)

Reference stimulus only

Reference and probe (No Attention)

Reference and probe (with Attention)

Example: Modeling Spatial Attention in V4

Page 77: Why Consider Probabilistic Models? Computational Reasons

77

Attentional Restoration of Responses in the Model

Reference only Ref. and probe Ref. and probe with attention

(Rao, NIPS, 2004)

Page 78: Why Consider Probabilistic Models? Computational Reasons

78

Related Work on Probabilistic Models

Linear Generative Model Sparse Coding Models (Oshausen & Field, 1996; 1997) ICA (Bell & Sejnowski, 1997)

Hierarchical ModelMacKay, 1956; Mumford, 1992; Kawato et al., 1993; Dayan et

al., 1995; Lee & Mumford, 2003; Friston, 2003; Hawkins, 2004

Encoding Uncertainty and Belief Propagation with NeuronsAnderson & Van Essen, 1994; Zemel et al., 1998; Pouget et al.,

2000; Deneve, NIPS, 2004; Yu & Dayan, NIPS, 2004; Zemel et al., NIPS, 2004

Page 79: Why Consider Probabilistic Models? Computational Reasons

79

Summary and Conclusions (“Posterior” for this lecture)

There is growing evidence that the brain utilizes probabilistic principles such as Bayesian inference

This lecture explored two neural models for Bayesian inference: Predictive Coding: Feedback connections convey predictions

while feedforward connections carry errorsBelief Propagation: The membrane potential encodes log

posterior probability via belief propagation; spiking probability is equal to posterior probability of the state encoded by the neuron

Some broad predictions of the models:Cortical architecture implements a graphical model of the sensory

(and motor) environmentCortical networks perform hierarchical Bayesian inferenceCorticocortical feedback conveys predictions or prior probabilities

Page 80: Why Consider Probabilistic Models? Computational Reasons

80

(http://employees.csbsju.edu/tcreed/pb/pdoganim.html)

Open Problems: Synaptic Plasticity: Role of STDP and short-term plasticity in

Bayesian models Neural Implementation of Sensorimotor Bayesian models Incorporating rewards (Pavlovian conditioning, etc.)…

Future Directions