neurobiological models of instrumental conditioning

78
Neurobiological Models of Instrumental Conditioning Matthew J. Crossley Department of Psychological and Brain Sciences University of California, Santa Barbara, 93106

Upload: matthew-crossley

Post on 18-Jul-2015

142 views

Category:

Science


2 download

TRANSCRIPT

Page 1: Neurobiological Models of Instrumental Conditioning

Neurobiological Models of Instrumental Conditioning

Matthew J. Crossley

Department of Psychological and Brain Sciences University of California, Santa Barbara, 93106

Page 2: Neurobiological Models of Instrumental Conditioning

I. A neurobiological model of appetitive instrumental conditioning

II. Applications of model

Fast Reacquisition

Partial Reinforcement Extinction

Renewal

III. Temporal-Difference model of DA

Outline

Page 3: Neurobiological Models of Instrumental Conditioning

Why Instrumental Conditioning?

• The Ashby lab bread and butter is category learning

• Information-Integration category-learning is a procedural skill

• Appetitive Instrumental Conditioning is a procedural skill

Page 4: Neurobiological Models of Instrumental Conditioning

• Learned incrementally from feedback

• Model-free reinforcement learning

• Habitual control

• E.g., riding a bike or playing an instrument

• E.g., radiology

Procedural Skills

Page 5: Neurobiological Models of Instrumental Conditioning

Procedural Skills

Where are the tumors?

Page 6: Neurobiological Models of Instrumental Conditioning

Procedural Skills

TUMORS!

Page 7: Neurobiological Models of Instrumental Conditioning

Procedural Skills Depend on the Basal Ganglia

• Basal ganglia are a collection of subcortical nuclei

• Interconnects with cortex in well defined circuits

• Striatum is a major input structure

Page 8: Neurobiological Models of Instrumental Conditioning

Cortex Excites the Striatum

Page 9: Neurobiological Models of Instrumental Conditioning

Striatum Inhibits the GPi

Page 10: Neurobiological Models of Instrumental Conditioning

GPi Inhibits the Thalamus

High baseline firing rate

Page 11: Neurobiological Models of Instrumental Conditioning

Striatum Disinhibits the Thalamus

Page 12: Neurobiological Models of Instrumental Conditioning

Thalamus Excites Cortex

Page 13: Neurobiological Models of Instrumental Conditioning

Dopamine Modulates Activity

Page 14: Neurobiological Models of Instrumental Conditioning

Procedural Learning Depends on the Striatum

• Single-cell recordings Carelli, Wolske, & West, 1997; Merchant, Zainos, Hernadez, Salinas, & Romo, 1997; Romo, Merchant, Ruiz, Crespo, & Zainos, 1995

• Lesion studies Eacott & Gaffan, 1991; Gaffan & Eacott, 1995; Gaffan & Harrison, 1987; McDonald & White, 1993, 1994; Packard, Hirsch, & White, 1989; Packard & McGaugh, 1992

• Neuropsychological patient studies Filoteo, Maddox, & Davis, 2001; Filoteo, Maddox, Salmon, & Song, 2005; Knowlton, Mangels, & Squire, 1996

• Neuroimaging Nomura et al., 2007; Seger & Cincotta, 2002; Waldschmidt & Ashby, 2011

Page 15: Neurobiological Models of Instrumental Conditioning

Striatal Neurons

Medium Spiny Projection Neurons (MSNs)

96%

GABA Interneurons 2%

TANs - Cholinergic Interneurons 2%

Page 16: Neurobiological Models of Instrumental Conditioning

The TANs are of Particular Interest

• Tonically active and pause to excitatory input

• Presynaptically inhibit cortical input to MSNs

• Get major input from CM-Pf (thalamus)

• Learn to pause to stimuli that predict reward (requires dopamine)

Page 17: Neurobiological Models of Instrumental Conditioning

I. A neurobiological model of appetitive instrumental conditioning

II. Applications of model

Fast Reacquisition

Partial Reinforcement Extinction

Renewal

III. Temporal-Difference model of DA

Outline

Page 18: Neurobiological Models of Instrumental Conditioning

Model Architecture

Ashby and Crossley (2011)

Page 19: Neurobiological Models of Instrumental Conditioning

Learning Occurs at the CTX-MSN Synapse and at Pf-TAN Synapses

Pf-TAN Synapse

CTX-MSN Synapse

Ashby and Crossley (2011)

Page 20: Neurobiological Models of Instrumental Conditioning

Network Dynamics: Early Trial

Page 21: Neurobiological Models of Instrumental Conditioning

Network Dynamics: Early Trial

Page 22: Neurobiological Models of Instrumental Conditioning

Network Dynamics - Early Trial

Page 23: Neurobiological Models of Instrumental Conditioning

Network Dynamics - Early Trial

Page 24: Neurobiological Models of Instrumental Conditioning

Network Dynamics - Early Trial

SMA

Page 25: Neurobiological Models of Instrumental Conditioning

Response and Feedback

• Model responds if SMA crosses threshold

• Model is given feedback after every trial

Page 26: Neurobiological Models of Instrumental Conditioning

Learning Occurs at the CTX-MSN Synapse and at Pf-TAN Synapses

Pf-TAN Synapse

CTX-MSN Synapse

Ashby and Crossley (2011)

Page 27: Neurobiological Models of Instrumental Conditioning

CTX-MSN Synaptic Modification Requires a TANs Pause

• Synaptic Strengthening:

- Strong presynaptic activation

- Strong postsynaptic activation

- Elevated DA levels

• Synaptic Weakening:

- Strong presynaptic activation

- Strong postsynaptic activation

- Depressed DA levels

Arbuthnott, Ingham, & Wickens (2000) Calabresi, Pisani, Mercuri, & Bernardi (1996) Reynolds & Wickens (2002)

Page 28: Neurobiological Models of Instrumental Conditioning

Synaptic Plasticity in the Striatum Depends on Dopamine (DA)

• Synaptic Strengthening:

- Strong presynaptic activation

- Strong postsynaptic activation

- Elevated DA levels

• Synaptic Weakening:

- Strong presynaptic activation

- Strong postsynaptic activation

- Depressed DA levels

Arbuthnott, Ingham, & Wickens (2000) Calabresi, Pisani, Mercuri, & Bernardi (1996) Reynolds & Wickens (2002)

Page 29: Neurobiological Models of Instrumental Conditioning

DA Encodes Reward Prediciton Error (RPE)

• Elevated after unexpected reward

• Depressed after unexpected no-reward

• Does nothing if anything expected happens

Bayer & Glimcher (2005)

Page 30: Neurobiological Models of Instrumental Conditioning

Computing RPE

Obtained feedback on trial n:

Predicted feedback on trial n:

Rn =

�1 if positive feedback0 otherwise

Pn = Pn�1 + �(Rn�1 � Pn�1)

RPE on trial n:

RPE(n) = Rn � Pn

Page 31: Neurobiological Models of Instrumental Conditioning

DA Released on Trial n

DA(n) =

�⌅⇤

⌅⇥

1 if RPE > 10.8RPE + 0.2 if � 0.25 < RPE � 10 if RPE < 0.25

Page 32: Neurobiological Models of Instrumental Conditioning

Updating Synapses in the Model

!

wK ,J

(n +1) = wK ,J

(n)

+"wIK

(n) SJ(n) #$

NMDA[ ]+D(n) #D

base[ ]+

1# wK ,J

(n)[ ]

#%wIK

(n) SJ(n) #$

NMDA[ ]+Dbase

#D(n)[ ]+wK ,J

(n)

# &wIK

(n) $NMDA

# SJ(n)[ ]

+' S

J(n) #$

AMPA[ ]+wK ,J

(n).

Presynaptic Activity

Presynaptic Activity

Synaptic Strengthening

Synaptic Weakening

Page 33: Neurobiological Models of Instrumental Conditioning

Updating Synapses in the Model

!

wK ,J

(n +1) = wK ,J

(n)

+"wIK

(n) SJ(n) #$

NMDA[ ]+D(n) #D

base[ ]+

1# wK ,J

(n)[ ]

#%wIK

(n) SJ(n) #$

NMDA[ ]+Dbase

#D(n)[ ]+wK ,J

(n)

# &wIK

(n) $NMDA

# SJ(n)[ ]

+' S

J(n) #$

AMPA[ ]+wK ,J

(n).

Postsynaptic Activation

Postsynaptic Activation

Synaptic Strengthening

Synaptic Weakening

Page 34: Neurobiological Models of Instrumental Conditioning

Updating Synapses in the Model

!

wK ,J

(n +1) = wK ,J

(n)

+"wIK

(n) SJ(n) #$

NMDA[ ]+D(n) #D

base[ ]+

1# wK ,J

(n)[ ]

#%wIK

(n) SJ(n) #$

NMDA[ ]+Dbase

#D(n)[ ]+wK ,J

(n)

# &wIK

(n) $NMDA

# SJ(n)[ ]

+' S

J(n) #$

AMPA[ ]+wK ,J

(n).

Elevated DA

Depressed DA

Synaptic Strengthening

Synaptic Weakening

Page 35: Neurobiological Models of Instrumental Conditioning

Network Dynamics: Late Trial

Page 36: Neurobiological Models of Instrumental Conditioning

Network Dynamics: Late Trial

Page 37: Neurobiological Models of Instrumental Conditioning

Network Dynamics - Late Trial

Page 38: Neurobiological Models of Instrumental Conditioning

Network Dynamics - Late Trial

Page 39: Neurobiological Models of Instrumental Conditioning

Network Dynamics - Late Trial

SMA

Page 40: Neurobiological Models of Instrumental Conditioning

Model Accounts for Electrophysiological Recordings from TANs

Ashby and Crossley (2011)

Page 41: Neurobiological Models of Instrumental Conditioning

Model Accounts for Electrophysiological Recordings from MSNs

Ashby and Crossley (2011)

Page 42: Neurobiological Models of Instrumental Conditioning

I. A neurobiological model of appetitive instrumental conditioning

II. Applications of model

Fast Reacquisition

Partial Reinforcement Extinction

Renewal

III. Temporal-Difference model of DA

Outline

Page 43: Neurobiological Models of Instrumental Conditioning

Fast Reacquisition

Ashby and Crossley (2011)

Fast reacquisition is evidence that extinction did not erase initial learning

Page 44: Neurobiological Models of Instrumental Conditioning

Fast Reacquisition Mechanics

TANs quickly stop pausing, and thereby protect cortico-striatal synapses

Page 45: Neurobiological Models of Instrumental Conditioning

Fast Reacquisition Mechanics

Page 46: Neurobiological Models of Instrumental Conditioning

Partial Reinforcement Extinction (PRE)

Extinction is slower when acquisition is trained with partial reinforcement

Page 47: Neurobiological Models of Instrumental Conditioning

PRE Mechanics

TANs take longer to stop pausing under partial reinforcement

Page 48: Neurobiological Models of Instrumental Conditioning

Slowed Reacquisition

Condition

Phase

Ext2 Ext8 Prf2 Prf8

Acquisition VI-30 sec VI-30 sec VI-30 sec VI-30 sec

ExtinctionNo

ReinforcementNo

ReinforcementLean Schedule Lean Schedule

Reacquisition VI-2 min VI-8 min VI-2 min VI-8 min

Woods and Bouton (2007)

Page 49: Neurobiological Models of Instrumental Conditioning

Behavioral Results

Crossley, Horvitz, Balsam, & Ashby (in prep)

Page 50: Neurobiological Models of Instrumental Conditioning

Modeling Results

Crossley, Horvitz, Balsam, & Ashby (in prep)

Page 51: Neurobiological Models of Instrumental Conditioning

TANs don’t stop pausing during extinction in Prf Conditions

CTX-MSN Synapse Pf-TAN Synapse

Page 52: Neurobiological Models of Instrumental Conditioning

Renewal - Basic Design

Condition

Phase

ABA AAB ABC

Acquisition Environment A Environment A Environment A

Extinction Environment B Environment A Environment B

Renewal (Extinction)

Environment A Environment B Environment C

Bouton et al. (2011)

Page 53: Neurobiological Models of Instrumental Conditioning

Renewal

Page 54: Neurobiological Models of Instrumental Conditioning

Model Architecture

Crossley, Horvitz, Balsam, & Ashby (in prep)

Page 55: Neurobiological Models of Instrumental Conditioning

Synaptic Plasticity at ALL Pf-TAN Synapses

Crossley, Horvitz, Balsam, & Ashby (in prep)

Page 56: Neurobiological Models of Instrumental Conditioning

Renewal

Crossley, Horvitz, Balsam, & Ashby (in prep)

Page 57: Neurobiological Models of Instrumental Conditioning

ABA Mechanics

Crossley, Horvitz, Balsam, & Ashby (in prep)

Net Pf-TAN synaptic weight is the average of all active Pf-TAN synapses

Page 58: Neurobiological Models of Instrumental Conditioning

Instrumental Conditioning Summary

• The TANs protect learning at CTX-MSN synapses.

• Manipulations that keep the TANs paused during extinction leave learning at the CTX-MSN synapse subject to change.

Page 59: Neurobiological Models of Instrumental Conditioning

Untested Physiological Predictions

• Development of TANs pause precedes development of category-specific responses in MSNs

• TANs should stop pausing during extinction

Page 60: Neurobiological Models of Instrumental Conditioning

I. A neurobiological model of appetitive instrumental conditioning

II. Applications of model

Fast Reacquisition

Partial Reinforcement Extinction

Renewal

III. Temporal-Difference (TD) model of DA

Outline

Page 61: Neurobiological Models of Instrumental Conditioning

Putting TD into the model

We want to replace the discrete-trial model of DA with a continuous

time model

Page 62: Neurobiological Models of Instrumental Conditioning

The TD Prediction Error

TrialTime Step

Pred

ictio

n Er

ror

Page 63: Neurobiological Models of Instrumental Conditioning

The TD Prediction Error

⇥t = rt + �V (t+ 1)� V (t)

rt =

�1 if reward at time t

0 if no reward at time t

Montague, Dayan, Sejnowski (1996) journal of neuroscience 16(5): 1936-1947

Page 64: Neurobiological Models of Instrumental Conditioning

Model Architecture

Spiking Neuron Driven by TD prediction error:

TANs were removed for initial TD applications

⇥t = rt + �V (t+ 1)� V (t)

Page 65: Neurobiological Models of Instrumental Conditioning

We Need Modified Learning Equations

!

wK ,J

(n +1) = wK ,J

(n)

+"wIK

(n) SJ(n) #$

NMDA[ ]+D(n) #D

base[ ]+

1# wK ,J

(n)[ ]

#%wIK

(n) SJ(n) #$

NMDA[ ]+Dbase

#D(n)[ ]+wK ,J

(n)

# &wIK

(n) $NMDA

# SJ(n)[ ]

+' S

J(n) #$

AMPA[ ]+wK ,J

(n).

Synaptic Strengthening

Synaptic Weakening

DA is no longer modeled on a discrete trial-by-trial basis!

Page 66: Neurobiological Models of Instrumental Conditioning

A Cortico-Striatal Synapse

Page 67: Neurobiological Models of Instrumental Conditioning

CaMKII, PP-1 and Striatal Plasticity

Page 68: Neurobiological Models of Instrumental Conditioning

Learning Equations

w(n+ 1) = w(n)

+ �w

�[SCaMKII(t)� SCaMKII base]

+[DPP-1(t)�Dbase]+[wmax � w(n)]dt

� ⇥w

�[SCaMKII(t)� SCaMKII base]

+[Dbase �DPP-1(t)]+w(n)dt

Synaptic Strengthening

Synaptic Weakening

CaMKII Activity

CaMKII Activity

Page 69: Neurobiological Models of Instrumental Conditioning

Learning Equations

w(n+ 1) = w(n)

+ �w

�[SCaMKII(t)� SCaMKII base]

+[DPP-1(t)�Dbase]+[wmax � w(n)]dt

� ⇥w

�[SCaMKII(t)� SCaMKII base]

+[Dbase �DPP-1(t)]+w(n)dt

Synaptic Strengthening

Synaptic Weakening

PP-1 Activity

PP-1 Activity

Page 70: Neurobiological Models of Instrumental Conditioning

Acquisition and Extinction

Trial

Prop

ortio

n R

espo

nses

Em

itted

Trial

CT

X-M

SN S

ynap

tic S

tren

gth

Page 71: Neurobiological Models of Instrumental Conditioning

MSN and SNc

TrialTime Step

TrialTime Step

MSN

Out

put

SNc

Out

put

Page 72: Neurobiological Models of Instrumental Conditioning

CaMKII and PP-1

DA model learns very quickly that reward is taken away

Trial

Tim

e St

ep

Trial

Tim

e St

ep

Page 73: Neurobiological Models of Instrumental Conditioning

Extinction under noncontingent reward delivery

Trial

Prop

ortio

n R

espo

nses

Em

itted

Trial

CT

X-M

SN S

ynap

tic S

tren

gth

Page 74: Neurobiological Models of Instrumental Conditioning

MSN and SNc

TrialTime Step

MSN

Out

put

TrialTime Step

SNc

Out

put

Page 75: Neurobiological Models of Instrumental Conditioning

MSN and SNc

Noncontingent reward delivery keeps DA surprised

Trial

Tim

e St

ep

Trial

Tim

e St

ep

Page 76: Neurobiological Models of Instrumental Conditioning

CaMKII and PP-1

Noncontingent reward delivery keeps DA surprised

Trial

Tim

e St

ep

Trial

Tim

e St

ep

Page 77: Neurobiological Models of Instrumental Conditioning

Summary and Future Directions

• TANs need to be added to account for reacquisition, renewal, and other effects after extinction with noncontingent reward

• TD model might need to be modified once the TANs are included and post-extinction effects are examined

Page 78: Neurobiological Models of Instrumental Conditioning

Acknowledgments Collaborators:

Greg Ashby

The Ashby Lab

Todd Maddox

Jon Horvitz

Peter Balsam

!

Funding:

NIMH Grant MH3760-2, Todd Wilkinson