neurobiological models of instrumental conditioning

Neurobiological Models of Instrumental Conditioning

Matthew J. Crossley

Department of Psychological and Brain Sciences University of California, Santa Barbara, 93106

I. A neurobiological model of appetitive instrumental conditioning

II. Applications of model

Fast Reacquisition

Partial Reinforcement Extinction

Renewal

III. Temporal-Difference model of DA

Outline

Why Instrumental Conditioning?

• The Ashby lab bread and butter is category learning

• Information-Integration category-learning is a procedural skill

• Appetitive Instrumental Conditioning is a procedural skill

• Learned incrementally from feedback

• Model-free reinforcement learning

• Habitual control

• E.g., riding a bike or playing an instrument

• E.g., radiology

Procedural Skills

Where are the tumors?

Procedural Skills

TUMORS!

Procedural Skills Depend on the Basal Ganglia

• Basal ganglia are a collection of subcortical nuclei

• Interconnects with cortex in well defined circuits

• Striatum is a major input structure

Cortex Excites the Striatum

Striatum Inhibits the GPi

GPi Inhibits the Thalamus

High baseline firing rate

Striatum Disinhibits the Thalamus

Thalamus Excites Cortex

Dopamine Modulates Activity

Procedural Learning Depends on the Striatum

• Single-cell recordings Carelli, Wolske, & West, 1997; Merchant, Zainos, Hernadez, Salinas, & Romo, 1997; Romo, Merchant, Ruiz, Crespo, & Zainos, 1995

• Lesion studies Eacott & Gaffan, 1991; Gaffan & Eacott, 1995; Gaffan & Harrison, 1987; McDonald & White, 1993, 1994; Packard, Hirsch, & White, 1989; Packard & McGaugh, 1992

• Neuropsychological patient studies Filoteo, Maddox, & Davis, 2001; Filoteo, Maddox, Salmon, & Song, 2005; Knowlton, Mangels, & Squire, 1996

• Neuroimaging Nomura et al., 2007; Seger & Cincotta, 2002; Waldschmidt & Ashby, 2011

Striatal Neurons

Medium Spiny Projection Neurons (MSNs)

GABA Interneurons 2%

TANs - Cholinergic Interneurons 2%

The TANs are of Particular Interest

• Tonically active and pause to excitatory input

• Presynaptically inhibit cortical input to MSNs

• Get major input from CM-Pf (thalamus)

• Learn to pause to stimuli that predict reward (requires dopamine)

Fast Reacquisition

Renewal

Outline

Model Architecture

Ashby and Crossley (2011)

Learning Occurs at the CTX-MSN Synapse and at Pf-TAN Synapses

Pf-TAN Synapse

CTX-MSN Synapse

Network Dynamics: Early Trial

Network Dynamics - Early Trial

Response and Feedback

• Model responds if SMA crosses threshold

• Model is given feedback after every trial

Learning Occurs at the CTX-MSN Synapse and at Pf-TAN Synapses

Pf-TAN Synapse

CTX-MSN Synapse

CTX-MSN Synaptic Modification Requires a TANs Pause

• Synaptic Strengthening:

- Strong presynaptic activation

- Strong postsynaptic activation

- Elevated DA levels

• Synaptic Weakening:

- Depressed DA levels

Arbuthnott, Ingham, & Wickens (2000) Calabresi, Pisani, Mercuri, & Bernardi (1996) Reynolds & Wickens (2002)

Synaptic Plasticity in the Striatum Depends on Dopamine (DA)

• Synaptic Strengthening:

- Elevated DA levels

• Synaptic Weakening:

- Depressed DA levels

Arbuthnott, Ingham, & Wickens (2000) Calabresi, Pisani, Mercuri, & Bernardi (1996) Reynolds & Wickens (2002)

DA Encodes Reward Prediciton Error (RPE)

• Elevated after unexpected reward

• Depressed after unexpected no-reward

• Does nothing if anything expected happens

Bayer & Glimcher (2005)

Computing RPE

Obtained feedback on trial n:

Predicted feedback on trial n:

�1 if positive feedback0 otherwise

Pn = Pn�1 + �(Rn�1 � Pn�1)

RPE on trial n:

RPE(n) = Rn � Pn

DA Released on Trial n

DA(n) =

�⌅⇤

⌅⇥

1 if RPE > 10.8RPE + 0.2 if � 0.25 < RPE � 10 if RPE < 0.25

Updating Synapses in the Model

(n +1) = wK ,J

(n) SJ(n) #$

NMDA[ ]+D(n) #D

base[ ]+

1# wK ,J

(n)[ ]

(n) SJ(n) #$

NMDA[ ]+Dbase

#D(n)[ ]+wK ,J

# &wIK

(n) $NMDA

# SJ(n)[ ]

J(n) #$

AMPA[ ]+wK ,J

Presynaptic Activity

Synaptic Strengthening

Synaptic Weakening

(n +1) = wK ,J

(n) SJ(n) #$

NMDA[ ]+D(n) #D

base[ ]+

1# wK ,J

(n)[ ]

(n) SJ(n) #$

NMDA[ ]+Dbase

#D(n)[ ]+wK ,J

# &wIK

(n) $NMDA

# SJ(n)[ ]

J(n) #$

AMPA[ ]+wK ,J

Postsynaptic Activation

Synaptic Weakening

(n +1) = wK ,J

(n) SJ(n) #$

NMDA[ ]+D(n) #D

base[ ]+

1# wK ,J

(n)[ ]

(n) SJ(n) #$

NMDA[ ]+Dbase

#D(n)[ ]+wK ,J

# &wIK

(n) $NMDA

# SJ(n)[ ]

J(n) #$

AMPA[ ]+wK ,J

Elevated DA

Depressed DA

Synaptic Weakening

Network Dynamics: Late Trial

Network Dynamics - Late Trial

Model Accounts for Electrophysiological Recordings from TANs

Model Accounts for Electrophysiological Recordings from MSNs

Fast Reacquisition

Renewal

Outline

Fast Reacquisition

Fast reacquisition is evidence that extinction did not erase initial learning

Fast Reacquisition Mechanics

TANs quickly stop pausing, and thereby protect cortico-striatal synapses

Fast Reacquisition Mechanics

Partial Reinforcement Extinction (PRE)

Extinction is slower when acquisition is trained with partial reinforcement

PRE Mechanics

TANs take longer to stop pausing under partial reinforcement

Slowed Reacquisition

Condition

Ext2 Ext8 Prf2 Prf8

Acquisition VI-30 sec VI-30 sec VI-30 sec VI-30 sec

ExtinctionNo

ReinforcementNo

ReinforcementLean Schedule Lean Schedule

Reacquisition VI-2 min VI-8 min VI-2 min VI-8 min

Woods and Bouton (2007)

Behavioral Results

Crossley, Horvitz, Balsam, & Ashby (in prep)

Modeling Results

TANs don’t stop pausing during extinction in Prf Conditions

CTX-MSN Synapse Pf-TAN Synapse

Renewal - Basic Design

Condition

ABA AAB ABC

Acquisition Environment A Environment A Environment A

Extinction Environment B Environment A Environment B

Renewal (Extinction)

Environment A Environment B Environment C

Bouton et al. (2011)

Renewal

Model Architecture

Synaptic Plasticity at ALL Pf-TAN Synapses

Renewal

ABA Mechanics

Net Pf-TAN synaptic weight is the average of all active Pf-TAN synapses

Instrumental Conditioning Summary

• The TANs protect learning at CTX-MSN synapses.

• Manipulations that keep the TANs paused during extinction leave learning at the CTX-MSN synapse subject to change.

Untested Physiological Predictions

• Development of TANs pause precedes development of category-specific responses in MSNs

• TANs should stop pausing during extinction

Fast Reacquisition

Renewal

III. Temporal-Difference (TD) model of DA

Outline

Putting TD into the model

We want to replace the discrete-trial model of DA with a continuous

time model

The TD Prediction Error

TrialTime Step

The TD Prediction Error

⇥t = rt + �V (t+ 1)� V (t)

�1 if reward at time t

0 if no reward at time t

Montague, Dayan, Sejnowski (1996) journal of neuroscience 16(5): 1936-1947

Model Architecture

Spiking Neuron Driven by TD prediction error:

TANs were removed for initial TD applications

⇥t = rt + �V (t+ 1)� V (t)

We Need Modified Learning Equations

(n +1) = wK ,J

(n) SJ(n) #$

NMDA[ ]+D(n) #D

base[ ]+

1# wK ,J

(n)[ ]

(n) SJ(n) #$

NMDA[ ]+Dbase

#D(n)[ ]+wK ,J

# &wIK

(n) $NMDA

# SJ(n)[ ]

J(n) #$

AMPA[ ]+wK ,J

Synaptic Weakening

DA is no longer modeled on a discrete trial-by-trial basis!

A Cortico-Striatal Synapse

CaMKII, PP-1 and Striatal Plasticity

Learning Equations

w(n+ 1) = w(n)

+ �w

�[SCaMKII(t)� SCaMKII base]

+[DPP-1(t)�Dbase]+[wmax � w(n)]dt

� ⇥w

+[Dbase �DPP-1(t)]+w(n)dt

Synaptic Weakening

CaMKII Activity

Learning Equations

w(n+ 1) = w(n)

+ �w

+[DPP-1(t)�Dbase]+[wmax � w(n)]dt

� ⇥w

+[Dbase �DPP-1(t)]+w(n)dt

Synaptic Weakening

PP-1 Activity

Acquisition and Extinction

MSN and SNc

TrialTime Step

CaMKII and PP-1

DA model learns very quickly that reward is taken away

Extinction under noncontingent reward delivery

MSN and SNc

TrialTime Step

MSN and SNc

Noncontingent reward delivery keeps DA surprised

CaMKII and PP-1

Noncontingent reward delivery keeps DA surprised

Summary and Future Directions

• TANs need to be added to account for reacquisition, renewal, and other effects after extinction with noncontingent reward

• TD model might need to be modified once the TANs are included and post-extinction effects are examined

Acknowledgments Collaborators:

Greg Ashby

The Ashby Lab

Todd Maddox

Jon Horvitz

Peter Balsam

Funding:

NIMH Grant MH3760-2, Todd Wilkinson

neurobiological models of instrumental conditioning

ctxmsn synaptic modification

synaptic plasticity

activityprocedural learning

cortical input

procedural skillstumors

sma crosses threshold

major input structurecortex

basal ganglia basal

Science

brain conditioning is instrumental for successful ... ·...

table of contents chapter 6 learning. table of...

economic,neurobiological,andbehavioralperspectives ... ·...

more instrumental (operant) conditioning. b.f. skinner...

neurobiological basis of

chapter 5: instrumental conditioning: nonreward, punishment,...

class9 instrumental conditioning

learning part ii. overview habituation classical...

obesity and addiction: neurobiological...

neurobiological circuits

13 instrumental conditioning vi - princeton...

biologically plausible reinforcement learning is now called...

neurobiological bases of behavior

instrumental conditioning & kognitif larning

table of contents chapter 6 learning. table of contents...

lecture 2: classical conditioning. types of learning...

chapter 8 instrumental conditioning: learning the...

chapter 7 – instrumental conditioning: motivational...

classical conditioning and reinforcement...

chapter 7 the associative structure of instrumental...