instrumental conditioning: motivational mechanisms

Instrumental Conditioning: Motivational Mechanisms

Contingency-Shaped Behaviour

• Uses three-term contingency• Reinforcement schedule (e.g., FR10)

imposes contingency• Seen in non-humans and humans

Rule Governed Behaviour

• Particularly in humans• Behaviour can be varied and

unpredictable• Invent rules or use (in)appropriate rules

across conditions (e.g., language)• Age-dependent, primary vs. secondary

reinforcers, experience

Role of Response in Operant Conditioning

• Thorndike– Performance of response necessary

• Tolman– Formation of expectation

• McNamara, Long & Wike (1956)– Maze– Running rats or riding rats (cart)– Association what is needed

Role of the Reinforcer

• Is reinforcement necessary for operant conditioning?

• Tolman & Honzik (1930)• Latent learning

– Not necessary for learning– Necessary for performance

Results

Day 11

Ave

rage

Err

ors

Days

food

no food

no food until day 11

Associative Structure in Instrumental Conditioning

• Basic forms of association– S = stimulus, R = response, O = outcome

• S-R• Thorndike, Law of Effect• Role of reinforcer: stamps in S-R

association• No R-O association acquired

Hull and Spence

• Law of Effect, plus a classical conditioning process

• Stimulus evokes response via Thorndike’s S-R association

• Also, S-O association creates expectancy of reward

• Two-process approach– Classical and instrumental are different

One-Process or Two-Processes?

• Are instrumental and classical the same (one process) or different (two processes)?

• Omission control procedure– US presentation depends on non-

occurrence of CR– No CR, then CS ---> US– CR, then CS ---> no US

Omission Control

CS

US

CR

Trial with a CR

CS

US

CR

Trial without a CR

Gormenzano & Coleman (1973)

• Eyeblink with rabbits• US=shock, CS=tone• Classical group: 5mA shock each trial,

regardless of response• Omission group: making eyeblink CR to

CS prevents delivery of US

• One-process prediction:– CR acquisition faster and stronger for

Omission group– Reinforcement for CR is shock avoidance– In Classical group CR will be present

because it somehow reduces shock aversiveness

• BUT…– CR acquisition slower in Omission group– Classical conditioning extinction (not all

CSs followed by US)• Supports Two-process theory

Classical in Instrumental

• Classical conditioning process provides motivation

• Stimulus substitution• S acquires properties of O

– rg = fractional anticipatory goal response• Response leads to feedback

– sg = sensory feedback• rg-sg constitutes expectancy of reward

Timecourse

S

R O

rg - sg

Through stimulus substitution S elicits rg-sg, giving motivational expectation of reward

Prediction

• According to rg-sg CR should occur before operant response; but doesn’t always

• Dog lever pressing on FR33 ---> PRP

• Low lever presses early, then higher; but salivation only later

Lever pressing

salivation

Time from start of trialM

agni

tude

Modern Two-Process Theory

• Classical conditioning in instrumental• Neutral stimulus ---> elicits motivation• Central Emotional State (CES)• CES is a characteristic of the nervous

system (“mood”)• CES won’t produce only one response

– Bit annoying re: prediction of effect

Prediction

• Rate of operant response modified by presentation of CS

• CES develops to motivate operant response• CS from classical conditioning also elicits

CES• Therefore, giving CS during instrumental

conditioning should alter CES that motivates instrumental response

“Explicit” Predictions

• Emotional states

USCS Appetitive Aversive

(e.g., food) (e.g., shock)CS+ Hope FearCS- Disappointment Relief

• Behavioural predictions

Aversive USInstrumental schedule CS+(fear) CS-(relief)

Positive reinforcement decrease increaseNegative reinforcement increase decrease

R-O and S(R-O)

• Earlier interpretations had no response-reinforcement associations

• Intuitive explanation, though• Perform response to get reinforcer

Colwill & Rescorla (1986)

• R-O association• Devalue reinforcer

post-conditioning• Does operant

response decrease?• Bar push right or left

for different reinforcers– Food or sucrose

devalued reinforcer

normal reinforcer

Mea

n re

spon

ses/

min

.Blocks of Ext. Trials

Testing of Reinforcers

Interpretation

• Can’t be S-R– No reinforcer in this model

• Can’t be S-O– Two responses, same stimuli (the bar), but

only one response affected• Conclusion

– Each response associated with its own reinforcer

– R-O association

Hierarchical S-(R-O)

• R-O model lacks stimulus component• Stimulus required to activate

association• Really, Skinner’s (1938) three term

contingency• Old idea; recent empirical testing

Colwill & Delameter (1995)

• Rats trained on pairs of S+• Biconditional discrimination problem

– Two stimuli– Two responses– One reinforcer

• Match the correct response to the stimuli to be reinforced

• Training, reinforcer devaluation, testing

• Training– Tone: lever --> food; chain --> nothing– Noise: chain --> food; lever --> nothing– Light: poke --> sucrose; handle --> nothing– Flash: handle --> sucrose; poke --> nothing

• Aversion conditioning• Testing: marked reduction in previously

reinforced response– Tone: lever press vs. chain– Noise: chain vs. lever– Light: poke vs. handle– Flash: handle vs. poke

Analysis

• Can’t be S-O– Each stimulus associated with same reinforcer

• Can’t be R-O– Each response reinforced with same outcome

• Can’t be S-R– Due to devaluation of outcome

• Each S activates a corresponding R-O association

Reinforcer Prediction, A Priori

• Simple definition– A stimulus that increases the future

probability of a behaviour– Circular explanation

• Would be nice if we could predict beforehand

Need Reduction Approach

• Primary reinforcers reduce biological needs

• Biological needs: e.g., food, water• Not biological needs: e.g., sex,

saccharin• Undetectable biological needs: e.g.,

trace elements, vitamins

Drive Reduction

• Clark Hull• Homeostasis

– Drive systems• Strong stimuli aversive• Reduction in stimulation is reinforcer

– Drive is reduced• Problems

– Objective measurement of stimulus intensity– Where stimulation doesn’t change or increases!

Trans-situationality

• A stimulus that is a reinforcer in one situation will be a reinforcer in others

• Subsets of behaviour– Reinforcing behaviours– Reinforcable behaviours

• Often works with primary reinforcers• Problems with other stimuli

Primary and Incentive Motivation

• Where does motivation to respond come from?

• Primary: biological drive state• Incentive: from reinforcer itself

But… Consider:

• What if we treat a reinforcer not as a stimulus or an event, but as a behaviour in and of itself

• Fred Sheffield (1950s)• Consummatory-response theory

– E.g., not the food, but the eating of food that is the reinforcer

– E.g., saccharin has no nutritional value, can’t reduce drive, but is reinforcing due to its consumability

Premack’s Principle

• Reinforcing responses occur more than the responses they reinforce

• H = high probability behaviour• L = low probability behaviour• If L ---> H, then H reinforces L• But, if H ---> L, H does not reinforce L• “Differential probability principle”• No fundamental distinction between

reinforcers and operant responses

Premack (1965)• Two alternatives

– Eat candy, play pinball– Phase I: determine individual behaviour

probability (baseline)• Gr1: pinball (operant) to eat (reinforcer)• Gr2: eating candy (operant) to play pinball

(reinforcer)– Phase II (testing)

• T1: play pinball (operant) to eat (reinforcer)– Only Gr1 kids increased operant

• T2: eat (operant) to play pinball (reinforcer)– Only Gr2 kids increased operant

Premack in Brief

Any activity…

…could be a reinforcer

… if it is more likely to be “preferred” than the operant response.

Response Deprivation Hypothesis

• Restriction to reinforcer response• Theory:

– Impose response deprivation– Now, low probability responses can reinforce high

probability responses• Instrumental procedures withhold reinforcer

until response made; in essence, deprived of access to reinforcer

• Reinforcer produced by operant contingency itself

Behavioural Regulation

• Physiological homeostasis• Analogous process in behavioural

regulation• Preferred/optimal distribution of

activities• Stressors move organism away from

optimum behavioural state• Respond in ways to return to ideal state

Behavioural Bliss Point

• Unconstrained condition: distribute activities in a way that is preferred

• Behavioural bliss point (BBP)• Relative frequency of all behaviours in

unconstrained condition• Across conditions

– BBP shifts• Within condition

– BBP stable across time

Imposing a Contingency

• Puts pressure on BBP• Act to defend challenges to BBP• But requirements of contingency (may)

make achieving BBP impossible• Compromise required• Redistribute responses so as to get as

close to BBP as possible

Minimum Deviation Model

• Behavioural regulation• Due to imposed contingency:• Redistribute behaviour• Minimize deviation of responses from

BBP– Get as close as you can

Time running

Tim

e dr

inki

ng

10 20 30 40

40

30

20

10

restricted running

restricted drinking

Strengths of BBP Theory

• Reinforcers: not special stimuli or responses

• No difference between operant and reinforcer

• Explains new allocation of behaviour• Fits with findings on cognition for

cost:benefit optimization

instrumental conditioning: motivational mechanisms

Documents

process prediction

process approachclassical

eyeblink cr

rgsg cr

experiencerole of response

avoidancein classical

so association

operant responsecs