instrumental conditioning: motivational mechanisms
DESCRIPTION
Instrumental Conditioning: Motivational Mechanisms. Contingency-Shaped Behaviour. Uses three-term contingency Reinforcement schedule (e.g., FR10) imposes contingency Seen in non-humans and humans. Rule Governed Behaviour. Particularly in humans Behaviour can be varied and unpredictable - PowerPoint PPT PresentationTRANSCRIPT
Instrumental Conditioning: Motivational Mechanisms
Contingency-Shaped Behaviour
• Uses three-term contingency• Reinforcement schedule (e.g., FR10)
imposes contingency• Seen in non-humans and humans
Rule Governed Behaviour
• Particularly in humans• Behaviour can be varied and
unpredictable• Invent rules or use (in)appropriate rules
across conditions (e.g., language)• Age-dependent, primary vs. secondary
reinforcers, experience
Role of Response in Operant Conditioning
• Thorndike– Performance of response necessary
• Tolman– Formation of expectation
• McNamara, Long & Wike (1956)– Maze– Running rats or riding rats (cart)– Association what is needed
Role of the Reinforcer
• Is reinforcement necessary for operant conditioning?
• Tolman & Honzik (1930)• Latent learning
– Not necessary for learning– Necessary for performance
Results
Day 11
Ave
rage
Err
ors
Days
food
no food
no food until day 11
Associative Structure in Instrumental Conditioning
• Basic forms of association– S = stimulus, R = response, O = outcome
• S-R• Thorndike, Law of Effect• Role of reinforcer: stamps in S-R
association• No R-O association acquired
Hull and Spence
• Law of Effect, plus a classical conditioning process
• Stimulus evokes response via Thorndike’s S-R association
• Also, S-O association creates expectancy of reward
• Two-process approach– Classical and instrumental are different
One-Process or Two-Processes?
• Are instrumental and classical the same (one process) or different (two processes)?
• Omission control procedure– US presentation depends on non-
occurrence of CR– No CR, then CS ---> US– CR, then CS ---> no US
Omission Control
CS
US
CR
Trial with a CR
CS
US
CR
Trial without a CR
Gormenzano & Coleman (1973)
• Eyeblink with rabbits• US=shock, CS=tone• Classical group: 5mA shock each trial,
regardless of response• Omission group: making eyeblink CR to
CS prevents delivery of US
• One-process prediction:– CR acquisition faster and stronger for
Omission group– Reinforcement for CR is shock avoidance– In Classical group CR will be present
because it somehow reduces shock aversiveness
• BUT…– CR acquisition slower in Omission group– Classical conditioning extinction (not all
CSs followed by US)• Supports Two-process theory
Classical in Instrumental
• Classical conditioning process provides motivation
• Stimulus substitution• S acquires properties of O
– rg = fractional anticipatory goal response• Response leads to feedback
– sg = sensory feedback• rg-sg constitutes expectancy of reward
Timecourse
S
R O
rg - sg
Through stimulus substitution S elicits rg-sg, giving motivational expectation of reward
Prediction
• According to rg-sg CR should occur before operant response; but doesn’t always
• Dog lever pressing on FR33 ---> PRP
• Low lever presses early, then higher; but salivation only later
Lever pressing
salivation
Time from start of trialM
agni
tude
Modern Two-Process Theory
• Classical conditioning in instrumental• Neutral stimulus ---> elicits motivation• Central Emotional State (CES)• CES is a characteristic of the nervous
system (“mood”)• CES won’t produce only one response
– Bit annoying re: prediction of effect
Prediction
• Rate of operant response modified by presentation of CS
• CES develops to motivate operant response• CS from classical conditioning also elicits
CES• Therefore, giving CS during instrumental
conditioning should alter CES that motivates instrumental response
“Explicit” Predictions
• Emotional states
USCS Appetitive Aversive
(e.g., food) (e.g., shock)CS+ Hope FearCS- Disappointment Relief
• Behavioural predictions
Aversive USInstrumental schedule CS+(fear) CS-(relief)
Positive reinforcement decrease increaseNegative reinforcement increase decrease
R-O and S(R-O)
• Earlier interpretations had no response-reinforcement associations
• Intuitive explanation, though• Perform response to get reinforcer
Colwill & Rescorla (1986)
• R-O association• Devalue reinforcer
post-conditioning• Does operant
response decrease?• Bar push right or left
for different reinforcers– Food or sucrose
devalued reinforcer
normal reinforcer
Mea
n re
spon
ses/
min
.Blocks of Ext. Trials
Testing of Reinforcers
Interpretation
• Can’t be S-R– No reinforcer in this model
• Can’t be S-O– Two responses, same stimuli (the bar), but
only one response affected• Conclusion
– Each response associated with its own reinforcer
– R-O association
Hierarchical S-(R-O)
• R-O model lacks stimulus component• Stimulus required to activate
association• Really, Skinner’s (1938) three term
contingency• Old idea; recent empirical testing
Colwill & Delameter (1995)
• Rats trained on pairs of S+• Biconditional discrimination problem
– Two stimuli– Two responses– One reinforcer
• Match the correct response to the stimuli to be reinforced
• Training, reinforcer devaluation, testing
• Training– Tone: lever --> food; chain --> nothing– Noise: chain --> food; lever --> nothing– Light: poke --> sucrose; handle --> nothing– Flash: handle --> sucrose; poke --> nothing
• Aversion conditioning• Testing: marked reduction in previously
reinforced response– Tone: lever press vs. chain– Noise: chain vs. lever– Light: poke vs. handle– Flash: handle vs. poke
Analysis
• Can’t be S-O– Each stimulus associated with same reinforcer
• Can’t be R-O– Each response reinforced with same outcome
• Can’t be S-R– Due to devaluation of outcome
• Each S activates a corresponding R-O association
Reinforcer Prediction, A Priori
• Simple definition– A stimulus that increases the future
probability of a behaviour– Circular explanation
• Would be nice if we could predict beforehand
Need Reduction Approach
• Primary reinforcers reduce biological needs
• Biological needs: e.g., food, water• Not biological needs: e.g., sex,
saccharin• Undetectable biological needs: e.g.,
trace elements, vitamins
Drive Reduction
• Clark Hull• Homeostasis
– Drive systems• Strong stimuli aversive• Reduction in stimulation is reinforcer
– Drive is reduced• Problems
– Objective measurement of stimulus intensity– Where stimulation doesn’t change or increases!
Trans-situationality
• A stimulus that is a reinforcer in one situation will be a reinforcer in others
• Subsets of behaviour– Reinforcing behaviours– Reinforcable behaviours
• Often works with primary reinforcers• Problems with other stimuli
Primary and Incentive Motivation
• Where does motivation to respond come from?
• Primary: biological drive state• Incentive: from reinforcer itself
But… Consider:
• What if we treat a reinforcer not as a stimulus or an event, but as a behaviour in and of itself
• Fred Sheffield (1950s)• Consummatory-response theory
– E.g., not the food, but the eating of food that is the reinforcer
– E.g., saccharin has no nutritional value, can’t reduce drive, but is reinforcing due to its consumability
Premack’s Principle
• Reinforcing responses occur more than the responses they reinforce
• H = high probability behaviour• L = low probability behaviour• If L ---> H, then H reinforces L• But, if H ---> L, H does not reinforce L• “Differential probability principle”• No fundamental distinction between
reinforcers and operant responses
Premack (1965)• Two alternatives
– Eat candy, play pinball– Phase I: determine individual behaviour
probability (baseline)• Gr1: pinball (operant) to eat (reinforcer)• Gr2: eating candy (operant) to play pinball
(reinforcer)– Phase II (testing)
• T1: play pinball (operant) to eat (reinforcer)– Only Gr1 kids increased operant
• T2: eat (operant) to play pinball (reinforcer)– Only Gr2 kids increased operant
Premack in Brief
Any activity…
…could be a reinforcer
… if it is more likely to be “preferred” than the operant response.
Response Deprivation Hypothesis
• Restriction to reinforcer response• Theory:
– Impose response deprivation– Now, low probability responses can reinforce high
probability responses• Instrumental procedures withhold reinforcer
until response made; in essence, deprived of access to reinforcer
• Reinforcer produced by operant contingency itself
Behavioural Regulation
• Physiological homeostasis• Analogous process in behavioural
regulation• Preferred/optimal distribution of
activities• Stressors move organism away from
optimum behavioural state• Respond in ways to return to ideal state
Behavioural Bliss Point
• Unconstrained condition: distribute activities in a way that is preferred
• Behavioural bliss point (BBP)• Relative frequency of all behaviours in
unconstrained condition• Across conditions
– BBP shifts• Within condition
– BBP stable across time
Imposing a Contingency
• Puts pressure on BBP• Act to defend challenges to BBP• But requirements of contingency (may)
make achieving BBP impossible• Compromise required• Redistribute responses so as to get as
close to BBP as possible
Minimum Deviation Model
• Behavioural regulation• Due to imposed contingency:• Redistribute behaviour• Minimize deviation of responses from
BBP– Get as close as you can
Time running
Tim
e dr
inki
ng
10 20 30 40
40
30
20
10
restricted running
restricted drinking
Strengths of BBP Theory
• Reinforcers: not special stimuli or responses
• No difference between operant and reinforcer
• Explains new allocation of behaviour• Fits with findings on cognition for
cost:benefit optimization