chapter 5 – instrumental conditioning: foundations outline 1 –comparison of classical and...

Chapter 5 – Instrumental Conditioning: Foundations

• Outline 1– Comparison of Classical and Instrumental Conditioning– Early Investigations of Instrumental Conditioning.

• Thorndike– Chicks and mazes– Cats and puzzle box

– Modern approaches to the study of instrumental conditioning• Discrete trials Procedures• Free-operant procedures

– Magazine training and shaping– Response rate as measure of operant behavior

– Instrumental Conditioning Procedures• Positive Reinforcement• Negative Reinforcement (Escape or Avoidance)• Positive Punishment (Punishment)• Negative Punishment (Omission Training/DRO)

• Comparison of Classical and Instrumental Conditioning.– Classical = S-S relationship

• Light-shock – elicits fear• Tone-food – elicits salivation• In Classical Conditioning there is no response

requirement.

– Instrumental = R-S relationship• We will refer to it as R-O• Behavior (Response) is instrumental in producing the

outcome – Press lever – Get food– Pull lever – Get money

• In Instrumental Conditioning a particular response is required

• Keep in mind Classical and Instrumental conditioning are approaches to understanding learning. – not completely different kinds of learning.– Many learning situations could be described by either

approach.• Child touches hot stove

– CS, US, UR, CR?– Pavlovian = fear stove– Instrumental = less likely to approach

• Conditioned Taste Aversion– S-S = Taste – LICl– R-O= Drink liquid - get sick (punished)

• Early Investigations of Instrumental Conditioning.

• Edward Lee Thorndike (American)– Same time as Pavlov

• Late 1800s; early 1900s

– interested in animal intelligence.• late 19th century - many people believed that

animals reasoned – like people do – Romanes– Stories of amazing abilities of animals.

• biased reporting?– Report interesting behavior– ignore stupid

• Thorndike (1898) – “Dogs get lost hundreds of times and no one ever

notices it or sends an account of it to a scientific magazine, but let one find his way home from Brooklyn to Yonkers and the fact immediately becomes a circulating anecdote. Thousands of cats on thousands of occasions sit helplessly yowling, and no one takes thought of it or writes to his friend, the professor; but let one cat claw at the knob of a door supposedly as a signal to be let out, and straightway this cat becomes representative of the cat-mind in all the books…In short, the anecdotes give really the…supernormal psychology of animals.”

• Thorndike attempted to understand normal or ordinary animal intelligence.– Chicks in a maze– Cats in a box

• Puzzle box video

• Thorndike tested many animals – chicks, cats, dogs, fish, and monkeys. – little evidence for reasoning.

• Instead learning seemed to result from trial and accidental success

• Modern approaches to the study of instrumental conditioning– Discrete trials Procedures

• Like Thorndike’s work• Simpler mazes though

– Figure 5.3» Straight alleyway (running speed)» T-Maze (errors)

– Later» radial arm maze » Morris Water Maze

– Note • each run is separated by an intertrial interval• Just like in Pavlovian Conditioning

8-arm Maze

Morris Water Maze

• Free-operant procedures – There are no “trials”– The animal is allowed to behave freely

• Skinner Box– an automated method for gathering data from

animals

• Skinner

• Rat in Operant Chamber

• Skinner used these boxes to study operant conditioning– operant

• any response that “operated” on the environment

– Defined in terms of the effect it has on the environment

• Pressing a lever for food– Doesn’t matter how the rat does it

• Right paw, left paw, tail• As long as it actuated the switch

– Similar to opening a door• Doesn’t matter which hand you use• Or foot (carrying groceries)• Just as long as the result is achieved

• Magazine training and shaping– First have to train the animals about

availability of food– training by successive approximations

(shaping)• Shaping a rat

• Response rate as measure of operant behavior– In a free-operant situation you do not have

measures such as percent correct, or errors.– Skinner used response rate as a primary

measure– We will see later that various schedules of

reinforcement affect response rate in various ways.

• Instrumental Conditioning Procedures• First let’s get some terminology down

– Positive• Behave = Stimulus applied

– Negative• Behave = Stimulus removed

– Reinforcement• Behavior increased

– Punishment• Behavior decreased

• 2 x 2 table

positive negative

• Application– Box 5.2 in book

• Mentally disabled woman• Head banging behavior• Possibly to get attention - reward

– Change contingencies• Ignore head banging• Social rewards when not head banging

– Procedure?• Negative Punishment

– Differential Reinforcement of Other Behavior (DRO)

ABAB design

• Your book uses somewhat different terminology– Positive Reinforcement

• Same

– Escape or Avoidance• Negative Reinforcement

– Punishment • Positive Punishment

– Omission Training/DRO• Negative Punishment

• We will use my terminology

• Outline 2– Fundamental Elements of Instrumental Conditioning

• The Response– Behavioral variability vs. Stereotypy– Relevance or Belongingness– Behavior Systems and Constraints on Instrumental conditioning

• The Reinforcer– Quantity and Quality of the Reinforcer– Shifts in Reinforcer Quality or Quantity

• The Response-Outcome Relation– Temporal Relation– Contingency– Skinner’s Superstition experiment– The Triadic Design– Learned helplessness hypothesis

• Behavioral variability vs. Stereotypy– Thorndike emphasized the fact that reinforcement increases the

likelihood of the behavior being repeated in the future• Uniformity – stereotypy• This is often true

– You can increase response variability, however, by requiring it• Page & Neuringer, 1985• Two keys (50 trials per session)

– Novel group» Peck 8 times» Do not repeat a pattern that was used in the last 50 trials» LRLRLRLR» RLRLRLRL» LLRRLLRR» LLLLRRRR

– Control group» RF 8 pecks» Doesn’t matter how they do it.

• Figure 5.8

Reward and originality (creativity)?Box 5.3

• Relevance or Belongingness• We have discussed this in Classical

Conditioning• Bright Noisy Tasty water• Peck differently for water and Grain

• It has also been studied extensively in the Instrumental literature• Originally noted by Thorndike

• Cat – puzzle box• Train cat to yawn to escape• Or scratch themselves• Did not go well

• Brelands “The Misbehavior of Organisms” 1961• Play on Skinner’s “Behavior of Organisms” (1938)

• students of Skinner – training animals to do tricks as advertising gimmicks

• raccoon and coin(s)– Shaped to pick up coin– Then to place in bank– Then 2 coins

• pig and wooden coin

• Instinctive drift– Arbitrarily established responses drift toward more

innately organized behavior (instinct)

• The arbitrary operant– place coin in bank

• Instinctive drift– species specific behaviors related to food

consumption• Wash food• Root for food

Behavior Systems and Constraints on Instrumental conditioning

• Behavior Systems (Timberlake)• the response that occurs in a learning episode is

related to the particular behavioral system that is active at the time

• If you are a hungry rat and food is the reward • behaviors related to foraging will increase

• If you are a male quail maintained on a light cycle that indicates mating season and access to a female is being offered • mating behaviors will be elicited

Behavior systems continued

• The effectiveness of any procedure for increasing an instrumental response will depend on the compatibility of that response with behavioral system currently activated– rats pressing levers for food?– pigeons pecking keys for food?– Very easy to train

• Even works for fish– Easy

• bite a stimulus associated with a rival male• Swim through hoops for a stimulus associated with

female– Difficult

• Bite stimulus associated with access to female• Swim through hoops for access to rival male

The Reinforcer

• How do qualities of the Reinforcer affect Instrumental Conditioning?– Quantity and Quality of the Reinforcer

• Just like in Pavlovian conditioning. – More intense US Better conditioning

• Trosclair-Lassere et al. (2008)– Taught autistic child to press button for social reward

» Praise, hugs, stories– Social reward

» 10 s» 105 s» 120 s

– Progressive ratio

• Magnitude of RF and drug abstinence– Perhaps not surprising

• The more you pay the better addicts do.

• Shifts in Reinforcer Quality or Quantity• How well a reinforcer works depends on what

subjects are used to receiving

– Mellgren (1972) -• Straight alleyway

– Phase 1» Half – found 2 pellets (low reward)» Half – found 22 pellets (high reward)

– Phase 2» Half from each group switched to opposite condition» Other half stay the same

• 4 conditions• H-H

• high RF control

• L-L• low RF control

• L-H• Positive contrast

• H-L• Negative Contrast

Results – Figure 5.10

[note Domjan changed it to Small (S) and Large (L) rewards in text]

• Response- Outcome Relation– Temporal relation?

• contiguity

– Causal relation?• Contingency

• Independent– Contiguous doesn’t mean contingent

• You wake up – sun rises

– Contingent not always contiguous• Submit tax returns

– Wait a few weeks for the money

• Effects of temporal relationship– Hard to get animals to respond if long delay

between response and reward

– Delays are tough experiments to run using free operant procedure• Allow barpressing during “delay?”

– Some barpresses likely close to RF

• Enforce no barpressing after initial barpress?– Delay RF?

• In study in book (Fig. 5.11) each bar press resulted in RF at a specified set time

• For some animals the delay was short– 2 - 4 s

• For some it was long– 64 s

• Example (16 s delay)– Bar press 1 at 1 s (RF1 = 17 s)– Bar press again at 3 s (RF2 at 19s)

• 14 s from RF1

– Bar press again at 12 s (RF3 at 28 s)• 5 s from RF1

– There will still be some “accidental” contiguity• Graph shows how responding is affected by actually experienced delays

• Why are animals so sensitive to the delay between response and outcome?– Delay makes it difficult to determine which

behavior actually caused RF• Press lever (contingent after 20 s)

– Scratch ear– Dig in bedding on the floor– Rear up– Clean face

» Reward

• All of the other behaviors are more contiguous with RF than is lever pressing

– A marking procedure can help maintain responding over a long delay• Provide a light or click after “target” response

– Lever press – click .....20 s food

• Helps animals bridge the gap

• Response-Reinforcer Contingency– Skinner thought contiguity was more

important than contingency– Superstition experiment

– Superstition and bowling video from u-tube• relevant content begins at 3:12

• Reinterpretation of Superstition Experiment• Behavioral systems again

– Different kinds of responses occur when periodic RF is used• Focal search

– Behaviors near food cup as time for RF approaches» “I know its coming”

• Post-food focal search– Again - activity near cup

» “Did I miss any?”

• General search– Move away from cup

» This is probably when skinner saw the turning and head tossing behaviors» “I have to wait, might as well look around”» “I am also a bit frustrated”

– There is evidence for these patterns of responding• Staddon and Simmelhag (1971)

– There is also evidence that slightly different patterns emerge with food vs. water RF.

• Effects of the controllability of Reinforcers• Is having control a good thing?

• We briefly mentioned the Brady Executive Monkey study earlier.• That study implied having operant control

over outcomes could be bad for the animal

• That study was confounded • Better evidence for the effects of control over

outcomes comes from the learned helplessness literature

• Learned helplessness hypothesis– Animals learn that they have no control over the

shock– Creates an expectation

• expect shock regardless of their behavior

• Implications– 1. This expectation reduces motivation to make an

instrumental response– 2. Even if they do escape/avoid shock, it is harder

for them to associate their behavior with the outcome.• Shock was independent of behavior in the past• Similar to US preexposure effect

chapter 5 – instrumental conditioning: foundations outline 1 –comparison of classical and...

pavlovian conditioning

box slide

stupid slide

classical conditioning

arm maze slide

morris water maze slide

sick punished slide

accidental success slide

Documents

101.09.learning - michigan state university– conditioning:...

a behaviorist view of learning using instrumental...

chapter 5: instrumental conditioning: nonreward, punishment,...

learning part ii. overview habituation classical...

operant or instrumental conditioning psychology 3306

instrumental conditioning also called operant conditioning

lecture 17: instrumental conditioning (associative...

chapter 6 learning. table of contents learning learning...

shaping robot behavior using principles from instrumental...

behavior-consequence relations the first few slides review...

c9 - 1 1 conditioning and learning processes. c9 - 2...

pavlovian and instrumental conditioning

more instrumental (operant) conditioning. b.f. skinner...

chapter 8 instrumental conditioning: learning the...

vehicle transient air conditioning analysis: model ... ·...

biologically plausible reinforcement learning is now called...

chapter 222 2 - princeton...

instrumental conditioning 1 blanks

open cycle liquid desiccant air conditioning systems ......

instrumental conditioning i: control...