the power of one reinforcer: the ... - behavior explorer€¦ · item was the dice, which measured...

16
The power of one reinforcer: The effect of a single reinforcer in the context of shaping MARY HUNTER AND JESÚS ROSALES-RUIZ UNIVERSITY OF NORTH TEXAS During shaping, if the organism is engaged in behaviors other than the current approximation, the amount of time between reinforcers increases. In these situations, the shaper may resort to what is referred to as a desperation-driven click.That is, after a period of no reinforcement, the shaper delivers one reinforcer for a nontarget approximation. Reports from professional animal trainers sug- gest that the animal may continue performing this new behavior, even if it is reinforced only once. This study attempted to model this phenomenon with college students. Results from the study demonstrated that a desperation-driven click situation can be reliably produced in a controlled setting. When partici- pants received one reinforcer for interacting with a new object following a period of no reinforcement, they interacted with the new object for a longer or equal amount of time as compared to an object that had a longer history of reinforcement. The results of this study have implications for the understanding of how reinforcement controls behavior. Key words: acquisition, reinforcement, stimulus control, extinction, humans In 1952, William Schoenfeld observed, We dont even know the effect of a single rein- forcer presentation on an individual response(Mechner, 1994, p. 55), and Mechner argued that Shoenfelds statement continued to be valid in 1994. Even today, there is still truth in this statement. While some studies have looked at the effect of a single reinforcer, questions remain regarding the effect of a sin- gle reinforcer during the acquisition, mainte- nance, and reinstatement of behavior. Several studies have examined the effect of a single reinforcer by reinforcing only one response during the entire experiment. For example, Skinner (1933) discovered that a substantial extinction curve could be pro- duced when just a single response was reinforced. A rat was habituated to the oper- ant chamber and learned to approach the magazine when it was sounded. Then, the lever was introduced. After about 20 min, the rat pressed the lever for the rst time, and this was reinforced with food. After this, the maga- zine was disconnected. During the 40 min that followed, the rat pressed the lever more than 60 times. This began with a sustained high rate of responding, which was followed by pauses and short bursts of responding. Eventually, pauses became longer until the overall rate of responding was reduced to near zero. This demonstrated that even one reinforcer could have a large effect on the occurrence of a new behavior. In another study with rats, Iversen and Mogensen (1988) investigated spatial and tem- poral patterns of responding after the delivery of a single reinforcer in a vertical holeboard apparatus. Each rat received a single 10-min session in an apparatus that contained a matrix of 45 or 54 holes on one side. After 5 min, a single food pellet was presented in one hole. Nose poking through the holes was measured both before and after the delivery of the food pellet. Iversen and Mogensen observed low levels of nose pokes before the delivery of the pellet. After the delivery of the pellet, they reported an increase in nose pokes with the majority of visits occurring in the hole where the pellet had been delivered or the surrounding holes. This increase occurred during the rst 2 min after the presentation of the pellet with most rats returning to near baseline levels for the nal 3 min. In a similar study, Mal, McCall, Newland, and Cummins (1993) used a wooden grid with Mary Hunter, Department of Behavior Analysis, Univer- sity of North Texas; Jesús Rosales-Ruiz, Department of Behavior Analysis, University of North Texas. Correspondence concerning this article should be addressed to: Mary Hunter, Department of Behavior Analysis, 1155 Union Circle, Box 311070, University of North Texas, Denton, TX 76201. E-mail: mary@behavior explorer.com; Jesús Rosales-Ruiz, Department of Behavior Analysis, 1155 Union Circle, Box 311070, University of North Texas, Denton, TX 76201. E-mail: Jesus.Rosales- [email protected] doi: 10.1002/jeab.517 JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2019, 116 © 2019 Society for the Experimental Analysis of Behavior 1

Upload: others

Post on 10-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The power of one reinforcer: The ... - Behavior Explorer€¦ · item was the dice, which measured 1.5 cm x 1.5 cm x 1.5 cm, and the largest was the blue chair, which measured 2.8

The power of one reinforcer: The effect of a single reinforcer in thecontext of shaping

MARY HUNTER AND JESÚS ROSALES-RUIZ

UNIVERSITY OF NORTH TEXAS

During shaping, if the organism is engaged in behaviors other than the current approximation, theamount of time between reinforcers increases. In these situations, the shaper may resort to what isreferred to as a “desperation-driven click.” That is, after a period of no reinforcement, the shaperdelivers one reinforcer for a nontarget approximation. Reports from professional animal trainers sug-gest that the animal may continue performing this new behavior, even if it is reinforced only once. Thisstudy attempted to model this phenomenon with college students. Results from the study demonstratedthat a desperation-driven click situation can be reliably produced in a controlled setting. When partici-pants received one reinforcer for interacting with a new object following a period of no reinforcement,they interacted with the new object for a longer or equal amount of time as compared to an object thathad a longer history of reinforcement. The results of this study have implications for the understandingof how reinforcement controls behavior.Key words: acquisition, reinforcement, stimulus control, extinction, humans

In 1952, William Schoenfeld observed, “Wedon’t even know the effect of a single rein-forcer presentation on an individual response”(Mechner, 1994, p. 55), and Mechner arguedthat Shoenfeld’s statement continued to bevalid in 1994. Even today, there is still truth inthis statement. While some studies havelooked at the effect of a single reinforcer,questions remain regarding the effect of a sin-gle reinforcer during the acquisition, mainte-nance, and reinstatement of behavior.Several studies have examined the effect of

a single reinforcer by reinforcing only oneresponse during the entire experiment. Forexample, Skinner (1933) discovered that asubstantial extinction curve could be pro-duced when just a single response wasreinforced. A rat was habituated to the oper-ant chamber and learned to approach themagazine when it was sounded. Then, thelever was introduced. After about 20 min, therat pressed the lever for the first time, and this

was reinforced with food. After this, the maga-zine was disconnected. During the 40 min thatfollowed, the rat pressed the lever more than60 times. This began with a sustained high rateof responding, which was followed by pausesand short bursts of responding. Eventually,pauses became longer until the overall rate ofresponding was reduced to near zero. Thisdemonstrated that even one reinforcer couldhave a large effect on the occurrence of a newbehavior.

In another study with rats, Iversen andMogensen (1988) investigated spatial and tem-poral patterns of responding after the deliveryof a single reinforcer in a vertical holeboardapparatus. Each rat received a single 10-minsession in an apparatus that contained amatrix of 45 or 54 holes on one side. After5 min, a single food pellet was presented inone hole. Nose poking through the holes wasmeasured both before and after the deliveryof the food pellet. Iversen and Mogensenobserved low levels of nose pokes before thedelivery of the pellet. After the delivery of thepellet, they reported an increase in nose pokeswith the majority of visits occurring in the holewhere the pellet had been delivered or thesurrounding holes. This increase occurredduring the first 2 min after the presentation ofthe pellet with most rats returning to nearbaseline levels for the final 3 min.

In a similar study, Mal, McCall, Newland,and Cummins (1993) used a wooden grid with

Mary Hunter, Department of Behavior Analysis, Univer-sity of North Texas; Jesús Rosales-Ruiz, Department ofBehavior Analysis, University of North Texas.Correspondence concerning this article should be

addressed to: Mary Hunter, Department of BehaviorAnalysis, 1155 Union Circle, Box 311070, University ofNorth Texas, Denton, TX 76201. E-mail: [email protected]; Jesús Rosales-Ruiz, Department of BehaviorAnalysis, 1155 Union Circle, Box 311070, University ofNorth Texas, Denton, TX 76201. E-mail: [email protected]: 10.1002/jeab.517

JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2019, 1–16

© 2019 Society for the Experimental Analysis of Behavior1

Page 2: The power of one reinforcer: The ... - Behavior Explorer€¦ · item was the dice, which measured 1.5 cm x 1.5 cm x 1.5 cm, and the largest was the blue chair, which measured 2.8

40 compartments to deliver one reinforcer toweanling horses halfway through a 10-min ses-sion. Their results were similar to thosedescribed by Iversen and Mogensen (1988).After finding and consuming the food, thehorses visited the apparatus more frequentlyand visits were clustered around the compart-ment where food had been delivered. Thesethree studies all support the same conclusion:A single reinforcer delivery for a response thathas not been previously reinforced can cause amomentary, but substantial, increase in thefrequency of that response.Other researchers have studied the effect of

a single reinforcer by delivering a free rein-forcer after the response has beenextinguished. This research differs from theprevious three studies because it examines theeffect of a single reinforcer on a behavior thathad previously been reinforced. For example,Reid (1958) conducted research with pigeons,rats, and students. For each subject, an initialresponse was trained and then extinguished.When the response decreased to a minimallevel, a single free reinforcer was given. Thisreinforcer was “free” in the sense that it wasdelivered independent of the organism’sbehavior. In each case, most subjects emittedthe previously reinforced response at leastonce during the observation period thatfollowed. Reid concluded that the single freereinforcer produced this effect because it alsofunctioned as a stimulus, recreating conditionssimilar to those present during the acquisitionof the initial response. That is, the delivery ofthe reinforcer served as a discriminative stimu-lus to reestablish the previously taught behav-ior. Similarly, in research on operantreinstatement, a reinforcer is presented after aperiod of extinction. Exposing the organismagain to the reinforcer leads to an increase inthe previously taught response (see Vurbic &Bouton, 2014).Recent research has also examined how

reinforcement signals upcoming contingen-cies. This type of research differs from thestudies cited above because it attempts to mea-sure the effect of reinforcement during con-current schedules and after much longerexposure to the contingencies. Cowie and col-leagues (Cowie, Davison, & Elliffe, 2011, 2014;Cowie, Elliffe, & Davison, 2013), for example,have shown that a reinforcer for Response Amay result in an immediate increase in

Response B, if the reinforcer predicts that thenext reinforcer will more likely be deliveredfor Response B. According to Cowie andDavison (2016), it may be better to conceptu-alize reinforcement as a stimulus controleffect, rather than as a strengthening effect(see also Baum, 2012). These studies haveextended Reid’s (1958) finding that a rein-forcer may increase behavior because of its dis-criminative properties.

Many of studies described above examinedthe effect of giving a single reinforcer for anew response (Iversen & Mogensen, 1988; Malet al. 1993; Skinner, 1933), the effect of givinga free reinforcer after extinguishing aresponse (Reid 1958), or the effect of rein-forcement in a choice situation with two alter-native responses (e.g., Cowie & Davison,2016). Little research has examined the effectof a single reinforcer on another responseafter the extinction of the original response.In his investigations regarding superstitiousbehavior, Skinner (1948) noted that a newbehavior could be learned as the result of justone reinforcer. As well, Sidman (2010) dis-cussed that new learning can happen immedi-ately, in one trial, when the learner haspreviously mastered all of the necessary pre-requisite skills.

In applied settings, animal trainers haveobserved increases in a response after the deliv-ery of a single reinforcer (Bailey, 2013). Whenshaping novel or complex behaviors, trainerssometimes have difficulty moving from oneapproximation to the next. This can happenbecause, when the trainer changes to a new cri-terion, the animal repeatedly emits behaviorappropriate only for a prior criterion. This canlead to extended periods without reinforce-ment, sometimes eliciting counterproductiveemotional responding in the animal (e.g., adog may begin barking or whining, or leave thecontext entirely). Sometimes the trainer, in aneffort to increase the rate of reinforcement,reinforces a response that does not meet thenew criterion. Animal trainer Bob Bailey(2013) called this situation a “desperation-driven click.” Anecdotal reports from Baileyand other trainers suggest that the animalrepeatedly performs the behavior that wasfollowed by one reinforcer even in the absenceof further reinforcement for that behavior. Ifthis behavior persists, it may interfere with theacquisition of the terminal performance.

MARY HUNTER and JESÚS ROSALES-RUIZ2

Page 3: The power of one reinforcer: The ... - Behavior Explorer€¦ · item was the dice, which measured 1.5 cm x 1.5 cm x 1.5 cm, and the largest was the blue chair, which measured 2.8

This study examined the effect of one rein-forcer in the context of shaping by creating aperiod of no reinforcement followed by a sin-gle reinforcer for a new response. In particu-lar, we investigated whether, during a 1-minperiod of extinction, human participants spentmore time performing a new behavior thatreceived a single reinforcer after a brief periodof no reinforcement or if they spent moretime performing a behavior that had beenreinforced multiple times.

Method

Participants, Setting, and MaterialsParticipants included five undergraduate stu-

dents, two males and three females, from alarge public university. Participants were rec-ruited using flyers placed in university buildingsand by announcements made in undergradu-ate behavior analysis classes. Each participantreviewed an informed consent form at thebeginning of the study and was informed thatparticipation was voluntary. The consent formincluded information about the general natureof the study, compensation ($5 per day), andpotential risks. Participants were told that theexperiment would be filmed, but that onlytheir hands would be visible on the video foot-age and that their identity would be kept confi-dential. All procedures and the informedconsent form were reviewed and approved bythe university’s Institutional Review Board.

ApparatusThe game PORTL (Portable Operant

Research and Teaching Lab) served as theresponse apparatus (Rosales-Ruiz & Hunter,2016). The experiment was conducted in asmall room on a folding card table. The partici-pant sat on one side of the table, and theexperimenter sat on the opposite side of thetable. Stimuli were placed on the table betweenthe experimenter and the participant.Materials included a collection of 12 small

objects, a StarMark dog training clicker, greenwooden blocks, and a small blue ceramic dish.The green wooden blocks measured 1 cm x0.8 cm x 0.8 cm. The 12 small objects wereused as manipulanda. They varied in shape,color, and size and consisted of a purple Legoblock, a yellow rubber duck, a dice, a red bot-tle cap, a pink rubber stamp with a picture of

a dog, a gray key, a red pencil sharpener, ared domino-shaped wooden rectangle, a greendomino-shaped wooden rectangle, an off-whiterectangular eraser, a white foam ring, and ablue plastic toy chair (Fig. 1). The smallestitem was the dice, which measured 1.5 cm x1.5 cm x 1.5 cm, and the largest was the bluechair, which measured 2.8 cm x 2.8 cmx 5 cm.

A Canon digital camera attached to a tripodwas positioned to the side of the table betweenthe experimenter and the participant. Thecamera was aimed so that only the table andthe participant’s hands and arms were visibleon the video. In addition, a stopwatch applica-tion on an Android phone was used to moni-tor the passage of time.

Dependent Variables and MeasurementThe main dependent measure was the time

spent interacting with each object. Interactionwith an object was defined as any time that theparticipant’s hand or finger(s) contacted theobject. This could include any part of the par-ticipant’s hand below the wrist. Adding orswitching fingers or moving the object in thehand counted as a continuation of the sameinteraction, as long as part of the hand wasconstantly in contact with the object. Theinteraction ended when the participant’s handwas no longer in contact with the object. If theparticipant touched two or more objects at thesame time, each object was recorded as a sepa-rate interaction. Finally, if the object was outof view, either because it was off-screen,blocked by the participant’s hand, or blockedby other objects, an interaction was notrecorded until the observer could see part ofthe hand touching the object. The exceptionto this rule was if the observer could see theobject move and there was no other possibleway for the object to move except by contactwith the participant’s hand.

Data were collected from the recordedvideos. Videos were scored frame by frameusing the program QuickTime. Data wererecorded from the time the experimenter said“start” until the participant had received 10 rein-forcers, except for the one-click condition dur-ing which recording ended 1 min after theparticipant touched the 10th object. Durationof time spent interacting with each object wasmeasured as the cumulative number of seconds

3THE POWER OF ONE REINFORCER

Page 4: The power of one reinforcer: The ... - Behavior Explorer€¦ · item was the dice, which measured 1.5 cm x 1.5 cm x 1.5 cm, and the largest was the blue chair, which measured 2.8

during which the participant contacted eachobject. An interaction was recorded for eachtime stamp on the video if the participant began,continued, or ended an interaction during thatsecond. Interactions that lasted less than a sec-ond were counted as a full second. Becauseinteractions were rounded to a second, if twoseparate interactions with the same objectoccurred within the same time stamp, this wascounted as a duration of only one second.

ProceduresPreliminary training. Preliminary training

included instructions regarding the rules ofthe game PORTL and an explanation of theclicker and green blocks. First, the participantwas told: “Today we are going to play a gamewith the objects you see sitting on the table. Ina minute, I will spread out the objects in thecenter of the table. Then, when I say ‘start,’you can touch or interact with any of theobjects in any way you wish. Sometimes, how-ever, when you do certain actions or interactwith certain objects, you will hear a clicksound and I will hand you a little green block.Your goal today will be to earn as many littlegreen blocks as possible.” The participant was

also instructed to place the blocks in the bluedish. In addition, the participant was asked touse only one hand when interacting with theobjects and was informed that sometimes heor she would have access to all of the objectsand other times he or she would have accessto only certain objects. Participants were givenno further instructions regarding whether theyshould interact with certain objects or howthey should behave with respect to the objects.The experimenter then demonstrated sound-ing the clicker and handing the participant ablock several times.

The clicker and blocks are used in PORTLto create a behavior chain that is analogous tothe chain of responses that occur during rein-forcement delivery in the operant chamber.The sound of the clicker is functionally similarto the sound of the magazine. It serves as aconditioned reinforcer for correct responsesand also as a discriminative stimulus for theparticipant to hold out a hand to receive agreen block. Receiving a block is a discrimina-tive stimulus for the participant to place theblock in the dish. This is functionally similarto a consummatory response in an operantchamber (e.g., picking up a food pellet andplacing it in the mouth).

Fig. 1. The 12 objects that were used as experimental stimuli.

MARY HUNTER and JESÚS ROSALES-RUIZ4

Page 5: The power of one reinforcer: The ... - Behavior Explorer€¦ · item was the dice, which measured 1.5 cm x 1.5 cm x 1.5 cm, and the largest was the blue chair, which measured 2.8

In PORTL, the participant can use only onehand to interact with the objects, and blocksare always delivered to this hand. If the partici-pant still has an object in his or her hand afterthe click, the experimenter holds out theblock, but waits for the participant to put theobject down before giving the block. Thismeans that the participant’s hand must breakcontact with the object(s) to collect the blockand place it in the dish, just as the rat mustleave the lever to collect a food pellet fromthe magazine. If blocks were not used, the par-ticipant might continue to hold an object inhis or her hand after the click. The use of theblocks allows each response to have a clearbeginning and end, which creates a repeatableunit of behavior.Experiment structure. Each participant took

part in the experiment on two separate days.The participant experienced the same set ofconditions in the same order on each day.The experimental conditions were dividedinto smaller units that were called rounds. Atthe beginning of each round, the experi-menter spread out the objects in front of theparticipant and then said “start.” Each roundconsisted of the delivery of 10 clicks andblocks. The exception to this was the one-clickcondition, during which only two clicks andblocks were delivered. At the end of eachround, the experimenter informed the partici-pant that it was time for a break and removedthe objects. Each break lasted approximately1-2 min. This allowed the experimenter to addor remove certain objects.Experimental conditions. There were five

conditions, the single object training condi-tion, the multiple object training condition,the target object condition, the one-click con-dition, and the reinforce-all condition. On

each day, each condition was implemented forone or more rounds in the order listed above(see Fig. 2). The first two conditions were usedto acclimate participants to the apparatus andto teach participants to perform differentactions with a single object.

Single object training condition. During the sin-gle object training condition, 10 clicks andblocks were delivered during each round. Onlyone object, the training object, was available.The participant received clicks and blocks forperforming a succession of three differentactions with this object. Examples of actionsincluded responses such as touching the object,picking up the object, turning the object on itsside, pushing the object across the table, turn-ing the object upside down, and twirling theobject in the hand. At the beginning of theround, the experimenter delivered clicks andblocks contingent upon the first action the par-ticipant performed with the object. If the par-ticipant performed other actions, theexperimenter waited for the participant toreturn to this initial action before deliveringanother click. After three to four clicks andblocks had been delivered, the experimenterstopped clicking for this action and waited forthe participant to emit another action. Whenthe participant did so, the experimenter clickedand delivered the next block. Two additionalclicks and blocks were then delivered only forthis second action. Finally, the experimenterwaited for the participant to emit a third actionwith the object. The final three to four clicksand blocks were delivered when the participantemitted this action. Having the participant per-form different actions with the object wasmeant to simulate shifting between differentreinforcement criteria during shaping, thuspromoting variations within the response class.

Fig. 2. The order of conditions that was used for each participant.

5THE POWER OF ONE REINFORCER

Page 6: The power of one reinforcer: The ... - Behavior Explorer€¦ · item was the dice, which measured 1.5 cm x 1.5 cm x 1.5 cm, and the largest was the blue chair, which measured 2.8

Multiple object training condition. The multi-ple object training condition was implementedfor the second round on each day. Ten clicksand blocks were delivered during this round.The multiple object training condition wassimilar to the single object training condition,except all 12 objects were available. Onceagain, the participant had to emit three differ-ent actions with the training object, changingto a new action after the delivery of everythree or four clicks and blocks. The partici-pant was allowed to repeat actions that hadproduced clicks in the previous round. All-owing actions to be repeated deviates from atypical shaping procedure, in which a new cri-terion would be reinforced at each step. How-ever, this simplification was made becausethere was a limited number of actions thatcould be performed with some objects. Thisalso made it easier for the experimenter tokeep track of the actions that had been per-formed and monitor the participant’s behaviorin real time. During this condition, if the par-ticipant interacted with objects other than thetraining object, the experimenter did nothing.If the participant then returned to the trainingobject and performed an action that no longerearned clicks and blocks, the experimenterdid nothing. However, if the participant thenleft the training object again, but later ret-urned to it, the experimenter clicked anddelivered a block. This was done to encouragemore interaction with the training object.Target object condition. The target object con-

dition was meant to simulate reinforcing par-ticular criteria during a shaping session. Thiscondition was also used to establish a historyof reinforcement with a particular object, sothat responding to this object could be com-pared later to responding to an object thatreceived only one reinforcer. During thisround and all subsequent rounds on a givenday, the training object was removed, leavingjust 11 objects. Each round in the target objectcondition consisted of the delivery of 10 clicksand blocks. At the beginning of the first roundof the target object condition, the experi-menter waited for the participant to touch the10th object. That is, the experimenter pri-vately tracked which objects the participanthad touched and then clicked when the partic-ipant touched the penultimate object. Thisobject became the target object for the rest ofthe day. For the remainder of this round,

clicks and blocks were delivered only for inter-actions with the target object. Similar to thetraining conditions, the participant had to per-form three different actions with the targetobject, changing to a new action after everythree to four clicks.

Response criteria for beginning the one-click con-dition. Two criteria were used to determine ifthe target object condition was repeated formore than one round. If the participant didnot emit three different actions with the targetobject, the experimenter implemented theback-up procedure with the target object dur-ing the following round. Each round of theback-up procedure was identical to the targetobject condition, except fewer objects werepresent. During the first round of the back-upprocedure, the participant was given the targetobject and two other objects. If the participantsuccessfully performed three different actionswith the target object, the target object condi-tion was then repeated during the next roundwith all 11 objects present. If the participantstill did not perform three different actionswith the target object when only three objectswere present, the target object was presentedalone in the following round. This was contin-ued for multiple rounds, if necessary, until theparticipant performed three different actionswith the target object when only the targetobject was present. After this, the experi-menter implemented one round of the targetobject condition with three objects presentand then an additional round of the targetobject condition with all 11 objects present.

Once the participant had performed threedifferent actions with the target object withina single round with all 11 objects present, theexperimenter observed if the participant ret-urned immediately to the target object at thebeginning of the next round to determinewhat condition to implement. If the partici-pant began by interacting with an object otherthan the target object, the target object condi-tion was continued during this round. If theparticipant began by interacting with the tar-get object at the very beginning of the round,the experimenter implemented the one-clickcondition during this round.

One-click condition. The one-click conditionwas implemented for one round on each day.This was the only round that consisted of thedelivery of 2 clicks and blocks, rather than 10.When the participant began an interaction

MARY HUNTER and JESÚS ROSALES-RUIZ6

Page 7: The power of one reinforcer: The ... - Behavior Explorer€¦ · item was the dice, which measured 1.5 cm x 1.5 cm x 1.5 cm, and the largest was the blue chair, which measured 2.8

with the target object at the very beginning ofthe round, the experimenter clicked and deliv-ered the first block. Next, the experimenterstopped delivering clicks and watched as theparticipant interacted with the other objects.When the participant contacted the 10thobject (the second to last object), the experi-menter clicked and delivered a second block.This object became the one-click object. Bothof the clicks in this condition were deliveredcontingent on the beginning of any type ofinteraction with the specified object.Following the delivery of this second click and

block, the experimenter checked the time onthe timer and withheld all clicks and blocks fora period of 1 min. The round was finished whenthis 1-min period had elapsed. This extinctionperiod was limited to 1 min because pilot partici-pants displayed long pauses (without objectinteraction) and emotional responding (vocalsounds, throwing objects at the experimenter)when longer periods were tried during the ini-tial development of the procedures.The one-click condition modeled a

desperation-driven click situation because a clickand block were delivered for a class of responseswith a history of reinforcement (interactionswith the target object). Then, after a period dur-ing which no clicks or blocks were delivered,one click and block were delivered for a differ-ent response (interacting with the one-clickobject). The following 1 min of extinction wasthen used to assess whether participants spentmore time interacting with the one-click object,the target object, or other objects.Reinforce-all condition. The reinforce-all con-

dition, the final condition, tested whether theparticipant was more likely to interact with thetarget object, the one-click object, or someother object after the one-click condition. Dur-ing this round, the experimenter clicked anddelivered a block whenever the participantbegan any sort of interaction with any object.The participant did not have to performcertain actions with the objects. This roundended after 10 clicks and blocks had beendelivered.Experimental design. The main comparison

was between the duration of time spent inter-acting with the target object after the deliveryof the final click and block for the targetobject at the beginning of the one-click condi-tion and the duration of time spent inter-acting with the one-click object after the

delivery of the final click and block later inthe one-click condition. To determine if wecould replicate the effect, each participant ret-urned on a second day and all five conditionswere repeated in the same order.

Interobserver agreement. Five trainedobservers collected interobserver agreement(IOA) data on object contacts. Training con-sisted of explaining the study and data collec-tion procedures. The observer then practicedcoding two 20-s video clips under the guidanceof the experimenter. Finally, the observercoded a test video of four 20-s video clips with-out aid from the experimenter. The observerwas considered trained if his or her resultsfrom the test video showed 90% or greateragreement with data the experimenter col-lected from the same video clip. IOA datawere collected for the entire experiment.However, IOA data were collected only for thetraining object, the target object, and the one-click object. IOA was calculated for eachobject during each session using the formulaA/(A + D)*100, where A was agreements andD was disagreements. IOA was 99.1% for Par-ticipant 1 (range 98.4-100%), 98.7% for Partic-ipant 2 (range 96.2-99.5%), 90.7% forParticipant 3 (range 85.7-97.8%), 99.1% forParticipant 4 (range 97.8-100%), and 95.2%for Participant 5 (range 75.3-100%).

Results

Results were graphed as the cumulativenumber of seconds during which the partici-pant interacted with each object. On eachgraph, time, as well as the cumulative line foreach object, resets to zero at the beginning ofeach round. In addition, for the one-click con-dition, the lines for each object and the timereset to zero when the participant received thefinal click and block. The graph was reset atthis point to make it easier to compare theamount of time spent interacting with eachobject after the last click for the target objectto the amount of time spent interacting witheach object after the single click for the one-click object. On each graph, the one-clickobject is always displayed as a solid blackline, the target object is displayed as adashed black line, and the training object isdisplayed as a dotted black line. The othernine objects are displayed in solid lines ofshades of gray.

7THE POWER OF ONE REINFORCER

Page 8: The power of one reinforcer: The ... - Behavior Explorer€¦ · item was the dice, which measured 1.5 cm x 1.5 cm x 1.5 cm, and the largest was the blue chair, which measured 2.8

Individual results for the five participantsare shown in Figures 3–7. The top and bottomgraphs depict data from each participant’s firstand second day, respectively. Each graphshows the five conditions implemented foreach participant, the single object trainingcondition, multiple object training condition,target object condition, one-click condition,and reinforce-all condition.On their first day, each participant inter-

acted repeatedly with the training object dur-ing the single object training condition andmultiple object training condition. When theother objects were first present at the begin-ning of the multiple object training condition,most participants interacted some with them.However, when each participant returned tointeracting with the training object, this pro-duced clicks and blocks, and responding tothe rest of the objects decreased to near zero.The exception to these patterns was Partici-

pant 5 (Fig. 7), who interacted with all of theobjects during the single object training condi-tion. During this round, she picked up thetraining object twice, receiving a click andblock both times. Then, however, she beganreaching across the table and picking up theother objects that were piled next to theexperimenter, moving them one by one to herside of the table. Once the participant hadmoved all 11 of the other objects to her sideof the table, she returned to interacting withthe training object, and only rarely interactedwith any of the other objects. During the mul-tiple object training condition, Participant5 interacted with only the training object,something not observed with any of the otherparticipants on the first day.During the target object condition, each

participant interacted primarily with the targetobject across both days and rounds. The targetobject condition was continued for a secondround for Participant 3 on Day 1 (Fig. 5)because this participant did not immediatelyreturn to the target object after the first roundof the target object condition. For the samereason, the target object condition wasrepeated on Day 2 for Participants 1, 4, and 5.During the one-click condition, each partici-

pant received one click and block at the verybeginning of the round for interacting withthe target object. After receiving this first clickand block, each participant continued to inter-act with the target object for some time before

beginning to interact with the rest of theobjects. When the participant touched the10th object, the second click and block weredelivered. The time between receiving thesetwo clicks varied between participants. For Day1, the average was 37.2 s, and the range was30-49 s (Table 1). During this time interval, allparticipants spent more time interacting withthe target object than with any other object.

After receiving the second click and block,no more clicks and blocks were given for thefollowing 1 min. On the first day, most partici-pants interacted repeatedly with the one-clickobject at the beginning of this period,switching later to interacting with the targetobject and, to a lesser extent, the otherobjects. Participant 1 (Fig. 3) and Participant3 (Fig. 5) spent more time interacting with theone-click object (27 s and 19 s, respectively)than with any other object. For these two par-ticipants, the second largest amount of timewas spent interacting with the target object.Participant 2 (Fig. 4) spent 11 s interactingwith both the one-click object and the targetobject and 8 s interacting with a third object.This participant did not spend more than 4 sinteracting with any of the other objects. Par-ticipant 4 (Fig. 6) started out interacting withthe one-click object, then switched back andforth between this object and the target object.This participant spent a total of 16 s interactingwith the one-click object and 18 s interactingwith the target object, which was more thandouble the amount of time spent with any ofthe other objects. Participant 5 (Fig. 7) startedout interacting with the one-click object, thenswitched to interacting mainly with anotherobject. These two objects received the mostinteraction. This participant was the only partic-ipant who interacted only rarely with the targetobject during this part of the experiment.

The final round for each participant wasthe reinforce-all condition. A summary of theobjects that participants interacted with dur-ing this condition is displayed in Table 2. Onthe first day, Participant 1 (Fig. 3) and Partici-pant 5 (Fig. 7) interacted only with the targetobject, and Participant 3 (Fig. 5) interactedonly with the one-click object. Participant2 (Fig. 4) interacted with eight differentobjects, and Participant 4 (Fig. 6) interactedexclusively with a single object, which wasneither the target object nor the one-clickobject.

MARY HUNTER and JESÚS ROSALES-RUIZ8

Page 9: The power of one reinforcer: The ... - Behavior Explorer€¦ · item was the dice, which measured 1.5 cm x 1.5 cm x 1.5 cm, and the largest was the blue chair, which measured 2.8

Fig. 3. The cumulative number of seconds that Participant 1 spent interacting with each of the objects during Day1 (top graph) and Day 2 (bottom graph).

9THE POWER OF ONE REINFORCER

Page 10: The power of one reinforcer: The ... - Behavior Explorer€¦ · item was the dice, which measured 1.5 cm x 1.5 cm x 1.5 cm, and the largest was the blue chair, which measured 2.8

Fig. 4. The cumulative number of seconds that Participant 2 spent interacting with each of the objects during Day1 (top graph) and Day 2 (bottom graph).

MARY HUNTER and JESÚS ROSALES-RUIZ10

Page 11: The power of one reinforcer: The ... - Behavior Explorer€¦ · item was the dice, which measured 1.5 cm x 1.5 cm x 1.5 cm, and the largest was the blue chair, which measured 2.8

Fig. 5. The cumulative number of seconds that Participant 3 spent interacting with each of the objects during Day1 (top graph) and Day 2 (bottom graph).

11THE POWER OF ONE REINFORCER

Page 12: The power of one reinforcer: The ... - Behavior Explorer€¦ · item was the dice, which measured 1.5 cm x 1.5 cm x 1.5 cm, and the largest was the blue chair, which measured 2.8

Fig. 6. The cumulative number of seconds that Participant 4 spent interacting with each of the objects during Day1 (top graph) and Day 2 (bottom graph).

MARY HUNTER and JESÚS ROSALES-RUIZ12

Page 13: The power of one reinforcer: The ... - Behavior Explorer€¦ · item was the dice, which measured 1.5 cm x 1.5 cm x 1.5 cm, and the largest was the blue chair, which measured 2.8

Fig. 7. The cumulative number of seconds that Participant 5 spent interacting with each of the objects during Day1 (top graph) and Day 2 (bottom graph).

13THE POWER OF ONE REINFORCER

Page 14: The power of one reinforcer: The ... - Behavior Explorer€¦ · item was the dice, which measured 1.5 cm x 1.5 cm x 1.5 cm, and the largest was the blue chair, which measured 2.8

Participants showed similar patterns ofresponding when the experimental conditionswere repeated on the second day. During themultiple object training condition and targetobject condition, participants interacted pri-marily with the training object or target objectand rarely with the other objects.During the one-click condition, response pat-

terns were generally similar to what wasobserved for each participant on his or her firstday. This was particularly true for Participant1, who showed very similar patterns ofresponding during this condition on both days.Participants 3 and 5 still spent a considerableamount of time interacting with the one-clickobject. However, the effect was slightly less, ascompared to their first day. Participants 2 and4 seemed to spend slightly more time inter-acting with the one-click object, as compared toDay 1. For all participants, the time betweenthe two clicks during the one-click conditionwas shorter on the second day (Table 1). Dur-ing this time, each participant interactedmainly with the target object (range: 10-22 s),interacting minimally with the other objects.During the reinforce-all condition on the

second day, Participant 1 (Fig. 3) and Partici-pant 3 (Fig. 5) interacted with only the targetobject, and Participant 2 (Fig. 4) interactedwith only the one-click object (Table 1). Partic-ipant 4 (Fig. 6) interacted with nine differentobjects, and Participant 5 (Fig. 7) interactedwith six different objects.

Discussion

The results from this study showed that,when participants received a single reinforcerfor interacting with a new object following aperiod of no reinforcement, they spent asmuch or more time, during extinction, inter-acting with this new object than another objectfor which interaction had been reinforced far

more often. Also, response allocation substan-tially favored this new response over otherresponses that had never been reinforced.

Although a single reinforcer increased thetime spent interacting with the one-click object,its effects did not eliminate the participants’ his-tory of reinforcement with the target object. Astime elapsed without any additional reinforcers,most participants reverted to interacting with thetarget object more frequently than with theother objects. Thus, like Epstein (1983), wefound that a history of reinforcement remainsimportant in determining how behavior is allo-cated in the face of a change in contingencies.

One main pattern of responding wasobserved during the 1 min of extinction thatbegan after the one click and block were deliv-ered for interacting with the one-click object.The participant first repeatedly interacted withthe one-click object. Then, as responding tothe one-click object diminished, the partici-pant began interacting again with all11 objects. However, the participant interactedmore frequently with the target object thanwith the other nine objects, for which interac-tions had never been reinforced. We observedtwo variations to this pattern of responding. Inthe first variation (seen during Participant 4’ssecond day and during Participant 5’s firstday), the participant first repeatedly interactedwith the one-click object. As responding to theone-click object began to decrease, the partici-pant interacted for approximately the sameduration of time with the other objects. How-ever, the participant was still responding tothe one-click object at the end of the minute.Possibly, the participant would have interactedmore with the target object if the extinctionperiod had lasted for longer than 1 min. Inthe second variation (seen on Participant 4’sfirst day and Participant 5’s second day), theparticipant interacted nearly equally with theone-click object and the target object

Table 1

Time Interval between the two clicks in the One-ClickCondition

Participant Day 1 Day 2

1 35 s 31 s2 34 s 26 s3 49 s 34 s4 30 s 20 s5 38 s 23 s

Table 2

Objects touched during the Reinforce-All Condition

Participant Day 1 Day 2

1 Target object Target object2 8 objects One-click object3 One-click object Target object4 Dice 9 objects5 Target object 6 objects

MARY HUNTER and JESÚS ROSALES-RUIZ14

Page 15: The power of one reinforcer: The ... - Behavior Explorer€¦ · item was the dice, which measured 1.5 cm x 1.5 cm x 1.5 cm, and the largest was the blue chair, which measured 2.8

throughout the 1 min of extinction. In thesetwo cases, the participant returned muchfaster to interacting with the target object, yetstill continued to interact with the one-clickobject.The present study demonstrated that the

effect of a single reinforcer could be observedin the context of behavior acquisition. Similarto Skinner (1933), Iversen and Mogensen(1988), and Mal et al. (1993), our data showedthat a single reinforcer could momentarilyincrease the frequency of a new behavior. Thismomentary increase can be observed when noother responses are reinforced during theexperiment, as was demonstrated in thesethree previous experiments. It can also beseen in the context of acquisition when a pre-vious behavior has been reinforced, as wasshown in this experiment. Studying this phe-nomenon in the context of acquisition allowedus to compare whether participants spentmore time performing a previously reinforcedbehavior or a new behavior that received a sin-gle reinforcer.These results have important implications

for therapists, teachers, and animal trainers.An accidental reinforcer may occur if a profes-sional makes an error while applying a proce-dure, if a professional does not have well-defined criteria for what to reinforce, or if aprofessional reinforces an unrelated behaviorin an effort to increase the rate of reinforce-ment. Research in applied settings has exam-ined what level of treatment integrity isneeded for procedures to be effective (seeDiGennaro Reed & Codding, 2014). Practi-tioners generally accept 80% as an acceptablelevel of treatment integrity. Recently, someresearchers have found that even this level ofintegrity may interfere with learning (seeBergmann, Kodak, & LeBlanc, 2017). How-ever, this current study suggests that even oneaccidental reinforcer may result in an increasein an unrelated behavior, which could make itdifficult to bring the learner back to approxi-mations to the goal behavior.The current results differ from what might

be expected, based on the Law of Effect(Skinner, 1938; Thorndike, 1911), whichwould favor the target object. During the firstpart of the experiment, the participantreceived multiple reinforcers for interactingwith the target object. If these repeated rein-forcers strengthened this behavior, it could be

predicted that the participant would spend sig-nificantly more time interacting with thisobject during extinction, rather than inter-acting with an object that received only a sin-gle reinforcer. However, this was not what wasobserved.

Instead, the results could be interpreted asa stimulus control effect (see Baum, 2012;Cowie & Davison, 2016; Reid, 1958). In the ini-tial parts of the present experiment, when theparticipant received the first reinforcer forinteracting with a new object, this was followedby at least 10 more reinforcers for interactingwith that object, as well as reinforcement forperforming different actions with the object.Additionally, the participant received rein-forcement for interacting with only one objectduring each round. The participant had expe-rienced these conditions with both the train-ing object and the target object. Because ofthis, when the participant received the firstreinforcer for interacting with a new object,this would signal that more reinforcementwould likely be forthcoming for interactingwith this object. This should lead the partici-pant to interact repeatedly with the newobject, as was observed in this study. Oneinterpretation of the present results could bethat the delivery of the reinforcer for inter-acting with the one-click object created condi-tions that were similar to the conditions thatwere present when the first reinforcer wasgiven for interacting with the training objectand again when the first reinforcer was givenfor interacting with the target object.

Similar to Reid (1958), in this study a rein-forcer was given after a period of extinction.However, Reid reported an increase in theoriginal behavior after the delivery of the freereinforcer (see also Neuringer, 1970), whereasthe present study observed an immediateincrease in a new behavior. Yet, Reid’s experi-ment does not report the rates of respondingfor alternative behaviors. It could be that thefree reinforcer in Reid’s experiment led to alarge increase in a new behavior and also toan increase in the original behavior, with onlythe latter being measured. If that was the case,the stimulus control effect from Reid’s experi-ment and the effect observed in this studywould be similar.

Future research should investigate what fac-tors contribute to the single reinforcer effectobserved in this experiment. It would be

15THE POWER OF ONE REINFORCER

Page 16: The power of one reinforcer: The ... - Behavior Explorer€¦ · item was the dice, which measured 1.5 cm x 1.5 cm x 1.5 cm, and the largest was the blue chair, which measured 2.8

worthwhile to investigate if the same effectcould be obtained if the participant had a veryminimal history of reinforcement for inter-acting with the target object. For example, thetwo training conditions and the target objectcondition could be omitted, and the partici-pant could receive just a single reinforcer forinteracting with the target object at the begin-ning of the one-click condition. If the increasein responding to the one-click object is pro-duced as a result of stimulus control, theamount of responding to the one-click objectwould be expected to be much less. This isbecause a different stimulus control rulewould be established. That is, an interactionwith a particular object will be reinforced onlyonce. In summary, this study extended theknowledge about the effect of a single rein-forcer in the context of shaping. These resultshave important implications for practitionersconcerned with the effects of individual rein-forcers during the acquisition of behavior.They also have implications for basicresearchers concerned with how behavior iscontrolled by reinforcement. Even a singlereinforcer may direct the learner away fromthe target behavior. As Davison and Baum(2003) suggest, every reinforcer counts.

References

Bailey, B. (2013, March). What did Bailey do, and why: Atraining potpourri. Presentation at the 5th Annual Artand Science of Animal Training Conference, Den-ton, TX.

Baum, W. M. (2012). Rethinking reinforcement: Alloca-tion, induction, and contingency. Journal of the Experi-mental Analysis of Behavior, 97(1), 101-124. https://doi.org/10.1901/jeab.2012.97-101

Bergmann, S. C., Kodak, T. M., & LeBlanc, B. A. (2017).Effects of programmed errors of omission and com-mission during auditory-visual conditional discrimina-tion training with typically developing children. ThePsychological Record, 67(1), 109-119. https://doi.org/10.1007/s40732-016-0211-2

Cowie, S., & Davison, M. (2016). Control by reinforcersacross time and space: A review of recent choiceresearch. Journal of the Experimental Analysis of Behavior,105(2), 246-269. https://doi.org/10.1002/jeab.200

Cowie, S., Davison, M., & Elliffe, D. (2011). Reinforce-ment: Food signals the time and location of futurefood. Journal of the Experimental Analysis of Behavior, 96(1), 63-86. https://doi.org/10.1901/jeab.2011.96-63

Cowie, S., Davison, M., & Elliffe, D. (2014). A model forfood and stimulus changes that signal time-based con-tingency changes. Journal of the Experimental Analysis ofBehavior, 102(3), 289-310. https://doi.org/10.1002/jeab.105

Cowie, S., Elliffe, D., & Davison, M. (2013). Concurrentschedules: Discriminating reinforcer-ratio reversals ata fixed time after the previous reinforcer. Journal ofthe Experimental Analysis of Behavior, 100(2), 117-134.https://doi.org/10.1002/jeab.43

Davison, M., & Baum, W. M. (2003). Every reinforcercounts: Reinforcer magnitude and local preference.Journal of the Experimental Analysis of Behavior, 80(1),95-129. https://doi.org/10.1901/jeab.2003.80-95

DiGennaro Reed, F. D., & Codding, R. (2014). Advance-ments in procedural fidelity assessment and interven-tion: Introduction to the special issue. Journal ofBehavioral Education, 23, 1-18. https://doi.org/10.1007/s10864-013-9191-3

Epstein, R. (1983). Resurgence of previously reinforcedbehavior during extinction. Behavior Analysis Letters, 3,391-397.

Iversen, I. H., & Mogensen, J. (1988). A multipurpose ver-tical holeboard with automated recording of spatialand temporal response patterns for rodents. Journal ofNeuroscience Methods, 25(3), 251-263. https://doi.org/10.1016/0165-0270(88)90140-9

Mal, M. E., McCall, C. A., Newland, C., & Cummins, K. A.(1993). Evaluation of a one-trial learning apparatus totest learning ability in weanling horses. Applied AnimalBehaviour Science, 35(4), 305-311. https://doi.org/10.1016/0168-1591(93)90082-Z

Mechner, F. (1994). The revealed operant: A way to study thecharacteristics of individual occurrences of operant responses(3rd ed.). Cambridge, MA: Cambridge Center forBehavioral Studies.

Neuringer, A. J. (1970). Superstitious key pecking afterthree peck-produced reinforcements. Journal of theExperimental Analysis of Behavior, 13(2), 127-134.https://doi.org/10.1901/jeab.1970.13-127

Reid, R. L. (1958). The role of the reinforcer as a stimu-lus. British Journal of Psychology, 49(3), 202-209.https://doi.org/10.1111/j.2044-8295.1958.tb00658.x

Rosales-Ruiz, J., & Hunter, M. (2016). PORTL: Your porta-ble Skinner box. Operants, Quarter IV, 34-36.

Sidman, M. (2010). Errorless learning and programmedinstruction: The myth of the learning curve. EuropeanJournal of Behavior Analysis, 11, 167-180. https://doi.org/10.1080/15021149.2010.11434341

Skinner, B.F. (1933). “Resistance to extinction” in the pro-cess of conditioning. Journal of General Psychology, 9, 420-429. https://doi.org/10.1080/00221309.1933.9920945

Skinner, B.F. (1938). The behavior of organisms: An experi-mental analysis. New York, NY: Appleton-Century-Crofts.

Skinner, B. F. (1948). Superstition in the pigeon. Journal ofExperimental Psychology, 38, 168-172. https://doi.org/10.1037/h0055873

Thorndike, E. L. (2000/1911). Animal intelligence. NewBrunswick, NJ: Transaction Publishers.

Vurbic, D., & Bouton, M. E. (2014). A contemporarybehavioral perspective on extinction. InF. K. McSweeney and E. S. Murphy (Eds.), The WileyBlackwell handbook of operant and classical conditioning(pp. 53-76). Malden, MA: John Wiley & Sons.

Received: December 2, 2017Final Acceptance: April 11, 2019

Editor in Chief: Amy L. OdumAssociate Editor: Iser G. DeLeon

MARY HUNTER and JESÚS ROSALES-RUIZ16