a reaction time paradigm to measure rewardoriented … e view that the decision making ......

27
Lauwereyns & Wisnewski Rewardoriented bias 1 A ReactionTime Paradigm to Measure RewardOriented Bias in Rats Johan Lauwereyns & Regan G. Wisnewski Victoria University of Wellington Running head: Rewardoriented bias Journal of Experimental Psychology: Animal Behavior Processes In press Lines of text (main text + references): 414 Correspondence: Johan Lauwereyns School of Psychology Victoria University of Wellington P. O. Box 600 Wellington 6006 New Zealand Email: [email protected] Phone ++6444635042 Fax: ++6444635402

Upload: vukien

Post on 11-Mar-2018

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: A Reaction Time Paradigm to Measure RewardOriented … e view that the decision making ... “decision threshold,” one can model reaction time data using ... Peripheral Stimulus

Lauwereyns & Wisnewski Reward­oriented bias 1

A Reaction­Time Paradigm to Measure

Reward­Oriented Bias in Rats

Johan Lauwereyns & Regan G. Wisnewski

Victoria University of Wellington

Running head: Reward­oriented bias

Journal of Experimental Psychology: Animal Behavior Processes

In press

Lines of text (main text + references): 414

Correspondence: Johan Lauwereyns

School of Psychology Victoria University of Wellington

P. O. Box 600 Wellington 6006

New Zealand E­mail: [email protected]

Phone ++64­4­463­5042 Fax: ++64­4­463­5402

Page 2: A Reaction Time Paradigm to Measure RewardOriented … e view that the decision making ... “decision threshold,” one can model reaction time data using ... Peripheral Stimulus

Lauwereyns & Wisnewski Reward­oriented bias 2

Abstract

A nose­poke task with asymmetric position­reward mapping was devised to distinguish between

effects of bias and sensitivity in reaction times of rats. In all trials, the rats had to poke their nose

in the hole to the left or to the right of center, corresponding to the side where four lights were

illuminated, ignoring distracters on the other side. Reaction times were faster for large­reward

trials than for small­reward trials. In large­reward trials, there was no influence of the number of

distracters, whereas in small­reward trials, distracters produced an increase of reaction time.

Analysis of reaction­time distributions according to a linear model of decision making suggested

that most of the systematic variability was due to a reward­oriented bias.

(118 words)

Key words:

Nose poke, rat, reward, bias, reaction time

Page 3: A Reaction Time Paradigm to Measure RewardOriented … e view that the decision making ... “decision threshold,” one can model reaction time data using ... Peripheral Stimulus

Lauwereyns & Wisnewski Reward­oriented bias 3

A Reaction­Time Paradigm to Measure Reward­Oriented Bias in Rats

Successful behavior requires the ability to predict and exploit opportunities that lead to desirable

outcomes (Dickinson & Balleine, 1994). Recent behavioral research has focused on the ability of

animals such as rats to detect reward rates (e.g., Gallistel, Mark, Adam, & Latham, 2001; Gharib,

Gade, & Roberts, 2004). Concurrently, there has been a veritable explosion of

electrophysiological research on the reward­related activities of single neurons in rats and

monkeys (e.g., Kobayashi et al., 2002; Lauwereyns et al., 2002a,b; Pan, Schmidt, Wickens, &

Hyland, 2005; Platt & Glimcher, 1999; Pratt & Mizumori, 2001; Schmitzer­Torbert & Redish,

2004; Schultz, Dayan, & Montague, 1997). Here, we present a new behavioral paradigm with

rats that will be particularly suited to examine the covariation between neurophysiological

measures and behavioral indices of reward­oriented perception and action.

Arguably the most promising approach to marry the two research fields would be to

capitalize on the information that can be read out of reaction­time distributions (Luce, 1986).

Indeed, this proposal has already been put forward by several researchers (Carpenter, 2004;

Smith & Ratcliff, 2004). One of the principle advantages of reaction times as a dependent

measure, rather than a more discontinuous measure such as spatial choice or percent correct, is

the increase in statistical power when comparing the variability of this behavioral measure with

variability in neuronal spike trains on a trial­by­trial basis. This increased statistical power is

crucial when one considers that optimal recording conditions for individual neurons are difficult

to maintain for even a single session with an experimental animal.

Quite apart from logistic considerations, however, analyses of reaction­time distributions

may also be suitable to distinguish between alternative mechanisms of decision making. Taking

Page 4: A Reaction Time Paradigm to Measure RewardOriented … e view that the decision making ... “decision threshold,” one can model reaction time data using ... Peripheral Stimulus

Lauwereyns & Wisnewski Reward­oriented bias 4

the view that the decision­making process consists of a “decision signal” that linearly rises to a

“decision threshold,” one can model reaction­time data using several parameters reflecting the

slope of the decision signal, the starting point of the decision signal, and the level of the

threshold (i.e., the Linear Approach to Threshold with Ergodic Rate or LATER model;

Carpenter, 2004; Carpenter & Williams, 1995; Reddi & Carpenter, 2000; see Figure 1a).

Changes to the slope of the decision signal would then be distinguishable from changes to the

starting point of the decision signal in the shapes of reaction­time distributions (see Method for

LATER analysis). This approach gives researchers very valuable computational tools in the

study of neuronal activity. Single­unit recordings in the frontal eye field of monkeys have

already shown slopes in the rise of neuronal activity that closely correlate with eye movement

latency, as if the activity builds up toward a fixed decision threshold for movement initiation

(Hanes & Schall, 1996; see also a comprehensive discussion of these ideas in Gold & Shadlen,

2001). Thus, analyses of reaction times and neurophysiological measures can be used as

convergent operations in the study of the mechanisms underlying decision making.

With respect to the reward factor in decision making, the typical reduction of reaction

times observed for large­reward trials as compared to small­reward trials (Watanabe, K. et al,

2003; Watanabe, M. et al., 2001) may be underpinned by two different mechanisms: sensitivity

on the one hand, and bias on the other. Sensitivity refers to the quality of decision­making as a

function of the ratio between signal and noise, and would correspond to the slope of the decision

signal. The prospect of reward may improve the signal­to­noise ratio (and lead to a steep rise of

the decision signal) for stimuli associated with a high reward value (see Figure 1a, left panel). In

contrast, bias refers to the a priori likelihood of making one response rather than another,

regardless of incoming perceptual information. The prospect of reward may create a bias by

Page 5: A Reaction Time Paradigm to Measure RewardOriented … e view that the decision making ... “decision threshold,” one can model reaction time data using ... Peripheral Stimulus

Lauwereyns & Wisnewski Reward­oriented bias 5

increasing the likelihood of making a response with a high reward value, and would correspond

to moving the starting point of the decision signal closer to the decision threshold (see Figure 1a,

right panel). According to the LATER model, effects of reward­oriented sensitivity or bias

would leave different signatures in the reaction­time distributions. Our concrete aim in the

present study, then, was to develop a reaction­time paradigm with rats that would enable us to

examine these signatures.

Nose­poke paradigms may be the most appropriate for measuring reaction times in rats,

and have been used successfully with location­cueing tasks (Ward & Brown, 1996) and five­

choice serial reaction­time tasks (Robbins, 2002). For the present study, we developed a nose­

poke paradigm with a single spatial­choice task under an asymmetric reward schedule. Rats were

required to poke their nose in the hole adjacent to the center, corresponding to the side where

four lights were illuminated. To do so, they had to ignore distracters on the other side. For each

rat, one side was always associated with a large reward, whereas the other side was associated

with a small reward. We expected that reactions would be faster in large­reward than in small­

reward trials. If rats were biased to respond to the large­reward side, their reaction times should

be at maximum speed in that direction, regardless of the level of visual stimulation on the other

side. In contrast, if the behavior was mainly determined by the efficiency of visuospatial

processing, the reaction times in large­reward trials should be affected by the number of

distracters, with slower reaction times as the signal­to­noise ratio decreases (i.e., due to an

increase in the number of distracters). Reaction­time analyses according to the LATER model

would enable us to independently evaluate the same hypothesized mechanisms.

Page 6: A Reaction Time Paradigm to Measure RewardOriented … e view that the decision making ... “decision threshold,” one can model reaction time data using ... Peripheral Stimulus

Lauwereyns & Wisnewski Reward­oriented bias 6

Method

Subjects, Housing and General Procedures

Subjects were 12 male Sprague­Dawley rats, weighing 190 – 275 mg, approximately 3 months

old at the commencement of training. They were housed individually in home cages containing

untreated wood shavings, renewed on a regular basis. Water was available ad libitum. Subjects

were fed individually after completion of each testing session to preserve their 85% free­feeding

body weight throughout the duration of the experiment. The housing room was maintained at an

ambient temperature (22 O C); and humidity (74%); and a reversed 12 hr light/dark cycle (7.30

a.m. to 7.30 p.m.) to ensure that experimental sessions were conducted in darkened conditions, as

this is when rats are mainly active. The experiments were performed in adherence to the legal

laboratory animal care principles of the Victoria University of Wellington Animal Breeding

Facility, and the Victoria University of Wellington Animal Ethics Committee.

Behavioral Apparatus for Nose Poking

Two 9­hole boxes (MED­NP9L­B1; MED Associates, St Albans, VT) with dimensions

measuring 53.3 cm long x 34.9 cm wide x 26.0 cm high were used to conduct the experimental

procedure. Each chamber was fitted within a sound­attenuating box. All events were scheduled

and recorded by a Dell personal computer running MED­PC software (MED Associates, St

Albans, VT). The front and rear walls of each chamber were constructed of metal. The left and

right walls and the ceiling were constructed of transparent plexiglas. The left wall also

functioned as the entrance to the chamber. The floor of the chamber was constructed of

horizontal metal rods spaced 1 cm apart. Both boxes contained an arc of 9 contiguous apertures

Page 7: A Reaction Time Paradigm to Measure RewardOriented … e view that the decision making ... “decision threshold,” one can model reaction time data using ... Peripheral Stimulus

Lauwereyns & Wisnewski Reward­oriented bias 7

set into the curved front wall. Each aperture was 2.5 cm x 2.5 cm square and 2.2 cm deep. Light­

emitting diodes (LED) at the rear of each hole could be turned on and off automatically to

provide visual cues specific to each hole. Vertical infrared detectors at the front of each nose­

poke hole allowed the recording of the response latencies and locations. A 0.1 ml reinforcer

(20% sucrose solution; 400 gm caster sugar: 1600 ml water) was delivered via a metal dipper

centered in the rear wall. The light in the food aperture was illuminated when the reinforcer was

delivered and was extinguished when the reinforcer was collected.

Training

Rats were assigned to a specific experimental chamber where they participated individually in all

sessions of 30 min duration. First, using a sequential “auto­shaping” program over a period of 1 ­

2 weeks, the rats were trained to respond by nose poking to visual stimuli. Then, task parameters

were changed gradually, including the number of visual stimuli, the nose­poke duration and the

reward schedule, until the rats were able to perform at least 80 correct trials of the complete

asymmetric reward paradigm (as described below) in a session of 30 min for at least 3

consecutive sessions. After 5 weeks of training, 2 rats had not yet met this criterion. At this

point, the data collection for the present study commenced with the remaining 10 subjects.

Asymmetric Reward Paradigm (ARP)

Sessions were conducted daily, for a maximum of 200 trials or until 30 min had elapsed within a

session. The ARP comprised the following sequence of events (see also Figure 1b).

Centering. A trial started when only the center hole light was illuminated. This light

signalled that the rat was required to make a nose­poke response immediately and sustain it for a

Page 8: A Reaction Time Paradigm to Measure RewardOriented … e view that the decision making ... “decision threshold,” one can model reaction time data using ... Peripheral Stimulus

Lauwereyns & Wisnewski Reward­oriented bias 8

duration of 500 ms. This requirement ensured that a rat always started a trial from the same

position (i.e., centered at the front wall of the chamber). If the rat did not make a nose­poke

response within 10 s, or if it did not keep its nose in the central hole for 500 ms, the light was

extinguished. After a delay of 30 s, the light in the front center hole was re­illuminated to give

the rat a new opportunity to proceed with the trial.

Peripheral Stimulus Presentation. Once a nose­poke response had been sustained for 500

ms in the central hole, the central light was extinguished and peripheral stimuli were presented.

In each trial, the rat was required to respond with a nose poke to the hole adjacent to the center

hole in the direction where 4 LEDs were illuminated. The target side, then, was defined as the

side with 4 illuminated LEDs. On the other side, there could be between 0­3 LEDs illuminated.

These were termed ‘distracters.’ The distracter formation was always organized as a linear array

from center to periphery, making sure that there were no gaps (i.e., the LEDs that were not

illuminated were always further in the periphery than the distracters). In this way there were 8

possible stimulus configurations, consisting of 2 possible target sides combined with 4 possible

distracter arrangements.

The moment at which the rat broke away from the central hole following the peripheral

stimulus presentation was registered as the break time. Reaction time (RT) was defined as the

time between the onset of peripheral stimulus presentation and the moment at which the rat

reached the correct hole on the target side. The rat had to sustain this nose poke for a duration of

at least 200 ms. Note that in this procedure, the rat is not punished for poking its nose in different

holes than the one defined as the correct hole. Effectively, then, the procedure cannot induce

erroneous choice trials, even though the rat might take a very long time (theoretically, until

infinity) to make the correct response. In this way, our experimental paradigm accommodates

Page 9: A Reaction Time Paradigm to Measure RewardOriented … e view that the decision making ... “decision threshold,” one can model reaction time data using ... Peripheral Stimulus

Lauwereyns & Wisnewski Reward­oriented bias 9

one of the most controversial features of the LATER model (Smith & Ratcliff, 2004), in that the

model is not capable of producing errors.

On each trial, the stimulus configuration was determined by a quasi­random sequence

with the constraints that, for every block of 16 trials, there was an equal number of trials for each

magnitude of reward, and no more than 4 consecutive trials with the same reward value.

Asymmetric reward. To investigate the influence of incentive, an asymmetric reward

schedule was used. Once the rat completed the peripheral nose poke, all LEDs were

extinguished. A reinforcer was delivered in accordance with the particular reward schedule at the

rear of the chamber. In order to minimize temporal dynamics in the mechanisms of reward

expectation, rats were permanently assigned to a particular position­reward mapping condition

throughout training and all experimental sessions. Specifically, for 5 of 10 rats, the left target

side was always worth 0.3 ml of reinforcer (3 x 0.1 ml dipper: large reward condition) and the

right target side was always worth 0.1 ml of reinforcer (1 x 0.1 ml dipper: small reward

condition). For the remaining 5 rats, the reward schedule was reversed, with the right target side

always delivering the large reward and the left target side always delivering the small reward.

Thus, before the experimental sessions started, a rat had acquired a fixed position­reward

association for the ARP task, but during any experimental session, it was impossible for the rat to

predict on a particular trial whether the target side would actually correspond to the position

associated with the large reward.

Analysis of Variance and LATER Analysis

For each rat, mean RTs were computed for each of the 2 x 4 conditions on the basis of the data

from 3 consecutive sessions, immediately following the 5 weeks of training. As preliminary

Page 10: A Reaction Time Paradigm to Measure RewardOriented … e view that the decision making ... “decision threshold,” one can model reaction time data using ... Peripheral Stimulus

Lauwereyns & Wisnewski Reward­oriented bias 10

analyses showed no effects from the order of the sessions, the data from the 3 sessions were

combined. The mean RTs for each rat were then submitted to a repeated measures analysis of

variance (ANOVA) with Reward (Large or Small) and Distracters (0, 1, 2, or 3) as within­

subject variables. The same analysis was also performed for break times, and for the number of

trials completed.

The same data were used for LATER analysis, following the framework proposed by

Carpenter and colleagues (Carpenter, 2004; Carpenter & Williams, 1995; Reddi & Carpenter,

2000). It is suggested that reaction times obey a simple stochastic law: The reciprocal of latency

follows a Gaussian distribution. Plotting cumulative latency distributions on a probit scale as a

function of reciprocal latency (a reciprobit plot) should therefore yield a straight line. The

LATER model postulates a decision signal S associated with a particular response. When an

appropriate stimulus appears, S starts to rise linearly from an initial level S0 at a rate r; when it

reaches a pre­specified threshold ST, the response is triggered. If the variation of r is Gaussian

with mean μ and variance σ 2 , the reaction time is (ST ­ S0)/r on any one trial and its distribution

will fall on a straight line on the reciprobit plot. This straight line will have a median of (ST ­

S0)/μ. It will intercept the infinity axis at I = μ/(σ√2), a value that is independent of ST and S0.

According to this model, varying the amount or quality of information for stimulus

discrimination would affect the rate r, as when perceptual sensitivity would lead to an improved

processing of stimuli associated with reward. On the other hand, a reward­oriented bias in this

scheme would be the same as a change in the initial level S0 so that the distance to the threshold

level would be smaller in case the action is associated with a large reward.

The appeal of the LATER model derives from its clear quantitative prediction about what

should happen under these two cases. If reward expectation leads to a reward­oriented bias, that

Page 11: A Reaction Time Paradigm to Measure RewardOriented … e view that the decision making ... “decision threshold,” one can model reaction time data using ... Peripheral Stimulus

Lauwereyns & Wisnewski Reward­oriented bias 11

is, an elevation of S0, the reciprobit plot should swivel about a fixed infinite­time intercept, I.

This follows from the LATER model since I is determined by the parameters of μ and σ, but not

S0. In other words, the plot should show a shallower slope for the distribution of reaction times

in trials with a large reward than in trials with a small reward. In contrast, if the change in

reaction time is due to improved perceptual processing with a large reward as compared to a

small reward, the change should be reflected in r, and so the line on the reciprobit plot would

undergo a parallel shift, the slope remaining constant. Thus trials with a large reward would

merely be shifted to the left, toward shorter reaction times.

The LATER analysis was conducted on the aggregated data from all rats, as well as on

the data from each rat individually. To evaluate the predictions from the LATER model

statistically, we computed the slope of each linear least­squares fit (i.e., the reciprobit line) for

each rat, and submitted these slopes to a repeated measures ANOVA with Reward (Large or

Small) and Distracters (0, 1, 2, or 3) as within­subject variables. The hypothesis of sensitivity

predicted no differences in the slopes, whereas the hypothesis of bias predicted shallower slopes

for responses associated with a large reward than for responses associated with a small reward.

Page 12: A Reaction Time Paradigm to Measure RewardOriented … e view that the decision making ... “decision threshold,” one can model reaction time data using ... Peripheral Stimulus

Lauwereyns & Wisnewski Reward­oriented bias 12

Results

The 10 rats completed a total of 4,624 trials, or an average of 154 trials per 30 min session. A

repeated measures ANOVA, with the factors of Reward and Distracters as within­subjects

variables, on the number of trials completed showed no significant differences, all F values < 1.

This result confirms that the current paradigm maximizes variability in the domain of latency,

without any possibility for speed­accuracy trade­off. The same repeated measures ANOVA was

performed with break time as dependent measure, that is, the time at which the rat breaks

fixation from the central hole following the onset of peripheral stimulation. The repeated

measures showed no significant effects, all F values < 1.8. The average break time for all 8 types

of trial was 206 ms, with a standard deviation of 72.5. All remaining analyses were therefore

concentrated on reaction times.

ANOVA on Mean RT

The mean RTs and standard deviations are presented in Figure 2a. A repeated measures ANOVA

on RT showed that there was a highly significant effect of the factor Reward, F(1,9) = 229.08,

MSE = 183614, p < .001, with faster reaction times in the direction associated with a large

reward (636 ms) than in the direction associated with a small reward (2086 ms). There was also a

very reliable main effect of the factor Distracters, F(3,27) = 41.27, MSE = 43371, p < .001, with

slower reaction times as the number of distracters increased: 929 ms for 0 distracters, 1348 ms

for 1 distracter, 1548 ms for 2 distracters, and 1620 ms for 3 distracters. Finally, there was also a

highly significant interaction between Reward and Distracters, F(3,27) = 13.35, MSE = 53274, p

< .005.

Page 13: A Reaction Time Paradigm to Measure RewardOriented … e view that the decision making ... “decision threshold,” one can model reaction time data using ... Peripheral Stimulus

Lauwereyns & Wisnewski Reward­oriented bias 13

To gain further insights in the nature of the interaction, we conducted repeated measures

ANOVAs on the data for small­ and large­rewards separately, using the number of distracters as

the single factor. With the data from the large­reward conditions, the effect of Distracters was

not significant, F < 1. With the data from the small­reward conditions, the effect of Distracters

was statistically reliable, F(3,27) = 39.97, MSE = 86665, p < .001. Post­hoc Tukey HSD tests

with alpha at .05 showed that, in small­reward trials, reaction times were faster without

distracters (1277 ms) than in any of the 3 types of small­reward trial with distracters (2044 ms

for 1 distracter, 2414 ms for 2 distracters, and 2611 ms for 3 distracters). The small­reward

condition with 1 distracter also produced significantly faster reaction times than the conditions

with 2 or 3 distracters. There was no significant difference in the reaction times between the

conditions with 2 versus 3 distracters.

LATER Analysis

The aggregated reciprocal reaction time data from all 10 rats are plotted in the form of

cumulative percentage probability, on a probit scale, in Figure 2b. A total of 4,624 individual

trials are plotted separately for each of the 8 conditions, along with the linear least­squares fit.

The four distributions from conditions with a large reward (Figure 2b, data in black, indicated as

‘a’) appeared to have a shallower slope than the distributions from conditions with a small

reward (Figure 2b, data in gray, indicated as ‘b’ and ‘c’). Among the small­reward conditions,

the distribution from the condition without distracters (‘b’) appeared to have a shallower slope

than the three distributions from conditions with distracters (‘c’).

A repeated measures ANOVA on the slopes of the reciprobit lines for each condition, for

each rat, showed a significant main effect of the factor Reward, F(1,9) = 50.85, MSE = 1576694,

Page 14: A Reaction Time Paradigm to Measure RewardOriented … e view that the decision making ... “decision threshold,” one can model reaction time data using ... Peripheral Stimulus

Lauwereyns & Wisnewski Reward­oriented bias 14

p < .001, with shallower slopes in the direction associated with a large reward (1424) than in the

direction associated with a small reward (3427). There was also a reliable main effect of the

factor Distracters, F(3,27) = 20.80, MSE = 333979, p < .001, with shallow slopes for conditions

with less than two distracters: 1723 for 0 distracters, 2156 for 1 distracter, 2946 for 2 distracters,

and 2877 for 3 distracters. Finally, there was also a significant interaction between Reward and

Distracters, F(3,27) = 31.87, MSE = 308667, p < .001.

To gain further insights in the nature of the interaction, we conducted repeated measures

ANOVAs on the data for small­ and large­rewards separately, using the number of distracters as

the single factor. With the data from the large­reward conditions, the effect of Distracters was

not significant, F < 1. With the data from the small­reward conditions, the effect of Distracters

was statistically reliable, F(3,27) = 32.51, MSE = 502993, p < .001. Post­hoc Tukey HSD tests

with alpha at .05 showed that, in small­reward trials, slopes were shallower without distracters

(1747) than in any of the 3 types of small­reward trial with distracters (3119 for 1 distracter,

4502 for 2 distracters, and 4339 for 3 distracters). The small­reward condition with 1 distracter

also produced significantly shallower slopes than the conditions with 2 or 3 distracters. There

was no significant difference in the slopes between the conditions with 2 versus 3 distracters.

Page 15: A Reaction Time Paradigm to Measure RewardOriented … e view that the decision making ... “decision threshold,” one can model reaction time data using ... Peripheral Stimulus

Lauwereyns & Wisnewski Reward­oriented bias 15

Discussion

Ten rats performed in the nose­poke paradigm with a single spatial­choice task under an

asymmetric reward schedule. In sessions of 30 min, the rats were able to complete an average of

more than 150 nose­poke responses, providing data for reaction­time analysis with sufficient

statistical power not only to observe significant differences between means of distributions, but

also to consider the shapes of distributions. From a logistic viewpoint, then, the current paradigm

may be particularly appealing for investigations such as those in the areas of neurophysiology

and psychopharmacology, which require the collection of the largest possible amount of data in

short time periods.

Over and above this practical merit, however, the current paradigm enables researchers to

address theoretical questions on the mechanisms that underlie reward­oriented behavior.

Replicating previous studies using spatial choice tasks under asymmetric reward schedules with

monkeys (Watanabe K. et al., 2003; Watanabe M. et al., 2001), we found that rats responded

faster in trials with a large reward than in trials with a small reward. In addition, by varying the

number of distracters, we obtained a conspicuous interaction effect between the level of reward

and the number of distracters: In trials with a large reward, reaction times were unaffected by the

number of distracters, whereas in trials with a small reward, reaction times increased with more

distracters. Particularly the absence of a distracter effect in large­reward trials is consistent with

the hypothesis that rats were biased to respond to the large­reward side. The result suggests that

the rats’ reaction times were at maximum speed in the direction associated with a large reward,

regardless of the level of visual stimulation on the other side.

Page 16: A Reaction Time Paradigm to Measure RewardOriented … e view that the decision making ... “decision threshold,” one can model reaction time data using ... Peripheral Stimulus

Lauwereyns & Wisnewski Reward­oriented bias 16

The hypothesis of reward­oriented response bias was corroborated by the LATER

analysis. The slopes of the reciprobit lines were shallower for the distributions from large­reward

conditions than for those from small­reward conditions. Thus, the lines appeared to swivel, as

was predicted by a change in the starting point, S0, of the decision signal, not a change in the rate

r of the linear rise to the decision threshold. This result again suggests that the rats were biased

to respond to the large­reward side. Taken together, then, the effects of reward and distracters on

mean RT and the shape of RT distributions make a strong case for the operation of a reward­

oriented bias in the rats’ behavior.

As such, the current data may also shed new light on neurophysiological data that were

previously obtained with a similar asymmetric reward paradigm in monkeys (Lauwereyns et al.,

2002a). In that study, dorsal striatal (caudate nucleus) neurons increased their activity in advance

of a peripheral visual cue, but only when the contralateral side (i.e., the hemifield opposite to the

recording site) was associated with a large reward. It was argued that these neurons created a

reward­oriented spatial bias that was responsible for the reward effect as observed in the

monkeys’ spatial reaction times (i.c., eye movements). In terms of the LATER model, the

activity of the dorsal striatal neurons would represent the change in the starting point of the

decision signal. However, in the neurophysiological study no behavioral analysis was presented

to sustain the proposal that response bias produced the reward effect in reaction times. Instead,

the current paradigm succeeds in presenting such behavioral analysis. Thus, the present data

raise the question whether similar dorsal striatal activity may be the basis for the reward­oriented

bias observed in rats. This line of reasoning illustrates that the combination of behavioral­

analytic techniques on the basis of reaction times with single­unit recording may lead to a fuller

Page 17: A Reaction Time Paradigm to Measure RewardOriented … e view that the decision making ... “decision threshold,” one can model reaction time data using ... Peripheral Stimulus

Lauwereyns & Wisnewski Reward­oriented bias 17

understanding of how reward expectation influences brain mechanisms for decision making and

voluntary control of action.

The fact that reaction times in small­reward trials were affected by the number of

distracters, however, may be due to processes in addition to reward­oriented bias. Particularly

interesting in this regard is the observation that RTs in small­reward trials without a distracter

were markedly faster than RTs in small­reward trials with one or more distracters. One

possibility is that in small­reward trials a complementary mechanism was needed to counteract

response bias and initiate a movement in the direction associated with a small reward. Such a

complementary mechanism has already been recorded with an asymmetric reward paradigm in

neurons of the centromedian nucleus in the thalamus (Minamimoto, Hori, & Kimura, 2005; for

discussion in relation to response bias, see Lauwereyns, 2006). In the present nose­poke

paradigm with rats, it seems plausible that the complementary mechanism would be activated

faster in small­reward trials without a distracter than in small­reward trials with one or more

distracters. In the no­distracter case, the visual stimulation on the small­reward side would

suffice to activate the complementary mechanism. When there is at least one distracter on the

large­reward side, however, an additional perceptual­decision mechanism may be required to

confirm that the visual stimulation on the large­reward side does not fit the profile of the target

side (i.e., four illuminated LEDs).

Since rats were not punished for poking their nose in different holes than the one defined

as the correct hole in the present paradigm, it is possible that, on a proportion of small­reward

trials, they first poked their nose into the hole adjacent to the center that is associated with a large

reward, particularly in small­reward trials with distracters. In this regard, it is also interesting to

note that, in small­reward conditions, the faster portions of the RT distributions appeared to

Page 18: A Reaction Time Paradigm to Measure RewardOriented … e view that the decision making ... “decision threshold,” one can model reaction time data using ... Peripheral Stimulus

Lauwereyns & Wisnewski Reward­oriented bias 18

deviate from the reciprobit lines, consistent with previous observations using the LATER

analysis (Carpenter & Williams, 1995; Reddi & Carpenter, 2000). This component forms only a

small proportion of the whole, but is made conspicuous by the reciprobit plot that exaggerates

the first and last 5% of the cumulative distribution. Similarly, in large­reward trials the shallower

slope of the main component reveals a third, long­latency component that may also be present in

small­reward trials. Again, all of these observations suggest that, in the present asymmetrical

reward paradigm, there might be other processes at work in addition to reward­oriented bias. A

promising line for future research will be to elucidate the behavioral and neurophysiological

mechanisms that are complementary to, or counteract, response bias. This can be done, for

instance, by comparing the current version of the paradigm with one that introduces punishments

or more complicated reward schedules depending on the rats’ spatial choice. In doing so,

however, the LATER analysis may become problematic as it cannot produce erroneous

decisions.

Further limitations, inherent to the LATER model, should be noted. For instance, the

assumption of a linear rise to threshold may be a particularly vulnerable one in many real­life

and even laboratory settings. Also, in the LATER model as presented here, perceptual processes

are not dissociated from response processes. Thus, claims that sensitivity effects pertain to

perceptual processes, and that bias effects pertain to response processes, remain unchecked in the

present data. It will be a continuing task, then, to search for models that best fit reaction­time

distributions in different experimental situations that implicate different behavioral processes (for

an enlightening overview of existing RT models, see Smith & Ratcliff, 2004). Nevertheless, the

LATER model has an undeniable appeal because of its simplicity, and the relative ease with

Page 19: A Reaction Time Paradigm to Measure RewardOriented … e view that the decision making ... “decision threshold,” one can model reaction time data using ... Peripheral Stimulus

Lauwereyns & Wisnewski Reward­oriented bias 19

which it can be translated into predictions with respect to underlying neural mechanisms. Thus, it

may be a fruitful starting point for such investigations.

With the above caveats in mind, the current data with an asymmetric reward paradigm do

succeed in implicating a reward­oriented bias mechanism in the advantage of large­reward trials

over small­reward trials. Neurophysiological investigations may already benefit from the current

version of the paradigm, by correlating putative neural signals of response bias with RTs in

large­reward trials. More generally, the current data extend an invitation to researchers in the

field of behavioral analysis and neurophysiology to consider the statistical and computational, as

well as the logistic, advantages of studying reaction­time distributions in nose­poke tasks with

rats.

Page 20: A Reaction Time Paradigm to Measure RewardOriented … e view that the decision making ... “decision threshold,” one can model reaction time data using ... Peripheral Stimulus

Lauwereyns & Wisnewski Reward­oriented bias 20

Acknowledgments

We thank Dave Harper, Debbie Whare, Doug Drysdale and Richard Moore for technical

assistance. The research was supported by grant 0313­PG of the Neurological Foundation of

New Zealand and grant 04­VUW­052 of the Royal Society of New Zealand Marsden Fund.

Correspondence concerning this article may be sent to J. Lauwereyns, School of Psychology,

Victoria University of Wellington, P.O. Box 600, Wellington 6006, New Zealand (

[email protected]).

Page 21: A Reaction Time Paradigm to Measure RewardOriented … e view that the decision making ... “decision threshold,” one can model reaction time data using ... Peripheral Stimulus

Lauwereyns & Wisnewski Reward­oriented bias 21

References

Carpenter, R. H. S. (2004). Contrast, probability, and saccadic latency: Evidence for

independence of detection and decision. Current Biology, 14, 1576­1580.

Carpenter, R. H. S., & Williams, M. L. L. (1995). Neural computation of log likelihood

in control of saccadic eye movements. Nature, 377, 59­62.

Dickinson, A., & Balleine, B. (1994). Motivational control of goal­directed action.

Animal Learning & Behavior, 22, 1­18.

Gallistel, C. R., Mark, T. A., Adam, P., & Latham, P. E. (2001). The rat approximates an

ideal detector of changes in rates of reward: Implications for the law of effect. Journal of

Experimental Psychology: Animal Behavior Processes, 27, 354­372.

Gharib, A., Gade, C., & Roberts, S. (2004). Control of variation by reward probability.

Journal of Experimental Psychology: Animal Behavior Processes, 30, 271­282.

Gold, J.I., & Shadlen, M. N. (2001). Neural computations that underlie decisions about

sensory stimuli. Trends in Cognitive Sciences, 5, 10­16.

Hanes, D. P., & Schall, J. D. (1996). Neural control of voluntary movement initiation.

Science, 274, 427­430.

Kobayashi, S., Lauwereyns, J., Koizumi, M., Sakagami, M., & Hikosaka, O. (2002).

Influence of reward expectation on visuospatial processing in macaque lateral prefrontal cortex.

Journal of Neurophysiology, 87, 1488­1498.

Lauwereyns, J. (2006). Voluntary control of unavoidable action. Trends in Cognitive

Sciences, 10, 47­49.

Page 22: A Reaction Time Paradigm to Measure RewardOriented … e view that the decision making ... “decision threshold,” one can model reaction time data using ... Peripheral Stimulus

Lauwereyns & Wisnewski Reward­oriented bias 22

Lauwereyns, J., Watanabe, K., Coe, B., & Hikosaka, O. (2002a). A neural correlate of

response bias in monkey caudate nucleus. Nature, 418, 413­417.

Lauwereyns, J., Takikawa, Y., Kawagoe, R., Kobayashi, S., Koizumi, M., Coe, B.,

Sakagami, M., & Hikosaka, O. (2002b). Feature­based anticipation of cues that predict reward in

monkey caudate neurons. Neuron, 33, 463­473.

Luce, R. D. (1986). Response Times: Their Role in Inferring Elementary Mental

Organization. London, UK: Oxford University Press.

Minamimoto, T., Hori, Y., & Kimura, M. (2005). Complementary process to response

bias in the centromedian nucleus of the thalamus. Science, 308, 1798­1801.

Pan, W­X., Schmidt, R., Wickens, J.R., & Hyland, B.I. (2005). Dopamine cells respond

to predicted events during classical conditioning: Evidence for eligibility traces in the reward­

learning network. Journal of Neuroscience, 25, 6235­6242.

Platt, M. L., & Glimcher, P.W. (1999). Neuronal correlates of decision variables in

parietal cortex. Nature, 400, 233­238.

Pratt, W.E., & Mizumori, S.J. (2001). Neurons in rat medial prefrontal cortex show

anticipatory rate changes to predictable differential rewards in a spatial memory task.

Behavioural Brain Research, 123, 165­183.

Reddi, B. A. J., & Carpenter, R. H. S. (2000). The influence of urgency on decision time.

Nature Neuroscience, 3, 827­830.

Robbins, T.W. (2002). The 5­choice serial reaction time task: Behavioural pharmacology

and neurochemistry. Psychopharmacology, 163, 362­380.

Page 23: A Reaction Time Paradigm to Measure RewardOriented … e view that the decision making ... “decision threshold,” one can model reaction time data using ... Peripheral Stimulus

Lauwereyns & Wisnewski Reward­oriented bias 23

Schmitzer­Torbert, N., & Redish, A.D. (2004). Neuronal activity in the rodent dorsal

striatum in sequential navigation: Separation of spatial and reward responses on the multiple T­

task. Journal of Neurophysiology, 91, 2259­2272.

Schultz, W., Dayan, P., & Montague, P.R. (1997). A neural substrate of prediction and

reward. Science, 275, 1593­1599.

Smith, P. L., & Ratcliff, R. (2004). Psychology and neurobiology of simple decisions.

Trends in Neurosciences, 27, 161­168.

Ward, N.M., & Brown, V.J. (1996). Covert orienting of attention in the rat and the role of

striatal dopamine. Journal of Neuroscience, 16, 3082­3088.

Watanabe, K., Lauwereyns, J., & Hikosaka, O. (2003). Effects of motivational conflicts

on visually elicited saccades in monkeys. Experimental Brain Research, 152, 361­367.

Watanabe, M., Cromwell, H. C., Tremblay, L., Hollerman, J. R., Hikosaka, K., &

Schultz, W. (2001). Behavioral reactions reflecting different reward expectations in monkeys.

Experimental Brain Research, 140, 511­518.

Page 24: A Reaction Time Paradigm to Measure RewardOriented … e view that the decision making ... “decision threshold,” one can model reaction time data using ... Peripheral Stimulus

Lauwereyns & Wisnewski Reward­oriented bias 24

Figure legends

Figure 1. A reaction­time paradigm for rats. a) Two alternative hypotheses for the

mechanism that underlies the effect of reward magnitude on reaction time. Decision making is

conceived as a linear rise of a “decision signal” (indicated as a gray line) to a “decision

threshold” (m, for a movement to the position associated with a large reward; –m, for a

movement to the position associated with a small reward). According to the hypothesis of

“sensitivity” (left panel), faster reaction times for positions associated with a large reward as

compared to a small reward would be due to changes in the slope of the decision signal (r, for a

decision associated with a large reward; r’, for a decision associated with a small reward).

According to the hypothesis of “bias” (right panel), faster reaction times for positions associated

with a large reward as compared to a small reward would be due to a positive bias (b > 0), which

brings the decision signal closer to the decision threshold m, but further away from the decision

threshold –m, even before the onset of the peripheral target. b) Schematic representation of the

sequence of events in a single trial. The trial started with the onset of the center LED. The rat

was required to poke its nose in the corresponding hole, and stay in this position for 500 ms. At

this time, the peripheral stimulation was presented and the center LED was extinguished. The

trial ended when the rat poked its nose and stayed for 200 ms in the hole adjacent to the center

hole, corresponding to the side where four LEDs were illuminated. Break time was defined as

the time duration between onset of peripheral stimulation and the moment when the rat broke

away from fixation in the center hole. Reaction time was defined as the time duration between

onset of peripheral stimulation and the moment when the rat poked its nose in the correct

response hole, provided that it stayed there for 200 ms.

Page 25: A Reaction Time Paradigm to Measure RewardOriented … e view that the decision making ... “decision threshold,” one can model reaction time data using ... Peripheral Stimulus

Lauwereyns & Wisnewski Reward­oriented bias 25

Figure 2. Data obtained with the reaction­time paradigm for rats. a) Mean reaction times

(ms) and standard deviations as a function of the number of distracters (abscissa) and the reward

magnitude (large reward, black line; small reward, gray line). RTs were faster for large reward

than for small reward. In large­reward trials, RTs were unaffected by distracters, whereas

distracters produced an increase in RT in small­reward trials. b) Reciprobit plot of reaction

times according to the LATER model (i.e., plotting the cumulative RT distributions on a probit

scale as a function of reciprocal RT). For each of the 8 conditions (2 Reward levels x 4 Distracter

levels), the actual data points (marked as small “+” symbols) are superimposed on the least­

squares fit line. The large­reward conditions are shown in black; the small­reward conditions are

shown in gray. Three groups of data distributions can be distinguished. Group a consists of the

four large­reward conditions (i.e., with 0, 1, 2, or 3 distracters), and thus shows four least­squares

fit lines (and their actual data points). These four data distributions are not significantly different

from each other. Group b contains only the small­reward condition without distracters; it shows

just one least­squares fit line and its actual data points. This data distribution is significantly

different from all other small­reward conditions. Finally, Group c consists of the three remaining

small­reward conditions (i.e., with 1, 2, or 3 distracters); shown in this group are three least­

squares fit lines and their actual data points. The differences between the distributions are

consistent with swiveling rather than parallel shifts. This suggests that the effects in RT are due

to changes in the starting point (or b) rather than the slope (r or r’) of the linear rise to threshold.

Page 26: A Reaction Time Paradigm to Measure RewardOriented … e view that the decision making ... “decision threshold,” one can model reaction time data using ... Peripheral Stimulus

Lauwereyns & Wisnewski Reward­oriented bias 26

Page 27: A Reaction Time Paradigm to Measure RewardOriented … e view that the decision making ... “decision threshold,” one can model reaction time data using ... Peripheral Stimulus

Lauwereyns & Wisnewski Reward­oriented bias 27