integration of sensory and reward information a ...pw673yq6957... · during perceptual...

INTEGRATION OF SENSORY AND REWARD INFORMATION

DURING PERCEPTUAL DECISION-MAKING IN LATERAL

INTRAPARIETAL CORTEX (LIP)

A DISSERTATION

SUBMITTED TO THE DEPARTMENT OF NEUROBIOLOGY

AND THE COMMITTEE ON GRADUATE STUDIES

OF STANFORD UNIVERSITY

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

Alan Edward Rorie

March 2011

http://creativecommons.org/licenses/by-nc-nd/3.0/us/

This dissertation is online at: http://purl.stanford.edu/pw673yq6957

© 2011 by Alan E Rorie. All Rights Reserved.

Re-distributed by Stanford University under license with the author.

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.

ii



http://purl.stanford.edu/pw673yq6957

I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.

William Newsome, Primary Adviser


Brian Knutson


Tirin Moore


Krishna Shenoy

Approved for the Stanford University Committee on Graduate Studies.

Patricia J. Gumport, Vice Provost Graduate Education

This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file inUniversity Archives.

iii

iv

Abstract

The work presented in this dissertation primarily focuses on

decision-related activity in the lateral intraparietal area (LIP) and,

secondarily, the dorsolateral prefrontal cortex (DLPFC). In Chapter 1 we

review the previous independent investigations indicating that these areas

are separately modulated by sensory information, value information and

choice appropriate to represent decisions. We argue that when both

sensory and value information must be simultaneously integrated to make

choices, it is unknown, if, how and when these areas integrate these

factors. We present a behavioral paradigm in which animal subjects must

combine sensory and value information, on a trial-to-trial basis, to make

optimal choices. This paradigm is based on a well-known motion

discrimination task; however, in our task the magnitude of the reward

associated with each option varies from trial to trial. On some trials both

options are worth equally large or small rewards. On other trials one

option’s reward is greater than that of the other. In Chapter 2, we

demonstrate that in the unequal reward conditions subjects’ choices are

consistently biased towards the greater magnitude option. Additionally,

we will show that this bias is independent of the motion stimulus strength

and its magnitude is nearly optimal. In Chapter 3, we observe that single

neurons in cortical area LIP consistently, simultaneously and dynamically

represent both sensory and value information. We will argue that this

representation supports an integrator model of decision making, in which

sensory information is accumulated until the decision is resolved by a

threshold crossing. Our results support an interpretation of this model in

which value information adjusts the likelihood of a threshold crossing by

v

raising or lowering the accumulator's initial state. In Chapter 4, we

present a preliminary comparison between LIP and DLPFC activity, under

identical conditions, suggesting they play fundamentally different roles in

decision making. In Chapter 5, we discuss future lines of research.

vi

Table of Contents

Chapter 1 ......................................................................................................... 1

1.1 Foreword ................................................................................................ 1

1.2 The neurobiology of decisions ............................................................... 2

1.3 The neurophysiological study of decisions ............................................ 3

1.3.1 Decision-related signals in LIP ........................................................... 5

1.3.1.1 Evaluation of sensory evidence in LIP ............................................ 6

1.3.1.2 Representation of value in LIP ........................................................ 8

1.4 Integration of sensory and value information in a common currency ... 8

1.5 Studying the integrate sensory and value information .......................... 9

Chapter 2 ....................................................................................................... 12

2.1 Introduction .......................................................................................... 12

2.2 Methods ............................................................................................... 14

2.2.1 A Motion Discrimination Task With Multiple Reward

Contingencies ............................................................................................ 14

2.2.2 Subjects ............................................................................................. 18

2.2.3 Procedures ......................................................................................... 18

2.3 Results .................................................................................................. 19

2.3.1 Relative Reward Biases Choice. ....................................................... 19

2.3.2 Estimating the optimal bias .............................................................. 24

2.3.3 Modeling caveats .............................................................................. 31

2.3.4 “No Choice” Analysis ....................................................................... 35

2.3.5 Saccade Latency ............................................................................... 39

2.4 Discussion ............................................................................................ 41

2.4.1 Sensory and value information are additive ..................................... 41

vii

2.4.2 Monkeys are capable of near-optimal performance ......................... 43

2.5 Summary .............................................................................................. 45

Chapter 3 ....................................................................................................... 47

3.1 Introduction .......................................................................................... 47

3.2 Methods ............................................................................................... 49

3.2.1 Subjects ............................................................................................. 49

3.2.2 Physiological Recordings ................................................................. 50

3.2.3 Cell selection .................................................................................... 51

3.3 Results .................................................................................................. 53

3.3.1 Activity during delayed saccades ..................................................... 53

3.3.2 The representation of choice, absolute value, relative value and

motion coherence in LIP ............................................................................ 56

3.3.2.1 Representation of choice: qualitative description .......................... 57

3.3.2.2 Representation of absolute value: qualitative description ............ 59

3.3.2.3 Representation of relative value: qualitative description .............. 60

3.3.2.4 Representation of motion coherence: qualitative description ........ 63

3.3.2.5 Quantifying LIP dynamics: absolute value, relative value,

motion coherence and choice ..................................................................... 66

3.3.2.6 Quantifying LIP dynamics: absolute value, relative value and

motion coherence within choice ................................................................ 71

3.3.2.7 Quantifying coherence within reward condition ........................... 74

3.3.3 Do individual LIP neurons integrate sensory and value

information? ............................................................................................... 77

3.3.4 Common Currency ............................................................................ 80

3.3.5 Population heterogeneity .................................................................. 85

3.4 Discussion ............................................................................................ 89

viii

3.4.1 The dynamic representation of absolute value, relative value,

coherence and choice. ................................................................................ 90

3.4.2 Relation to the integrator/ accumulator model of decision making .. 91

3.4.2.1 Relative value imposes an additive offset to the accumulator’s

initial state .................................................................................................. 92

3.4.2.2 Coherence effects are consistent with the integrator model .......... 95

3.4.3 Does LIP integrate sensory and value information in a common

currency? .................................................................................................... 97

3.4.4 Representation of value and probability of choice in LIP .............. 100

3.5 Summary ............................................................................................ 103

Chapter 4 ..................................................................................................... 105

4.1 Introduction ........................................................................................ 105

4.2 Methods ............................................................................................. 107

4.2.1 Physiological Recordings ............................................................... 107

4.3 Results ................................................................................................ 108

4.3.1 Cell selection and delayed saccade task ......................................... 108

4.3.2 Population Response ....................................................................... 110

4.3.3 Population heterogeneity ................................................................ 111

4.3.4 Discussion ....................................................................................... 116

Chapter 5 ..................................................................................................... 120

5.1 Summary and conclusions ................................................................. 120

5.2 Future directions ................................................................................ 122

5.2.1 Common currency .......................................................................... 122

5.2.2 Reaction time discrimination .......................................................... 123

5.2.3 Mapping utility with additional reward magnitudes ....................... 124

ix

References ................................................................................................... 126

x

List of figures

Figure 1. A two-alternative, forced-choice, motion discrimination task

with multiple reward contingencies …………………………...….…… 16

Figure 2a-d. Relative reward biases choice .………………………….. 21

Figure 3a-b. Bias is consistent across all experiments and coherence ... 25

Figure 4a-d. Harvesting efficiency is a function of bias ……………… 28

Figure 4e-f. Monkeys’ bias is greater than the optimal bias, which is a

function of psychophysical sensitivity and specific coherence values ... 30

Figure 4g-i. Despite over-bias, monkeys harvest a majority of rewards

………………………………………………………………………….. 32

Figure 5a-b. PMF slopes are independent of reward conditions, and bias

is similar for both relative reward conditions ………………….…….... 34

Figure 6a-b. Fraction of no-choice trials varies with task epoch ..…..... 37

Figure 7a-b. Fraction of no-choice trials is greater for the LL reward

condition in the motion and delay epochs ……………………..…….... 38

Figure 8a-b. Fraction of no-choice trials is greater for the LL reward

condition in the motion and delay epochs for most coherences …..…... 40

Figure 9a-b. Effect of motion coherence and reward condition on saccade

latency ...………………………………………………………….......... 42

Figure 10a-b. Delayed saccade task used to identify LIP response fields

……………………………………………………………………….… 52

Figure 10c-d. Mean response to the delayed saccade task ………….... 54

Figure 10e. In the discrimination experiments one response target was

positioned within the RF of the neuron under study ……………….…. 55

Figure 11a-b. LIP represents the absolute value of the option in the

RF ...…………………………………………………………………… 58

xi

Figure 12a-b. LIP represents the relative value of the option in the

RF .……………………………………………………………………... 61

Figure 13a-b. LIP represents the relative value of the option in the RF

………………………………………………………………..……….... 63

Figure 14a-b. Effect of motion coherence for monkey A and monkey T

…...……………………………………………………………………... 65

Figure 15a-b. Quantifying the dynamics of absolute value, relative value,

motion coherence and choice ………………………………………...... 68

Figure 16a-b. Quantifying the dynamics of absolute value, relative value

within choice for monkey A …………………………………………... 72

Figure 16c. Quantifying the dynamics of coherence within choice for

monkey A ……………………………………………………..……..... 73

Figure 17a-b. Quantifying the dynamics of absolute value, relative value

within choice for monkey T ………………….……….……………….. 75

Figure 17c. Quantifying the dynamics of coherence within choice for

monkey T ……………….……………………………………………... 76

Figure 18a-b. The effect of motion coherence is independent of reward

condition …………………..………………………………………….... 78

Figure 19a-f. Individual LIP neurons integrate sensory and value

information ……………………………………………………….....…. 79

Figure 20a-b. To calculate nEVS we modeled the log odds of a spike

occurring with logistic regression model without a factor for choice .… 83

Figure 21a-b. Venn diagrams depicting a larger percentage of neurons is

simultaneously and significantly modulated by absolute reward ……... 84

Figure 22a-b. LIP integrates reward information and motion coherence in

a common currency ……………………………………………………. 86

xii

Figure 23a-b. Examples of single cells with responses nearly identical to

their population average ……………………………………………..… 88

Figure 23c-d. A single cell demonstrating no choice-related activity in

the discrimination task, despite being well tuned in the delayed saccade

task …………………………………………………………………...... 89

Figure 23e. A single cell demonstrating choice-related activity only .... 90

Figure 24a-b. Anatomical magnetic resonance imaging PFC recording

site ….……………………………………………………………….... 108

Figure 25a-b.In contrast to LIP, neurons in the PFC tended to be less

selective, responding similarly to targets positioned anywhere within the

contralateral hemifield …………………………………………….… 110

Figure 26. Average DLPFC response ………………..……………... 112

Figure 27. A single DLPFC neuron ……………..………………….. 113

Figure 28. A single DLPFC neuron ………..……………………….. 115

Figure 29. A single DLPFC neuron responding specifically to particular

combinations of reward condition, choice and epoch ………….…… 116

Figure 30. A single DLPFC neuron responding specifically to particular

combinations of reward condition, choice and epoch ……………..... 117

1

Chapter 1

1.1 Foreword

This dissertation is about decisions. Specifically, it is about how

individual factors combine to generate choice. While I write this

dissertation, the people of the United States of America are deliberating a

momentous decision, one widely believed to be one of the most significant

in contemporary history: who will be the next President. At few other

points in history have so many people been asked to integrate such a vast

amount of varied information to make a single decision.

Ultimately, the decision is based upon information which predisposes

or biases voters to one or the other candidate. For example, one is from the

Democratic party, one is from the Republican party; one is young, one is

old; one is pale-skinned, one is dark-skinned; one has a military background,

one an academic background. As powerful and as useful as these biases can

be when making decisions, we know that to make an optimal decision, we

should not be unduly influenced by our biases. Instead, we should wisely

integrate them with the current information we have about the state of this

country. For example, a voter may inherently prefer an academically trained

candidate but in war time might choose the militarily trained one. Another

voter might be biased against the younger candidate but choose him as a

symbol of change.

Our capacity to incorporate multiple sources of information permits

dynamic decisions, in which prior experience is integrated with the current

situation. Without this capacity, our choices would calcify with history’s

deposition or slavishly meander with the moment. Understanding the

biological foundation and limitation of this capacity ultimately elucidates

how people resolve complex decisions into the choices, political and

2

otherwise, which define our behavior and shape our world.

1.2 The neurobiology of decisions

A core question in neuroscience is how brains generate behavior—that

is, how do they select physical responses appropriate for the current

environment. Obviously the brain must detect, represent and parse relevant

information originating in the environment and respond by activating the

relevant organs, be it the stomach or the eye muscles. Indeed, sensing and

acting comprise the two sides of an arch defining all behaviors. The historic,

central focus of behavioral neuroscience is reconstructing this arch by

tracing the neurophysiological correlates of these sensory and motor systems

inward from the periphery.

This process delineated sets of cortical areas, traditionally viewed as

exclusively part of sensory or motor systems. The areas within the sensory

system were defined by their specific responses to specific features (e.g.

pressure or leftward motion) present in the environment. Those along the

motor arc were defined by their responses in anticipation of, or concurrently

with, actions, and encoded their specific features (e.g. velocity and duration).

Research on both arcs, however, converged on several, “association” cortical

regions, notably the frontal and parietal areas, whose activity is neither

clearly sensory nor motor. For example neurons in the lateral intraparietal

area (LIP) and the dorsolateral prefrontal cortex (DLPFC), respond to

stimuli but continue responding when the stimulus is gone; they indicate

what general action will be taken long before the action is initiated. These

areas also represent aspects of a behavior neither sensory or motor, such as

the magnitude of a reward expected for an action.

The historic tendency to parse cortical areas as either sensory or motor

3

has hindered the clear articulation of this long-recognized link between

sensation and action. In the past ten years, however, research on the neural

basis of decisions has allowed remarkable progress towards articulating this

link by revealing that: 1) decisions provide a conceptual and computational

framework, usefully for parameterizing how sensations are evaluated into

actions; 2) decision based models of neural activity are potentially capable

of capturing the wide variety of information encoded in several “association”

areas; and 3) that the demarcations between sensing, deciding and motor

planing, (particularly deciding and motor planing) are blurry, and behavioral

dependent (for review see 19, 24, 57,61, 71).

Decisions are deliberative evaluations of information about options that

may be difficult to distinguish, be ambiguous, have variable pays offs, or

require prior knowledge to resolve (24, 61). Decision-related signals must,

therefore, represent the evaluation of information, in contrast to the

information itself. For example, if you decide a noisy motion stimulus

moved leftward a decision signal must represent a leftward evaluation, even

if the motion truly went rightward. Decisions are resolved into choices,

which in turn guide action. Thus, decision-related signals must precede, or

concur with, choice-signals and represent all factors ultimately influencing

choice (e.g. sensory information, value information, prior probabilities, etc.)

(27).

1.3 The neurophysiological study of decisions

While the neurophysiological basis of decision-making has been

explicitly investigated in a range of contexts, previous studies focused

largely on either sensory or value based decisions (See 24 for review).

Studies of sensory decisions typically require subjects to perform

4

comparisons or discriminations of sensory stimuli. These stimuli typically

span psychophysical thresholds, creating a range of ambiguities requiring

resolution through a decision. In general, these studies consistently find

that: 1) a majority of neurons in traditional sensory areas primarily encode

dimensions of the stimulus itself (5, 29); 2) activity in prefrontal and parietal

areas represent the graded, decision-related evaluation of the sensory

information (38, 55-56, 65); 3) traditional motor areas (i.e. FEF and MPC)

can also exhibit decision-related activity but only when a sensory stimulus is

directly linked to a specific motor response (23, 30)

For example, in a tactile discrimination task monkeys are trained to

report which of two vibrations applied sequentially to the finger, is higher in

frequency (51, for review see 24 and 57). Single neurons in sensory area S1

monotonically increase their activity with increasing frequency, representing

each vibrations frequency in turn (29). In contrast, some neurons in parietal

areas S2, the lateral prefrontal cortex and media premotor cortex, have

temporally dynamic responses representing the emerging difference between

the frequencies (55-56). This difference is a decision variable, and its

representation emerges during the second vibration, when the decision is

made. A similar pattern of responses are observed during a visual motion

discrimination task, discussed in detail below. Additionally, in visual search

tasks, when monkeys must decide which target in an array of distractors has

a specific conjunction of features, neurons in the frontal eye fields, a region

of the prefrontal cortex involved in moving the eyes, distinguish the target

from distractors and represented distractors in a graded fashion based on

their similarity. (59-60)

Value based decisions are studied in the context of “free choice” tasks

permitting subjects to chose options volitionally. In these studies, an

5

option’s reward statistics (typically reward magnitude or probability) is

dynamically manipulated by the experimenter, and these manipulations are

hidden from the subjects. From the subject’s perspective, this generates

ambiguities in regard to the options. Subjects resolve these ambiguities

through a decision process of valuation. Reward statistics in free choice

tasks have been manipulated though foraging behavior (18, 31, 42, 70) and

competitive games (3, 13, 43-44, 63, 68,). These electrophysiological

studies, and a host of human fMRI studies, have revealed signals in frontal

and parietal cortical areas capable of supporting valuation (36, 39, 50, 75).

Importantly, free choice tasks contrast with instructed choice tasks

(discussed below). Importantly, free choice tasks contrasts with instructed

choice tasks the latter of which are not useful for studying value based

decisions, but are widely used to demonstrate the representation of value in a

range of frontal and parietal areas (1, 45, 52, 73-74).

An example of a foraging-based, free-choice task is the “matching”

paradigm in which monkeys were permitted to freely chose between two

options with probabilistic reward baiting (42 , 70). When this probability

was adjusted across blocks of trials the monkeys adjusted their probability of

choosing a given option to “match” the fraction of rewards recently

experienced from that option. A computational analysis of the behavior

revealed that monkeys base their valuation on the temporal integration of

prior rewards. Electrophysiological recordings revealed that parietal area

LIP represented the resulting valuation (70).

1.3.1 Decision-related signals in LIP

LIP, located within the lateral bank of the intraparietal sulcus, has been

widely studied in the context of both sensory and value based decisions.

6

Within LIP is a population of neurons typical defined by their tendency to

become active when a particular region of the visual field, called the

response field (RF), is the focus of attention (10-11,26) or the target of an

upcoming saccadic eye movement (2, 20, 67).

Early investigation of this “into the RF” activity was largely framed by

attempts to associate this activity with either an attentional, sensory signal,

or an intentional, motor signal (19). Subsequent experiments (52, 65),

however, suggested this activity is better defined by a decision process

capable of bridging the gap between sensation and action. These

experiments demonstrated that LIP independently represents the evaluations

of the sensory evidence supporting a choice into the RF in addition to the

value based options in the RF.

1.3.1.1 Evaluation of sensory evidence in LIP

An extensive series of experiments on decision-related signals in LIP,

including those in this thesis, were conducted in the context of a two-

alternative, forced-choice motion discrimination task. On each trial of this

task monkeys observe a noisy, random-dot motion stimulus and report

which of two possible directions of motion was present by making a

saccadic eye movement to a target. On a trial-to-trial basis the difficulty of

this decision is manipulated by adjusting the proportion of dots moving

coherently in one of the two directions. When the coherence is greater, the

decision is easier and the monkeys made fewer errors.

Using this task investigators initially identified neurons in two visual

cortical areas, MT and MST, representing the motion information itself, in

manner necessary and sufficient for perception (5-7, 58, 64). Neurons in

these regions responded to specific motion directions, reflecting the amount

7

of motion energy present in that direction, and cease their activity when the

stimulus was removed.

Subsequently, Shadlen and Newsome (64-65) demonstrated that LIP

neurons, in contrast to MT, represent a decision about the motion stimulus.

During the motion discrimination LIP begins encoding the choice to saccade

into the RF. This response, however, is initially modulated by the motion

coherence. This modulation is not motor related because it is independent of

the specific parameters of the motor response. Additionally, the modulation

is not a sensory response because, unlike MT, it reflects the monkey’s

erroneous choices. The authors reasoned that this modulation represents the

monkey’s evaluation of the motion evidence supporting a choice to saccade

to the target in the RF.

Based on these and other subsequent results, Shadlen and colleagues

modeled LIP as an evidence accumulator or integrator (48, 65). In this

model LIP accumulates a decision variable, which can be thought of as the

weight of evidence supporting a choice, to a threshold. When this threshold

is crossed the decision resolves to a choice. In the motion discrimination

task the decision variable is the accumulated difference between opposing

motion sensors in MT. Additionally, Gold and Shadlen (21) suggest that

this decision variable is proportional to the logarithm of an option’s

likelihood ratio (discussed in greater detail below) and thus capable of

additively incorporating additional factors such as option value.

This model is well developed computationally (22-24, 48) and predicts

LIP responses to several manipulations of the motion discrimination task

including: allowing the monkey to respond freely (54), additional

alternatives (9) and variable motion durations (37). Additionally, the model

is general enough to accommodate non-sensory decision variables, such as

8

elapsed time (34, 46).

Additional support for the model has come from intracortical

microstimulation during the motion discrimination task. Microstimulation

of MT (12, 58) has confirmed predicted changes in choice and reaction time

resulting from manipulations of the sensory representation of motion

direction. Similarly, microstimulation of LIP influences the decision process

by introducing an additive offset to the integrator resulting in a small bias in

choice and a larger effect on reaction time (28). Finally, Gold and Shadlen

(23) stimulated the frontal eye fields while monkeys evaluated the motion

evidence and demonstrated that decisions are represented in the evolution of

oculomotor commands, consistent with the integrator model.

1.3.1.2 Representation of value in LIP

The same population of LIP neurons implicated in the evaluation of

sensory evidence are also modulated by the value of the option within the

RF. These modulations are observed in the free-choice, matching paradigm

discussed above as well as in an instructed choice task. Platt and Glimcher

(52) instructed monkeys to saccade to targets associated with either large or

small magnitude rewards placed within a RF. They reported that while the

monkey awaits the saccade’s instruction, LIP represents an option’s greater

reward magnitude with a greater firing rate. In an additional manipulation,

they found that LIP encoded the probability that a saccade into the RF would

be instructed and thus rewarded.

1.4 Integration of sensory and value information in a common currency

As discussed above, single neurons in LIP represent sensory and value

signals supporting decisions about an option within the RF. If LIP activity

9

represents decisions, then it should encode all factors ultimately influencing

choice. It has been proposed that LIP integrates the multitude of factors

momentarily influencing shifts in gaze (choices) or visual attention in a

“common currency” (21, 70-71). A common currency implies that diverse

information is encoded in a “currency,” or scale, dependent on its ”common”

influence on behavior. Evidence of a common currency for reward signals

has been demonstrated in rats capable of trading-off combinations of natural

rewards with an artificial reward signal introduced through microstimulation

(for review see 66). This suggests the natural reward signals are scaled and

converge to a singular representation before they are traded off with the

artificial reward.

The idea of a common currency is also fundamental to the integrator

model of LIP. Gold and Shadlen (21) posit the logarithm of the likelihood

ratio (logLR) as a neural common currency for combing sensory and value

information, and suggest that a quantity proportional to logLR is the

decision variable represented by LIP. An option’s likelihood ratio (LR)

describes the likelihood that the current evidence would be observed if that

option were correct, relative to the likelihood that it would be observed if the

alternative were correct. Through multiplication the LR can be updated to

include other factors including additional evidence, prior probabilities and

value. Importantly, Gold and Shadlen point out that by taking the logarithm

of the LR (the logLR) these factors can be accumulated additively.

1.5 Studying the integrate sensory and value information

The work presented in this dissertation primarily focuses on decision-

related activity in the lateral intraparietal area (LIP) and, secondarily, the

dorsolateral prefrontal cortex (DLPFC). As discussed above, independent

10

investigations indicate these areas are separately modulated by sensory

information, value information and choice appropriate to represent decisions.

When both sensory and value information must be simultaneously integrated

to make choices, however, it is unknown, if, how and when these areas

integrate these factors. It is important to understand how these areas process

decisions requiring the dynamic combination of sensory and value

information because a majority of real-world decisions are based on both

factors.

To investigate this we have developed a behavioral paradigm in which

animal subjects must combine sensory and value information, on a trial-to-

trial basis, to make optimal choices. This paradigm is based on the well-

known motion discrimination task, discussed above, in which subjects must

view a noisy motion stimulus and report which of two opposed directions

they perceived. In our task the magnitude of the reward associated with

each option varies from trial to trial. On some trials both options are worth

equally large or small rewards. On other trials one option’s reward is greater

then that of the other.

In Chapter 2, we first demonstrated that in the unequal reward

conditions subjects choices are consistently biased towards the greater

magnitude option. Additional, we will demonstrate this bias is independent

of the motion stimulus strength and its magnitude is nearly optimal. In

Chapter 3, we will demonstrate that single neurons in cortical area LIP

consistently, simultaneously and dynamically represent both sensory and

value information. We will argue that this representation supports an

integrator model of decision making, in which sensory information is

accumulated until the decision is resolved by a threshold crossing. Our

results support an interpretation of this model in which value information

11

adjusts the likelihood of a threshold crossing by raising or lowering the

accumulators initial state. In Chapter 4, we present a preliminary

comparison between LIP and DLPFC activity, under identical conditions,

suggesting they play fundamental different roles in decision making. In

Chapter 5, we discuss future lines of research.

12

Chapter 2

2.1 Introduction

The goal of the following experiments is to study the behavior of rhesus

monkeys in a decision-making paradigm requiring the dynamic combination

of sensory and value information. To accomplish this, our behavioral

paradigm borrows elements from paradigms previously used to isolate either

the sensory or value components of a decision.

The sensory component of our task is based on a two-alternative,

forced-choice, direction discrimination task used to study sensory-based

decisions (28, 32, 64). On each trial monkeys observed a noisy, random-dot

motion stimulus and reported which of two possible directions of motion

was present by making a saccadic eye movement to one of two

corresponding targets. On a trial-to-trial basis the difficulty of this decision

is manipulated by adjusting the proportion of dots moving coherently in one

of the two directions. When the coherence is greater, the decision is easer

and the monkeys make fewer errors. As discussed above, this task has been

used to identify decision related neural signals corresponding to the weight

of evidence supporting a decision. This paradigm arms us with strong

predictions of how the monkeys choices are influenced by task parameters,

and an array of well honed analysis tools to quantify this behavior.

To this sensory decision we have added a very simple value element by

changing the magnitude of the reward associated with correct choices of

each option. On each trial we overtly inform the monkey, with a visual cue,

what volume of juice reward he will receive for correctly choosing (as

defined by the motion coherence) each option. As mentioned in Chapter 1,

in the context of a free-choice, “matching” paradigm, monkeys allocate their

responses in proportion to a target’s subjective, relative value. That is, they

13

are biased towards choosing targets of greater relative value. However, this

and other studies of value have been limited in only addressing the influence

of relative value, as opposed to absolute reward value. We have designed

our behavioral paradigm to vary both relative and absolute reward value.

The ideas of absolute and relative value are simple and fundamental to

our behavioral paradigm. Absolute value refers to an option’s value

independent of other potential options, while relative value refers to an

option’s value in relation to alternatives. For example, consider two options

offering an equally small reward: each has a small absolute value, but

because neither option is more valuable than the other, neither has any

relative value. Similarly, two options offering an equally large reward each

have a larger absolute value (compared to the two equally small offers), but

again, no relative value. When, however, one option offers a large reward

and the other a small reward, along with their absolute values, each option

has a relative value that is larger or smaller than the alternative.

We know monkeys will be biased toward choosing options with greater

relative value and we know they will also be more more likely to choose

options supported by greater coherence. In the following behavioral

experiment, we ask how monkeys integrate these factors on a trial-to-trial

basis. To encourage the monkeys to consider both of these factors on every

trial, we have incorporated these multiple reward contingencies into our

motion discrimination task so that on some trials they conflict.

For example, on some trials in which the options differ in relative value,

the motion coherence is towards the lower value target. In these conditions

the monkeys could ignore the motion stimulus and always choose the option

with greater relative value. Alternatively, the monkeys could ignore the

greater reward and chose the option supported by greater motion coherence.

14

As we will demonstrate below, however, these extreme-bias behaviors are

sub-optimal for reward harvesting. Optimal behavior requires a bias of

moderate size which is dependent on the magnitude of the relative reward,

the range of coherences presented, and the monkey’s capacity to

discriminate the motions direction.

Along with determining how monkeys integrate motion coherence and

relative reward value, and whether they do so optimally, we will also be able

to determine the extent to which absolute reward value influences

performance. While we do not anticipate a bias in response to changes in

absolute reward, it is possible that when presented with two options, each

with a small value, the monkeys are less sensitive to the motion coherence

and make more errors. Additionally, when a small reward is certain

monkeys might be less likely to engage in the task overall.

2.2 Methods

2.2.1 A Motion Discrimination Task With Multiple Reward

Contingencies

On each behavioral trial the monkeys observed a noisy random-dot

motion stimulus and reported which of two possible directions of motion are

present with a saccadic eye movement to the corresponding target. The

motion stimulus is composed of white dots, viewed through a circular

aperture, on a dark computer screen. On each trial a variable proportion of

the dots moved coherently in one of two opposite directions while the

remaining dots were flashed transiently at random location and times (for a

detailed description see 5). The difficulty of discrimination was varied

parametrically, from trial-to-trial, by adjusting the percentage of the dots in

15

coherent motion: the task was easy if many of the dots moved coherently

(i.e. 50% or 100% coherence), but became progressively more difficult as

the coherence decreased.

Importantly, the coherence only describes the strength of the motion,

not its direction. In the data figures that follow, the direction of coherent

motion is indicated by “signing” the coherence. Thus +25% coherence and

–25% coherence are equally strong motion signals, but move in opposite

directions. Typically, the animals viewed a range of signed coherences

spanning psychophysical threshold. The animals were always rewarded for

indicating the correct direction of motion, except at 0% coherence where

they were rewarded randomly (50% probability) irrespective of their choice.

Figure 1 illustrates the sequence of events comprising a typical trial of

the motion discrimination task. From left to right, trials began with the onset

of a small, dot that the monkey must visually fixate for 150 ms. Next, two

saccade targets appeared (hollow gray circles) for 250 ms. The two targets

were 10 degrees eccentric from the visual fixation point and 180 degrees

apart from each other. The targets were positioned in-line with the axis of

motion being discriminated. By convention, the target corresponding to

positive coherence is target 1 (T1) while the other is target 2 (T2). At the

end of the trial, the monkey reported his decision by making a saccadic eye

movement to one of these targets.

After 250 ms the targets changed color, indicating the magnitude of

reward the monkey available to the monkey for correctly choosing that

target. A blue target indicated a low magnitude (L) reward (1 unit, ~0.12 ml

of juice), while a red target indicated a high magnitude (H) reward (2 units).

As there are two reward magnitudes (H and L) to be assigned to each target

locations (T1 and T2), there were four reward conditions overall,

16

schematized by the vertical row of panels in Figure 1: 1) the LL condition in

which both targets were blue, 2) the HH condition in which both targets

were red, 3) the HL condition, in which T1 was red and T2 was blue, and 4)

the LH condition which was the mirror image of the HL condition.

The colored targets were visible for 250 ms before onset of the visual

motion stimulus which appeared for 500 ms, centered on the fixation point.

Fixate Targets Reward Motion Delay Go!

Figure 1. A two-alternative, forced-choice, motion discrimination task with mul-tiple reward contingencies.The sequence of events comprising a typical trial of the motion discrimination task. From left to right, trials begin with the onset of a !xation point. Two saccade targets appear and then change color indicating the magnitude of reward avail-able for correctly choosing that target. A blue target indicates a low magnitude (L) reward, while a red target indicates a high magnitude (H) reward. There are four reward combinations LL,HH, LH and HL, respectively depicted vertically. The visual motion stimulus appears centered on the !xation point. Following o"set of the motion stimulus, subjects maintain !xation for a variable delay period after which the !xation point disappears, cueing the subjects to report their decisions with a saccade to the target corresponding to the perceived direction of motion. If the subjects choose the correct direction of motion, they receive the reward indicated by the color of the chosen target.

17

Following offset of the motion stimulus, the monkey was required to

maintain fixation for a variable delay period (300-550 ms) after which the

fixation point disappeared, cueing the monkey to report his decision with a

saccade to the target corresponding to the perceived direction of motion. If

the monkey chose the correct direction of motion, he received the reward

indicated by the color of the chosen target.

Fixation was enforced throughout the trial by requiring the monkey to

maintain its eye position within an electronic window (1.25 degrees radius)

centered on the fixation point. Inappropriate breaks of fixation were

punished by aborting the trial and enforcing a time-out period before onset

of the following trial. Psychophysical decisions were identified by detecting

the time of arrival of the monkeys’ eye in one of two electronic windows

(1.25 radius) centered on the two choice targets (T1 and T2).

All trials were presented pseudo-randomly in block-randomized order.

For monkey A, we employed 12 signed coherences, 0% coherence and four

reward conditions, yielding 52 conditions overall. For monkey T we

eliminated two of the lowest motion coherences because this animal’s

psychophysical thresholds were somewhat higher than those of monkey A.

Thus monkey T was tested for 36 conditions overall. We attempted to

acquire 40 trials for each condition, enabling us to characterize a full

psychometric function for each of the four reward conditions. Because these

behavioral data were obtained simultaneously with electrophysiological

recordings, however, we did not always acquire the full 40 trials for each

condition (the experiment typically ended when single unit isolation was

lost). For the data reported in this paper, the number of repetitions obtained

for each experiment ranged from 19 to 40 with a mean of 36. The full data

set analyzed in this paper consists of 35 experiments from monkey A and 26

18

experiments from monkey T.

2.2.2 Subjects

Two adult male rhesus monkeys, A and T (12 and 14 kg), were trained on a

two-alternative, forced-choice, motion discrimination task with multiple

reward contingencies. Daily access to fluids was controlled during training

and experimental periods to promote behavioral motivation. Before training,

the monkeys were prepared surgically with a head-holding device (14) and a

scleral search coil for monitoring eye position (35). All surgical, behavioral,

and animal care procedures complied with National Institutes of Health

guidelines and were approved by the Stanford University Institutional

Animal Care and Use Committee.

2.2.3 Procedures

During both training and experimental sessions monkeys sat in a primate

chair at a viewing distance of 57 cm from a color monitor. Visual stimuli

were presented on the monitor under computer control. The monkeys’ heads

were positioned stably using the head-holding device, and eye position was

monitored throughout all experimental sessions my means of a magnetic

search coil apparatus (0.1o resolution; CNC Engineering, Seattle, WA).

Behavioral control and data acquisition were managed by a PC-compatible

computer running the QNX Software System’s (Ottawa, Canada) real-time

operating system.

The experimental paradigm was implemented in the NIH Rex

programming environment (Hays, Richmond, & Optican, 1982). Visual

stimuli were generated by a second PC-compatible computer and displayed

using the Cambridge Research Systems VSG (Kent, UK) graphics card and

19

accompanying software development tools. Liquid rewards were delivered

to the animals through a gravity-fed juice tube placed near the animal’s

mouth, activated by a computer-controlled solenoid valve. Subsequent data

analyses and computer simulations were preformed on Apple Macintosh

(Cupertino, CA) computers in the Mathworks MATLAB (Natick, MA)

programming environment.

2.3 Results

2.3.1 Relative Reward Biases Choice.

Figures 2a-d depicts psychometric functions (PMFs) describing each

monkey’s probability of choosing T1 (ordinate) as a function of motion

coherence (abscissa). As mentioned above, motion coherence is denoted

with a magnitude indicating the strength of the motion, and a sign indicating

its direction. Thus, +48% and -48% denote coherences of equal strength but

opposite direction. Positive coherence denotes motion towards T1 while

negative coherence denotes motion towards T2. A separate PMF is plotted

for each of the four reward conditions. The HH condition is plotted in red;

the LL in blue; the HL in black; and the LH in green. The circles depict the

observed proportion of T1 choices for each combination of coherence and

reward condition. The sigmoidal curves are fit quantitatively with logistic

regression. Figures 2a and 2b depict data from a representative experiment

for monkey A and monkey T respectively. Figures 2c and 2d depict the

average PMF across all behavioral sessions for monkeys A (n=35) and T

(n=25) respectively.

Our logistic regression model describes the log-odds-ratio of choosing

T1 as a function of the linear sum of several factors. In this model we have

20

included a factor for the coherence of the motion stimulus and the values of

each of the two targets, as described by equation 1:

Equation 1:

Where p is the observed probability of choosing T1; βcoh, βt1 and βt2 are

the fit coefficients representing the effect of motion coherence and target

value on this probability. β0 represents any global bias the monkey has

towards choosing T1. COH is an assigned a factor for the coherence of the

motion stimulus, in fractional units of the maximum coherence and signed to

signify the direction as described above. Thus, COH has a range from -1 to

1, where -1 represents -48% coherence and +1 represents +48% coherence.

T1val and T2val are assigned either +1, if the target was H, or -1 if the target

was L. For example, on HL trials in which the motion coherence was -12%,

COH=-0.25, T1=+1 and T2=-1. Constraining these factors to be in the same

range (-1 to 1) allows us to directly compare the values of the fit coefficients.

Equation 1 can be rearranged to Equation 2, which is used to generate the

sigmoid functions seen in Figure 2.

Equation 2:

ln

�p

1− p

�= β0 + βcoh(COH) + βt1 (T1val) + βt2 (T2val)

P =1

1 + e−(β0 + βcoh(COH) + βt1(T1val) + βt2(T2val))

21

-48 0 48

Monkey T

% T1Choices

Monkey A 100

50

0-48 0 48

% Coherence

-48 0 48

Monkey T

% T1Choices

Monkey A 100

50

0-48 0 48

% Coherence

Figure 2. Relative reward biases choice. a-d Psychometric functions (PMF) describing each monkey!s probability of choosing T1 as a function of motion coherence. Motion coherence is denoted with a magnitude indicating the strength of the motion and a sign indicating its direction. Positive coherence denotes motion towards T1 while negative coherence denotes motion towards T2. Separate PMFs are plotted for each reward condition (HH, red; LL, blue; HL, black; LH, green). Circles depict the observed proportion, and sigmoidal curves are fit quantitatively with logistic regression. a-b Results from one representative experiment for monkey A and monkey T, respectively. c-d Average PMF across all behav-ioral sessions for monkeys A (n=35) and T (n=25), respectively.

c d

a b

22

Several features of data presented in Figure 2 are notable. To begin,

consider the single behavioral session from monkey A plotted in Figure 2a.

First, the observed behavior for the HH and LL reward conditions (red and

blue circles, respectively) is nearly identical, indicating that the monkey’s

probability of choosing T1 is unaffected by changes in absolute reward.

Second, for each coherence the black and green circles, representing the

observed probabilities of a T1 choice for the HL (black) and LH (green)

conditions, are shifted vertically in relation to the HH and LL conditions.

The upward vertical shift in the HL condition (black) indicates that, across

all coherences, the monkey was more likely to choose T1, the higher value

target. The downward vertical shift in the LH (green) condition indicates the

opposite; the monkey was less likely to choose T1, the lower value target,

and more likely to choose T2, the higher value target. These data indicate

that the monkey’s choices are biased towards the target with higher relative

value. This bias results in corresponding leftward and rightward shifts of the

logistic model fit to the data for the HL (black) and LH (green) reward

conditions, respectively.

Figure 2c depicts the average (± s.e.m) behavior for monkey A, across

all behavioral experiments, in an identical style to Figure 2a. Note that this

average behavior is similarly fit with Equation 2 and shows similar results.

Across all behavioral experiments the monkey’s probability of choosing T1

is affected only by the motion coherence and changes in relative reward.

Figure 2d depicts the same results for monkey T. Thus, the bias resulting

from changes in relative reward are highly robust and reproducible both

within and across the two monkeys.

To quantify the magnitude of this bias, we measured the horizontal shift

between the HL and LL and between the LH and LL PMFs. This quantity,

23

the “behavioral equivalent visual stimulus” (bEVS), is in units of motion

coherence and corresponds to the amount of visual stimulus that would

produce an increase (or decrease) in T1 choices equal to that produced by

the increase (or decrease) in relative reward. Note that this approach to

quantifying the effect of relative reward is identical to that taken by Salzman

and colleagues to quantify the effects of MT microstimulation (58). bEVS is

defined by equation 3:

Equation 3:

For example, in the PMF plotted in Figure 2a the bEVS for the HL

condition is 12.3% coherence. This means that increasing the motion

coherence towards T1 by 12.3% coherence and increasing the value of T1,

relative to T2, by one unit of reward, both exert equivalent effects on the

probability of a T1 choice. For the data shown in Figure 2b, bEVS=15.7%

coherence; for Figure 2c, bEVS=14.7% coherence; and for Figure 2d,

bEVS=16.3% coherence. Figure 3a and 3b depict population data as

frequency histograms of the bEVS measured in each behavioral experiment

in monkey A and monkey T, respectively. The solid red line in each figure

indicates the mean of the distribution (monkey A: ±15.4% coh; monkey T:

±17% coh), and the dotted lines indicate the s.e.m (monkey A: ±0.9393;

monkey T: ± 1.3206). Both monkeys exhibited considerable day-to-day

variation in the size of the bias.

Note that in the logistic model this reward bias is expressed by the

addition of βt1 and βt2 (equations 1 and 2) and is independent of the

bEV S =βt1− βt2

βcoh

24

coherence term. To verify this we recomputed this logistic model with two

two interaction terms to capture any coherence dependent effect of reward.

Monkey A had a significant interaction between coherence and T1val in 6

(17.65%) behavioral sessions and between coherence and T2val in 8

(23.53%) sessions. Monkey T had a significant interaction between

coherence and T1val in 3 (12%) behavioral sessions and between coherence

and T2val in 8 (32%) sessions. Figures 3c and 3d plot the mean βt1 (blue)

and βt2 (red) coefficients for each coherence, after incorporating any

significant affects of coherence as mediated by significant interaction terms.

The flat lines in Figures 3c and 3d demonstrates that for monkey A and T,

βt1 and βt2 did not systematically vary, on average, as a function of

coherence. This suggests that, in mechanistic terms, the bias is equally

effective across all coherences. The appearance of a smaller bias effect at

larger coherences in the plots in Figure 1 is due to saturation of “percentage

T1 choices” (bound between 0 and 100) at high coherences. If the same data

were plotted in the log-odds space generated by Equation 1, the fits are a

series of straight, parallel lines with the additive bias causing a single offset

across all coherences.

2.3.2 Estimating the optimal bias

Intuitively we can understand that excessive bias, say always choosing

the high value target, results in fewer rewards earned, as the monkey makes

choices that are at odds with clear sensory evidence on high coherence trials.

Similarly, an under-bias increases the chance of selecting the low value

target under great uncertainty on low coherence trials. Some intermediate

bias level will be optimal for harvesting the rewards optimally, and this

optimal bias will depend on the monkey’s capacity to discriminate the

direction of motion. A perfect motion discriminator would always know

25

Figure 3. Bias is consistent across all experiments and coherences.a-b Frequency histograms of monkeys! bias (% coherence) in each behavioral experiment; each distribution!s mean (±s.e.m.) is demarked in red (monkey A: 15.4±0.9393 % coh; monkey T:17±1.3206 % coh). c-d Mean value of ßt1 (blue) and ßt2 (red) coefficients for each coherence after incorporating any significant effects of coherence as mediated by significant interaction terms.

c d

a bMonkey A

Count

Bias (% coherence)

2

4

6

8

10

5 15 25 35

Monkey T

5 15 25 35

1

2

3

4

Monkey A

% Coherence

RewardCeffficient

1

0

-1

0 3 12 48

Monkey T

0 12 48

26

which choice is correct and should show no bias at all, whereas a very poor

discriminator, facing great uncertainty, should exhibit a larger bias.

Similarly, the optimal bias should also depend on the overall difficulty

motion stimulus set. A set largely composed of difficult stimuli requires a

larger bias than a set of easy stimuli.

How close do our monkeys come to establishing an optimal bias? To

address this question quantitatively we calculated the percentage of rewards

(in drops of juice) that a subject could harvest in principle across a range of

behavioral biases and relative reward ratios, given each monkey’s average

sensitivity to the visual stimulus. For each behavioral bias, assuming no

spatial bias, the probability of choosing T1 is given by:

Equation 4:

Where βcoh is a fit coefficient from equation 1, which defines the slope of

the average PMF from the normative LL condition (Figs. 2c and d, blue

curves), and B is a specific choice bias in units of percentage coherence.

COH denotes the actual motion coherence values experienced by each

monkey as described in Methods. Equation 5 incorporates equation 4 to

define Harvesting Efficiency (HE), our quantity of interest.

Equation 5:

P =1

1 + e−(βcoh(COH)+B)

!

HE(B,T1,T2) =

Pcoh,BT1coh>0"

#

$ %

&

' ( + (1) Pcoh,B )T2

coh<0"

#

$ %

&

' ( + 0.5* P0T1+ (1) P0)T2( )

Ncoh>0T1+ Ncoh<0T2 + 0.5(T1+ T2)

27

The numerator is the total number of rewards (in drops of juice)

obtained by the hypothetical subject in a hypothetical experiment; the

denominator is the total number of rewards that became available during the

experiment. Figures 4a and 4b shows the model results for monkeys A and

T, respectively. Here we plot the harvesting efficiency (color-coded surface)

as a function of choice bias (bEVS in %coh) on the abscissa, and T1:T2 ratio

on the ordinate. The monkeys in our experiments only experienced two of

the T1:T2 ratios in the plots of Figure 4 (1:1 and 2:1), but examination of the

entire surface is useful for understanding how reward ratio, bias, and

harvesting efficiency interact.

The surfaces exhibit two important features. First, as the reward ratio

increases, the optimal bias (peak HE) grows positively away from 0%

coherence. Thus, to maximize harvesting efficiency, the monkey must bias

its choices toward T1, and the amplitude of the bias should increase as the

reward ratio increases. The second feature is the striking asymmetry in HE

between large positive and negative biases. A larger-than-optimal bias is

punished less severely, in terms of harvesting efficiency, than a smaller-

than-optimal bias. Strategically, therefore, an animal should err on the side

of an over-bias.

These features are perhaps better appreciated in Figures 4c and 4d,

which plots for each monkey two horizontal slices though the surface in

Figures 4a and 4b—the two reward ratios actually experienced by each

animal. The blue and green horizontal curves are slices through T1:T2=1

and T1:T2=2, respectively, as experienced in the HH/LL and HL/LH

conditions. HE is plotted on the ordinate, as a function of choice bias

(bEVS) on the abscissa for both T1:T2 values.

28

Monkey A12

4

6

8

10

T1:T2ratio

Bias (% coherence)-40 -20 0 20 40

Monkey T

-40 -20 0 20 40

Monkey A

Harvesting efficiency

1

0.5

0 -40 -20 0 20 40

Monkey T

Bias (% coherence)

-40 -20 0 20 40

c d

a b

Figure 4a-d. Harvesting efficiency is a function of bias.a-b Color-coded surfaces depicting harvesting efficiency (HE) as a function of bias and reward ratio. Maximizing harvesting efficiency requires a bias to T1 whose amplitude increases with the reward ratio. c-d HE plotted as a function of choice bias. Curves are horizontal slices through the surface in a and b at the two reward ratios used (T1:T2=1, HH/LL, blue and T1:T2=2, HL/LH, green). Blue vertical lines demark peak HE for T1:T2=1 (monkey A:77.16%; monkey T:84.75%). Green vertical lines demark peak HE for T1:T2=2 (monkey A:80.02%; monkey T: 86.32%). The shifted peak indi-cates it is optimal to bias choices towards T1 (monkey A 9.2%; monkey T:6.8% coherence). Black vertical lines depict observed average biases (monkey A: 14.7%; monkey T: 16.3%).

29

When T1:T2=1 the peak harvesting efficiency (monkey A:77.16%;

monkey T:84.75%; blue vertical line), is achieved with no bias. When

T1:T2=2 (the HL condition, plotted in green) the peak of the HE curve

(green vertical line) is both elevated and shifted. The elevation means that,

if the bias is optimal, peak HE increases to 80.02% and 86.32% harvested

rewards for monkeys A and T, respectively. The shift of the peak indicates

that in this reward condition it is optimal to bias choices towards T1. From

these plots, which are derived from each animal’s average behavior (Fig. 1),

we can determine that the optimal bias for monkeys A and T is 9.2% and

6.8% coherence, respectively. However, as stated above, the observed

average biases (black vertical lines in Figures 4c and 4d) are larger: 14.7%

coherence for monkey A and 16.3% coherence for monkey T. In fact, both

monkeys exhibit a consistent over-bias across all the behavioral experiments.

This can be clearly seen in Figure 4e which plots the observed biases (%coh),

on the ordinate, and the calculated optimal bias (%coh) on the abscissa, for

monkey A (circles) and T (pluses) for all experiments.

Note that for a given reward ratio the optimal bias is a function of two

related factors. First, the slope of the PMF, defined by βcoh, which varies

across experimental sessions, and second, the specific set of coherences

experienced by each monkey. Figure 4f plots the optimal bias, on the

ordinate, as a function of βcoh, on the abscissa, for T1:T2=2. The curved

red line is for monkey A, the blue for monkey T. The straight vertical lines

demark the βcoh fit with Equation 1 for monkey A (red) and monkey T

(blue). Note that the red curve is above the blue, indicating that across all

βcoh monkey A’s optimal bias is greater then monkey T’s. This results

from the fact that monkey A experienced four stimulus conditions (-3%, -

1.5%, 1.5% and 3% coherence) that monkey T did not (see Methods).

30

Because these extra conditions are very low coherence, monkey A faced

more uncertainty than did monkey T, requiring a larger bias for optimal

performance.

Although both monkeys exhibit reward biases larger than optimal, they

do not pay much of a penalty for the overbias. Inspection of Figures 4c and

4d, for example, reveals that the over-bias results in HE's of 79.40% and

85.8% for monkeys A and T respectively, which represents a HE penalty of

Optimal bias (% coh)

Actualbias(% coh)

0 10 20

10

20

30

Optimalbias(% coh)

ßcoh

20

10

04 5 6

Figure 4e-f. Monkeys! bias is greater then the optimal bias, which is a func-tion of psychophysical sensitivity and specific coherence values.e Observed bias plotted against calculated optimal bias for monkey A (circles) and T (pluses) for all experiments. f Optimal bias for T1:T2=2 as a function of ßcoh, which varies across experimental sessions, and defines the PMF"s slope. Red and blue curves are for monkeys A and T, respectively. The straight vertical lines demark the ßcoh fit with Equation 1 for monkey A (red) and monkey T (blue). Across all ßcoh values, monkey A!s optimal bias is greater then monkey T!s because A experienced four additional coher-ences (-3%, -1.5%, 1.5% and 3% coherence) of greater uncertainty.

e f

31

only 0.62% and 1.74% relative to the optimal bias. Figure 4g, which plots

the observed HE (ordinate) as a function of the optimal HE (abscissa)

demonstrate this point for all the behavioral experiments from monkey A

(circles) and T (pluses). Figures 4h and 4i depicts frequency histograms

showing a distribution of the percentage of the optimal HE achieved by each

monkey in each experiment. Although the monkeys have a consistent

overbias, they are still harvesting, on average, 98.6% (A) and 97% (T) of the

optimal HE.

2.3.3 Modeling caveats

It is important to note that the model described by equation 1, which we

have used to quantify the monkeys’ behavior for all the above analyses,

constrains the resulting sigmoidal fits in two relevant ways. First, because

there is no interaction term between βcoh and either βt1 or βt2, the slope of

the PMF, which describes how accurately the monkey discriminates the

direction of motion, is constrained to be equal for each of the four reward

conditions. That is, we are assuming that the monkeys are equally willing

and able to discriminate the direction of motion in all four reward conditions.

This assumption can be tested directly by modeling the behavior from each

of the four reward conditions separately using equation 6.

Equation 6:

Equation 6 is similar to equation 1 but lacks terms for the target values,

ln

�p

1− p

�= β0 + βcoh(COH)

32

Figure 4g-i. Despite over-bias, monkeys harvest a majority of rewards.g Observed harvesting efficiency plotted against optimal harvesting efficiency for all behavioral experiments from monkey A (circles) and T (pluses). Monkeys do not pay much of a penalty for their over-bias. h-i Frequency histograms showing the distribution of the percent of the optimal harvesting efficiency achieved by each monkey in each experiment. Although the monkeys have a consistent over-bias, they are still harvesting, on average, 98.6% (A) and 97% (T) of the optimal.

96 97 98 99 100 92 94 96 98 100

1

2

3

4

Monkey TMonkey A

Count

% Optimal harvested rewards

8

6

4

2

h i

70 80 90

80

90

Observedharvestingefficiency

Optimal harvesting efficiency

g

33

which are irrelevant since we are modeling within a reward condition.

Figures 5a and 5b depict histograms of the resulting βcoh values from

monkeys A and T, respectively. A one-way Anova revealed no significant

difference in psychophysical sensitivity (βcoh) across the four reward

conditions for monkey A (p=0.7980), but a weakly significant difference

was detected in monkey T (p=0.06). Clearly, monkey A is equally able and

willing to discriminate the direction of motion in all four reward conditions.

The difference detected in monkey T is a weak trend toward slightly lower

βcoh values (less psychophysical sensitivity) for the LL condition (Fig. 5b,

blue bars; mean βcoh=5.0) in comparison to all other conditions (mean

βcoh=5.8). We reran the optimality analysis to determine whether the slight

reduction in psychophysical sensitivity for the LL condition affected the

outcome. The effect was minimal—only a 1.2% coherence increase in the

estimate of the optimal bias. We conclude that the assumption of equal

psychophysical sensitivity across reward conditions is generally legitimate,

and that small departures from equal sensitivity had little effect on our

results.

Second, because the target value factors (T1val and T2val) differ only

in their sign, the magnitude of the lateral shift resulting from the behavioral

bias towards the high value target is constrained to be equal for the HL and

LH reward conditions. We can test this assumption by modeling each of the

reward conditions independently using equation 4. Figures 5c and 5d shows

the absolute value of the bias in the LH condition as a function of the bias in

the HL condition for each behavioral experiment. A paired t-test of these

data revel no significant difference in the distributions of the bias term in the

two reward conditions (monkey A: p=0.44; monkey T: p=0.24). The

choice bias differed substantially from experiment to experiment, but was

34

Monkey A Monkey T

Count

ßcoh

12

8

4

4 8 12

12

8

4

4 6 8 10

LHHLLLHH

LHHLLLHH

10 20 30 40

10

20

30

40Monkey A

0 20 40

0

20

40

Monkey T

| LH |bias

(% coh)

HL bias (% coh)

ba

dc

Figure 5. PMF slopes are independent of reward conditions, and bias is similar for both relative reward conditions. a-b Frequency histograms of ßcoh for each reward condition modeled separately using equation 6. There is no significant difference in psy-chophysical sensitivity (ßcoh) across reward conditions for monkey A (one-way ANOVA p=0.7980). A weak but significant trend toward slightly lower values (less sensitivity) for the LL condition was detected in monkey T (one-way ANOVA p=0.06) c-d Absolute value of LH bias plotted against the HL bias for each behavioral experiment. A paired t-test revealed no signifi-cant difference in the distributions (monkey A: p=0.44; monkey T: p=0.24). Biases differed between experiments but were reliable within experiments (monkey A: r=0.565, p< 0.001; monkey T, filled points dropped: r=0.482 p<0.05).

35

fairly reliable within an experiment, as evidenced by the positive correlation

in Figure 5c. Even after dropping two outlier data points (filled points), the

bias terms were significantly correlated in both monkeys (monkey A:

r=0.565, p< 0.001; monkey T, after dropping two outliers: r=0.482 p<0.05).

2.3.4 “No Choice” Analysis

The preceding analyses were concerned exclusively with successfully

completed trial in which the monkey unambiguously chose T1 or T2. Recall

that successful completion of a trial required the monkey to: 1) maintain

fixation within an electronically defined “fixation window” until receipt of

the “go” signal, 2) initiate the operant saccade within 1 second of the

disappearance of the fixation point, and 3) execute a saccade that terminates

within the detection window surrounding the chosen target. On some trials,

however, these conditions are not met.

For example, the monkey’s eye position might leave the fixation

window before the fixation point disappears, or the monkey might not look

at one of the targets when the fixation point does disappear. These trials are

considered “no-choice” trials; such trials are aborted immediately and are

not included in our standard analyses of behavioral and electrophysiological

data. No-choice trials comprise roughly 9% of all trials (monkey A:

mean=9.17%; s.e.m±0.433; monkey T: mean=9.62%; s.e.m±0.99).

Although a no-choice trial can result from several different behaviors

including eye-blinks, eye-drift and errant or early explicit saccades, they all

reflect some sort of failure to engage the task in a sufficiently precise

manner. We analyzed these no-choice trials to determine whether they were

modulated by parameters of the behavioral paradigm, which might yield

additional insight into how the monkeys were influenced by motion

36

coherence and reward information.

Figures 6a and 6b plot the mean fraction of no-choice trials (mean in

white, ±s.e.m in black) across all experiments (ordinate) as a function of trial

time (aligned to target onset—abscissa) for monkeys A and T respectively.

Comparison of these two plots reveals that the monkeys show very similar

patterns of no-choices. During the reward cue epoch (250-500 ms), in which

the monkey first learns the reward condition for that trial, the fraction of no-

choices increases transiently. As the monkeys enter the motion epoch (500-

1000 ms) the fraction of no-choice trials is low but then increases as the

viewing period progresses. A similar trend is evident in the delay period:

the fraction of no-choices is low during the early delay epoch (1000-1300

ms) but rises during the late delay epoch (1000-1550 ms). Clearly, the

likelihood of generating a no-choice trial is certainly modulated by task

epoch. We next analyze whether the fraction of no-choice trials is further

influenced by parameters such as the reward condition and motion

coherence.

We selected two task epochs for further analysis: the motion cue

period and the delay period. The bar graphs in Figures 7a-d depict for each

epoch the mean fraction (±s.e.m) of no-choice trials within that epoch for

each reward condition. For each epoch and monkey we performed a one-

way anova and a post-hock, pairwise comparison test (corrected for multiple

comparisons using THD) test to identify differences in no-choice frequency

among reward conditions.

Figures 7a and 7b plot data from the motion stimulus period for

monkeys A and T, respectively. In this epoch both monkeys generated

significantly more no-choice trials in the LL reward condition than in any of

the reward condition in which a high value target was present (monkey A:

37

Mean fraction ofno choice

Time from target onset (ms)

Monkey ATargetepoch

Rewardepoch

Motion epoch Early delayepoch

Latedelayepoch

0.1

0.05

0 250 500 1000 1300 1550

Mean fraction ofno choice

Time from target onset (ms)

Monkey T0.1

0.05

0 250 500 1000 1300 1550

a

b

Figure 6. Fraction of no-choice trials varies with task epoch. a-b The mean fraction of no-choice trials (white, ±sem in black) as a func-tion of time aligned to target onset for all experiments. The likelihood of generating a no-choice trial is modulated by task epoch.

38

Monkey A motion epoch

Mean fraction no-choicein epoch

0.5

0.25

0

Reward condition

HH LL HL LH

Monkey A delay epoch

HH LL HL LH

Monkey T motion epoch

Mean fraction no-choicein epoch

0.5

0.25

0

Reward condition

HH LL HL LH

Monkey T delay epoch

HH LL HL LH

Figure 7. Fraction of no-choice trials is greater for the LL reward condition in the motion and delay epochs. a-d Bar graphs depicting the mean fraction (±sem) of no-choice trials within the motion epoch (a and c) and delay epoch (b and d) for each reward condi-tion. There are significantly more no-choice trials in the LL condition than in any condition with a high-value target (one-way ANOVA and post-hoc com-parison test, monkey A: p<0.001; monkey T: p<0.001). This trend was also present during the delay period for both monkey A (p < 0.001) and monkey T (p < 0.001).

a

c

b

d

39

p<0.001; monkey T: p<0.001). This trend was also present during the delay

period for both monkey A (p < 0.001) and monkey T (p < 0.001) as seen in

Figures 7c and 7d, respectively.

To investigate the influence of motion coherence on the generation of

no-choices, we further analyzed the no-choice trials from the motion

stimulus and delay epochs. Figures 8a-d plots the mean (±s.e.m) of no-

choice trials within these task epochs, separately for each of the reward

condition, on the ordinate, as a function of motion coherence, on the abscissa.

For this analysis we have combined data from both monkeys to gain

statistical power.

Figures 8a and 8b plot the results from the motion stimulus and delay

period, respectively, for the HH (red) and LL (blue) conditions. Note in

both epochs the blue line lies above the red line across almost all motion

coherences, indicating that the likelihood of a no-choice trails is greater LL

trails. Thus, while absolute reward value does not affect either monkey’s

probability of choosing T1 (Fig. 2) it does affect both monkeys’ probability

of completing a trial successfully. Figures 8c and 8d plot results for the HL

(black) and LH (green) conditions for the stimulus and delay epochs,

respectively. Note that in both epoch the black line (HL) lies above the

green (LH) on the left half of the graph while the green (LH) line lies above

the black (HL) on the right half. This indicates that in the relative reward

conditions, both monkeys were more likely to generate a no-choice trail

when the motion coherence was towards the low value target.

2.3.5 Saccade Latency

Another aspect of the monkeys’ behavior that is potentially modulated

by task parameters is the basic saccade metrics. Here we consider the effect

40

Mean fractionno-choicein epoch

0.2

0.4

0.6

0.8

0.2

0.4

0.6Monkey A motion epoch Monkey T delay epoch

-48 0 48 -48 0 48Coherence (%)

Mean fractionno-choicein epoch

0.2

0.4

0.6

0.8

0.2

0.4

0.6Monkey A motion epoch Monkey T delay epoch

-48 0 48 -48 0 48Coherence (%)

Figure 8. Fraction of no-choice trials is greater for the LL reward condition in the motion and delay epochs for most coherences. a-d Depicts the mean (± sem) fraction of no-choice trials for the HH (red), LL (blue), HL (black) and LH (green) reward conditions within the motion (a and c) and delay (b and d) epochs as a function of motion coherence. For this analysis we have combined data from both monkeys to gain statistical power.

a

c

b

d

41

of reward condition and motion coherence on the mean saccade latency,

defined as the time between fixation offset and saccade initiation. Figures

9a and 9b plot the mean (±s.e.m) latency across all behavioral experiments

as a function of unsigned motion coherence for the HH (red), LL (blue), HL

(black) and LH (green) reward conditions. The open circles denote the

observed average latencies, the solid lines

are regression lines individually fit to the data, and the dashed lines are the

95% confidence intervals for the regression lines. For clarity, the latency

measurements in Figure 9 are combined for directions of equal coherence.

Consider first the results from Monkey A in Figure 9a. The most

striking feature is a highly typical (28, 65) dependence of mean latency on

coherence, with higher coherence resulting in shorter latencies. This result

was quantitatively confirmed by the linear regression model that produced

significantly negative slopes for all four reward conditions (p<0.0001 for all

conditions). The results for monkey T, plotted in Figure 9b, are less striking.

For monkey T, all regression coefficients were negative, however, they were

only significant in the HH (p=1.5x10-5) and LH (p=0.01) conditions.

2.4 Discussion

2.4.1 Sensory and value information are additive

The most important result of these behavioral experiments is the

systematic lateral shifts and identical PMF slopes for relative reward

conditions as compared to absolute reward conditions. These data indicate

that relative value exerts a simple additive effect on current sensory evidence

in the formation of perceptual decisions, implying that we may see additive

effects at the neural level as well.

42

Gold and Shadlen (21) have posited a theoretical framework, based on

signal detection theory (SDT), in which sensory and value information can

be incorporated into a single decision variable though addition. In this

framework an option is chosen if its likelihood ratio (LR) is greater then

unity. An option’s LR describes the likelihood that the current evidence

would be observed if that option were correct, relative to the likelihood that

it would be observed if the alternative were correct. Through multiplication

the LR can be updated to include other factors including additional evidence,

prior probabilities and relative value. Importantly, Gold and Shadlen point

135

145

155

165

120

160

200

240

Response Latency(ms)

Monkey A Monkey T

Coherence (%)

0 12 24 48 0 12 24 48

Figure 9. Effect of motion coherence and reward condition on saccade latency.a-b Depicts the mean (±sem) latency across all behavioral experiments as a function of unsigned motion coherence for the HH (red), LL (blue), HL (black) and LH (green) reward conditions. Circles demark observed average laten-cies, solid lines demark individually fit regressions and the dashed lines demark the 95% confidence intervals for the regression lines. For clarity, the latency measurements are combined for directions of equal coherence.

a b

43

out that by taking the logarithm of the LR (logLR), these factors can be

accumulated additively. They further posit the logLR as a common neural

currency for combing sensory and value information, and suggest that a

quantity proportional to logLR is represented by LIP neurons. While we are

unable to address the issue of a common neural currency with these

behavioral data, we will consider the matter further in the next chapter when

we discuss our physiological experiments in area LIP.

Additionally, SDT predicts that absolute value has no effect on the

likelihood of selecting an option. While our results confirm this, we assume

there are only two options, one of which much be selected. However, even

in a two-alternative, forced-choice task such as ours, there is always at least

one other course of action -- to choose neither option. This truly represents a

third option, whose value is not under behavioral control. Our analysis of

no-choice trials (Figs. 6, 7 and 8), in which the monkey failed to choose

either of the two targets, reveals that this third option is reflected reliably in

both monkeys’ behavior.

2.4.2 Monkeys are capable of near-optimal performance

As discussed in Results, the optimal amount of bias in response to

increases in relative reward depends on the relative value of the options, the

monkey’s perceptual sensitivity and the set of coherences employed in the

experiment. Our analysis demonstrates that both monkeys’ performance is

nearly optimal, harvesting on average 98% of the maximum available

rewards. Departures from optimality result from a consistent over-bias

(Figure 4e), for which there are several potential explanations.

Given the asymmetry of the HE surface (Figs. 4a-d) clearly the

consequences of excessive positive bias are less severe than the

44

consequences of a negative bias. One is therefore tempted to attribute the

observed over-bias to a conservative strategy on the monkey’s part, but this

explanation is unsatisfactory upon closer scrutiny. Negative bias is not a

realistic option a priori—it is extremely unlikely that a monkey would ever

exhibit a bias toward the target of lesser relative reward magnitude! Given

the improbability of a negative bias within our experimental design, and

given that neither monkey ever demonstrated negative biases it is perhaps

more reasonable to look elsewhere for an explanation of the observed over-

bias.

A more likely explanation for the observed over-bias is that the

monkeys’ valuation of the relative reward is nonlinear. Although the

objective value of our relative reward is only one unit of juice, the subjective

value of that increase to the monkey may be greater than one unit.

Economists and behavioral ecologists have long been familiar with such

nonlinear transformations in the subjective value, or “utility”, of increases in

reward magnitude (19, 66, 69). Utilities that are larger than would be

expected from the objective magnitudes are a hallmark of positive utility

functions, which are frequently observed in animals that have not yet

achieved daily requirements of food or water (69). This is analogous to the

situation of our monkeys, who enter each experiment needing to work to

obtain their daily fluid allotment. We speculate, therefore, that our monkeys

exhibit over-biases because of positive utility functions associated with a

highly motivated desire for fluids. We are unable to define our monkeys’

actual utility functions as that would require at least one additional reward

magnitude (e.g 3:1). Nevertheless, if we assume a positive utility function,

we can consult the plots in Figures 4a and 4b to determine quantitatively

how the monkey subjectively values the one unit increase in objective value

45

provided in our experiments. We accomplish this by identifying the reward

ratio for which each monkey’s observed bias would be optimal. The

analysis reveals that monkey A valued a one unit increase in reward as

though it were in fact a 1.99 unit increase, while monkey T valued the one

unit increase as though it were a 3.78 unit increase.

2.5 Summary

In this chapter we investigated the behavior of rhesus monkeys in a

decision-making paradigm requiring the dynamic combination of sensory

and value information. The sensory component of our task is based on a

two-alternative, forced-choice, direction discrimination task used to study

sensory-based decisions. To this sensory decision we have added a very

simple value element by changing the relative and absolute value of the

reward associated with correct choices of each option.

The most important result of these behavioral experiments is the

systematic lateral shifts and identical PMF slopes for relative reward

conditions (HL and LH) as compared to absolute reward conditions (HH and

LL). These data indicate that relative value exerts a simple additive effect

on current sensory evidence in the formation of perceptual decisions,

implying that we may see additive effects at the neural level as well. Our

analysis demonstrates that both monkeys’ performance is nearly optimal,

harvesting on average 98% of the maximum available rewards. Departures

from optimality likely result from a consistent over-bias because of positive

utility functions associated with a highly motivated desire for fluids.

In Chapter 3 we will investigate this behavior at the neural level with a

series of neurophysiological recordings within cortical area LIP. As

discussed in Chapter 1, in the context of decision-making, single LIP

46

neurons are modulated by both the the strength of motion coherence and

reward value. Thus, it is an ideal place to begin an investigation into where

and how the two disparate sources of information in our task, motion

coherence and target value, are integrated at the neural level.

47

Chapter 3

3.1 Introduction

The behavioral data presented in Chapter 2 demonstrated that monkeys

engaged in a motion discrimination task with multiple reward contingencies

integrate motion coherence and reward value in a near-optimal fashion. Our

analysis shows that first, this integration occurs on a trial-to-trial basis and

second, that across all motion coherences there is an additive bias towards

targets of greater relative value. This additive bias can be quantified with

the bEVS metric, which expresses the lateral shift of the PMF in terms of

motion coherence.

To investigate this behavior at the neural level we performed a series

of neurophysiological recordings within cortical area LIP, located on the

lateral bank of the intraparietal sulcus. Within LIP, we further focused our

investigation by selecting for study a subset of neurons that carry signals

generally thought to be relevant for decisions to move the eyes. These

neurons are usually identified by their increased activity when there is either

a shift of attention to, or in anticipation of a saccade to, a specific region of

space, referred to as the neuron’s response field (RF). Following procedures

established by several laboratories, we selected these eye movement related

LIP neurons using a delayed saccade task in which a visual target is

presented within a neuron’s RF while the monkey awaits a cue to saccade to

the target for a reward. Specifically, we selected neurons demonstrating a

persistent increase in activity during the delay between the presentation of

the target and the time of the saccade.

Early studies of these LIP neurons discussed their activity in terms of

either attention to the RF (4, 10-11, 26) or a motor plan to move the eyes

into RF (2, 20, 67). However, subsequent investigations demonstrated that

48

this, “into RF” versus “out-of RF,” or “choice,” activity is in fact graded and

highly modulated by several cognitive factors including the weight of

evidence supporting the decision (65), the prior probability the saccade will

be instructed (52), the relative magnitude of reward associated with a

saccade (52) and the relative subjective value of a saccade (70). Most

relevant to our study are modulations correlated with the weight of evidence,

specifically motion coherence, and modulations correlated with relative

target value.

In the context of a simple two-alternative, forced-choice, motion

discrimination task, these LIP neurons are modulated by the weight of

evidence (the strength of the motion coherence) supporting decisions into

and out of the RF. In these experiments, as in ours, two opposing saccade

targets are presented inline with the axis of motion coherence, with one

target located within the RF of the neuron under study (28, 54, 65). If a

decision to choose the target in the RF is based on strong evidence (a highly

coherent motion stimulus) favoring that target, the delay period activity is

greater than if it is based on weaker evidence (low coherence). Conversely,

if a decision to choose the target outside the RF is based on strong evidence

favoring that target, delay period activity is less than if it is based on weaker

evidence.

LIP delay period activity is also finely modulated by the relative value

of the target in the RF with greater relative values generally producing

greater delay period activity. This has been shown both for conditions in

which the target’s value is explicitly signaled (52), as it is in our experiment,

and for conditions in which the monkey must generate an internal estimate

of the targets’ value based on previous experience (70). In these previous

studies, however, target value was only defined in relative terms: a target in

49

the RF was always of greater or lesser value then a target out of the RF.

These studies were thus unable to determine if LIP delay-period activity is

also modulated by differences in absolute target value.

Given the host of factors shown to modulate these neurons, it has been

proposed that LIP integrates information from multiple sources that

momentarily inform the behavioral relevance of a stimulus in the RF. Taken

as a whole, these LIP neurons would comprise a map of the visual field that

could be used for the allocation of attention or to direct saccades.

Furthermore, it has been posited that information converging on LIP is

integrated in a “common currency.” (21-22, 70-71) By common currency we

mean that information is encoded in a “currency,” or scale, that depends on

its ”common” influence on behavior. A common currency predicts that two

disparate factors (such as motion coherence and target value) that have an

equivalent influence on a behavior relevant to LIP (such as the probability of

saccade generation) would modulate LIP activity equivalently.

Thus, LIP is a logical place to begin an investigation into where and

how the two disparate sources of information in our task, motion coherence

and target value, are integrated. By placing one of our targets within the RF

of an LIP neuron, we will be able to: 1) reveal the extent to which LIP

represents absolute and relative reward, 2) determine if and how single

neurons are modulated by both motion coherence and target value, 3)

investigate the the dynamics of integration as behaviorally relevant

information is presented sequentially and 4) determine whether this

information is integrated in a common currency.

3.2 Methods

3.2.1 Subjects

50

The same two adult male rhesus monkeys that participated in the behavioral

experiments presented in Chapter 2 were used in the following physiological

experiments. Before physiological recordings, each monkey underwent an

additional surgical procedure to place a recording chamber above the

intraparietal sulcus.

3.2.2 Physiological Recordings

Area LIP was identified by a combination of sterotactic location,

characteristic physiological activity and anatomical magnetic resonance

imaging. Single neurons were isolated and their activity recorded with

extracellular microelectrodes. Monkey T received a single craniotomy that

matched the dimensions of the recording cylinder. For monkey A, the

cylinder was placed on intact skull protected with a thin layer of dental

acrylic. For this animal, a 3 mm “burr-hole” was drilled, under surgical

conditions, one day before beginning recordings at a given location within

the recording cylinder.

For monkey A, neurophysiological recording was accomplished with

quartz/platinum-tungsten (Thomas Recording, Giessen, Germany) electrodes

that were positioned and manipulated daily with a 5-channel single electrode

system (“Mini Matrix,” Thomas Recording, Giessen, Germany). For

monkey T, we employed tungsten electrodes (FHC Inc., Bowdoin, Maine)

positioned with a Crist grid (Crist Instruments Co., Inc., Hagerstown,

Maryland) and manipulated with a Narishige single electrode drive

(Narishige Co., LTD, East Meadow, New York).

Real time experimental control was implemented in the Rex software

environment for the Qnx operating system (QNX software, Ontario, Canada)

running on a PC compatible computer. Visual stimuli were generated using

51

a VSG graphics card (Cambridge Graphics, UK) and presented on a CRT

display. After amplification, single unit spiking activity was identified and

collected along with digitized task events and eye position traces using the

Plexon (Plexon Inc., Dallas, Texas) data acquisition system operating in

conjunction with Rex. All data were subsequently analyzed offline with

custom scripts written in the MATLAB (The MathWorks, Inc., Natick,

Massachusetts) programming language, running on Apple Macintosh (Apple

Computer, Inc., Cupertino, California) computers.

3.2.3 Cell selection

As mentioned above, we limited our study to LIP neurons identified as

having persistent delay-period activity during a delayed saccade task. We

employed a variant of the delayed saccade task that has been used

extensively to identify these neurons. The temporal structure of this task is

illustrated in Figure 10a. From left to right, trials began with the onset of a

small fixation target. After the monkey acquired and fixated the target for

150 ms, a single saccade target appeared for a variable delay period (250-

800 ms). At the end of the delay period the fixation point disappeared,

cueing the monkey to saccade to the target. For monkey A the saccade

target was always blue, indicating a low magnitude (L) reward (1 unit, ~0.12

ml of juice); for monkey T the target had a 50% probability of being red,

indicating a high magnitude (H) reward (2 units, ~0.24 ml of juice).

Fixation was enforced throughout the trial by requiring the monkey to

maintain its eye position within an electronic window (1.25° radius)

centered on the fixation point. Aborting the trial and enforcing a time-out

period before the onset of the following trial punished inappropriate breaks

of fixation. Completed trials were identified by detecting the time of arrival

52

Fixate Targets Delay Go

6000

80

0Time fromtarget onset(ms)

Time fromsaccade (ms)

Meanresponse(spikes/sec.)

Figure 10a-b. Delayed saccade task used to identify LIP response fields.a The temporal structure of the delayed saccade task. Trials began with the onset of a small fixation point. After fixating for 150 ms, a single saccade target appeared for a variable delay period (250-800 ms) before the fixation point disappeared, cueing the saccade to the target. For monkey A the sac-cade target was always blue, indicating a low-magnitude (L) reward; for monkey T the target could also be red, with a 50% probability, indicating a high-magnitude (H) reward (2 units, ~0.24 ml of juice). b An example LIP neuron during the delayed saccade task. Each plot depicts a mean response as a function of time for one of the six saccade directions; activity is aligned to target onset in the left panels and to saccade time in the right panels.

a

b

53

of the monkey’s eye in an electronic window (1.25 radius) centered on the

target. The saccade target was typically presented at six locations in

pseudorandom order—all 10 degrees eccentric and separated by equal polar

angles (Fig. 10b). Eccentricities and angles were sometimes varied to locate

the sensitive region of a given neuron’s RF.

3.3 Results

3.3.1 Activity during delayed saccades

Figure 10b illustrates data from an example LIP neuron during the

delayed saccade task. Each plot depicts mean firing rate, as a function of

time, for one of the six saccade directions; neural activity is aligned to target

onset in the left panel of each plot through the time of the saccade in the

right panel. Note this neuron responds only when a target was presented at

180° and that activity is sustained throughout the delay period. Elevated

activity defines this spatial location as being within this neuron’s RF. We

recorded neural responses from 51 neurons with spatially selective, elevated

delay period activity from the right hemisphere of monkey A and 31

responses from the left hemisphere of monkey T.

Figure 10c depicts the mean FR (±s.e.m) of the 51 neurons from

monkey A as a function of time, when a target was placed within the RF (in-

RF, red traces) and when a target was placed 180° away from the RF (out-

RF, blue traces). As in Figure 10b, the left panel responses are aligned to

target onset, while in the right panel they are aligned to the time of the

saccade. Figure 10d depicts similar data from the 31 neurons from monkey

T. As mentioned above, for monkey T, the targets could also be red or blue

with equal probability, indicating a high magnitude (H) reward or (L) low

magnitude reward. In this plot the red and magenta lines are in-RF

55

Figure 10e. In the discrimination experiments one response target was posi-tioned within the RF of the neuron under study.e T1 is the target within the RF of the neuron under study, as illustrated by the purple, dashed circle, while T2 is positioned 180° away, in the opposite hemifield. The axis of stimulus motion was defined by these two target posi-tions so that motion discrimination choices corresponded to saccades into or out of the RF. We denote choices into the RF as T1 choices and those to the opposite target as T2 choices.

e

T1T2

T1T2

T1T2

T1T2

LL

HH

HL

LH

56

responses for the H and L targets, respectively; while the blue and cyan lines

are out-RF responses from the H and L targets, respectively.

While the selected population of LIP neurons from both monkeys is

clearly spatially selective, there are several notable differences between the

two animals. Neurons from monkey A exhibited higher average firing rates,

a faster and more pronounced transient response to the target onset and an

additional transient response at the time of the saccade. These differences

will also be evident in data acquired during the discrimination task. Note

that in Figure 10d, the red and magenta lines are superimposed, as are the

blue and cyan lines, indicating that these LIP neurons are not, on average,

modulated by target value in the context of a simple delayed saccade task.

3.3.2 The representation of choice, absolute value, relative value and

motion coherence in LIP

In the following four sections we address the representation of choice,

absolute value, relative value and motion coherence in our sample of LIP

neurons. In the first three sections, we qualitatively examine the dynamic

effects of each of these factors on LIP activity. In the fourth section, we

present quantitative analyses that capture these dynamic effects.

In all discrimination experiments we positioned one response target

(T1) within the RF of the neuron under study, as illustrated in Figure 10e

(purple dashed circle), while positioning the other target (T2) 180° away in

the opposite hemifield. The axis of stimulus motion was defined by these

two target positions so that motion discrimination choices corresponded to

saccades into or out of the RF. In the following sections, we denote choices

into the RF as T1 choices and those to the opposite target as T2 choices.

This design allows us to study responses of single LIP neurons to all

57

combinations of reward condition, motion coherence and behavioral

response.

3.3.2.1 Representation of choice: qualitative description

Figures 11a (monkey A) and 11b (monkey T) depict mean LIP firing

rate, averaged across all recorded cells, as a function of time for all

successfully completed trials in the HH (red) and LL (blue) reward

conditions. Data is plotted separately for trials in which the monkey chose

T1 (in-RF, solid lines) and T2 (out-RF, dashed lines). Both 11a and 11b

consist of two panels: a left panel with responses aligned to the time of

target onsetand a right panel with responses aligned to the time of the

saccade. The black vertical lines in both figures denote relevant task epochs:

0-250 ms is the target epoch in which the blank targets are presented; 250-

500 ms is the reward epoch in which the targets change color to

cue the reward condition; 500-1000 ms is the motion epoch in which the

random-dot motion stimulus is presented; 1000-1250 ms is the early

segment of the delay epoch; -350-0 ms (in the right panel) is the late delay

epoch immediately preceding the saccade.

Note first that in both 11a and 11b, the solid and dashed lines are

initially identical (for each color), diverging after approximately 200 ms into

the motion period. Thus, shortly after the onset of the motion stimulus, LIP

neurons in both monkeys begin to signal choice: whether the monkey will

choose T1 or T2. This result is not surprising. We explicitly selected for

study neurons that responded differentially to oppositely directed eye

movement in the delayed saccade task. It is well known from previous work

that such LIP neurons typically exhibit “choice predictive” activity during a

variety of tasks. The data in Figure 11 simply confirms that in our task, our

58


20

10

40

30

0 250 500 1000 -350 0


12

8

16

0 250 500 1000 -350 0

Monkey ATargetepoch

Rewardepoch


Late delay epoch

Time from target onset (ms) Time from saccade (ms)Monkey T

Time from target onset (ms) Time from saccade (ms)

Figure 11a-b. LIP represents the absolute value of the option in the RF.a-b Mean LIP firing rate, for all cells, as a function of time, for the HH (red) and LL (blue) reward conditions. Data are plotted separately for T1 (solid) and T2 (dashed) choices. In the left panels, responses are aligned to the target onset, while in the right panels, responses are aligned to saccade time. Any difference between the red and blue curves indicates LIP repre-sents the absolute value of the option in the RF.

b

a

59

sample LIP neurons exhibit choice predictive activity in which decisions are

based on a combination of visual motion and reward information. The effect

of behavioral choice in our data is strong, consistent across neurons and

monkeys and present for all reward conditions as demonstrated below.

3.3.2.2 Representation of absolute value: qualitative description

As discussed in the context of behavioral data in Chapter 2, any

differences in performance or in neural activity between the HH and LL

conditions indicate an effect of absolute reward value. By comparing the red

and blue lines in Figure 11 we can see the extent to which LIP represents

absolute reward value. Consider first the data from monkey A in Figure 11a.

The solid red and blue traces (T1 choices) separate with very short latency

following presentation of the reward cues at 250 ms. Thus monkey A’s LIP

population rapidly encodes the absolute value of T1, producing elevated

firing rates when a high value target is presented within the RF. Following

their initial separation, the red and blue traces converge briefly near the

beginning of the motion epoch, but then separate again for the duration of

the trial. Qualitatively, then, except for a brief interval near the onset of the

motion stimulus, LIP neurons from monkey A encode a signal concerning

the absolute value of the reward available in the RF throughout the trial.

Note that that a similar and more robust representation of absolute value is

present for T2 choices as well (dashed traces).

Figure 11b shows a similar pattern of activity for the LIP population

recorded from monkey T. Even though LIP activity in monkey T does not

respond as rapidly or robustly as in monkey A (consistent with the delayed

saccade data—Fig. 10c, d), all major features of the absolute value signal

observed in monkey A are replicated in monkey T: 1) the effect of absolute

60

value begins during the reward cue period, 2) greater absolute value is

represented by higher firing rates, 3) the effect is maintained until the end of

the trial and 4) the effect is present for T2 choice trials as well. A minor

difference is that the absolute reward signal does not “disappear” at any

point in the trial for monkey T.

3.3.2.3 Representation of relative value: qualitative description

As revealed by the behavioral data in Chapter 2, the relative reward

value of the two targets exerts a substantial impact on choice behavior. We

can examine the extent to which LIP represents relative value by comparing

LIP responses in the HH and HL reward conditions. In these conditions, the

value of T1 is constant (high value) while the value of T2 differs (high in

HH, low in HL). Thus, any LIP modulation between these two conditions

indicates a relative effect of T2 value on the response to the high value target

present in the RF. Figures 12a and 12b depict LIP responses for monkeys A

and T, respectively, to the HH (red traces) and HL (black traces) reward

conditions. The format of these figures is identical to Figures 11a and 11b,

and the red curves are the same as in Figure 11.

In Figure 12a, the black and red traces separate late in the reward cue

epoch (black arrow), with the average firing rate being higher for the HL

condition (black arrow). This difference indicates that on average, LIP

neuronsrespond more strongly to a target in the RF (T1) when it has a larger

value relative to that of the T2 target. This “relative value” signal is present

throughout the motion epoch but disappears early in the delay epoch, after

the choice has presumably been determined. The same dynamics are evident

both for T1 and T2 choices (solid and dashed lines, respectively).

A similar pattern of activity is present for the population data from

61


20

10

40

30

0 250 500 1000 -350 0


12

8

16

0 250 500 1000 -350 0

Monkey ATargetepoch

Rewardepoch


Late delay epoch



Figure 12a-b. LIP represents the relative value of the option in the RF.a-b Mean LIP firing rate, for all cells, as a function of time, for the HH (red) and HL (black) reward conditions. HH curves are the same as in Figure 11a-b. Data are plotted separately for T1 (solid) and T2 (dashed) choices. In the left panels, responses are aligned to the target onset, while in the right panels, responses are aligned to saccade time. Any difference between the red and black curves indicates LIP represents the relative value of the option in the RF.

a

b

62

monkey T, illustrated in Figure 12b. As for monkey A, the relative reward

signal emerges late in the reward cue epoch (black arrow), with average

firing rate being higher for larger relative value. For monkey T, however,

the relative reward signal fades more rapidly than for monkey A.

Additionally, for T1 choices, the relative reward signal inverts during the

second half of the motion epoch and remains inverted throughout the delay

epoch. This inversion is not present for T2 choices, however.

By comparing the LL and LH reward conditions, we acquire a second

look at the effects of relative reward on LIP activity. As in the previous

comparison of HH and HL trials, the value of T1 is identical (low) for the

LL and LH conditions. The two conditions differ only in the value of T1

relative to the value of T2, which is equal in the LL condition but low in the

LH condition. Again, any modulation of LIP activity between these two

conditions comprises a signal of relative reward value.

Figures 13a and 13b, plotted in an identical manner to Figures 11 and

12, compare average LIP responses in the LL (blue traces) and LH (green

traces) conditions for monkeys A and T, respectively. Note that the blue

curves in these figures are the same as the blue curves in Figures 11a and

11b. The data for monkey A shows an effect of relative reward similar to

that seen in Figure 12a. The green trace drops below the blue trace during

the reward cue epoch (black arrow), indicating again that average LIP firing

rates fall as the relative value of the target in the RF decreases. The green

and blue traces converge again during the motion period and remain together

throughout the delay period, indicating a diminished representation of

relative reward. As shown in Figure 13b, the effect of relative reward is

similar, although weaker, in monkey T (black arrow).

63


20

10

40

30

0 250 500 1000 -350 0


12

8

16

0 250 500 1000 -350 0

Monkey ATargetepoch

Rewardepoch


Late delay epoch



Figure 13a-b. LIP represents the relative value of the option in the RF.a-b Mean LIP firing rate, for all cells, as a function of time, for the LL (blue) and LH (green) reward conditions. LL curves are the same as in Figure 11a-b. Data are plotted separately for T1 (solid) and T2 (dashed) choices. In the left panels, responses are aligned to the target onset, while in the right panels, responses are aligned to saccade time. Any difference between the blue and green curves indicates LIP represents the relative value of the option in the RF.

a

b

64

3.3.2.4 Representation of motion coherence: qualitative description

To assess qualitatively the effect of motion coherence on LIP activity,

we separately plotted the response to individual motion coherences for the

HH reward condition. Figures 14a-b depict the mean LIP firing rate as a

function of time for monkey A and monkey T, respectively. This plot

format differs somewhat from the previous three figures. First, time begins

at the onset of the motion stimulus (500 ms, left edge). Second, the three

colors now represent three different motion coherences—48%, 6% and 0%

for monkey A (Fig. 14a), and 48%, 12% and 0% for monkey T (Fig. 14b).

Finally, to avoid confounding motion coherence effects with behavioral

choice, we plot data from correct choices only, for nonzero coherences.

Thus the solid lines (T1 choices) derive from positive coherences (except at

0% coherence) while the dashed lines (T2 choices) derive from negative

coherences.

For clarity and brevity we are only presenting the effects of coherence

for the HH reward condition. While the results from the other reward

conditions are comparable, they are qualitatively less compelling. Indeed,

while the following trends are qualitatively weak, they are all confirmed by

our regression models (discussed below). At the start of the motion epoch

(500 ms) for both monkey A and monkey T, all the lines are collapsed

together. The initial response to motion onset is a commonly observed (28,

54, 65) “dip” in activity (gray arrow), which is sometimes interpreted as the

initialization of the motion integration process. Following this dip, the solid

traces rise above the dashed, consistent with the choice predictive activity of

LIP neurons documented in previous studies and in Figures 11-13. Within

these diverging responses the black, blue and red curves also separate.

As discussed in the introduction, we expect the weight of evidence

65

30

10

500 1000 -350 0



Monkey A, HH reward condition

Monkey T, HH reward condition


Late delay epoch

Figure 14a-b. Effect of motion coherence for monkey A and monkey T.a Mean LIP firing rate for the HH reward condition as a function of time from the start of the motion epoch, for three motion coherences: 0% (black), 6% (blue) and 48% (red). We plot data from correct choices only, for non-zero coherences. Grey arrow demarks response “dip;” black arrow demarks graded coherence trend and cyan arrow demarks absence of coherence trend. b Similar for monkey T, but for 0% (black), 12% (blue) and 48% (red) coherence .

a

b20

10

500 1000 -350 0




Late delay epoch

20

66

supporting a decision, (the coherence) to modulate the T1 and T2 responses,

with greater coherence producing greater modulation. Thus, we expect

responses to the highest coherence (48%, red) to produce the greatest

activity when the monkey chose T1 (solid lines) and the least activity when

he chose T2 (dashed lines). Furthermore, we expect response to 6% (12%,

for monkey T) and 0% to be progressively reduced for T1 choices and

increased for T2 choices. This trend is clearly visible (black arrow) for both

T1 (solid) and T2 (dashed) responses.

Finally, note the effects of coherence are predominantly visible during

the second half of the motion epoch. Once the motion epoch ends (1000 ms)

and the delay epochs begin the consistent effects of coherence are greatly

diminished and by the late delay epoch they appear to be entirely absent

(cyan arrows). This indicates that as the motion epoch ends, LIP is

modulated by the impending choice but not by the sensory evidence that

supported it.

3.3.2.5 Quantifying LIP dynamics: absolute value, relative value,

motion coherence and choice

As the preceding section demonstrated, the response of both LIP

populations are highly dynamic, representing different behaviorally relevant

factors to varying degrees at various times. The qualitative assessment

above indicates that on average LIP neurons multiplex the absolute value,

relative value and motion coherence signals. Additionally, LIP is strongly

modulated by the impending choice. To quantify these trends we have

applied a multiple-variable, linear regression model to LIP activity over a

sliding temporal window, in order to determine if and how absolute value,

relative value, motion coherence and choice are modulating LIP as a

67

function of time. The model is described in Equation 7.

Equation 7:

Where FR(t) is the mean firing rate over a given temporal epoch and trial;

βcoh, βt1, βt2 and βchoice are the fit coefficients representing the effect of

motion coherence, target value and choice on this firing rate. COH is an

assigned factor for the coherence of the motion stimulus on that trial, in

fractional units of the maximum coherence and signed to signify the

direction as described above. Thus, COH has a range from -1 to 1, where -1

represents -48% coherence and +1 represents +48% coherence. T1val and

T2val are assigned either +1, if the target was H, or -1 if the target was L.

For example, on HL trials in which the motion coherence was -12%, COH=-

0.25, T1val=+1 and T2val=-1. Choice is assigned a value of +1 for T1

choices and -1 for T2 choices. Constraining these factors to be in the same

range (-1 to 1) allows us to directly compare the values of the fit coefficients

and determine which have greater impact on FR. Note, Equation 7 is very

similar to Equation 1, which was used to model the probability of a T1

choice, with the addition of a choice factor.

For each LIP neuron we apply this model to the average firing rate over

a 50 ms window that is progressively slid, in 1 ms intervals, across the

duration of a trial. This generates a time vector of coefficients (βcoh, βt1,

βt2 and βchoice) for each neuron in the population describing that factor’s

influence on the mean firing rate of that neuron at that time point.

Figures 15a and 15b plot the mean (±s.e.m) coefficient, across neurons,

for βcoh (black), βt1 (red), βt2 (blue) and βchoice (green) as a function of

FR (t) = β0+βcoh(COH)+βt1(T1val)+βt2(T2val)+βchoice(CHOICE)

69

time for monkey A and T respectively. The format of Figure 15 is similar to

Figures 11, 12 and 13. When interpreting these results keep in mind, first,

that it is the sum of βt1 and βt2 that fully captures how given reward

conditions (HH, LL, HL and LH) modulate FR. Second, for each reward

condition, before addition, the coefficients are multiplied by the appropriate

factor values (T1val and T2val). For example, as pointed out above, in the

HL reward condition the factor for βt1 is +1 while the factor for βt2 is -1.

Thus, although βt2 might have a negative value, in this condition it actually

has a positive influence on FR. Third, βt1 models the effect of the target

within the RF and thus absolute reward, while βt2 models the effect of the

target outside the RF and thus relative reward.

Consider the red line in Figure 15a, which is the mean (±s.e.m) βt1

value for Monkey A. This coefficient rises rapidly during the reward epoch,

diminishes as the motion epoch begins but then quickly rebounds during the

motion epoch. As the motion epoch ends, the coefficient again diminishes

but stabilizes throughout the delay epochs. This confirms the representation

of absolute reward value we observed in the average FR (Figure 11a). The

red curve in Figure 15b, which plots mean (±s.e.m) βt1 for Monkey T,

follows a similar rise and fall, indicating that both LIP populations have

similar, temporal representations of absolute value.

The results for monkey T, while following the same trend as monkey A,

differ in two main respects: the coefficients are smaller and they are more

variable. The smaller value coefficients result from the lower overall firing

rate in monkey T’s population compared to monkey A’s (c.f. Figs. 10-13).

The greater variance likely results from the smaller sample size (monkey A:

n=51; monkey T: n=31).

The blue lines in Figure 15 show the average βt2 value, capturing the

70

effect of relative value by modeling the influence of the target outside the

RF. Like absolute value, the influence of relative value begins at the onset

of the reward-cue epoch, but it grows more slowly than its counterpart and

peaks at the start of the motion epoch. As the motion epoch unfolds,

however, the effect of relative value diminishes. Note that while the effect

of relative reward persists thought the delay period, it is much smaller than

the effect of absolute value (red). Also note that the average βt2 coefficient

changes its sign at the end of the motion epoch, implying that as the motion

epoch ends, LIP inverts its representation of relative value. This trend is

visible in the average firing rates of Figures 12a and 13a. The blue line in

Figure 16b plots similar βt2 results for Monkey T. Note that monkey T’s

population inverts its representation of relative value midway through the

motion period, much earlier than monkey A.

The black lines in Figures 15 depict the average value of the βcoh

coefficient, capturing the effect of motion coherence on LIP FR. In both

Figures 15a and 15b the black lines begin to rise approximately 200 ms into

the motion epoch, reach their peak approximately 400 ms after motion onset,

after which they decline to zero. For monkey A the effect of motion

coherence then reemerges at a very low level during the delay period, while

for monkey T motion coherence has no effect on the firing rate during the

delay period.

The average value of βchoice, representing the effect of choice

outcome on LIP, is depicted by the green lines in Figures 15a and 15b. The

green lines in both figures follow a very similar and straightforward trend,

emerging from zero after about 200 ms into the motion period. As the

effects of other factors diminish, the effect of choice continues to grow

throughout the delay period, reaching its peak immediately preceding the

71

saccade. Note that for both monkeys, the peak effects of choice are nearly

equal to the peak effects of absolute value.

3.3.2.6 Quantifying LIP dynamics: absolute value, relative value and

motion coherence within choice

While the quantitative analysis presented above captures the obvious

and subtle effects of absolute value, relative value, motion coherence and

choice on LIP activity, it does not capture any differences that might exist in

how value and motion coherence are presented within a given choice.

Considering our qualitative assessment above, we know there are likely to

be significant differences between T1 and T2 choices. Additionally, the

preceding model (Equation 7) had two factors, choice and coherence, which

are highly correlated. Including correlated factors as co-regressors in

regression models can produce inaccurate results. To address both these

issues, we dropped the choice factor from the model (Equation 7) and then

separately applied this abbreviated model to trials, resulting in T1 and T2

choices.

The results of this analysis for monkey A and monkey T are presented

in Figures 16 and 17, respectively, in a format similar to Figure 15. Figure

16a depicts for monkey A, the average (± s.e.m) value of βt1 for T1 choice

(solid) and T2 choice (dashed). These lines are identical until the start of the

delay period, after which they differ significantly. The representation of

absolute value is larger for T2 (dashed) then for T1 choices, an effect clearly

visible in the average firing rates depicted in Figure 11a.

Figure 16b depicts for monkey A, the average (± s.e.m) results for βt2,

which represents relative value, for T1 choices (solid) and T2 choices

(dashed). The results for T1 and T2 choices differ significantly during

73

several points in the trial. They first diverge at the peak representation of

relative value, when the motion epoch begins. This indicates that the effect

of relative value is greater on T2 choices then T1 choices. Close

examination of mean firing rates in Figures 12a and 13a, however, reveal no

difference in the representation of relative value for T1 and T2 choice. It is

possible the model is capturing the larger effect of the LH reward conditions

relative to the HL conditions. The LH condition is composed more of T2

than T1 choices. The representation of relative value also differs during the

delay period. For T1 choices, the representation of relative value converges

on zero before the end of the motion epoch and remains at zero throughout

the delay epoch. For T2 choices, however, the representation inverts at the

end of the motion period and remains significantly positive through the

delay period.

74

The effects of motion coherence also differ for T1 and T2 choices.

Figure 16c plots the average (± s.e.m) value of βcoh for T1 (solid) and T2

(dashed) choices. As expected, the effect of motion coherence is greater for

T2 choices during the second half of the motion epoch. Note also that for

T1 choices the average coefficient significantly dips below zero midway

though the motion epoch, while for T2 choices it dips below zero at the start

of the delay epoch. In fact, these trends are visible upon close inspection of

the average firing rates, depicted in Figures 14a-d. Despite these odd

gyrations, however, the effects of coherence are weakly present throughout

the delay period for both T1 and T2 choices.

Figures 17a-c plot the results of this analysis for Monkey T. Figure 18a

depicts the average (± s.e.m) value of βt1 for T1 choice (solid) and T2

choice (dashed). Except for a brief period at the end of the motion epoch,

Monkey T’s LIP population does not represent absolute value differently for

T1 and T2 choices. Figure 17b depicts the average (± s.e.m) value of βT2

for T1 choice (solid) and T2 choice (dashed). Like monkey A, the

representation of relative value inverts as the trial progresses. However,

unlike monkey A, the inversion for Monkey T occurs for T1 choices midway

through the motion period. Similarly, Monkey T shows a greater effect of

coherence for T1 choices than T2 choices as depicted in Figure 17c, which

depicts the average (± s.e.m) value of βcoh for T1 choice (solid) and T2

choice (dashed).

3.3.2.7 Quantifying coherence within reward condition

Recall that in Section 2.3.3, we demonstrated that the monkeys show no

significant difference in psychophysical sensitivity across the four reward

76

conditions. This indicated that relative value results in the observed

behavioral bias by contributing an additive offset on neurophysiological

accumulation of current sensory evidence. As discussed in greater detail in

Section 3.4.2, it is possible, however, that relative or absolute value effects

the rate at which sensory evidence is accumulated. To determine if the

effect of motion coherence on LIP activity depends on reward condition (HH,

LL, HL or LH), we modeled the effect of coherence separately for each

reward condition while controlling for choice. The preceding analysis (Figs.

15a and 15b) indicates that the effect of motion coherence on LIP activity is

confined to the second half of the motion epoch (750 ms to 1000 ms from

target onset). Therefore we focused our analysis on this temporal window.

We applied a modified version of the model presented in Equation 7 to the

mean firing rate in the 250 ms time window at the end of the motion epoch.

77

In this model, surmised in Equation 8, we have removed the factors for

T1val and T2val.

Equation 8:

Frequency histograms of the resulting βcoh values are plotted in Figure

18a (monkey A) and 18b (monkey T). Results are plotted for the HH (red),

LL (blue), HL (green) and LH (black) reward conditions, for T1 choices

(upper histogram) and T2 choices (lower histogram). For each monkey and

choice we performed a one-way anova to identify differences in βcoh

frequency among reward conditions. For both monkey A and monkey T, we

detected no significant effect of reward condition on βcoh frequency for

either T1 or T2 choices (monkey A: T1, p=0.3865, T2, p=0.1353; monkey

T: T1, p=0.5883, T2, p=0.7675).

3.3.3 Do individual LIP neurons integrate sensory and value

information?

The results of this model confirm that, on average, both LIP

populations are similarly and dynamically representing absolute value,

relative value and motion coherence, and that most of the trends visible in

the average firing rate can be verified quantitatively with our linear

regression model. Each of these factors, however, might be encoded

exclusively by separate sub-populations within our selected LIP population.

To determine if single neurons are modulated by all three factors, we asked

what percentage of neurons within a given task epoch had significant

FR (t) = β0 + βcoh(COH) + βchoice(CHOICE)

78

coefficients for the various combinations of these three factors.

Figures 19a-f plot a series of Venn diagrams depicting the possible

intersections of these three sets of coefficients. In these figures the red circle

represents the βt1 set, the blue circle represents the βt2 set, while the black

circle represents the set of βcoh coefficients. The overlapping areas of these

circles de-mark elements that these sets have in common. Within each of

these areas, we report the percentage of neurons belonging to this subset

(Equation 7, applied to average activity in each epoch, βt1val, βt2val and

βcoh significantly different from 0). Note that these percentages do not sum

to 100%, as some neurons are not significantly modulated by any of these

Monkey A Monkey T

ßcoh

CountT2 T2

LHHLLLHH

LHHLLLHH

Figure 18a-b. The effect of motion coherence is independent of reward con-dition.a-b

-

a b

79

29.0%

32.2%

3.2% 3.2% 0%

3.2%9.6%

51.6%

6.4%

0% 0% 6.4%

6.4%0%

7.5%

43.3%

11.3% 1.8% 0%

9.4%20.7%

16.9%

64.1%

3.7% 0% 1.8%

5.6%1.8%

32.0%

20.7%

5.6% 3.7% 0%

15.0%5.6%

35.4%

22.5%

9.6% 3.2% 3.2%

3.2%3.2%

ßt2 ßcoh

ßt1

ßt2 ßcoh

ßt1

ßt2 ßcoh

ßt1

ßt2 ßcoh

ßt1

ßt2 ßcoh

ßt1

ßt2 ßcoh

ßt1

Motion epoch

Early delay epoch

Late delay epoch

Monkey A Monkey T

Figure 19a-f. Individual LIP neurons integrate sensory and value informa-tion.a-f Venn diagrams depicting the possible intersections of three sets of coeffi-

areas we have reported the percentage of neurons belonging to this subset.

a b

e f

c d

80

factors within a specific epoch.

Figures 19a and 19b plot the results for monkeys A and T, respectively,

for the reward cue period. In this epoch we can see that 64.1% of neurons in

monkey A’s population represented both T1val and T2val while only 6.4%

of monkey T’s population represented both factors. While a small fraction

of monkey A’s population were modulated either only T1val or only T2val,

a large portion (51.6%) of monkeys T’s population was modulated by only

T1val. Figures 14c and 14d similarly plot the results for the motion stimulus

epoch. Note that in this epoch, while some neurons are encoding a single

factor, a large portion (Monkey A: 75.2%; Monkey T: 48.2%) are encoding

combinations of two or more factors. In the late delay epoch, Figures 19e

and 19f, many neurons (Monkey A: 45%; Monkey T: 32.1%) in both

populations continued to represent two or more factors. These results

indicate that most neurons are multiplexing absolute reward, relative reward

and motion coherence signals.

3.3.4 Common Currency

The preceding behavioral analysis (Chapter 2) revealed that on relative

reward trials (HL and LH), the monkeys are biased towards choosing the

target with the greater relative value. Additionally, our bEVS analysis

quantified the magnitude of this bias in units of motion coherence, thereby

establishing a quantified equivalence between relative value and motion

coherence. Simply put, it revealed that a relative increase in 1 unit of reward

is, on average, behaviorally equivalent to 14.7% (monkey A) and 16.3%

(Monkey T) motion coherence. If information converging on LIP is

integrated into a scale dependent on its common influence on behavior, we

should be able to uncover an equivalence between relative reward value and

81

motion coherence on the neural level comparable to the one observed

behaviorally. This predicts that the modulation of LIP from relative rewards

should be equal to the modulation produced by 14.7% coherence. Thus, if

LIP encodes information in a common currency, the neural equivalent visual

stimulus (nEVS) should be equal to the behavioral equivalent visual

stimulus (bEVS).

Given that our physiological model (Equation 7) and our behavioral

model (Equation 1) are very similar, we would like to define nEVS as we

defined bEVS (Equation 3) and then compare the two. However, two

reasons prevent us from directly comparing these models’ coefficients. First,

Equation 1 is a logistic model of the log odds of a T1 choice, while Equation

7 is a linear model of mean FR. Second, Equation 7 has a factor for choice

that is not present in Equation 1. Thus, each model’s coefficients represent

fundamentally different quantities. We addressed the first issue by modeling

the log odds of an occurrence of a spike with logistic regression rather than

modeling the average firing rate with linear regression. We addressed the

second issue by dropping the choice factor from the model. The scientific

justifications for dropping this term from our model and the implications

thereof are addressed in greater detail in the discussion (Section 3.4.3), and

potentially significantly impact the interpretation of the following results.

Despite this caveat, dropping choice from the model gives us two logistic

models, one for the behavior and one for the physiology. Because these

models have similar factors, we can similarly define and directly compare

nEVS and bEVS. This new model is defined in Equation 9:

Equation 9:

82

Where s is the observed probability of a spike occurring, βcoh, βt1 and βt2

are the fit coefficients representing the effect of motion coherence and target

value on this probability.

Applying this model over a 50 ms window progressively slid, in 1 ms

intervals, across the duration of a trial generates a time vector of coefficients

(βcoh, βt1, βt2) for each neuron in the population describing which factors

influence at that time point. In Figures 20a and 20b we see this model’s

results for monkey A and T respectively, starting at the motion epoch, in a

similar fashion to Figure 15. In these figures the red curves represent the

average βt1 value, the blue the average βt2 value and black the average βcoh

value.

The consequences of removing choice from the physiological model

can be seen by comparing Figures 20a and 20b with Figures 15a and 15b,

respectively. While we cannot directly compare the magnitude of these two

sets of coefficients, we can compare both their magnitudes relative to each

other and their general time-course. Note that the βcoh (black) in this model

(Equation 9) continues to influence LIP activity through the delay epoch,

unlike the model (Equation 7) containing a choice factor. This indicates that

the coherence term now captures a portion of the variance previously

captured by choice. The βt2 term represents relative value (blue), which in

this model (Equation 9) continues to influence LIP through the delay epoch,

indicating it captures a portion of the variance previously captured by choice.

Removing the choice term has little effect on the βt1 term (red), which

influences LIP activity through the motion and delay epoch in both models.

ln

�s

1− s

�= β0 + βcoh(COH) + βt1(T1val) + βt2(T2val)

83

1000500 -350 0

0

-0.2

0.2

0.4

Meancoe!cient


1000500 -350 0

0

-0.2

0.2

0.4

Meancoe!cient


Monkey A

Monkey T

Figure 20a-b. To calculate nEVS we modeled the log odds of a spike occurring with logistic regression model without a factor for choice.a-b

a

b

Motion epochepoch epoch

84

The factors in Equation 9 are defined in a manner identical to those in

Equation 1, allowing us to define nEVS with Equation 10:

Equation 10:

The preceding analysis indicates that LIP neurons multiplex absolute

reward, relative reward and motion coherence (Figure 19). Although the

same is true when this analysis is repeated with Equation 8 rather then

Equation 7, the percentage of neurons simultaneously and significantly

modulated by absolute reward, relative reward and motion coherence in the

late delay period are larger (Monkey A: 56.6%, Monkey T: 35.48%).

These results are depicted in Venn diagrams plotted in Figures 21a and 21b.

In the following analysis we will focus on this subset of our LIP populations.

9.67%

0%

0% 3.2% 12.9%

29%35%

5.66%

5.66%.

0% 9.43% 7.54%

11.32%

56.5%

ßt2 ßcoh

ßt1

ßt2 ßcoh

ßt1

Late delay epoch

Monkey A Monkey T

Figure 21a-b. Venn diagrams depicting that larger percentage of neurons are simultaneously and significantly modulated by absolute reward, relative reward and motion coherence in the late delay epoch. Venn diagrams are similar to those in Figure 20.

a b

nEV S =βt1− βt2

βcoh

85

Figures 22a (Monkey A) and 22b (Monkey T) plot frequency

histograms of the nEVS values for this sub-population of LIP neurons. The

means (±s.e.m) of these distributions are denoted with solid green lines. The

mean (±s.e.m) bEVS for the behavioral data collected with these neurons is

denoted with red lines. For Monkey A, the mean nEVS was 22.35% (s.e.m,

±4.08) coherence, while the mean bEVS was 15.35% (s.e.m, ±1.05)

coherence. For Monkey T the mean nEVS was 26.61% (s.e.m, ±6.47)

coherence and the mean bEVS was 18.71% (s.e.m, ±1.84) coherence. A

paired t-test of each monkey’s data reveals no significant difference

(Monkey A: p=0.3732; Monkey T: p=0.2543) between the bEVS and

nEVS means. Assuming we are justified in removing choice from our

model (see discussion Section 3.4.3), these results indicate that this LIP sub-

population integrates reward information and motion coherence in a

common currency.

3.3.5 Population heterogeneity

While the preceding analysis focused on LIP average activity, it is

important to note that within these populations, there is a small degree of

heterogeneity. Here we will present some single cell examples from both

LIP populations, representing some of the unique activity profiles we have

observed. The following data are presented in identical manner to Figures

11, 12 and 13. However, in these figures we are presenting the results from

all four reward conditions (HH in red, LL in blue, HL in black and LH in

86

green) and both choices (into RF, solid lines; out of RF, dashed lines)

simultaneously.

Figure 23a plots a single cell example from Monkey A, with a response

profile nearly identical to the population average. Note that the

representation of absolute value (HH, red vs. LL, blue) that emerges soon

after the presentation of the reward cue diminishes as the motion cue period

begins and reemerges later in the trial. Also, note that the representation of

relative value (HH, red vs. HL, black; and LL, blue vs. LH, green) emerges

at the end of the reward cue period, grows though the beginning of the

20 40 60

1

2

3

4

0 100

10

20

Count

Monkey A

nEVS (% coherence)

Monkey Ta b

Figure 22a-b. LIP integrates reward information and motion coherence in a common currency.a-b Frequency histograms of the nEVS values resulting from Equation 10. The distribution means are denoted with the sold green lines (±sem, dashed). The mean bEVS, for the behavioral data collected with these neurons, is denoted with the solid red line (±sem, dashed). For Monkey A the mean nEVS was 22.35% (sem ±4.08) coherence and the mean bEVS was 15.35% (sem ±1.05) coherence. For Monkey T the mean nEVS was 26.61% (sem ±6.47) coherence and the mean bEVS was 18.71% (sem ±1.84) coherence. A paired t-test of each monkey!s data reveals no significant difference (Monkey A: p=0.3732; Monkey T: p=0.2543) between the bEVS and nEVS means.

87

motion cue period and then diminishes as the trial progresses. Figure 23b

plots a similar cell from Monkey T’s population. Note, in this neuron the

representation of absolute reward (difference between HH, red line and LL,

blue line) persists relatively equally through the entire trial for T1 choices

while fluctuating slightly for T2 choices.

Figure 23c depicts a neuron from Monkey A’s population which is not

modulated by choice. This neuron, however, like all others in the population,

was selected based on its choice predictive activity during the delayed

saccade task (presented above). Figure 23d plots this neuron’s activity

during the delayed saccade task. Here we are plotting this neuron’s mean

response during the delay epoch (radius) as a function of target position

(angle). This neuron fired at ~40 spikes per second when a target was

presented at 180° (where T1 was placed during the discrimination task), but

fired at ~10 spikes per second when the target was presented at 0° (where T2

was placed during the discrimination task). Thus, while this neuron was

apparently modulated by choice in the delayed saccade task, it was not in the

context of the discrimination task. Note, however, this neuron briefly

represented absolute value. During the delay epoch, its response to all

reward conditions is ~35 spikes per second, the same level of activity

observed in the delayed saccade task. This indicates the response in the

delayed saccade task was driven by the target’s value. Figure 23e depicts a

neuron from Monkey A’s population with an opposite pattern of activity.

This neuron represents neither absolute nor relative reward during the

motion discrimination task, but is instead modulated by choice.

88


40

80

0 250 500 1000 -350 0

Monkey A 110804_2a



20

40

0 250 500 1000 -350 0

Monkey T 111507_2a


a

b

Figure 23a-b. Examples of single cells with responses nearly identical to their population average.a-b Mean LIP firing rate as a function of time, for the HH (red) and HL (black) reward conditions. Data are plotted separately for T1 (solid) and T2 (dashed) choices. In the left panels responses are aligned to the target onset, while in the right responses are aligned to saccade time.

Targetepoch

Rewardepoch


Late delayepoch

89

Monkey A 120803_2a


30

60

0 250 500 1000 -350 0Time from target onset (ms) Time from saccade (ms)

60

240

120

300

180 0102030

c

d

Figure 23c-d. A Single cell demonstrating no choice-related activity in the discrimination task, despite being well tuned in the delayed saccade task.c Mean LIP firing rate as a function of time, similar to 23a-b. Note the solid and dashed lines are overlapping indicating this neuron did not represent the impending choice during the discrimination task. It did, however, have strong delay period activity in the delayed saccade task. d Mean response (radius) of this same neuron during the delay epoch of the delayed saccade task as a function of target position (angle).

90

3.4 Discussion

The primary goal of these experiments was to determine if and when

single LIP neurons represent relative reward value, absolute reward value

and motion coherence. We further endeavored to determine if LIP integrates

these factors in a common currency. We have molded the firing rates of

single LIP neurons as a function of these factors and successively applied

this model across the duration of the experimental trial. This analysis has

revealed that LIP neurons simultaneously represent these factors, that this

representation is highly dynamic and might occur in the context of a

common currency.

3.4.1 The dynamic representation of absolute value, relative value,


15

30

0 250 500 1000 -350 0

Monkey A 12003_1b


eTargetepoch

Rewardepoch


Late delayepoch

Figure 23e. A Single cell demonstrating choice-related activity only. This neuron represents neither absolute nor relative reward during the motion discrimination task, but is instead modulated by choice during the late delay epoch.

91

coherence and choice.

The preceding analysis demonstrates that LIP neurons initially respond

to our task with a rapid representation of the absolute value of the target in

the response field. Within 200 ms, this representation is then augmented by

the value of the target outside the response field and LIP comes to

additionally represent the target’s relative value. Targets of greater absolute

and relative value are represented in LIP with greater firing rates.

Importantly, the representation of relative value is clearest at the start of the

motion epoch and therefore ideally positioned to effect the integration of the

forthcoming motion information (discussed below). As the motion epoch

develops, however, the representation of both relative and absolute value

fade. As these value signals fade, LIP neurons become strongly modulated

by the monkeys’ forthcoming choice.

This representation of choice quickly dominates LIP responses and

persists through the time of the saccade. Within this representation of choice,

neurons are modulated by the specific coherence of the motion stimulus.

This modulation is brief and largely confined to the second half of the

motion epoch. As the motion epoch ends, the representation of relative

value is largely gone, but the representation of absolute value remains. The

delay epoch’s LIP activity represents the absolute value of the target in the

response field and predominantly represents choice, irrespective of the

coherence or relative value supporting it.

3.4.2 Relation to the integrator/ accumulator model of decision making

These results are very consistent with the integrator model of decision

making presented by Mazurek and Shadlen (48). In this model, LIP

accumulates a decision variable up to a threshold. This decision variable can

92

be thought of as the motion coherence, or the weight of evidence supporting

each alternative. As discussed in Chapter 2, Gold and Shadlen (21) also

suggest that this decision variable is proportional to an option’s logLR and is

thus capable of incorporating factors additively. In the context of a two-

alternative, forced-choice, motion discrimination task, the difference

between opposing-direction motion-signals from the sensory cortex is the

posited physiological substrate of the decision variable. This difference

signal is accumulated by LIP neurons representing each alternative (located

in their RF) until a threshold is crossed. It is the crossing of this threshold

that is the presumptive representation of a choice.

Our model predicts that relative value biases choice by adjusting how

quickly the decision variable crosses the threshold. Relative value can

accomplish this by influencing one or more of three possible model

parameters: the accumulator’s initial state, the rate of accumulation or the

threshold’s height. Our physiological results support a model in which

relative value imposes an additive offset to the accumulator’s initial state,

without adjusting the rate of accumulation (Section 3.3.2.7, Figs. 18a-b).

This result is compatible with the behavioral analysis presented in Chapter 2,

demonstrating that relative value additively affects the probability of

choosing T1 without effecting psychophysical sensitivity (Section 2.3.3 and

Figs. 5a-b).

3.4.2.1 Relative value imposes an additive offset to the accumulator’s

initial state

Recall from the preceding analysis that LIP’s peak representation of

relative value is at the start of the motion epoch (Figures 15a-b, ~500 ms). If

relative value influences the accumulator’s initial state, then this is when we

93

would expect to see its effect. We can estimate how much of an offset the

HL reward condition introduces, in terms of neural activity, by solving

βt1(T1val) + βt2(T2val) from Equation 7, with the coefficients from the start

of the motion epoch. Based on the results depicted in Figures 16a and 16b,

we estimate that at the start of the motion epoch relative value introduces an

offset of approximately 4.3 (monkey A) and 3.4 (monkey T) spikes per

second. These estimates are very reasonable given the average firing rates

depicted in Figures 12a-b.

Hanks and colleagues (28) artificially introduced an additive off-set to

the accumulator by microstimulating LIP neurons during a reaction time

version of the motion discrimination task. They report that LIP stimulation

introduces a slight choice bias, equal to a bEVS of ~2.85% coherence, with

the effects on reaction timeequal to ~4.65% coherence. In contrast,

stimulation of MT in the same monkey performing the same task results in a

much larger choice bias (12). The authors then conclude that stimulation of

MT increases the evidence supporting a decision, while stimulation of LIP

offsets the accumulator. Intuitively, they argue that while the local effect of

stimulation on MT is small, it is also constant, and because this stimulation

effect is temporally accumulated in LIP, the total effect on choice is

substantial. In contrast, while the effect of stimulating LIP is also small and

constant, it does not benefit from temporal accumulation and therefore has

only a small effect on choice (see 24 for review). To quantify this intuition,

they first assume that stimulation in LIP introduces an offset of ~5 spikes per

second (sps). (This assumption is based on the previously observed effects

of MT stimulation in the same monkey; 12). They then model the effect of a

5 sps additive offset to LIP and demonstrate that the result provides a good

fit to the observed behavior.

94

In summary, they equate an additive offset of ~5 sps with a bias of

~2.85% coherence. We observe that relative reward adds an offset of ~4.3

(monkey A) and ~3.4 (monkey T) sps, and resulting biases are equal to

~15% (monkey A) and ~17% (monkey T) coherence. The limited effect of

microstimulation, compared to our explicit reward cue, is likely a result of

the stimulation’s limited capacity to affect the entirety of LIP’s decision-

related network.

In addition to an additive offset, it is also possible that relative value

drives the decision variable to cross the threshold earlier by increasing the

rate of accumulation for relatively high value targets. If this is so, then for

example, we would expect the coherence effects to be greater for a T1

choice in the HL conditions than in the LH conditions. We see no

significant effect of reward condition on the distributions of βcoh

frequencies within T1 or T2 choices for either monkey (Section 3.3.2.7, Figs.

18a-b). This indicates that the rate at which the decision variable is

accumulated is independent of relative value. In Chapter 2 we similarly

found that the monkeys’ psychophysical sensitivity to coherence is

independent of relative value (Section 2.3.3 and Figs. 5a-b).

The final possibility is that relative value effects the threshold’s height.

Increasing an option’s relative value should lower its threshold, causing the

decision variable to reach the threshold more quickly. However, estimating

the threshold height, or the time at which the decision variable crossed it, is

not possible within this experimental design. In these experiments, the

motion stimulus is presented for a fixed duration of 500 ms and the monkey

is forced to delay reporting his choice. Thus, we have no means of

determining when the decision variable crosses the threshold. If this task

were modified to emphasize reaction time and permit subjects to report

95

choices freely, we could take the reaction time as surrogate for the boundary

crossing (54).

3.4.2.2 Coherence effects are consistent with the integrator model

Our coherence effects are both reasonably consistent with the integrator

model and similar to previous reports. The most comparable previous study

is Shadlen and Newsome (65), who reported their range of coherence (0%-

51.2%) increased LIP activity by 2.7 spikes per second for T1 choices and

4.2 spikes per second for T2 choices. Roitman and Shadlen (54, fixed-

duration experiments) report slightly larger modulations for the same

coherence range: 13.2 and 5.2 spikes per second for T1 and T2 choices,

respectively. Based on our model of LIP activity (Equation 7; βcoh), we

find that our range of coherence (0%-48%) modulates LIP activity by 2.0

spikes per second (monkey A) and 0.78 spikes per second (monkey T)

across both choices.

The integrator model predicts that these coherence effects should be

largely confined to the end of the motion epoch and should, ideally, be

absent in the delay epoch activity of T1 choices, but present in the delay

epoch activity of T2 choices (48). Indeed, our results (Section 3.3.2.6, Figs.

15-b, 16c and 17c) demonstrate that the effects of coherence are largely

confined to the second half of the motion epoch. Our results are slightly less

clear regarding delay period activity.

Mazurek and Shadlen (48) argue that the difference between T1 and T2

delay epoch activity is a result of the accumulation process. They argue that

for T1 choices, the observed accumulator (i.e. the LIP cell under study, with

T1 in the RF) must have crossed the decision threshold at some point during

the motion epoch. Thus, its delay activity should be pegged at the

96

threshold’s height. For T2 choices, however, the observed accumulator does

not reach the threshold (thus, T1 is not chosen). The delay period activity

after a T2 choice should then be pegged at some value below the threshold,

with this value a function of coherence. In both monkeys, however, we find

a very weak effect of coherence during the delay period. Similarly, weak

effects of coherence were reported during the early delay epoch by Roitman

and Shadlen (54, fixed-duration experiments) and Shadlen and Newsome

(65). For monkey T, the effects of coherence during the delay period are

larger for T1 choices then T2 choices (Figure 17c), while for monkey A they

are very similar for T1 and T2 choices (Figure 16c).

3.4.2.3 Effects of absolute value are not predicted by the integrator model

The integrator model does not predict the strong and persistent effects

of absolute value found throughout our trial. The integrator model

emphasizes relative value’s effect on integration (21, 24) because relative

value, not absolute value, influences behavior. The representation of

absolute value through the motion and delay epoch, however, does not

exclude the integrator model.

During the motion epoch, the increased activity for trials of greater

absolute value would also serve as an offset to the accumulators. In the HH

condition, this offset would be equally applied to both targets and would

drive both option’s integrators closer to their respective thresholds. This

leads to two predictions.

First, because all responses should cross the threshold sooner, reaction

times should be shorter for all choices on HH trials. We are unable to test

this prediction directly because this is not a reaction time task. As a

surrogate, in Chapter 2 we measured saccade latency (Section 2.3.5; Figs.

97

9a-b) and found no systematic effects of absolute value. Conversely, the

second prediction is that on LL trials all coherence should be less likely to

reach the threshold. Consequently, in the LL reward condition, responses to

both targets should be less likely than in the the HH reward condition and

the monkey should be more likely to saccade to some other region of space.

This prediction is supported by the analysis of no-choice trials presented in

Chapter 2. We demonstrated that across all coherences, both monkeys were

less likely to choose either low value target in the LL reward condition

(Section 2.3.4; Figs. 7-8).

However, the persistent representation of absolute value during the

delay epoch is more difficult to reconcile with an integrator model. Like

coherence, the effects of both absolute and relative value should be absent

during the delay epoch for T1 choices, but present for T2 choices. While

both monkeys show effects of absolute value during the delay period, for

monkey A, this effect is larger for T2 choices than for T1 choices (Section

3.3.2.6; Figs. 16a-b). This difference, however, is not visible for monkey T

(Section 3.3.2.6; Figs.17a-b).

3.4.3 Does LIP integrate sensory and value information in a common

currency?

In the preceding analysis, we attempted to demonstrate that in the late

delay period LIP comes to integrate sensory and value information in a

common currency (Section 3.3.4; Figs. 20-22). As discussed above, our

behavioral analysis established a quantified equivalence between relative

reward value and motion coherence. Specifically, it revealed that a relative

increase in 1 unit of reward is, on average, behaviorally equivalent to an

increase of 14.7% (monkey A) and 16.3% (monkey T) motion coherence

98

(Section 2.3.1; Figs.3a-b). We posited that if information converging on LIP

is integrated in a scale dependent on its common influence on behavior, we

should be able to uncover equivalence between relative value and motion

coherence on the neural level comparable to the one observed behaviorally.

This predicts that the modulation of LIP from relative value should be equal

to the modulation produced by 14.7% coherence.

It is important to note that, at the behavioral level, coherence has a

greater effect on the probability of a T1 choice than relative reward does.

However, the results of our full physiological model (Section 3.3.2.5;

Equation 7) reveal that absolute value, relative value and choice all have a

greater influence on LIP firing rate then does coherence (Figs.15a-b).

Consequently, attempts to find equivalence between bEVS and nEVS using

the factors in Equation 7 failed, even when we modeled LIP activity in terms

of the log odds of a spike. Upon dropping choice from our model, however,

the magnitude of βcoh grew as it accounted for variance previously captured

by choice. Only after this manipulation did our model generate βcoh values

greater then βt1 and βt2 values, and became capable of generating nEVS

values similar to bEVS values. Given, however, how clearly LIP is

representing the monkey’s impending choice, this is not a justification for

dropping the choice factor from the model.

The intellectual justification for removing choice from the model rests

on an assumption that the delay epoch activity in LIP plays a casual role in

choice. In these experiments, we practically defined choice by which way

the monkey moved his eyes at the end of the delay period. Thus, before the

saccade, choice is a post-hoc factor and illogically included as a predictive

factor of LIP activity. Under this assumption, delay epoch activity

represents a continually evolving decision that is not truly a choice until the

99

time of the saccade.

If, however, choice is defined by when the decision variable crosses the

threshold, then the differential LIP activity during the late delay epoch

(following the threshold crossing) may be influenced by the choice itself. In

this case, it would be important to include choice as a predictor of LIP

activity. Under this assumption, a choice state has already been reached, so

delay epoch activity represents a consequence. Including choice as a

predictive factor is also logical if LIP reflects a decision process occurring in

another brain area, such as the frontal cortex or the superior colliculus, and

simply mirrors the choice developing elsewhere (33, 38). Our physiological

data, however, cannot reveal exactly when the threshold is crossed, or if LIP

simply reflects a decision process rather than implementing one. Still, it is,

reasonable to assume that the threshold is crossed during the motion epoch.

If choice does indeed cause the differential T1 and T2 delay epoch

activity, then it must be included in the model and our estimate of nEVS

(Equation 10) would be fundamentally misleading. Rather then expressing

the influence of relative value in terms of coherence, it is expressing

influence of relative value, largely in terms of choice, and thus there is no

basis to compare nEVS and bEVS. Under this assumption, delay epoch

activity represents only the impending choice and its absolute value.

This causality conundrum can be momentarily side-stepped by focusing

on the representation of relative value at the start of the motion epoch and

the representation of coherence at the end of the motion epoch. Recall that

our core common-currency prediction is that the modulation resulting from

increase in relative value should be equal to the modulation produced by

14.7% coherence. If we assume the behaviorally relevant representation of

relative value is at the start of the motion epoch we can, using Equation 7,

100

calculate that relative value produces an increase of 4.35 spikes per second

(monkey A). Using Equation 7 and controlling for all other factors, we can

further determine what coherence produces a similar increase in firing rate.

Taking the peak value βcoh during the motion epoch (2.0, monkey A), we

can determine that a coherence of approximately 104% would, on average,

result in a modulation of LIP equivalent to this increase in relative value.

3.4.4 Representation of value and probability of choice in LIP

Recently, two similar studies of value signals in LIP came to two very

different conclusions as to if and how LIP represented value. A study by

Dorris and Glimcher (13) concludes that LIP represents a pure relative value

signal, while Sugrue and Newsome (70) conclude that LIP represents local

relative value in a manner indistinguishable from the local probability of

choice. Our results indicate that LIP is a highly dynamic representation of

value in which an initial representation of absolute value is transformed to a

representation of relative value at the start of the motion epoch and then, as

choice related activity develops, returns to representing absolute value and

choice. Our results are difficult to fully reconcile with either of these

previous studies.

Dorris and Glimcher trained monkeys in a free-choice paradigm in

which monkeys chose between a “safe” target, consistently delivering a

small reward and an alternative, “risky” target, probabilistically delivering a

large reward. In response to changes in this probability, monkeys adjusted

their frequency of selecting the risky target to an optimal level (in terms of a

Nash equilibrium) in which each option’s “subjective desirability” was

equalized across a block of trials. Subjective desirability is a measure of an

option’s value multiplied by the probability that it will be realized.

101

Dorris and Glimcher first reproduce an earlier finding (54) that LIP

neurons encode an option’s relative value during instructed saccades, until

LIP begins to encode the impending saccade, at which point the

representation of relative value disappears. Then, to demonstrate that LIP is

representing subjective desirability, they place the risky target in the RF of

an LIP neuron and show that its activity was invariant, despite the monkeys

fluctuating probability of choosing the option in the RF. They argue that

subjective desirability is behaviorally constant and that LIP activity is

constant, thus, the latter represents the former. Additionally, they double the

magnitude of the reward for both options and show that LIP does not

respond to this increase. Based on these two observations, they conclude

that LIP essentially represents only the relative value of the target in the RF

regardless of the probability of choosing it.

Sugrue and Newsome (70) trained monkeys on a free choice task in

which the two alternative targets are rewards with probabilities that change

between blocks. They show that monkeys similarly adjust their probability

of choosing a given target to “match” the fraction of rewards recently

experienced from that option. Because the overall probability of reward is

constant, one option always has a greater relative value then the other.

They demonstrate, in contrast to Dorris and Glimcher, that LIP clearly

represents the monkey’s impending choice towards or away from the RF.

Further, within these representations of choice LIP is finely modulated by

the monkey’s locally calculated subjective estimate of the target’s relative

value. This graded representation of relative value is preset in the delay

epoch activity of both T1 and T2 choice and persists through the time of the

saccade. Based on this graded representation, on the task’s logic, which

requires a spatial remapping of value on each trial, and on their behavioral

102

model in which local, relative value directly generates the probability of

choosing a target, they conclude that LIP largely represents the probability

of choice. Sugrue et al. (73) further argue that Dorris and Glimcher fail to

find LIP activity representing local value and hence the local probability of

choice largely because they only analyze T1 choice, thus emphasizing trials

on which this option had a high local value.

Consistent with Dorris and Glimcher, we find a strong relative value

signal that fades as the saccade is cued. While we do not explicitly instruct a

saccade, as Dorris and Glimcher do, our motion stimulus is an instructing

cue. In contrast to Dorris and Glimcher, however, we find a very strong and

consistent representation of absolute value. Their failure to find an absolute

value signal could result from LIP normalizing its representation of absolute

value within a block. In our experiments, trials of different absolute value

are randomly interweaved, potentially encouraging a more dynamic

representation of value.

In contrast to Sugrue et al., our results fail to support a role for LIP in

representing local probability of choice either across or within reward

conditions. First, recall from Chapter 2 that absolute value has no effect on

the monkey’s probability of choosing T1. Thus any representation of

absolute value in LIP undermines a representation of probability of choice

across reward conditions. Consider, for example, the monkeys probability

of choosing T1 at 0% coherence, which is equal for both HH and LL trials.

Yet because of LIP’s strong representation of absolute value, its activity will

clearly differentiate these conditions. Second, the probability of choice is

largely a function of the motion coherence, which only briefly influences

LIP activity at the end of the motion epoch.

103

3.5 Summary

These experiments demonstrate that single LIP neurons simultaneously

represent relative value, absolute value and motion coherence. By modeling

the firing rates of single LIP neurons as a function of these factors and

successively applying this model across the duration of the experimental

trial, we demonstrate that this representation is highly dynamic.

LIP neurons initially respond with a rapid representation of absolute

value, which is then augmented by the value of the target outside the

response field and comes to represent the target’s relative value. Targets of

greater absolute and relative value are represented with greater firing rates.

Relative value is strongly represented at the start of the motion epoch. As

the motion epoch develops, the representation of both relative and absolute

values fade and LIP neurons become strongly modulated by the monkey’s

forthcoming choice.

This representation of choice quickly dominates LIP response and is

modulated by the specific coherence of the motion stimulus. This

modulation is brief and largely confined to the second half of the motion

epoch. As the motion epoch ends, the representation of relative value is

largely gone, but the representation of absolute value remains. Throughout,

the delay epoch’s LIP activity represents the absolute value of the target in

the response field and predominantly represents choice, irrespective of the

coherence or relative value supporting it.

These results are very consistent with the integrator model of decision

making presented by Mazurek and Shadlen (48). Relative value’s

prominence at the start of the motion epoch indicates it introduces an

additive offset to the integration of the forthcoming motion information.

Our physiological and behavioral results support a model in which relative

104

value adjusts the accumulator’s initial state, without adjusting the rate of

accumulation. The offset imposed by relative value is similar in magnitude

to the offset imposed by Hanks and Shadlen (28) with microstimulation.

However, we observe greater choice bias.

We attempted to demonstrate that in the late delay period, LIP

represents sensory and value information in a common currency. By

common currency, we mean that information converging on LIP is

integrated in a scale depending on its common influence on behavior. This

model allowed us to directly compare our distribution of bEVS values to a

distribution of nEVS values and to demonstrate that their means do not

significantly differ. Our model, however, assumes choice does not causally

affect delay period activity. The validity of this assumption is a subject

demanding further attention and discussion and until it is verified, we must

consider these results specious.

In total, these results support LIP’s role in decisions requiring the

integration of sensory and value information. With some exceptions, LIP

simultaneously represented sensory and value information in a manner

similar to how it represents these factors alone. As previously mentioned,

the DLPFC is also independently modulated by sensory and value

information. Additionally, when studied independently, reward value and

motion coherence similarly modulated DLPFC and LIP. This result

supports the proposal that LIP is part of a decision-related network spanning

several cortical areas including the DLPFC. This proposal predicts that

neurons in the DLPFC should respond to our task in a manner similar to how

LIP neurons respond. In Chapter 4, we will present results from a

preliminary study of DLPFC indicating that this is unlikely to be the case.

105

Chapter 4

4.1 Introduction

Chapter 3 demonstrated that neurons in area LIP, previously shown to

represent sensory or value information in independent sets of experiments, in

fact, integrate this information at both the single unit and population levels.

Regions of the PFC, particularly the dorsal lateral PFC (DLPFC) are also

independently modulated by motion coherence (38) and value information (1,

40-41, 45, 53, 73). Additionally, when studied independently, these factors,

and other, similarly modulated DLPFC and LIP (8, 38, 45). In this chapter

we will present preliminary physiological data recorded from the DLPFC of

one monkey engaged in our motion discrimination task with multiple reward

contingencies. We will begin to determine whether LIP and DLPFC

continue to respond similarly to sensory and value information when they

are presented simultaneously and competitively in a single behavioral

paradigm.

The PFC is believed to play a general role in wide range of behaviors

requiring dynamic cognitive control (for review see 17, 49) and working

memory (16, 25), particularly when multiple sources of information guide

action (17, 49). PFC neurons have also been shown to encode behaviorally

relevant task categories (15), task specific rules (72), as well specific

combinations of value and action (73). The PFC has been extensively

studied in the context of competitive games, in witch signals related to

choices, their outcomes and their conjunction have been documented (3, 43,

63, 68). PFC neurons are, therefore, likely active during our task, in which

two different, often competitive, factors must be temporally integrated to

produce optimal behavior. We can begin to understand how our sensory and

value factors might jointly effect PFC neurons by first considering how they

106

each influence its activity alone. Of particular relevance are two studies

demonstrating that neurons in the PFC are independently modulated by

reward value and motion coherence.

Leon and Shadlen (45) trained monkeys on a memory-guided,

instructed saccade task, in which both the saccade location and reward

magnitude (small, 1x; or large 2x) were cued. They report that, overall,

neurons in the DLPFC responded with greater firing rates to saccades

associated with larger rewards. Some of these value responses, however,

emerged only after the saccade was cued, while others emerged independent

of the cue’s timing or location. Using a similar memory-guided delayed

saccade task, Kim and Shadlen (38) recorded DLPFC responses to a simple

motion discrimination task. They report that, while some neurons only

predict the monkey’s upcoming choice during the delay epoch, most begin

predicting choice during the motion epoch, and are modulated by the

coherence of the visual stimulus. This coherence dependent modulation was

reported as qualitatively and quantitatively similar in magnitude and time

course to those seen in LIP. Additionally, DLPFC and LIP also have

qualitatively and quantitatively similar response to a simple delayed saccade

task (8).

These and other decision related studies of the PFC (3, 38, 43-44, 55,

63, 68, 73) as well as the anatomical interconnectivity between PFC, LIP

(47) and other decision-related areas (33), has led to the proposition that the

DLPFC and LIP might constitute a single, distributed, decision-related

network (38, 62). If so, neurons isolated in DLPFC using our delayed

saccade task should be functionally related to those isolated in LIP by

similar criteria.

In contrast to this expectation, we observed remarkable differences

107

between the physiological responses of PFC and LIP neurons in this animal.

Even though our data set is small (26 neurons—see Methods) and obtained

from only one animal, the differences were sufficiently striking that they

seemed worth documenting in this thesis.

4.2 Methods

Monkey A, one of the two, adult, rhesus monkeys that participated in the

behavioral and electrophysiological experiments presented in Chapter 2 and

3 was used in the following physiological experiment. Prior to physiological

recordings the monkey underwent an additional surgical procedure to place a

recording chamber above the principal sulcus. All other methods were

identical to those described in Chapter 3.

4.2.1 Physiological Recordings

PFC was identified by a combination of stereotactic location, regional

physiological activity and anatomical magnetic resonance imaging. Figures

24a and 24b are two representative MRI images used to target electrode

penetrations and identify the recording sites. Figure 24a is from a series of

image planes normal to the bore of the recording cylinder. The cylinder’s

“footprint” is denoted by the large green circle, while the two smaller green

circles denote the location of two burr holes used in recording (see Methods,

Chapter 3). The purple and red lines respectively denote the principal sulcus

(PS) and arcuate sulcus (AS). Figure 24b is a coronal image, showing the

saline-filled recording cylinder and reference grid, centered over the

principal sulcus (purple). The approximate trajectory of an electrode passing

thought the burr hole is depicted with a dotted cyan line. Single neurons

were isolated and their activity recorded with methods and materials

108

identical to those presented in Chapter 3.

4.3 Results

4.3.1 Cell selection and delayed saccade task

To select recording sites in the PFC we used the same delayed saccade task

with multiple reward contingencies described in Chapter 2. Recall that, in

this task, a single target can appear at one of six locations (0°, 60°, 120°,

180°, 240°, 300° and 360°) with one of two reward values (high and low).

PS

PS

AS

Figure 24a-b. Anatomical magnetic resonance imaging PFC recording sitea From a series of image planes normal to the bore of the recording cylinder. The cylinder!s “footprint” is denoted by the large green circle, while the two smaller green circles denote the location of two burr holes used in recording . The purple and red lines respectively denote the principal sulcus (PS) and arcuate sulcus (AS). b A coronal image, showing the saline-filled recording cylinder and reference grid, centered over the principal sulcus (purple). The approximate trajectory of an electrode passing though the burr hole is depicted with a dotted cyan line.

ba

109

Using this task we identified neurons having persistent, delay-period activity

related to either the target’s location or its value. Because we were often

able to isolate multiple neurons on single or multiple electrodes we collected

26 single units over 12 experimental sessions from the right hemisphere of

monkey A.

During every session we identified at least one neuron responsive to

target location during the delayed saccade task. We often collected

additional single units, whose responses on the delayed saccade task were, at

the time, less clear. An off-line, multi-way analysis of variance—with

factors for target location, target value and their interaction—on each

neuron’s mean delay epoch activity revealed that of the 20 neurons with

significant (p<0.05) effects, 69% (18) were modulated by target location;

23% (6) by target value, 15% (4) by both target location and target value.

Only 11% (3) showed a significant interaction (p<0.05) between target

location and value. The remaining six units, while not significantly

modulated during the delayed saccade task, were in fact modulated

significantly during at least one epoch in the direction discriminate task

(multi-way ANOVA with factors for reward contingency and target location,

run on the mean firing rate in each task epoch, p<0.05). Thus, all 26 units

were included in all subsequent analyses.

In contrast to LIP neurons, which had highly localized response fields

within the contralateral visual hemifield, neurons in the PFC tended to be

less selective, responding similarly to targets positioned anywhere within the

contralateral hemifield. Figure 25a plots mean neural response (radius) to

the six target locations (angle) during the delay epoch of the delayed saccade

task, for an example PFC neuron. For comparison, Figure 25b similarly

plots an LIP neuron’s response. In Figure 25a the red points and lines are

110

responses when the target was high value, while the blue is when the target

was low value. Note that this PFC neuron responds more to targets at 120°,

180°and 240°, (targets within the contra-lateral visual hemifield) than to

those at 60°, 0° and 300°. In contrast, the LIP neuron (only one reward

condition) responds differentially to only one target position, 180°.

4.3.2 Population Response

Because we often collected multiple single units in a single

experimental session, we were usually unable to fully optimize the target

location for each neuron. As noted above, however, these neurons

Figure 25a-b.In contrast to LIP, neurons in the PFC tended to be less selec-tive, responding similarly to targets positioned anywhere within the contralat-eral hemifield.a Plots the mean neural response, in spikes.sec (radius) to the six target locations (angle) during the delay epoch of the delayed saccade task, for an example PFC neuron. The red points and lines are responses from when the target was high value, while the blue is from when the target was low value. b similarly plots an LIP neuron!s response.

5

10

15

20

30

210

60

240

90

270

120

300

150

330

180 0

5

10

15

30

210

60

240

90

270

120

300

150

330

180 0

ba

111

responded well to targets in the contralateral hemifield. Thus, in the context

of our direction discrimination task, “T1” in this chapter will always refer to

the target in the visual hemifield contralateral to the recording site.

Figure 26 plots the mean firing rate of all 26 DLPFC neurons as a

function of time, similarly to Figures 11, 12 and 13, for all completed trials

in the HH (red) and LL (blue), HL (black) and LH (green) reward conditions.

Within these reward conditions results are plotted for trials in which the

monkey chose T1 (contralateral target, solid lines) and T2 (ipsilateral target,

dashed lines). Note, that while this population is clearly modulated by

reward contingencies (colored lines), this modulation is not systematic (there

is no logical ordering of the four reward conditions) and is thus very

different from the LIP data. Only two systematic trends are visible in these

data. First is the gradual buildup of activity, which peaks approximately

midway though the motion period and declines throughout the delay period.

Second is the gradual separation of responses preceding saccades to T1 and

T2 (decision-related activity). As the forthcoming examples will

demonstrate, our population of PFC neurons is, in fact, extremely

heterogeneous, and the average responses depicted in Figure 18 are notable

primarily for how poorly they represent the responses of individual PFC

neurons. This is notably distinct from our LIP data, for which the

population histograms were reasonably representative of most single

neurons.

4.3.3 Population heterogeneity

To indicate the heterogeneity in our DLPFC population we present data

from four individual neurons from this population, each exemplifying an

extremely specific response profile. For each neuron, and each task epoch,

112

we have preformed a multi-way analysis of variance, with factors for the

reward conditions (HH, LL, HL and LH), the choice (T1 and T2) and signed

motion coherence. We then performed a posthoc, pairwise comparison test

to determine which factors significantly affected firing rate.

Figure 27 plots the mean response of one PFC neuron in a format

similar to Figure 26. In the reward and motion epochs, this neuron was

significantly (p=0; both epochs) modulated by the reward condition; firing

Figure 26. Average DLPFC response (n=26).Plots the mean firing rate of all 26 DLPFC neurons as a function of time, for all completed trials in the HH (red) and LL (blue), HL (black) and LH (green) reward conditions. Within these reward conditions results are plotted for trials in which the monkey chose T1 (contralateral target, solid lines) and T2 (ipsilateral target, dashed lines). Note, that while this population is clearly modulated by reward contingencies (colored lines), this modulation is not systematic (there is no logical ordering of the four reward conditions) and is thus very different from the LIP data for which the population histograms were reasonably representative of most single neurons.

Monkey A



10

20

0 250 500 1000 -350 0

Targetepoch

Rewardepoch


Late delayepoch

113

rates were statistically indistinguishable for LH (green) and HH (red)

conditions, but were greater for both of these conditions than for the LL

(blue) and HL (black) conditions. In the delay period this neuron was

significantly (p=0) but weakly modulated by choice, firing more for T1

choices (solid lines) than for T2 choices (dashed), but was not modulated by

reward condition (p=0.3287). If one wished to summarize the selectivity of

this neurons in words (perhaps an inadvisable endeavor), one might say that

it appears to respond best when a high value target is presented in the

ipsilateral visual field, mostly during the reward cue period.

Figure 27. A single DLPFC neuron.Plots the mean firing rate as a function of time, for all completed trials in the HH (red) and LL (blue), HL (black) and LH (green) reward conditions. Within these reward conditions results are plotted for trials in which the monkey chose T1 (contralateral target, solid lines) and T2 (ipsilateral target, dashed lines). In the reward and motion epochs, this neuron was significantly modu-lated by the reward condition. In the delay period this neuron was weakly modulated by choice, firing more for T1 choices (solid lines) than for T2 choices (dashed), but was not modulated by reward condition.

Monkey A



10

20

30

0 250 500 1000 -350 0

Targetepoch

Rewardepoch


Late delayepoch

114

Another example neuron, plotted in Figure 28, is also modulated by

reward condition in the reward cue and motion epochs, but it changes its

preferred reward conditions at the transition between these epochs. In the

reward cue epoch, this neuron fires significantly (p=0) more for LH (green),

equally for the HH (red) and LL (blue) conditions, and least for the HL

(black) condition. However, in the motion epoch this pattern switches, and

the neuron fires significantly (p=0) more (and equally) for the HH (red) and

HL (black) conditions as compared to the LL (blue) and LH (green)

conditions. In the delay period this neuron is significantly (p=0) modulated

by choice, firing more for T1 (solid) then T2 (dashed) choices, but is not

significantly modulated by reward conditions (p=0.2397).

Some neurons had responses that were highly specific to particular

combinations of reward condition, choice and epoch. Figure 29 depicts a

neuron responding almost exclusively during the motion period epoch to LH

(green, p=0) trials that result in a T1 choice (solid, p=0.0001). This is not a

subtle effect: the firing rate peaks at over 20 spikes/sec for the responsive

condition, but fails to exceed 5 spikes/sec for all others. During the motion

period, this neuron also responded weakly to the HH (red) and LL (blue)

conditions, but not at all to HL (black) conditions. Although, in the motion

epoch, this neuron was modulated by choice for the LH (green), HH (red)

and LL (blue) conditions, it was not significantly (p=0.2247) modulated by

choice in the delay period for any condition.

Figure 30 plots data from a similarly specific neuron. This neuron

was significantly (p=0) modulated by reward condition in the reward cue

epoch, preferring HL (black), HH (red), LL (blue) and LH (green)

conditions in that order, which is consistent with a representation of relative

value as observed in many LIP neurons. In the motion epoch, however, this

115

neuron’s response becomes nearly five times larger for one reward condition

(HL, black traces) than for all the others. The selectivity of this neuron was

not quite as impressive as the neuron in Figure 21 since it responded well to

both choices in the HL condition. Similar to the neuron in Figure 29, this

neuron was significantly (p=0) modulated by choice in the motion cue

period but not in the delay period.

Figure 28. A single DLPFC neuronPlots the mean firing rate as a function of time, for all completed trials in the HH (red) and LL (blue), HL (black) and LH (green) reward conditions. Within these reward conditions results are plotted for trials in which the monkey chose T1 (contralateral target, solid lines) and T2 (ipsilateral target, dashed lines). This neuron is modulated by reward condition in the reward cue and motion epochs, but it changes its preferred reward conditions at the transition between these epochs. In the delay period this neuron is modulated by choice, firing more for T1 (solid) than T2 (dashed) choices, but is not modu-lated by reward conditions.

Monkey A



20

60

40

0 250 500 1000 -350 0

Targetepoch

Rewardepoch


Late delayepoch

116

4.3.4 Discussion

While the preceding experiments and analysis are preliminary and in no

way conclusive, they suggest that DLPFC neurons represent sensory and

value information in an extremely heterogeneous manner. In their

heterogeneity generally, and the individual responses specifically, these

neurons appear to be fundamentally different than LIP neurons. LIP neurons,

selected using the same criterion, represent sensory and value information in

a systematic and consistent manner, commensurate with an accumulator

Figure 29. A single DLPFC neuron responding specifically to particular com-binations of reward condition, choice and epoch.Plots the mean firing rate as a function of time, for all completed trials in the HH (red) and LL (blue), HL (black) and LH (green) reward conditions. Within these reward conditions results are plotted for trials in which the monkey chose T1 (contralateral target, solid lines) and T2 (ipsilateral target, dashed lines). This neuron responds almost exclusively during the motion period epoch to LH trials that result in a T1 choice. The firing rate peaks at over 20 spikes/sec. for the responsive condition, but fails to exceed 5 spikes/sec. for all others.

Monkey A



10

20

0 250 500 1000 -350 0

Targetepoch

Rewardepoch


Late delayepoch

117

model of sensory integration. In contrast, DLPFC neurons can demonstrate

extreme specificity for combinations of reward condition, task epoch and

choice (e.g. Fig. 29 and Fig. 30) indicating that, while DLPFC likely plays

an important role in generating behavior, that role is fundamentally different

than LIP’s.

Our PFC data, though modest in number, are not compatible with a

model of decision making in which a decision variable evolves

Figure 30. A single DLPFC neuron responding specifically to particular com-binations of reward condition, choice and epoch.Plots the mean firing rate as a function of time, for all completed trials in the HH (red) and LL (blue), HL (black) and LH (green) reward conditions. Within these reward conditions results are plotted for trials in which the monkey chose T1 (contralateral target, solid lines) and T2 (ipsilateral target, dashed lines). This neuron was significantly modulated by reward condition in the reward cue epoch, preferring HL (black), HH (red), LL (blue) and LH (green) conditions in that order, which is consistent with a representation of relative value as observed in many LIP neurons. In the motion epoch, however, this neuron!s response becomes nearly five times larger for one reward condition (HL, black traces) than for all the others.

Monkey A



20

40

0 250 500 1000 -350 0

Targetepoch

Rewardepoch


Late delayepoch

118

simultaneously in DLPFC and LIP. Our results are also incompatible with a

model in which either LIP or DLPFC reflects the evolution of a decision in

the other area. If either of these scenarios were accurate, we would expect

greater similarity between these two areas during our task. Our data,

however, are compatible with previous studies demonstrating the DLPFC

neurons represent highly specific combinations of task relevant categories,

or rules, and choices (15, 72-73). The modulations we observe in DLPFC in

response to reward contingencies (HH, LL, HL or LH), appear more related

to which of the four reward conditions was present, than to either the

average reward or the value of the target in the contralateral hemifield.

Neurons that singularly represent a relative reward condition (HL or LH)

during the motion epoch, like those depicted in Figures 29 and 30, could

serve as the source of the additive bias we observed in LIP.

Additionally, while behavioral choice modulated the activity of some

DLPFC neurons, most were never systematically modulated by coherence

like LIP neurons. This was surprising given that we selected our DLPFC

population similarly to both our LIP populations and previous of studies of

coherence in DLPFC by Kim and Shadlen (38). Two possibilities might

account for this discrepancy. First, while we used a delayed saccade task to

select our neurons, Kim and Shadlen used a memory guided delayed saccade

task. Given DLPFC’s role in working memory (16-17, 25) it is possible that

we selected a different population in the DLPFC.

Second, it might result from DLPFC neurons representing a trials

category or rule, rather than a decision process. In a simple motion

discrimination the motion coherence is the only behaviorally relevant factor

and would therefore be the dimensions along which trials would be

categorized. However, in our task, the reward condition is likely the

119

dimension along which trails are categorized, particularly given that motion

coherence effect is identical across all reward conditions (Chapter 3).

120

Chapter 5

5.1 Summary and conclusions

This study’s central finding is that monkeys integrate sensory and value

information at the behavioral and neuronal level. At the behavioral level we

find that monkey’s choices are systematically biased towards options of

greater relative value. Changes in absolute value have no significant effect

on performance in the motion discrimination task. We quantified the bias's

magnitude in terms of the motion coherence, demonstrated it is nearly

optimal, and that excursions from optimality result in a consistent over-bias,

despite which, 98% of the maximum available rewards are still harvested.

We speculate that our monkeys exhibit over-biases because of positive

utility functions. Importantly, we show that this bias is independent of

psychophysical sensitivity, and is implemented behaviorally as an additive

factor.

In contrast to our behavioral results, we find that both relative and

absolute value modulate single LIP neurons, and that the representation of

sensory and value information is dynamic but systematic. Absolute value

significantly modulated firing rate primarily when the rewards are first cued

and, again, later during the delay period, when only absolute value and

choice are represented. Additionally, when options of greater value are

presented in a LIP neuron’s RF it results in greater firing rates.

Relative value has its clearest representation immediately preceding

the onset of the motion stimulus. We argue this relative value signal is well

situated to bias decisions by adjusting the level where sensory evidence

begins accumulating towards a threshold. Importantly, we find that the

effect of motion coherence on LIP firing rate is independent of absolute and

relative value. This indicates that, consistent with behavior, relative value

121

introduces an additive offset to the accumulation of sensory evidence. These

and other results support the integrator model of decision making presented

by Mazurek and Shadlen (48).

Our results do not support models of LIP emphasizing the

representation of local probability of choice (70) or pure, relative subjective-

desirability (13). If LIP represented the local probability of choice then LIP

activity should be equal for all reward conditions on trials when the

monkeys are equally likely to choose T1. The presence, however, of an

absolute value signal results in unequal firing rates representing an equal

probability of choice (e.g. 0% coherence in the HH and LL conditions).

Additionally, if LIP represented local probability of choice, we would expect

motion coherence, which is correlated with probability of choice, to

significantly modulate delay period activity. We find, however, that delay

period activity is significantly modulated by choice and absolute value. If

LIP purely represented the relative subject desirability of an option then

absolute value should not be represented at all.

Our attempts to determine if LIP integrates sensory and value

information in a common currency produced uncertain results. While we

presented a model of LIP activity (Equation 9) permitting us to compare the

behavioral and neuronal equivalence between relative value and motion

coherence (bEVS and nEVS, respectively), it makes assumptions about the

relationship between choice and motion coherence that are potentially

invalid. It is clear, however, that on the behavioral level motion coherence

has a greater influence on the probability of choice then value does, while, at

the neuronal level, value has a greater influence on firing rate then motion

coherence. This observation is difficult to reconcile with the concept of a

common currency.

122

We also presented preliminary results comparing DLPFC with LIP.

Previous investigations of DLPFC showed that its representation of motion

coherence and value were qualitatively and quantitatively similar to those in

LIP. This leads to the proposition that these two areas are parts of a single,

distributed, decision-related network (38, 62). To the contrary, our results

suggest that the DLPFC and LIP contribute to behavior in fundamentally

different ways. While LIP appears to implement a decision-related

accumulation process, capable of accommodating an additive relative value

signal, DLPFC activity appears better suited for signaling the current reward

condition. We observed DLPFC neurons that singularly represent a relative

value condition (Figs. 21 and 22) during the motion epoch, which could

serve as the source of the additive bias we observed in LIP.

5.2 Future directions

5.2.1 Common currency

Our results clearly indicate that single neurons in LIP simultaneously

represent sensory and value information. If LIP is directly responsible for

deciding where to move the eyes it should represent all the factors

influencing this choice in a magnitude proportional to their influence. While

this study was not explicitly designed to determine if LIP integrates

disparate factors in common currency it nonetheless provided one of the first

opportunities to address it. While our results do not strongly support the

idea that LIP combines sensory and value information in a common currency

it is a question demanding further investigation.

Analyzing the integration of sensory and value information in the

context of common currency is a principle focus of our on-going

123

collaboration with Jay McClelland and Juan Gao, in the Department of

Psychology at Stanford University. The question of how LIP integrates

multiple factors is the most significant question for future research.

5.2.2 Reaction time discrimination

Another version of the random dots task allows subjects to report their

decision as soon as they are ready after the random dots appear (9, 28, 54).

This reaction time (RT) version contrasts with the “fixed-duration” task,

used in this study, requiring subjects to view the random dots for a fixed

duration and wait though a delay period before reporting their choice. An

important direction for future research is to incorporate RT into our current

paradigm. A RT version of this task would provide additional metrics that

would deepen our understanding of how rewards bias behavior, and how that

bias is implemented at the neuronal level.

A RT version of this task would provide an additional behavioral

measure of bias, because in addition to quantifying the monkey’s choice, we

could also quantify how long it took to generate. This RT information will

allow us to more accurately determine the effect of reward conditions on the

decision’s duration. In the preceding sections we predicted that, generally,

reaction times for choices to the higher value target should be significantly

shorter than those to low value targets because the decision variable should

be elevated for these high value conditions, and thus more readily cross the

threshold.

Additionally, a RT version of this task will deepen our understanding of

how the bias is implemented at a neuronal level. With an RT version we

could precisely determine the beginning and ending of accumulation. This

would allow us to better determine if, and how, relative and absolute value

124

effect the rate of accumulation. The analysis presented in Section 3.3.2.7

suggests that the rate of accumulation is independent of reward condition;

this analysis, however, was performed on a fixed and arbitrary temporal

epoch that may not accurately capture the true window of accumulation.

Finally, a RT version of this task would allow us to take the time of the

saccade as a surrogate for threshold crossing. Knowing the time of threshold

crossing would permit us to determine if absolute and relative value effect

the threshold’s height. As discussed in Section 3.4.2, adjusting the

threshold’s height is on of three ways (along with offsetting the start of

accumulation, and increasing or decreasing the rate of accumulation) a bias

can be implemented in the accumulator model. The results presented above

suggest the bias is implemented by offsetting the start of accumulation,

however, it is possible this process works in conjunction with an offset in the

threshold’s height.

5.2.3 Mapping utility with additional reward magnitudes

In Chapter 2 we speculated that our monkeys exhibit greater-than-

optimal biases because of positive utility functions associated with a highly

motivated desire for fluids. To truly determine the shape of our monkeys’

actual utility functions with this paradigm we would require at least one, but

ideally several, additional reward ratios (e.g 3:1). If the monkeys truly have

a positive utility function the observed biases should continue to be greater

then the optimal bias, and the difference between the observed and optimal

should increase with greater reward ratios.

Mapping a full utility curve with this paradigm, while possible, is not

suggested. One experimental difficulty with this paradigm was the large

125

number of unique trial conditions (2 directions, 5-7 coherences, and 4

reward conditions generates 40-56 conditions) each of which needed to be

repeated 30-40 times to sufficiently define the PMF and characterize an

isolated neuron. This large number of trials was at the limits of both the

monkeys’ capacity to work, in terms of attention and satiation, and the

experimenter’s capacity to maintain the electrical isolation of a single

neuron. Additional reward conditions would only multiply these challenges.

Other behavioral paradigms are better suited for mapping a full utility

curve. A single point on a utility curve is more simply and directly mapped

by finding the magnitude of a certain reward that is behaviorally equivalent

to an uncertain, or risky, reward. For example, on a single trial a subject is

asked to choose between a certain reward, say 1 drops of juice, and a risky

reward, say a 50% chance of getting either 0 or 5 drops. If the subject is

indifferent to these choices, and treats them equivalently, we can say two

options (1*1=2 and 0*0.5 + 5*0.5=2.5) have equal utility. These two values

become the first point on our utility curve. The full shape of the utility curve

is mapped by determining this equivalence across a rage of risky reward

magnitudes. For details and examples see Chapter 6 of Stephens and Krebs

(69).

126

References

1. Amemori K, Sawaguchi T. 2006. Contrasting effects of reward

expectation on sensory and motor memories in primate prefrontal

neurons. Cereb Cortex. Jul;16(7):1002-15.

2. Barash S, Bracewell RM, Fogassi L, Gnadt JW, Andersen RA. 1991,

Saccade-related activity in the lateral intraparietal area. I. Temporal

properties; comparison with area 7a. J Neurophysiol. Sep;66(3):1095-

108.

3. Barraclough DJ, Conroy ML, Lee D. 2004. Prefrontal cortex and

decision making in a mixed-strategy game. Nat Neurosci.

Apr;7(4):404-10.

4. Bisley JW, Goldberg ME. 2003. Neuronal activity in the lateral

intraparietal area and spatial attention. Science. Jan 3;299(5603):81-6.

5. Britten KH, Shadlen MN, Newsome WT, Movshon JA. 1993.

Responses of neurons in macaque MT to stochastic motion signals.

Vis Neurosci. Nov-Dec;10(6):1157-69.

6. Celebrini S, Newsome WT. 1994. Neuronal and psychophysical

sensitivity to motion signals in extrastriate area MST of the macaque

monkey. J Neurosci. Jul;14(7):4109-24.

7. Celebrini S, Newsome WT. 1995. Microstimulation of extrastriate

area MST influences performance on a direction discrimination task. J

Neurophysiol. Feb;73(2):437-48.

8. Chafee MV, Goldman-Rakic PS. 1998. Matching patterns of activity

in primate prefrontal area 8a and parietal area 7ip neurons during a

spatial working memory task. J Neurophysiol. Jun;79(6):2919-40.

9. Churchland AK, Kiani R, Shadlen MN. 2008. Decision-making with

multiple alternatives. Nat Neurosci. Jun;11(6):693-702

127

10. Colby CL, Duhamel JR, Goldberg ME. 1996. Visual, presaccadic, and

cognitive activation of single neurons in monkey lateral intraparietal

area. J Neurophysiol. Nov;76(5):2841-52

11. Colby CL, Goldberg ME. 1999. Space and attention in parietal cortex.

Annu Rev Neurosci. 22:319-49.

12. Ditterich J, Mazurek ME, Shadlen MN. 2003. Responses of neurons

in macaque MT to stochastic motion signals. Nat Neurosci.

Aug;6(8):891-8

13. Dorris MC, Glimcher PW. 2004. Activity in posterior parietal cortex

is correlated with the relative subjective desirability of action. Neuron.

Oct 14;44(2):365-78.

14. Evarts EV. 1966. Pyramidal tract activity associated with a

conditioned hand movement in the monkey. J Neurophysiol.

Nov;29(6):1011-27.

15. Freedman DJ, Riesenhuber M, Poggio T, Miller EK. 2008.

Categorical representation of visual stimuli in the primate prefrontal

cortex. Science. Jan 12;291(5502):312-6

16. Funahashi S, Bruce CJ, Goldman-Rakic PS. 1989. Mnemonic coding

of visual space in the monkey's dorsolateral prefrontal cortex. J

Neurophysiol. Feb;61(2):331-49.

17. Fuster JM. 2001. The prefrontal cortex--an update: time is of the

essence. Neuron. May;30(2):319-33.

18. Gallistel CR, Mark TA, King AP, Latham PE. 2001 The rat

approximates an ideal detector of changes in rates of reward:

implications for the law of effect. J Exp Psychol Anim Behav Process.

Oct;27(4):354-72

19. Glimcher PW. 2003. The neurobiology of visual-saccadic decision

128

making. Annu Rev Neurosci. 26:133-79.

20. Gnadt JW, Andersen RA. 1988. Memory related motor planning

activity in posterior parietal cortex of macaque. Exp Brain Res.

70(1):216-20.

21. Gold JI, Shadlen MN. 2001. Neural computations that underlie

decisions about sensory stimuli. Trends Cogn Sci. Jan 1;5(1):10-16

22. Gold JI, Shadlen MN. 2002. Banburismus and the brain: decoding the

relationship between sensory stimuli, decisions, and reward. Neuron.

Oct 10;36(2):299-308.

23. Gold JI, Shadlen MN. 2003. The influence of behavioral context on

the representation of a perceptual decision in developing oculomotor

commands. J Neurosci. Jan 15;23(2):632-51

24. Gold JI, Shadlen MN. 2007. The neural basis of decision making.

Annu Rev Neurosci.30:535-74.

25. Goldman-Rakic, P.S 1987. Circuitry of primate prefrontal cortex and

the regulation of behavior by representational memory. Handbook of

Physiology vol5(1)

26. Gottlieb JP, Kusunoki M, Goldberg ME. 1998. The representation of

visual salience in monkey parietal cortex. Nature. Jan

29;391(6666):481-4.

27. Green, D. M., & Swets, J. A. 1966. Signal detection theory and

psychophysics. New York: John Wiley and Sons.

28. Hanks TD, Ditterich J, Shadlen MN. 2006. Microstimulation of

macaque area LIP affects decision-making in a motion discrimination

task. Nat Neurosci. May;9(5):682-9.

29. Hernández A, Zainos A, Romo R. 2000. Neuronal correlates of

sensory discrimination in the somatosensory cortex. Proc Natl Acad

129

Sci May 23;97(11):6191-6.

30. Hernández A, Zainos A, Romo R. 2002. Temporal evolution of a

decision-making process in medial premotor cortex. Neuron. Mar

14;33(6):959-72.

31. Herrnstein, R. J. 1961. Relative and absolute strength of responses as

a function of frequency of reinforcement. J Exp Anal Behav. 4, 267-

272.

32. Horwitz GD, Newsome WT. 1999. Separate signals for target

selection and movement specification in the superior colliculus.

Science. May 14;284(5417):1158-61.

33. Horwitz GD, Newsome WT. 2001. Target selection for saccadic eye

movements: prelude activity in the superior colliculus during a

direction-discrimination task. J Neurophysiol. Nov;86(5):2543-58.

34. Janssen P, Shadlen MN. 2005. A representation of the hazard rate of

elapsed time in macaque area LIP. Nat Neurosci. Feb;8(2):234-41.

35. Judge SJ, Richmond BJ, Chu FC. 1980. Implantation of magnetic

search coils for measurement of eye position: an improved method.

Vision Res. 20(6):535-8.

36. Kable JW, Glimcher PW. 2007. The neural correlates of subjective

value during intertemporal choice. Nat Neurosci. Dec;10(12):1625-33.

37. Kiani R, Hanks TD, Shadlen MN. 2008. Bounded integration in

parietal cortex underlies decisions even when viewing duration is

dictated by the environment. J Neurosci. Mar 19;28(12):3017-29

38. Kim JN, Shadlen MN. 1999. Neural correlates of a decision in the

dorsolateral prefrontal cortex of the macaque. Nat Neurosci.

Feb;2(2):176-85.

39. Knutson B, Cooper JC. 2005. Functional magnetic resonance imaging

130

of reward prediction. Curr Opin Neurol. Aug;18(4):411-7.

40. Kobayashi S, Lauwereyns J, Koizumi M, Sakagami M, Hikosaka O.

2002. Influence of reward expectation on visuospatial processing

in macaque lateral prefrontal cortex. J Neurophysiol. Mar;87(3):1488-

98.

41. Kobayashi S, Nomoto K, Watanabe M, Hikosaka O, Schultz W,

Sakagami M. 2006. Influences of rewarding and aversive outcomes on

activity in macaque lateral prefrontal cortex. Neuron.

Sep21;51(6):861-70

42. Lau B, Glimcher PW. 2008. Value representations in the primate

striatum during matching behavior. Neuron. May 8;58(3):451-63.

43. Lee D, Conroy ML, McGreevy BP, Barraclough DJ. 2004.

Reinforcement learning and decision making in monkeys during a

competitive game. Brain Res Cogn Brain Res. Dec;22(1):45-58

44. Lee D, McGreevy BP, Barraclough DJ. 2005. Learning and decision

making in monkeys during a rock-paper-scissors game. Brain Res

Cogn Brain Res. Oct;25(2):416-30.

45. Leon MI, Shadlen MN. 1999. Effect of expected reward magnitude on

the response of neurons in the dorsolateral prefrontal cortex of the

macaque. Neuron. Oct;24(2):415-25

46. Leon MI, Shadlen MN. 2003. Representation of time by neurons in

the posterior parietal cortex of the macaque. Neuron. Apr

24;38(2):317-27

47. Lewis JW, Van Essen DC. 2000. Corticocortical connections of visual,

sensorimotor, and multimodal processing areas in the parietal lobe of

the macaque monkey. J Comp Neurol. Dec 4;428(1):112-37.

48. Mazurek ME, Roitman JD, Ditterich J, Shadlen MN. 2003. A role for

131

neural integrators in perceptual decision making. Cereb Cortex.

Nov;13(11):1257-69

49. Miller EK, Cohen JD. 2001. An integrative theory of prefrontal cortex

function. Annu Rev Neurosci. 24:167-202

50. Montague PR, King-Casas B, Cohen JD. 2006. Imaging valuation

models in human choice. Annu Rev Neurosci. 29:417-48

51. Mountcastle VB, Steinmetz MA, Romo R. 1990. Frequency

discrimination in the sense of flutter: psychophysical measurements

correlated with postcentral events in behaving monkeys.J Neurosci.

Sep;10(9):3032-44

52. Platt ML, Glimcher PW. 1999. Neural correlates of decision variables

in parietal cortex. Nature. Jul 15;400(6741):233-8

53. Roesch MR, Olson CR. 2003. Impact of expected reward on neuronal

activity in prefrontal cortex, frontal and supplementary eye fields and

premotor cortex. J Neurophysiol.Sep;90(3):1766-89.

54. Roitman JD, Shadlen MN. 2002. Response of neurons in the lateral

intraparietal area during a combined visual discrimination reaction

time task. J Neurosci.Nov 1;22(21):9475-89

55. Romo R, Brody CD, Hernández A, Lemus L. 1999. Neuronal

correlates of parametric working memory in the prefrontal cortex.

Nature. Jun 3;399(6735):470-3.

56. Romo R, Hernández A, Zainos A, Lemus L, Brody CD. 2002.

Neuronal correlates of decision-making in secondary somatosensory

cortex. Nat Neurosci. Nov;5(11):1217-25.

57. Romo R, Salinas E. 2003. Flutter discrimination: neural codes,

perception, memory and decision making. Nat Rev Neurosci.

Mar;4(3):203-18.

132

58. Salzman CD, Murasugi CM, Britten KH, Newsome WT. 1992.

Microstimulation in visual area MT: effects on direction

discrimination performance. J Neurosci. Jun;12(6):2331-55

59. Schall JD, Hanes DP. 1993. Neural basis of saccade target selection in

frontal eye field during visual search. Nature. Dec 2;366(6454):467-9

60. Schall JD, Bichot NP. 1998. Neural correlates of visual and motor

decision processes. Curr Opin Neurobiol. Apr;8(2):211-7

61. Schall JD. 2001. Neural basis of deciding, choosing and acting. Nat

Rev Neurosci. Jan;2(1):33-42

62. Selemon LD, Goldman-Rakic PS. 1988. Common cortical and

subcortical targets of the dorsolateral prefrontal and posterior parietal

cortices in the rhesus monkey: evidence for a distributed neural

network subserving spatially guided behavior. J Neurosci.

Nov;8(11):4049-68

63. Seo H, Barraclough DJ, Lee D. 2007. Dynamic signals related to

choices and outcomes in the dorsolateral prefrontal cortex. Cereb

Cortex. Sep;17

64. Shadlen MN, Britten KH, Newsome WT, Movshon JA. 1996. A

computational analysis of the relationship between neuronal and

behavioral responses to visual motion. J Neurosci. Feb 15;16(4):1486-

510

65. Shadlen MN, Newsome WT. 001. Neural basis of a perceptual

decision in the parietal cortex (area LIP) of the rhesus monkey. J

Neurophysiol. Oct;86(4):1916-36

66. Shizgal P. 1997. Neural basis of utility estimation. Curr Opin

Neurobiol. Apr;7(2):198-208

67. Snyder LH, Grieve KL, Brotchie P, Andersen RA. 1999. Separate

133

body- and world-referenced representations of visual space in parietal

cortex. Nature. Aug 27;394(6696):887-91

68. Soltani A, Wang XJ. 2006. A biophysically based neural model of

matching law behavior: melioration by stochastic synapses. J

Neurosci. Apr 5;26(14):3731-44.

69. Stephens, D. W. and Krebs, J. R. 1986. Foraging Theory. Princeton

Univ. Press, Princeton, NJ

70. Sugrue LP, Corrado GS, Newsome WT. 2004. Matching behavior and

the representation of value in the parietal cortex. Science. Jun

18;304(5678):1782-7

71. Sugrue LP, Corrado GS, Newsome WT. 2005. Choosing the greater of

two goods: neural currencies for valuation and decision making. Nat

Rev Neurosci. May;6(5):363-75

72. Wallis JD, Anderson KC, Miller EK. 2001. Single neurons in

prefrontal cortex encode abstract rules. Nature. Jun 21;411(6840):953-

6

73. Wallis JD, Miller EK. 2003. Neuronal activity in primate dorsolateral

and orbital prefrontal cortex during performance of a reward

preference task. Eur J Neurosci. Oct;18(7):2069-81

74. Watanabe M, Hikosaka K, Sakagami M, Shirakawa S. 2007. Reward

expectancy-related prefrontal neuronal activities: are they neural

substrates of "affective" working memory? Cortex. Jan;43(1):53-64.

75. Xue G, Lu Z, Levin IP, Weller JA, Li X, Bechara A. 2008. Functional

dissociations of risk and reward processing in the medial prefrontal

cortex. Cereb Cortex. May;19(5):1019-27.

integration of sensory and reward information a ...pw673yq6957... · during perceptual...

Documents