integration of sensory and reward information a ...pw673yq6957... · during perceptual...
TRANSCRIPT
INTEGRATION OF SENSORY AND REWARD INFORMATION
DURING PERCEPTUAL DECISION-MAKING IN LATERAL
INTRAPARIETAL CORTEX (LIP)
A DISSERTATION
SUBMITTED TO THE DEPARTMENT OF NEUROBIOLOGY
AND THE COMMITTEE ON GRADUATE STUDIES
OF STANFORD UNIVERSITY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
Alan Edward Rorie
March 2011
http://creativecommons.org/licenses/by-nc-nd/3.0/us/
This dissertation is online at: http://purl.stanford.edu/pw673yq6957
© 2011 by Alan E Rorie. All Rights Reserved.
Re-distributed by Stanford University under license with the author.
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
ii
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
William Newsome, Primary Adviser
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Brian Knutson
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Tirin Moore
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Krishna Shenoy
Approved for the Stanford University Committee on Graduate Studies.
Patricia J. Gumport, Vice Provost Graduate Education
This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file inUniversity Archives.
iii
iv
Abstract
The work presented in this dissertation primarily focuses on
decision-related activity in the lateral intraparietal area (LIP) and,
secondarily, the dorsolateral prefrontal cortex (DLPFC). In Chapter 1 we
review the previous independent investigations indicating that these areas
are separately modulated by sensory information, value information and
choice appropriate to represent decisions. We argue that when both
sensory and value information must be simultaneously integrated to make
choices, it is unknown, if, how and when these areas integrate these
factors. We present a behavioral paradigm in which animal subjects must
combine sensory and value information, on a trial-to-trial basis, to make
optimal choices. This paradigm is based on a well-known motion
discrimination task; however, in our task the magnitude of the reward
associated with each option varies from trial to trial. On some trials both
options are worth equally large or small rewards. On other trials one
option’s reward is greater than that of the other. In Chapter 2, we
demonstrate that in the unequal reward conditions subjects’ choices are
consistently biased towards the greater magnitude option. Additionally,
we will show that this bias is independent of the motion stimulus strength
and its magnitude is nearly optimal. In Chapter 3, we observe that single
neurons in cortical area LIP consistently, simultaneously and dynamically
represent both sensory and value information. We will argue that this
representation supports an integrator model of decision making, in which
sensory information is accumulated until the decision is resolved by a
threshold crossing. Our results support an interpretation of this model in
which value information adjusts the likelihood of a threshold crossing by
v
raising or lowering the accumulator's initial state. In Chapter 4, we
present a preliminary comparison between LIP and DLPFC activity, under
identical conditions, suggesting they play fundamentally different roles in
decision making. In Chapter 5, we discuss future lines of research.
vi
Table of Contents
Chapter 1 ......................................................................................................... 1
1.1 Foreword ................................................................................................ 1
1.2 The neurobiology of decisions ............................................................... 2
1.3 The neurophysiological study of decisions ............................................ 3
1.3.1 Decision-related signals in LIP ........................................................... 5
1.3.1.1 Evaluation of sensory evidence in LIP ............................................ 6
1.3.1.2 Representation of value in LIP ........................................................ 8
1.4 Integration of sensory and value information in a common currency ... 8
1.5 Studying the integrate sensory and value information .......................... 9
Chapter 2 ....................................................................................................... 12
2.1 Introduction .......................................................................................... 12
2.2 Methods ............................................................................................... 14
2.2.1 A Motion Discrimination Task With Multiple Reward
Contingencies ............................................................................................ 14
2.2.2 Subjects ............................................................................................. 18
2.2.3 Procedures ......................................................................................... 18
2.3 Results .................................................................................................. 19
2.3.1 Relative Reward Biases Choice. ....................................................... 19
2.3.2 Estimating the optimal bias .............................................................. 24
2.3.3 Modeling caveats .............................................................................. 31
2.3.4 “No Choice” Analysis ....................................................................... 35
2.3.5 Saccade Latency ............................................................................... 39
2.4 Discussion ............................................................................................ 41
2.4.1 Sensory and value information are additive ..................................... 41
vii
2.4.2 Monkeys are capable of near-optimal performance ......................... 43
2.5 Summary .............................................................................................. 45
Chapter 3 ....................................................................................................... 47
3.1 Introduction .......................................................................................... 47
3.2 Methods ............................................................................................... 49
3.2.1 Subjects ............................................................................................. 49
3.2.2 Physiological Recordings ................................................................. 50
3.2.3 Cell selection .................................................................................... 51
3.3 Results .................................................................................................. 53
3.3.1 Activity during delayed saccades ..................................................... 53
3.3.2 The representation of choice, absolute value, relative value and
motion coherence in LIP ............................................................................ 56
3.3.2.1 Representation of choice: qualitative description .......................... 57
3.3.2.2 Representation of absolute value: qualitative description ............ 59
3.3.2.3 Representation of relative value: qualitative description .............. 60
3.3.2.4 Representation of motion coherence: qualitative description ........ 63
3.3.2.5 Quantifying LIP dynamics: absolute value, relative value,
motion coherence and choice ..................................................................... 66
3.3.2.6 Quantifying LIP dynamics: absolute value, relative value and
motion coherence within choice ................................................................ 71
3.3.2.7 Quantifying coherence within reward condition ........................... 74
3.3.3 Do individual LIP neurons integrate sensory and value
information? ............................................................................................... 77
3.3.4 Common Currency ............................................................................ 80
3.3.5 Population heterogeneity .................................................................. 85
3.4 Discussion ............................................................................................ 89
viii
3.4.1 The dynamic representation of absolute value, relative value,
coherence and choice. ................................................................................ 90
3.4.2 Relation to the integrator/ accumulator model of decision making .. 91
3.4.2.1 Relative value imposes an additive offset to the accumulator’s
initial state .................................................................................................. 92
3.4.2.2 Coherence effects are consistent with the integrator model .......... 95
3.4.3 Does LIP integrate sensory and value information in a common
currency? .................................................................................................... 97
3.4.4 Representation of value and probability of choice in LIP .............. 100
3.5 Summary ............................................................................................ 103
Chapter 4 ..................................................................................................... 105
4.1 Introduction ........................................................................................ 105
4.2 Methods ............................................................................................. 107
4.2.1 Physiological Recordings ............................................................... 107
4.3 Results ................................................................................................ 108
4.3.1 Cell selection and delayed saccade task ......................................... 108
4.3.2 Population Response ....................................................................... 110
4.3.3 Population heterogeneity ................................................................ 111
4.3.4 Discussion ....................................................................................... 116
Chapter 5 ..................................................................................................... 120
5.1 Summary and conclusions ................................................................. 120
5.2 Future directions ................................................................................ 122
5.2.1 Common currency .......................................................................... 122
5.2.2 Reaction time discrimination .......................................................... 123
5.2.3 Mapping utility with additional reward magnitudes ....................... 124
ix
References ................................................................................................... 126
x
List of figures
Figure 1. A two-alternative, forced-choice, motion discrimination task
with multiple reward contingencies …………………………...….…… 16
Figure 2a-d. Relative reward biases choice .………………………….. 21
Figure 3a-b. Bias is consistent across all experiments and coherence ... 25
Figure 4a-d. Harvesting efficiency is a function of bias ……………… 28
Figure 4e-f. Monkeys’ bias is greater than the optimal bias, which is a
function of psychophysical sensitivity and specific coherence values ... 30
Figure 4g-i. Despite over-bias, monkeys harvest a majority of rewards
………………………………………………………………………….. 32
Figure 5a-b. PMF slopes are independent of reward conditions, and bias
is similar for both relative reward conditions ………………….…….... 34
Figure 6a-b. Fraction of no-choice trials varies with task epoch ..…..... 37
Figure 7a-b. Fraction of no-choice trials is greater for the LL reward
condition in the motion and delay epochs ……………………..…….... 38
Figure 8a-b. Fraction of no-choice trials is greater for the LL reward
condition in the motion and delay epochs for most coherences …..…... 40
Figure 9a-b. Effect of motion coherence and reward condition on saccade
latency ...………………………………………………………….......... 42
Figure 10a-b. Delayed saccade task used to identify LIP response fields
……………………………………………………………………….… 52
Figure 10c-d. Mean response to the delayed saccade task ………….... 54
Figure 10e. In the discrimination experiments one response target was
positioned within the RF of the neuron under study ……………….…. 55
Figure 11a-b. LIP represents the absolute value of the option in the
RF ...…………………………………………………………………… 58
xi
Figure 12a-b. LIP represents the relative value of the option in the
RF .……………………………………………………………………... 61
Figure 13a-b. LIP represents the relative value of the option in the RF
………………………………………………………………..……….... 63
Figure 14a-b. Effect of motion coherence for monkey A and monkey T
…...……………………………………………………………………... 65
Figure 15a-b. Quantifying the dynamics of absolute value, relative value,
motion coherence and choice ………………………………………...... 68
Figure 16a-b. Quantifying the dynamics of absolute value, relative value
within choice for monkey A …………………………………………... 72
Figure 16c. Quantifying the dynamics of coherence within choice for
monkey A ……………………………………………………..……..... 73
Figure 17a-b. Quantifying the dynamics of absolute value, relative value
within choice for monkey T ………………….……….……………….. 75
Figure 17c. Quantifying the dynamics of coherence within choice for
monkey T ……………….……………………………………………... 76
Figure 18a-b. The effect of motion coherence is independent of reward
condition …………………..………………………………………….... 78
Figure 19a-f. Individual LIP neurons integrate sensory and value
information ……………………………………………………….....…. 79
Figure 20a-b. To calculate nEVS we modeled the log odds of a spike
occurring with logistic regression model without a factor for choice .… 83
Figure 21a-b. Venn diagrams depicting a larger percentage of neurons is
simultaneously and significantly modulated by absolute reward ……... 84
Figure 22a-b. LIP integrates reward information and motion coherence in
a common currency ……………………………………………………. 86
xii
Figure 23a-b. Examples of single cells with responses nearly identical to
their population average ……………………………………………..… 88
Figure 23c-d. A single cell demonstrating no choice-related activity in
the discrimination task, despite being well tuned in the delayed saccade
task …………………………………………………………………...... 89
Figure 23e. A single cell demonstrating choice-related activity only .... 90
Figure 24a-b. Anatomical magnetic resonance imaging PFC recording
site ….……………………………………………………………….... 108
Figure 25a-b.In contrast to LIP, neurons in the PFC tended to be less
selective, responding similarly to targets positioned anywhere within the
contralateral hemifield …………………………………………….… 110
Figure 26. Average DLPFC response ………………..……………... 112
Figure 27. A single DLPFC neuron ……………..………………….. 113
Figure 28. A single DLPFC neuron ………..……………………….. 115
Figure 29. A single DLPFC neuron responding specifically to particular
combinations of reward condition, choice and epoch ………….…… 116
Figure 30. A single DLPFC neuron responding specifically to particular
combinations of reward condition, choice and epoch ……………..... 117
1
Chapter 1
1.1 Foreword
This dissertation is about decisions. Specifically, it is about how
individual factors combine to generate choice. While I write this
dissertation, the people of the United States of America are deliberating a
momentous decision, one widely believed to be one of the most significant
in contemporary history: who will be the next President. At few other
points in history have so many people been asked to integrate such a vast
amount of varied information to make a single decision.
Ultimately, the decision is based upon information which predisposes
or biases voters to one or the other candidate. For example, one is from the
Democratic party, one is from the Republican party; one is young, one is
old; one is pale-skinned, one is dark-skinned; one has a military background,
one an academic background. As powerful and as useful as these biases can
be when making decisions, we know that to make an optimal decision, we
should not be unduly influenced by our biases. Instead, we should wisely
integrate them with the current information we have about the state of this
country. For example, a voter may inherently prefer an academically trained
candidate but in war time might choose the militarily trained one. Another
voter might be biased against the younger candidate but choose him as a
symbol of change.
Our capacity to incorporate multiple sources of information permits
dynamic decisions, in which prior experience is integrated with the current
situation. Without this capacity, our choices would calcify with history’s
deposition or slavishly meander with the moment. Understanding the
biological foundation and limitation of this capacity ultimately elucidates
how people resolve complex decisions into the choices, political and
2
otherwise, which define our behavior and shape our world.
1.2 The neurobiology of decisions
A core question in neuroscience is how brains generate behavior—that
is, how do they select physical responses appropriate for the current
environment. Obviously the brain must detect, represent and parse relevant
information originating in the environment and respond by activating the
relevant organs, be it the stomach or the eye muscles. Indeed, sensing and
acting comprise the two sides of an arch defining all behaviors. The historic,
central focus of behavioral neuroscience is reconstructing this arch by
tracing the neurophysiological correlates of these sensory and motor systems
inward from the periphery.
This process delineated sets of cortical areas, traditionally viewed as
exclusively part of sensory or motor systems. The areas within the sensory
system were defined by their specific responses to specific features (e.g.
pressure or leftward motion) present in the environment. Those along the
motor arc were defined by their responses in anticipation of, or concurrently
with, actions, and encoded their specific features (e.g. velocity and duration).
Research on both arcs, however, converged on several, “association” cortical
regions, notably the frontal and parietal areas, whose activity is neither
clearly sensory nor motor. For example neurons in the lateral intraparietal
area (LIP) and the dorsolateral prefrontal cortex (DLPFC), respond to
stimuli but continue responding when the stimulus is gone; they indicate
what general action will be taken long before the action is initiated. These
areas also represent aspects of a behavior neither sensory or motor, such as
the magnitude of a reward expected for an action.
The historic tendency to parse cortical areas as either sensory or motor
3
has hindered the clear articulation of this long-recognized link between
sensation and action. In the past ten years, however, research on the neural
basis of decisions has allowed remarkable progress towards articulating this
link by revealing that: 1) decisions provide a conceptual and computational
framework, usefully for parameterizing how sensations are evaluated into
actions; 2) decision based models of neural activity are potentially capable
of capturing the wide variety of information encoded in several “association”
areas; and 3) that the demarcations between sensing, deciding and motor
planing, (particularly deciding and motor planing) are blurry, and behavioral
dependent (for review see 19, 24, 57,61, 71).
Decisions are deliberative evaluations of information about options that
may be difficult to distinguish, be ambiguous, have variable pays offs, or
require prior knowledge to resolve (24, 61). Decision-related signals must,
therefore, represent the evaluation of information, in contrast to the
information itself. For example, if you decide a noisy motion stimulus
moved leftward a decision signal must represent a leftward evaluation, even
if the motion truly went rightward. Decisions are resolved into choices,
which in turn guide action. Thus, decision-related signals must precede, or
concur with, choice-signals and represent all factors ultimately influencing
choice (e.g. sensory information, value information, prior probabilities, etc.)
(27).
1.3 The neurophysiological study of decisions
While the neurophysiological basis of decision-making has been
explicitly investigated in a range of contexts, previous studies focused
largely on either sensory or value based decisions (See 24 for review).
Studies of sensory decisions typically require subjects to perform
4
comparisons or discriminations of sensory stimuli. These stimuli typically
span psychophysical thresholds, creating a range of ambiguities requiring
resolution through a decision. In general, these studies consistently find
that: 1) a majority of neurons in traditional sensory areas primarily encode
dimensions of the stimulus itself (5, 29); 2) activity in prefrontal and parietal
areas represent the graded, decision-related evaluation of the sensory
information (38, 55-56, 65); 3) traditional motor areas (i.e. FEF and MPC)
can also exhibit decision-related activity but only when a sensory stimulus is
directly linked to a specific motor response (23, 30)
For example, in a tactile discrimination task monkeys are trained to
report which of two vibrations applied sequentially to the finger, is higher in
frequency (51, for review see 24 and 57). Single neurons in sensory area S1
monotonically increase their activity with increasing frequency, representing
each vibrations frequency in turn (29). In contrast, some neurons in parietal
areas S2, the lateral prefrontal cortex and media premotor cortex, have
temporally dynamic responses representing the emerging difference between
the frequencies (55-56). This difference is a decision variable, and its
representation emerges during the second vibration, when the decision is
made. A similar pattern of responses are observed during a visual motion
discrimination task, discussed in detail below. Additionally, in visual search
tasks, when monkeys must decide which target in an array of distractors has
a specific conjunction of features, neurons in the frontal eye fields, a region
of the prefrontal cortex involved in moving the eyes, distinguish the target
from distractors and represented distractors in a graded fashion based on
their similarity. (59-60)
Value based decisions are studied in the context of “free choice” tasks
permitting subjects to chose options volitionally. In these studies, an
5
option’s reward statistics (typically reward magnitude or probability) is
dynamically manipulated by the experimenter, and these manipulations are
hidden from the subjects. From the subject’s perspective, this generates
ambiguities in regard to the options. Subjects resolve these ambiguities
through a decision process of valuation. Reward statistics in free choice
tasks have been manipulated though foraging behavior (18, 31, 42, 70) and
competitive games (3, 13, 43-44, 63, 68,). These electrophysiological
studies, and a host of human fMRI studies, have revealed signals in frontal
and parietal cortical areas capable of supporting valuation (36, 39, 50, 75).
Importantly, free choice tasks contrast with instructed choice tasks
(discussed below). Importantly, free choice tasks contrasts with instructed
choice tasks the latter of which are not useful for studying value based
decisions, but are widely used to demonstrate the representation of value in a
range of frontal and parietal areas (1, 45, 52, 73-74).
An example of a foraging-based, free-choice task is the “matching”
paradigm in which monkeys were permitted to freely chose between two
options with probabilistic reward baiting (42 , 70). When this probability
was adjusted across blocks of trials the monkeys adjusted their probability of
choosing a given option to “match” the fraction of rewards recently
experienced from that option. A computational analysis of the behavior
revealed that monkeys base their valuation on the temporal integration of
prior rewards. Electrophysiological recordings revealed that parietal area
LIP represented the resulting valuation (70).
1.3.1 Decision-related signals in LIP
LIP, located within the lateral bank of the intraparietal sulcus, has been
widely studied in the context of both sensory and value based decisions.
6
Within LIP is a population of neurons typical defined by their tendency to
become active when a particular region of the visual field, called the
response field (RF), is the focus of attention (10-11,26) or the target of an
upcoming saccadic eye movement (2, 20, 67).
Early investigation of this “into the RF” activity was largely framed by
attempts to associate this activity with either an attentional, sensory signal,
or an intentional, motor signal (19). Subsequent experiments (52, 65),
however, suggested this activity is better defined by a decision process
capable of bridging the gap between sensation and action. These
experiments demonstrated that LIP independently represents the evaluations
of the sensory evidence supporting a choice into the RF in addition to the
value based options in the RF.
1.3.1.1 Evaluation of sensory evidence in LIP
An extensive series of experiments on decision-related signals in LIP,
including those in this thesis, were conducted in the context of a two-
alternative, forced-choice motion discrimination task. On each trial of this
task monkeys observe a noisy, random-dot motion stimulus and report
which of two possible directions of motion was present by making a
saccadic eye movement to a target. On a trial-to-trial basis the difficulty of
this decision is manipulated by adjusting the proportion of dots moving
coherently in one of the two directions. When the coherence is greater, the
decision is easier and the monkeys made fewer errors.
Using this task investigators initially identified neurons in two visual
cortical areas, MT and MST, representing the motion information itself, in
manner necessary and sufficient for perception (5-7, 58, 64). Neurons in
these regions responded to specific motion directions, reflecting the amount
7
of motion energy present in that direction, and cease their activity when the
stimulus was removed.
Subsequently, Shadlen and Newsome (64-65) demonstrated that LIP
neurons, in contrast to MT, represent a decision about the motion stimulus.
During the motion discrimination LIP begins encoding the choice to saccade
into the RF. This response, however, is initially modulated by the motion
coherence. This modulation is not motor related because it is independent of
the specific parameters of the motor response. Additionally, the modulation
is not a sensory response because, unlike MT, it reflects the monkey’s
erroneous choices. The authors reasoned that this modulation represents the
monkey’s evaluation of the motion evidence supporting a choice to saccade
to the target in the RF.
Based on these and other subsequent results, Shadlen and colleagues
modeled LIP as an evidence accumulator or integrator (48, 65). In this
model LIP accumulates a decision variable, which can be thought of as the
weight of evidence supporting a choice, to a threshold. When this threshold
is crossed the decision resolves to a choice. In the motion discrimination
task the decision variable is the accumulated difference between opposing
motion sensors in MT. Additionally, Gold and Shadlen (21) suggest that
this decision variable is proportional to the logarithm of an option’s
likelihood ratio (discussed in greater detail below) and thus capable of
additively incorporating additional factors such as option value.
This model is well developed computationally (22-24, 48) and predicts
LIP responses to several manipulations of the motion discrimination task
including: allowing the monkey to respond freely (54), additional
alternatives (9) and variable motion durations (37). Additionally, the model
is general enough to accommodate non-sensory decision variables, such as
8
elapsed time (34, 46).
Additional support for the model has come from intracortical
microstimulation during the motion discrimination task. Microstimulation
of MT (12, 58) has confirmed predicted changes in choice and reaction time
resulting from manipulations of the sensory representation of motion
direction. Similarly, microstimulation of LIP influences the decision process
by introducing an additive offset to the integrator resulting in a small bias in
choice and a larger effect on reaction time (28). Finally, Gold and Shadlen
(23) stimulated the frontal eye fields while monkeys evaluated the motion
evidence and demonstrated that decisions are represented in the evolution of
oculomotor commands, consistent with the integrator model.
1.3.1.2 Representation of value in LIP
The same population of LIP neurons implicated in the evaluation of
sensory evidence are also modulated by the value of the option within the
RF. These modulations are observed in the free-choice, matching paradigm
discussed above as well as in an instructed choice task. Platt and Glimcher
(52) instructed monkeys to saccade to targets associated with either large or
small magnitude rewards placed within a RF. They reported that while the
monkey awaits the saccade’s instruction, LIP represents an option’s greater
reward magnitude with a greater firing rate. In an additional manipulation,
they found that LIP encoded the probability that a saccade into the RF would
be instructed and thus rewarded.
1.4 Integration of sensory and value information in a common currency
As discussed above, single neurons in LIP represent sensory and value
signals supporting decisions about an option within the RF. If LIP activity
9
represents decisions, then it should encode all factors ultimately influencing
choice. It has been proposed that LIP integrates the multitude of factors
momentarily influencing shifts in gaze (choices) or visual attention in a
“common currency” (21, 70-71). A common currency implies that diverse
information is encoded in a “currency,” or scale, dependent on its ”common”
influence on behavior. Evidence of a common currency for reward signals
has been demonstrated in rats capable of trading-off combinations of natural
rewards with an artificial reward signal introduced through microstimulation
(for review see 66). This suggests the natural reward signals are scaled and
converge to a singular representation before they are traded off with the
artificial reward.
The idea of a common currency is also fundamental to the integrator
model of LIP. Gold and Shadlen (21) posit the logarithm of the likelihood
ratio (logLR) as a neural common currency for combing sensory and value
information, and suggest that a quantity proportional to logLR is the
decision variable represented by LIP. An option’s likelihood ratio (LR)
describes the likelihood that the current evidence would be observed if that
option were correct, relative to the likelihood that it would be observed if the
alternative were correct. Through multiplication the LR can be updated to
include other factors including additional evidence, prior probabilities and
value. Importantly, Gold and Shadlen point out that by taking the logarithm
of the LR (the logLR) these factors can be accumulated additively.
1.5 Studying the integrate sensory and value information
The work presented in this dissertation primarily focuses on decision-
related activity in the lateral intraparietal area (LIP) and, secondarily, the
dorsolateral prefrontal cortex (DLPFC). As discussed above, independent
10
investigations indicate these areas are separately modulated by sensory
information, value information and choice appropriate to represent decisions.
When both sensory and value information must be simultaneously integrated
to make choices, however, it is unknown, if, how and when these areas
integrate these factors. It is important to understand how these areas process
decisions requiring the dynamic combination of sensory and value
information because a majority of real-world decisions are based on both
factors.
To investigate this we have developed a behavioral paradigm in which
animal subjects must combine sensory and value information, on a trial-to-
trial basis, to make optimal choices. This paradigm is based on the well-
known motion discrimination task, discussed above, in which subjects must
view a noisy motion stimulus and report which of two opposed directions
they perceived. In our task the magnitude of the reward associated with
each option varies from trial to trial. On some trials both options are worth
equally large or small rewards. On other trials one option’s reward is greater
then that of the other.
In Chapter 2, we first demonstrated that in the unequal reward
conditions subjects choices are consistently biased towards the greater
magnitude option. Additional, we will demonstrate this bias is independent
of the motion stimulus strength and its magnitude is nearly optimal. In
Chapter 3, we will demonstrate that single neurons in cortical area LIP
consistently, simultaneously and dynamically represent both sensory and
value information. We will argue that this representation supports an
integrator model of decision making, in which sensory information is
accumulated until the decision is resolved by a threshold crossing. Our
results support an interpretation of this model in which value information
11
adjusts the likelihood of a threshold crossing by raising or lowering the
accumulators initial state. In Chapter 4, we present a preliminary
comparison between LIP and DLPFC activity, under identical conditions,
suggesting they play fundamental different roles in decision making. In
Chapter 5, we discuss future lines of research.
12
Chapter 2
2.1 Introduction
The goal of the following experiments is to study the behavior of rhesus
monkeys in a decision-making paradigm requiring the dynamic combination
of sensory and value information. To accomplish this, our behavioral
paradigm borrows elements from paradigms previously used to isolate either
the sensory or value components of a decision.
The sensory component of our task is based on a two-alternative,
forced-choice, direction discrimination task used to study sensory-based
decisions (28, 32, 64). On each trial monkeys observed a noisy, random-dot
motion stimulus and reported which of two possible directions of motion
was present by making a saccadic eye movement to one of two
corresponding targets. On a trial-to-trial basis the difficulty of this decision
is manipulated by adjusting the proportion of dots moving coherently in one
of the two directions. When the coherence is greater, the decision is easer
and the monkeys make fewer errors. As discussed above, this task has been
used to identify decision related neural signals corresponding to the weight
of evidence supporting a decision. This paradigm arms us with strong
predictions of how the monkeys choices are influenced by task parameters,
and an array of well honed analysis tools to quantify this behavior.
To this sensory decision we have added a very simple value element by
changing the magnitude of the reward associated with correct choices of
each option. On each trial we overtly inform the monkey, with a visual cue,
what volume of juice reward he will receive for correctly choosing (as
defined by the motion coherence) each option. As mentioned in Chapter 1,
in the context of a free-choice, “matching” paradigm, monkeys allocate their
responses in proportion to a target’s subjective, relative value. That is, they
13
are biased towards choosing targets of greater relative value. However, this
and other studies of value have been limited in only addressing the influence
of relative value, as opposed to absolute reward value. We have designed
our behavioral paradigm to vary both relative and absolute reward value.
The ideas of absolute and relative value are simple and fundamental to
our behavioral paradigm. Absolute value refers to an option’s value
independent of other potential options, while relative value refers to an
option’s value in relation to alternatives. For example, consider two options
offering an equally small reward: each has a small absolute value, but
because neither option is more valuable than the other, neither has any
relative value. Similarly, two options offering an equally large reward each
have a larger absolute value (compared to the two equally small offers), but
again, no relative value. When, however, one option offers a large reward
and the other a small reward, along with their absolute values, each option
has a relative value that is larger or smaller than the alternative.
We know monkeys will be biased toward choosing options with greater
relative value and we know they will also be more more likely to choose
options supported by greater coherence. In the following behavioral
experiment, we ask how monkeys integrate these factors on a trial-to-trial
basis. To encourage the monkeys to consider both of these factors on every
trial, we have incorporated these multiple reward contingencies into our
motion discrimination task so that on some trials they conflict.
For example, on some trials in which the options differ in relative value,
the motion coherence is towards the lower value target. In these conditions
the monkeys could ignore the motion stimulus and always choose the option
with greater relative value. Alternatively, the monkeys could ignore the
greater reward and chose the option supported by greater motion coherence.
14
As we will demonstrate below, however, these extreme-bias behaviors are
sub-optimal for reward harvesting. Optimal behavior requires a bias of
moderate size which is dependent on the magnitude of the relative reward,
the range of coherences presented, and the monkey’s capacity to
discriminate the motions direction.
Along with determining how monkeys integrate motion coherence and
relative reward value, and whether they do so optimally, we will also be able
to determine the extent to which absolute reward value influences
performance. While we do not anticipate a bias in response to changes in
absolute reward, it is possible that when presented with two options, each
with a small value, the monkeys are less sensitive to the motion coherence
and make more errors. Additionally, when a small reward is certain
monkeys might be less likely to engage in the task overall.
2.2 Methods
2.2.1 A Motion Discrimination Task With Multiple Reward
Contingencies
On each behavioral trial the monkeys observed a noisy random-dot
motion stimulus and reported which of two possible directions of motion are
present with a saccadic eye movement to the corresponding target. The
motion stimulus is composed of white dots, viewed through a circular
aperture, on a dark computer screen. On each trial a variable proportion of
the dots moved coherently in one of two opposite directions while the
remaining dots were flashed transiently at random location and times (for a
detailed description see 5). The difficulty of discrimination was varied
parametrically, from trial-to-trial, by adjusting the percentage of the dots in
15
coherent motion: the task was easy if many of the dots moved coherently
(i.e. 50% or 100% coherence), but became progressively more difficult as
the coherence decreased.
Importantly, the coherence only describes the strength of the motion,
not its direction. In the data figures that follow, the direction of coherent
motion is indicated by “signing” the coherence. Thus +25% coherence and
–25% coherence are equally strong motion signals, but move in opposite
directions. Typically, the animals viewed a range of signed coherences
spanning psychophysical threshold. The animals were always rewarded for
indicating the correct direction of motion, except at 0% coherence where
they were rewarded randomly (50% probability) irrespective of their choice.
Figure 1 illustrates the sequence of events comprising a typical trial of
the motion discrimination task. From left to right, trials began with the onset
of a small, dot that the monkey must visually fixate for 150 ms. Next, two
saccade targets appeared (hollow gray circles) for 250 ms. The two targets
were 10 degrees eccentric from the visual fixation point and 180 degrees
apart from each other. The targets were positioned in-line with the axis of
motion being discriminated. By convention, the target corresponding to
positive coherence is target 1 (T1) while the other is target 2 (T2). At the
end of the trial, the monkey reported his decision by making a saccadic eye
movement to one of these targets.
After 250 ms the targets changed color, indicating the magnitude of
reward the monkey available to the monkey for correctly choosing that
target. A blue target indicated a low magnitude (L) reward (1 unit, ~0.12 ml
of juice), while a red target indicated a high magnitude (H) reward (2 units).
As there are two reward magnitudes (H and L) to be assigned to each target
locations (T1 and T2), there were four reward conditions overall,
16
schematized by the vertical row of panels in Figure 1: 1) the LL condition in
which both targets were blue, 2) the HH condition in which both targets
were red, 3) the HL condition, in which T1 was red and T2 was blue, and 4)
the LH condition which was the mirror image of the HL condition.
The colored targets were visible for 250 ms before onset of the visual
motion stimulus which appeared for 500 ms, centered on the fixation point.
Fixate Targets Reward Motion Delay Go!
Figure 1. A two-alternative, forced-choice, motion discrimination task with mul-tiple reward contingencies.The sequence of events comprising a typical trial of the motion discrimination task. From left to right, trials begin with the onset of a !xation point. Two saccade targets appear and then change color indicating the magnitude of reward avail-able for correctly choosing that target. A blue target indicates a low magnitude (L) reward, while a red target indicates a high magnitude (H) reward. There are four reward combinations LL,HH, LH and HL, respectively depicted vertically. The visual motion stimulus appears centered on the !xation point. Following o"set of the motion stimulus, subjects maintain !xation for a variable delay period after which the !xation point disappears, cueing the subjects to report their decisions with a saccade to the target corresponding to the perceived direction of motion. If the subjects choose the correct direction of motion, they receive the reward indicated by the color of the chosen target.
17
Following offset of the motion stimulus, the monkey was required to
maintain fixation for a variable delay period (300-550 ms) after which the
fixation point disappeared, cueing the monkey to report his decision with a
saccade to the target corresponding to the perceived direction of motion. If
the monkey chose the correct direction of motion, he received the reward
indicated by the color of the chosen target.
Fixation was enforced throughout the trial by requiring the monkey to
maintain its eye position within an electronic window (1.25 degrees radius)
centered on the fixation point. Inappropriate breaks of fixation were
punished by aborting the trial and enforcing a time-out period before onset
of the following trial. Psychophysical decisions were identified by detecting
the time of arrival of the monkeys’ eye in one of two electronic windows
(1.25 radius) centered on the two choice targets (T1 and T2).
All trials were presented pseudo-randomly in block-randomized order.
For monkey A, we employed 12 signed coherences, 0% coherence and four
reward conditions, yielding 52 conditions overall. For monkey T we
eliminated two of the lowest motion coherences because this animal’s
psychophysical thresholds were somewhat higher than those of monkey A.
Thus monkey T was tested for 36 conditions overall. We attempted to
acquire 40 trials for each condition, enabling us to characterize a full
psychometric function for each of the four reward conditions. Because these
behavioral data were obtained simultaneously with electrophysiological
recordings, however, we did not always acquire the full 40 trials for each
condition (the experiment typically ended when single unit isolation was
lost). For the data reported in this paper, the number of repetitions obtained
for each experiment ranged from 19 to 40 with a mean of 36. The full data
set analyzed in this paper consists of 35 experiments from monkey A and 26
18
experiments from monkey T.
2.2.2 Subjects
Two adult male rhesus monkeys, A and T (12 and 14 kg), were trained on a
two-alternative, forced-choice, motion discrimination task with multiple
reward contingencies. Daily access to fluids was controlled during training
and experimental periods to promote behavioral motivation. Before training,
the monkeys were prepared surgically with a head-holding device (14) and a
scleral search coil for monitoring eye position (35). All surgical, behavioral,
and animal care procedures complied with National Institutes of Health
guidelines and were approved by the Stanford University Institutional
Animal Care and Use Committee.
2.2.3 Procedures
During both training and experimental sessions monkeys sat in a primate
chair at a viewing distance of 57 cm from a color monitor. Visual stimuli
were presented on the monitor under computer control. The monkeys’ heads
were positioned stably using the head-holding device, and eye position was
monitored throughout all experimental sessions my means of a magnetic
search coil apparatus (0.1o resolution; CNC Engineering, Seattle, WA).
Behavioral control and data acquisition were managed by a PC-compatible
computer running the QNX Software System’s (Ottawa, Canada) real-time
operating system.
The experimental paradigm was implemented in the NIH Rex
programming environment (Hays, Richmond, & Optican, 1982). Visual
stimuli were generated by a second PC-compatible computer and displayed
using the Cambridge Research Systems VSG (Kent, UK) graphics card and
19
accompanying software development tools. Liquid rewards were delivered
to the animals through a gravity-fed juice tube placed near the animal’s
mouth, activated by a computer-controlled solenoid valve. Subsequent data
analyses and computer simulations were preformed on Apple Macintosh
(Cupertino, CA) computers in the Mathworks MATLAB (Natick, MA)
programming environment.
2.3 Results
2.3.1 Relative Reward Biases Choice.
Figures 2a-d depicts psychometric functions (PMFs) describing each
monkey’s probability of choosing T1 (ordinate) as a function of motion
coherence (abscissa). As mentioned above, motion coherence is denoted
with a magnitude indicating the strength of the motion, and a sign indicating
its direction. Thus, +48% and -48% denote coherences of equal strength but
opposite direction. Positive coherence denotes motion towards T1 while
negative coherence denotes motion towards T2. A separate PMF is plotted
for each of the four reward conditions. The HH condition is plotted in red;
the LL in blue; the HL in black; and the LH in green. The circles depict the
observed proportion of T1 choices for each combination of coherence and
reward condition. The sigmoidal curves are fit quantitatively with logistic
regression. Figures 2a and 2b depict data from a representative experiment
for monkey A and monkey T respectively. Figures 2c and 2d depict the
average PMF across all behavioral sessions for monkeys A (n=35) and T
(n=25) respectively.
Our logistic regression model describes the log-odds-ratio of choosing
T1 as a function of the linear sum of several factors. In this model we have
20
included a factor for the coherence of the motion stimulus and the values of
each of the two targets, as described by equation 1:
Equation 1:
Where p is the observed probability of choosing T1; βcoh, βt1 and βt2 are
the fit coefficients representing the effect of motion coherence and target
value on this probability. β0 represents any global bias the monkey has
towards choosing T1. COH is an assigned a factor for the coherence of the
motion stimulus, in fractional units of the maximum coherence and signed to
signify the direction as described above. Thus, COH has a range from -1 to
1, where -1 represents -48% coherence and +1 represents +48% coherence.
T1val and T2val are assigned either +1, if the target was H, or -1 if the target
was L. For example, on HL trials in which the motion coherence was -12%,
COH=-0.25, T1=+1 and T2=-1. Constraining these factors to be in the same
range (-1 to 1) allows us to directly compare the values of the fit coefficients.
Equation 1 can be rearranged to Equation 2, which is used to generate the
sigmoid functions seen in Figure 2.
Equation 2:
ln
�p
1− p
�= β0 + βcoh(COH) + βt1 (T1val) + βt2 (T2val)
P =1
1 + e−(β0 + βcoh(COH) + βt1(T1val) + βt2(T2val))
21
-48 0 48
Monkey T
% T1Choices
Monkey A 100
50
0-48 0 48
% Coherence
-48 0 48
Monkey T
% T1Choices
Monkey A 100
50
0-48 0 48
% Coherence
Figure 2. Relative reward biases choice. a-d Psychometric functions (PMF) describing each monkey!s probability of choosing T1 as a function of motion coherence. Motion coherence is denoted with a magnitude indicating the strength of the motion and a sign indicating its direction. Positive coherence denotes motion towards T1 while negative coherence denotes motion towards T2. Separate PMFs are plotted for each reward condition (HH, red; LL, blue; HL, black; LH, green). Circles depict the observed proportion, and sigmoidal curves are fit quantitatively with logistic regression. a-b Results from one representative experiment for monkey A and monkey T, respectively. c-d Average PMF across all behav-ioral sessions for monkeys A (n=35) and T (n=25), respectively.
c d
a b
22
Several features of data presented in Figure 2 are notable. To begin,
consider the single behavioral session from monkey A plotted in Figure 2a.
First, the observed behavior for the HH and LL reward conditions (red and
blue circles, respectively) is nearly identical, indicating that the monkey’s
probability of choosing T1 is unaffected by changes in absolute reward.
Second, for each coherence the black and green circles, representing the
observed probabilities of a T1 choice for the HL (black) and LH (green)
conditions, are shifted vertically in relation to the HH and LL conditions.
The upward vertical shift in the HL condition (black) indicates that, across
all coherences, the monkey was more likely to choose T1, the higher value
target. The downward vertical shift in the LH (green) condition indicates the
opposite; the monkey was less likely to choose T1, the lower value target,
and more likely to choose T2, the higher value target. These data indicate
that the monkey’s choices are biased towards the target with higher relative
value. This bias results in corresponding leftward and rightward shifts of the
logistic model fit to the data for the HL (black) and LH (green) reward
conditions, respectively.
Figure 2c depicts the average (± s.e.m) behavior for monkey A, across
all behavioral experiments, in an identical style to Figure 2a. Note that this
average behavior is similarly fit with Equation 2 and shows similar results.
Across all behavioral experiments the monkey’s probability of choosing T1
is affected only by the motion coherence and changes in relative reward.
Figure 2d depicts the same results for monkey T. Thus, the bias resulting
from changes in relative reward are highly robust and reproducible both
within and across the two monkeys.
To quantify the magnitude of this bias, we measured the horizontal shift
between the HL and LL and between the LH and LL PMFs. This quantity,
23
the “behavioral equivalent visual stimulus” (bEVS), is in units of motion
coherence and corresponds to the amount of visual stimulus that would
produce an increase (or decrease) in T1 choices equal to that produced by
the increase (or decrease) in relative reward. Note that this approach to
quantifying the effect of relative reward is identical to that taken by Salzman
and colleagues to quantify the effects of MT microstimulation (58). bEVS is
defined by equation 3:
Equation 3:
For example, in the PMF plotted in Figure 2a the bEVS for the HL
condition is 12.3% coherence. This means that increasing the motion
coherence towards T1 by 12.3% coherence and increasing the value of T1,
relative to T2, by one unit of reward, both exert equivalent effects on the
probability of a T1 choice. For the data shown in Figure 2b, bEVS=15.7%
coherence; for Figure 2c, bEVS=14.7% coherence; and for Figure 2d,
bEVS=16.3% coherence. Figure 3a and 3b depict population data as
frequency histograms of the bEVS measured in each behavioral experiment
in monkey A and monkey T, respectively. The solid red line in each figure
indicates the mean of the distribution (monkey A: ±15.4% coh; monkey T:
±17% coh), and the dotted lines indicate the s.e.m (monkey A: ±0.9393;
monkey T: ± 1.3206). Both monkeys exhibited considerable day-to-day
variation in the size of the bias.
Note that in the logistic model this reward bias is expressed by the
addition of βt1 and βt2 (equations 1 and 2) and is independent of the
bEV S =βt1− βt2
βcoh
24
coherence term. To verify this we recomputed this logistic model with two
two interaction terms to capture any coherence dependent effect of reward.
Monkey A had a significant interaction between coherence and T1val in 6
(17.65%) behavioral sessions and between coherence and T2val in 8
(23.53%) sessions. Monkey T had a significant interaction between
coherence and T1val in 3 (12%) behavioral sessions and between coherence
and T2val in 8 (32%) sessions. Figures 3c and 3d plot the mean βt1 (blue)
and βt2 (red) coefficients for each coherence, after incorporating any
significant affects of coherence as mediated by significant interaction terms.
The flat lines in Figures 3c and 3d demonstrates that for monkey A and T,
βt1 and βt2 did not systematically vary, on average, as a function of
coherence. This suggests that, in mechanistic terms, the bias is equally
effective across all coherences. The appearance of a smaller bias effect at
larger coherences in the plots in Figure 1 is due to saturation of “percentage
T1 choices” (bound between 0 and 100) at high coherences. If the same data
were plotted in the log-odds space generated by Equation 1, the fits are a
series of straight, parallel lines with the additive bias causing a single offset
across all coherences.
2.3.2 Estimating the optimal bias
Intuitively we can understand that excessive bias, say always choosing
the high value target, results in fewer rewards earned, as the monkey makes
choices that are at odds with clear sensory evidence on high coherence trials.
Similarly, an under-bias increases the chance of selecting the low value
target under great uncertainty on low coherence trials. Some intermediate
bias level will be optimal for harvesting the rewards optimally, and this
optimal bias will depend on the monkey’s capacity to discriminate the
direction of motion. A perfect motion discriminator would always know
25
Figure 3. Bias is consistent across all experiments and coherences.a-b Frequency histograms of monkeys! bias (% coherence) in each behavioral experiment; each distribution!s mean (±s.e.m.) is demarked in red (monkey A: 15.4±0.9393 % coh; monkey T:17±1.3206 % coh). c-d Mean value of ßt1 (blue) and ßt2 (red) coefficients for each coherence after incorporating any significant effects of coherence as mediated by significant interaction terms.
c d
a bMonkey A
Count
Bias (% coherence)
2
4
6
8
10
5 15 25 35
Monkey T
5 15 25 35
1
2
3
4
Monkey A
% Coherence
RewardCeffficient
1
0
-1
0 3 12 48
Monkey T
0 12 48
26
which choice is correct and should show no bias at all, whereas a very poor
discriminator, facing great uncertainty, should exhibit a larger bias.
Similarly, the optimal bias should also depend on the overall difficulty
motion stimulus set. A set largely composed of difficult stimuli requires a
larger bias than a set of easy stimuli.
How close do our monkeys come to establishing an optimal bias? To
address this question quantitatively we calculated the percentage of rewards
(in drops of juice) that a subject could harvest in principle across a range of
behavioral biases and relative reward ratios, given each monkey’s average
sensitivity to the visual stimulus. For each behavioral bias, assuming no
spatial bias, the probability of choosing T1 is given by:
Equation 4:
Where βcoh is a fit coefficient from equation 1, which defines the slope of
the average PMF from the normative LL condition (Figs. 2c and d, blue
curves), and B is a specific choice bias in units of percentage coherence.
COH denotes the actual motion coherence values experienced by each
monkey as described in Methods. Equation 5 incorporates equation 4 to
define Harvesting Efficiency (HE), our quantity of interest.
Equation 5:
P =1
1 + e−(βcoh(COH)+B)
!
HE(B,T1,T2) =
Pcoh,BT1coh>0"
#
$ %
&
' ( + (1) Pcoh,B )T2
coh<0"
#
$ %
&
' ( + 0.5* P0T1+ (1) P0)T2( )
Ncoh>0T1+ Ncoh<0T2 + 0.5(T1+ T2)
27
The numerator is the total number of rewards (in drops of juice)
obtained by the hypothetical subject in a hypothetical experiment; the
denominator is the total number of rewards that became available during the
experiment. Figures 4a and 4b shows the model results for monkeys A and
T, respectively. Here we plot the harvesting efficiency (color-coded surface)
as a function of choice bias (bEVS in %coh) on the abscissa, and T1:T2 ratio
on the ordinate. The monkeys in our experiments only experienced two of
the T1:T2 ratios in the plots of Figure 4 (1:1 and 2:1), but examination of the
entire surface is useful for understanding how reward ratio, bias, and
harvesting efficiency interact.
The surfaces exhibit two important features. First, as the reward ratio
increases, the optimal bias (peak HE) grows positively away from 0%
coherence. Thus, to maximize harvesting efficiency, the monkey must bias
its choices toward T1, and the amplitude of the bias should increase as the
reward ratio increases. The second feature is the striking asymmetry in HE
between large positive and negative biases. A larger-than-optimal bias is
punished less severely, in terms of harvesting efficiency, than a smaller-
than-optimal bias. Strategically, therefore, an animal should err on the side
of an over-bias.
These features are perhaps better appreciated in Figures 4c and 4d,
which plots for each monkey two horizontal slices though the surface in
Figures 4a and 4b—the two reward ratios actually experienced by each
animal. The blue and green horizontal curves are slices through T1:T2=1
and T1:T2=2, respectively, as experienced in the HH/LL and HL/LH
conditions. HE is plotted on the ordinate, as a function of choice bias
(bEVS) on the abscissa for both T1:T2 values.
28
Monkey A12
4
6
8
10
T1:T2ratio
Bias (% coherence)-40 -20 0 20 40
Monkey T
-40 -20 0 20 40
Monkey A
Harvesting efficiency
1
0.5
0 -40 -20 0 20 40
Monkey T
Bias (% coherence)
-40 -20 0 20 40
c d
a b
Figure 4a-d. Harvesting efficiency is a function of bias.a-b Color-coded surfaces depicting harvesting efficiency (HE) as a function of bias and reward ratio. Maximizing harvesting efficiency requires a bias to T1 whose amplitude increases with the reward ratio. c-d HE plotted as a function of choice bias. Curves are horizontal slices through the surface in a and b at the two reward ratios used (T1:T2=1, HH/LL, blue and T1:T2=2, HL/LH, green). Blue vertical lines demark peak HE for T1:T2=1 (monkey A:77.16%; monkey T:84.75%). Green vertical lines demark peak HE for T1:T2=2 (monkey A:80.02%; monkey T: 86.32%). The shifted peak indi-cates it is optimal to bias choices towards T1 (monkey A 9.2%; monkey T:6.8% coherence). Black vertical lines depict observed average biases (monkey A: 14.7%; monkey T: 16.3%).
29
When T1:T2=1 the peak harvesting efficiency (monkey A:77.16%;
monkey T:84.75%; blue vertical line), is achieved with no bias. When
T1:T2=2 (the HL condition, plotted in green) the peak of the HE curve
(green vertical line) is both elevated and shifted. The elevation means that,
if the bias is optimal, peak HE increases to 80.02% and 86.32% harvested
rewards for monkeys A and T, respectively. The shift of the peak indicates
that in this reward condition it is optimal to bias choices towards T1. From
these plots, which are derived from each animal’s average behavior (Fig. 1),
we can determine that the optimal bias for monkeys A and T is 9.2% and
6.8% coherence, respectively. However, as stated above, the observed
average biases (black vertical lines in Figures 4c and 4d) are larger: 14.7%
coherence for monkey A and 16.3% coherence for monkey T. In fact, both
monkeys exhibit a consistent over-bias across all the behavioral experiments.
This can be clearly seen in Figure 4e which plots the observed biases (%coh),
on the ordinate, and the calculated optimal bias (%coh) on the abscissa, for
monkey A (circles) and T (pluses) for all experiments.
Note that for a given reward ratio the optimal bias is a function of two
related factors. First, the slope of the PMF, defined by βcoh, which varies
across experimental sessions, and second, the specific set of coherences
experienced by each monkey. Figure 4f plots the optimal bias, on the
ordinate, as a function of βcoh, on the abscissa, for T1:T2=2. The curved
red line is for monkey A, the blue for monkey T. The straight vertical lines
demark the βcoh fit with Equation 1 for monkey A (red) and monkey T
(blue). Note that the red curve is above the blue, indicating that across all
βcoh monkey A’s optimal bias is greater then monkey T’s. This results
from the fact that monkey A experienced four stimulus conditions (-3%, -
1.5%, 1.5% and 3% coherence) that monkey T did not (see Methods).
30
Because these extra conditions are very low coherence, monkey A faced
more uncertainty than did monkey T, requiring a larger bias for optimal
performance.
Although both monkeys exhibit reward biases larger than optimal, they
do not pay much of a penalty for the overbias. Inspection of Figures 4c and
4d, for example, reveals that the over-bias results in HE's of 79.40% and
85.8% for monkeys A and T respectively, which represents a HE penalty of
Optimal bias (% coh)
Actualbias(% coh)
0 10 20
10
20
30
Optimalbias(% coh)
ßcoh
20
10
04 5 6
Figure 4e-f. Monkeys! bias is greater then the optimal bias, which is a func-tion of psychophysical sensitivity and specific coherence values.e Observed bias plotted against calculated optimal bias for monkey A (circles) and T (pluses) for all experiments. f Optimal bias for T1:T2=2 as a function of ßcoh, which varies across experimental sessions, and defines the PMF"s slope. Red and blue curves are for monkeys A and T, respectively. The straight vertical lines demark the ßcoh fit with Equation 1 for monkey A (red) and monkey T (blue). Across all ßcoh values, monkey A!s optimal bias is greater then monkey T!s because A experienced four additional coher-ences (-3%, -1.5%, 1.5% and 3% coherence) of greater uncertainty.
e f
31
only 0.62% and 1.74% relative to the optimal bias. Figure 4g, which plots
the observed HE (ordinate) as a function of the optimal HE (abscissa)
demonstrate this point for all the behavioral experiments from monkey A
(circles) and T (pluses). Figures 4h and 4i depicts frequency histograms
showing a distribution of the percentage of the optimal HE achieved by each
monkey in each experiment. Although the monkeys have a consistent
overbias, they are still harvesting, on average, 98.6% (A) and 97% (T) of the
optimal HE.
2.3.3 Modeling caveats
It is important to note that the model described by equation 1, which we
have used to quantify the monkeys’ behavior for all the above analyses,
constrains the resulting sigmoidal fits in two relevant ways. First, because
there is no interaction term between βcoh and either βt1 or βt2, the slope of
the PMF, which describes how accurately the monkey discriminates the
direction of motion, is constrained to be equal for each of the four reward
conditions. That is, we are assuming that the monkeys are equally willing
and able to discriminate the direction of motion in all four reward conditions.
This assumption can be tested directly by modeling the behavior from each
of the four reward conditions separately using equation 6.
Equation 6:
Equation 6 is similar to equation 1 but lacks terms for the target values,
ln
�p
1− p
�= β0 + βcoh(COH)
32
Figure 4g-i. Despite over-bias, monkeys harvest a majority of rewards.g Observed harvesting efficiency plotted against optimal harvesting efficiency for all behavioral experiments from monkey A (circles) and T (pluses). Monkeys do not pay much of a penalty for their over-bias. h-i Frequency histograms showing the distribution of the percent of the optimal harvesting efficiency achieved by each monkey in each experiment. Although the monkeys have a consistent over-bias, they are still harvesting, on average, 98.6% (A) and 97% (T) of the optimal.
96 97 98 99 100 92 94 96 98 100
1
2
3
4
Monkey TMonkey A
Count
% Optimal harvested rewards
8
6
4
2
h i
70 80 90
80
90
Observedharvestingefficiency
Optimal harvesting efficiency
g
33
which are irrelevant since we are modeling within a reward condition.
Figures 5a and 5b depict histograms of the resulting βcoh values from
monkeys A and T, respectively. A one-way Anova revealed no significant
difference in psychophysical sensitivity (βcoh) across the four reward
conditions for monkey A (p=0.7980), but a weakly significant difference
was detected in monkey T (p=0.06). Clearly, monkey A is equally able and
willing to discriminate the direction of motion in all four reward conditions.
The difference detected in monkey T is a weak trend toward slightly lower
βcoh values (less psychophysical sensitivity) for the LL condition (Fig. 5b,
blue bars; mean βcoh=5.0) in comparison to all other conditions (mean
βcoh=5.8). We reran the optimality analysis to determine whether the slight
reduction in psychophysical sensitivity for the LL condition affected the
outcome. The effect was minimal—only a 1.2% coherence increase in the
estimate of the optimal bias. We conclude that the assumption of equal
psychophysical sensitivity across reward conditions is generally legitimate,
and that small departures from equal sensitivity had little effect on our
results.
Second, because the target value factors (T1val and T2val) differ only
in their sign, the magnitude of the lateral shift resulting from the behavioral
bias towards the high value target is constrained to be equal for the HL and
LH reward conditions. We can test this assumption by modeling each of the
reward conditions independently using equation 4. Figures 5c and 5d shows
the absolute value of the bias in the LH condition as a function of the bias in
the HL condition for each behavioral experiment. A paired t-test of these
data revel no significant difference in the distributions of the bias term in the
two reward conditions (monkey A: p=0.44; monkey T: p=0.24). The
choice bias differed substantially from experiment to experiment, but was
34
Monkey A Monkey T
Count
ßcoh
12
8
4
4 8 12
12
8
4
4 6 8 10
LHHLLLHH
LHHLLLHH
10 20 30 40
10
20
30
40Monkey A
0 20 40
0
20
40
Monkey T
| LH |bias
(% coh)
HL bias (% coh)
ba
dc
Figure 5. PMF slopes are independent of reward conditions, and bias is similar for both relative reward conditions. a-b Frequency histograms of ßcoh for each reward condition modeled separately using equation 6. There is no significant difference in psy-chophysical sensitivity (ßcoh) across reward conditions for monkey A (one-way ANOVA p=0.7980). A weak but significant trend toward slightly lower values (less sensitivity) for the LL condition was detected in monkey T (one-way ANOVA p=0.06) c-d Absolute value of LH bias plotted against the HL bias for each behavioral experiment. A paired t-test revealed no signifi-cant difference in the distributions (monkey A: p=0.44; monkey T: p=0.24). Biases differed between experiments but were reliable within experiments (monkey A: r=0.565, p< 0.001; monkey T, filled points dropped: r=0.482 p<0.05).
35
fairly reliable within an experiment, as evidenced by the positive correlation
in Figure 5c. Even after dropping two outlier data points (filled points), the
bias terms were significantly correlated in both monkeys (monkey A:
r=0.565, p< 0.001; monkey T, after dropping two outliers: r=0.482 p<0.05).
2.3.4 “No Choice” Analysis
The preceding analyses were concerned exclusively with successfully
completed trial in which the monkey unambiguously chose T1 or T2. Recall
that successful completion of a trial required the monkey to: 1) maintain
fixation within an electronically defined “fixation window” until receipt of
the “go” signal, 2) initiate the operant saccade within 1 second of the
disappearance of the fixation point, and 3) execute a saccade that terminates
within the detection window surrounding the chosen target. On some trials,
however, these conditions are not met.
For example, the monkey’s eye position might leave the fixation
window before the fixation point disappears, or the monkey might not look
at one of the targets when the fixation point does disappear. These trials are
considered “no-choice” trials; such trials are aborted immediately and are
not included in our standard analyses of behavioral and electrophysiological
data. No-choice trials comprise roughly 9% of all trials (monkey A:
mean=9.17%; s.e.m±0.433; monkey T: mean=9.62%; s.e.m±0.99).
Although a no-choice trial can result from several different behaviors
including eye-blinks, eye-drift and errant or early explicit saccades, they all
reflect some sort of failure to engage the task in a sufficiently precise
manner. We analyzed these no-choice trials to determine whether they were
modulated by parameters of the behavioral paradigm, which might yield
additional insight into how the monkeys were influenced by motion
36
coherence and reward information.
Figures 6a and 6b plot the mean fraction of no-choice trials (mean in
white, ±s.e.m in black) across all experiments (ordinate) as a function of trial
time (aligned to target onset—abscissa) for monkeys A and T respectively.
Comparison of these two plots reveals that the monkeys show very similar
patterns of no-choices. During the reward cue epoch (250-500 ms), in which
the monkey first learns the reward condition for that trial, the fraction of no-
choices increases transiently. As the monkeys enter the motion epoch (500-
1000 ms) the fraction of no-choice trials is low but then increases as the
viewing period progresses. A similar trend is evident in the delay period:
the fraction of no-choices is low during the early delay epoch (1000-1300
ms) but rises during the late delay epoch (1000-1550 ms). Clearly, the
likelihood of generating a no-choice trial is certainly modulated by task
epoch. We next analyze whether the fraction of no-choice trials is further
influenced by parameters such as the reward condition and motion
coherence.
We selected two task epochs for further analysis: the motion cue
period and the delay period. The bar graphs in Figures 7a-d depict for each
epoch the mean fraction (±s.e.m) of no-choice trials within that epoch for
each reward condition. For each epoch and monkey we performed a one-
way anova and a post-hock, pairwise comparison test (corrected for multiple
comparisons using THD) test to identify differences in no-choice frequency
among reward conditions.
Figures 7a and 7b plot data from the motion stimulus period for
monkeys A and T, respectively. In this epoch both monkeys generated
significantly more no-choice trials in the LL reward condition than in any of
the reward condition in which a high value target was present (monkey A:
37
Mean fraction ofno choice
Time from target onset (ms)
Monkey ATargetepoch
Rewardepoch
Motion epoch Early delayepoch
Latedelayepoch
0.1
0.05
0 250 500 1000 1300 1550
Mean fraction ofno choice
Time from target onset (ms)
Monkey T0.1
0.05
0 250 500 1000 1300 1550
a
b
Figure 6. Fraction of no-choice trials varies with task epoch. a-b The mean fraction of no-choice trials (white, ±sem in black) as a func-tion of time aligned to target onset for all experiments. The likelihood of generating a no-choice trial is modulated by task epoch.
38
Monkey A motion epoch
Mean fraction no-choicein epoch
0.5
0.25
0
Reward condition
HH LL HL LH
Monkey A delay epoch
HH LL HL LH
Monkey T motion epoch
Mean fraction no-choicein epoch
0.5
0.25
0
Reward condition
HH LL HL LH
Monkey T delay epoch
HH LL HL LH
Figure 7. Fraction of no-choice trials is greater for the LL reward condition in the motion and delay epochs. a-d Bar graphs depicting the mean fraction (±sem) of no-choice trials within the motion epoch (a and c) and delay epoch (b and d) for each reward condi-tion. There are significantly more no-choice trials in the LL condition than in any condition with a high-value target (one-way ANOVA and post-hoc com-parison test, monkey A: p<0.001; monkey T: p<0.001). This trend was also present during the delay period for both monkey A (p < 0.001) and monkey T (p < 0.001).
a
c
b
d
39
p<0.001; monkey T: p<0.001). This trend was also present during the delay
period for both monkey A (p < 0.001) and monkey T (p < 0.001) as seen in
Figures 7c and 7d, respectively.
To investigate the influence of motion coherence on the generation of
no-choices, we further analyzed the no-choice trials from the motion
stimulus and delay epochs. Figures 8a-d plots the mean (±s.e.m) of no-
choice trials within these task epochs, separately for each of the reward
condition, on the ordinate, as a function of motion coherence, on the abscissa.
For this analysis we have combined data from both monkeys to gain
statistical power.
Figures 8a and 8b plot the results from the motion stimulus and delay
period, respectively, for the HH (red) and LL (blue) conditions. Note in
both epochs the blue line lies above the red line across almost all motion
coherences, indicating that the likelihood of a no-choice trails is greater LL
trails. Thus, while absolute reward value does not affect either monkey’s
probability of choosing T1 (Fig. 2) it does affect both monkeys’ probability
of completing a trial successfully. Figures 8c and 8d plot results for the HL
(black) and LH (green) conditions for the stimulus and delay epochs,
respectively. Note that in both epoch the black line (HL) lies above the
green (LH) on the left half of the graph while the green (LH) line lies above
the black (HL) on the right half. This indicates that in the relative reward
conditions, both monkeys were more likely to generate a no-choice trail
when the motion coherence was towards the low value target.
2.3.5 Saccade Latency
Another aspect of the monkeys’ behavior that is potentially modulated
by task parameters is the basic saccade metrics. Here we consider the effect
40
Mean fractionno-choicein epoch
0.2
0.4
0.6
0.8
0.2
0.4
0.6Monkey A motion epoch Monkey T delay epoch
-48 0 48 -48 0 48Coherence (%)
Mean fractionno-choicein epoch
0.2
0.4
0.6
0.8
0.2
0.4
0.6Monkey A motion epoch Monkey T delay epoch
-48 0 48 -48 0 48Coherence (%)
Figure 8. Fraction of no-choice trials is greater for the LL reward condition in the motion and delay epochs for most coherences. a-d Depicts the mean (± sem) fraction of no-choice trials for the HH (red), LL (blue), HL (black) and LH (green) reward conditions within the motion (a and c) and delay (b and d) epochs as a function of motion coherence. For this analysis we have combined data from both monkeys to gain statistical power.
a
c
b
d
41
of reward condition and motion coherence on the mean saccade latency,
defined as the time between fixation offset and saccade initiation. Figures
9a and 9b plot the mean (±s.e.m) latency across all behavioral experiments
as a function of unsigned motion coherence for the HH (red), LL (blue), HL
(black) and LH (green) reward conditions. The open circles denote the
observed average latencies, the solid lines
are regression lines individually fit to the data, and the dashed lines are the
95% confidence intervals for the regression lines. For clarity, the latency
measurements in Figure 9 are combined for directions of equal coherence.
Consider first the results from Monkey A in Figure 9a. The most
striking feature is a highly typical (28, 65) dependence of mean latency on
coherence, with higher coherence resulting in shorter latencies. This result
was quantitatively confirmed by the linear regression model that produced
significantly negative slopes for all four reward conditions (p<0.0001 for all
conditions). The results for monkey T, plotted in Figure 9b, are less striking.
For monkey T, all regression coefficients were negative, however, they were
only significant in the HH (p=1.5x10-5) and LH (p=0.01) conditions.
2.4 Discussion
2.4.1 Sensory and value information are additive
The most important result of these behavioral experiments is the
systematic lateral shifts and identical PMF slopes for relative reward
conditions as compared to absolute reward conditions. These data indicate
that relative value exerts a simple additive effect on current sensory evidence
in the formation of perceptual decisions, implying that we may see additive
effects at the neural level as well.
42
Gold and Shadlen (21) have posited a theoretical framework, based on
signal detection theory (SDT), in which sensory and value information can
be incorporated into a single decision variable though addition. In this
framework an option is chosen if its likelihood ratio (LR) is greater then
unity. An option’s LR describes the likelihood that the current evidence
would be observed if that option were correct, relative to the likelihood that
it would be observed if the alternative were correct. Through multiplication
the LR can be updated to include other factors including additional evidence,
prior probabilities and relative value. Importantly, Gold and Shadlen point
135
145
155
165
120
160
200
240
Response Latency(ms)
Monkey A Monkey T
Coherence (%)
0 12 24 48 0 12 24 48
Figure 9. Effect of motion coherence and reward condition on saccade latency.a-b Depicts the mean (±sem) latency across all behavioral experiments as a function of unsigned motion coherence for the HH (red), LL (blue), HL (black) and LH (green) reward conditions. Circles demark observed average laten-cies, solid lines demark individually fit regressions and the dashed lines demark the 95% confidence intervals for the regression lines. For clarity, the latency measurements are combined for directions of equal coherence.
a b
43
out that by taking the logarithm of the LR (logLR), these factors can be
accumulated additively. They further posit the logLR as a common neural
currency for combing sensory and value information, and suggest that a
quantity proportional to logLR is represented by LIP neurons. While we are
unable to address the issue of a common neural currency with these
behavioral data, we will consider the matter further in the next chapter when
we discuss our physiological experiments in area LIP.
Additionally, SDT predicts that absolute value has no effect on the
likelihood of selecting an option. While our results confirm this, we assume
there are only two options, one of which much be selected. However, even
in a two-alternative, forced-choice task such as ours, there is always at least
one other course of action -- to choose neither option. This truly represents a
third option, whose value is not under behavioral control. Our analysis of
no-choice trials (Figs. 6, 7 and 8), in which the monkey failed to choose
either of the two targets, reveals that this third option is reflected reliably in
both monkeys’ behavior.
2.4.2 Monkeys are capable of near-optimal performance
As discussed in Results, the optimal amount of bias in response to
increases in relative reward depends on the relative value of the options, the
monkey’s perceptual sensitivity and the set of coherences employed in the
experiment. Our analysis demonstrates that both monkeys’ performance is
nearly optimal, harvesting on average 98% of the maximum available
rewards. Departures from optimality result from a consistent over-bias
(Figure 4e), for which there are several potential explanations.
Given the asymmetry of the HE surface (Figs. 4a-d) clearly the
consequences of excessive positive bias are less severe than the
44
consequences of a negative bias. One is therefore tempted to attribute the
observed over-bias to a conservative strategy on the monkey’s part, but this
explanation is unsatisfactory upon closer scrutiny. Negative bias is not a
realistic option a priori—it is extremely unlikely that a monkey would ever
exhibit a bias toward the target of lesser relative reward magnitude! Given
the improbability of a negative bias within our experimental design, and
given that neither monkey ever demonstrated negative biases it is perhaps
more reasonable to look elsewhere for an explanation of the observed over-
bias.
A more likely explanation for the observed over-bias is that the
monkeys’ valuation of the relative reward is nonlinear. Although the
objective value of our relative reward is only one unit of juice, the subjective
value of that increase to the monkey may be greater than one unit.
Economists and behavioral ecologists have long been familiar with such
nonlinear transformations in the subjective value, or “utility”, of increases in
reward magnitude (19, 66, 69). Utilities that are larger than would be
expected from the objective magnitudes are a hallmark of positive utility
functions, which are frequently observed in animals that have not yet
achieved daily requirements of food or water (69). This is analogous to the
situation of our monkeys, who enter each experiment needing to work to
obtain their daily fluid allotment. We speculate, therefore, that our monkeys
exhibit over-biases because of positive utility functions associated with a
highly motivated desire for fluids. We are unable to define our monkeys’
actual utility functions as that would require at least one additional reward
magnitude (e.g 3:1). Nevertheless, if we assume a positive utility function,
we can consult the plots in Figures 4a and 4b to determine quantitatively
how the monkey subjectively values the one unit increase in objective value
45
provided in our experiments. We accomplish this by identifying the reward
ratio for which each monkey’s observed bias would be optimal. The
analysis reveals that monkey A valued a one unit increase in reward as
though it were in fact a 1.99 unit increase, while monkey T valued the one
unit increase as though it were a 3.78 unit increase.
2.5 Summary
In this chapter we investigated the behavior of rhesus monkeys in a
decision-making paradigm requiring the dynamic combination of sensory
and value information. The sensory component of our task is based on a
two-alternative, forced-choice, direction discrimination task used to study
sensory-based decisions. To this sensory decision we have added a very
simple value element by changing the relative and absolute value of the
reward associated with correct choices of each option.
The most important result of these behavioral experiments is the
systematic lateral shifts and identical PMF slopes for relative reward
conditions (HL and LH) as compared to absolute reward conditions (HH and
LL). These data indicate that relative value exerts a simple additive effect
on current sensory evidence in the formation of perceptual decisions,
implying that we may see additive effects at the neural level as well. Our
analysis demonstrates that both monkeys’ performance is nearly optimal,
harvesting on average 98% of the maximum available rewards. Departures
from optimality likely result from a consistent over-bias because of positive
utility functions associated with a highly motivated desire for fluids.
In Chapter 3 we will investigate this behavior at the neural level with a
series of neurophysiological recordings within cortical area LIP. As
discussed in Chapter 1, in the context of decision-making, single LIP
46
neurons are modulated by both the the strength of motion coherence and
reward value. Thus, it is an ideal place to begin an investigation into where
and how the two disparate sources of information in our task, motion
coherence and target value, are integrated at the neural level.
47
Chapter 3
3.1 Introduction
The behavioral data presented in Chapter 2 demonstrated that monkeys
engaged in a motion discrimination task with multiple reward contingencies
integrate motion coherence and reward value in a near-optimal fashion. Our
analysis shows that first, this integration occurs on a trial-to-trial basis and
second, that across all motion coherences there is an additive bias towards
targets of greater relative value. This additive bias can be quantified with
the bEVS metric, which expresses the lateral shift of the PMF in terms of
motion coherence.
To investigate this behavior at the neural level we performed a series
of neurophysiological recordings within cortical area LIP, located on the
lateral bank of the intraparietal sulcus. Within LIP, we further focused our
investigation by selecting for study a subset of neurons that carry signals
generally thought to be relevant for decisions to move the eyes. These
neurons are usually identified by their increased activity when there is either
a shift of attention to, or in anticipation of a saccade to, a specific region of
space, referred to as the neuron’s response field (RF). Following procedures
established by several laboratories, we selected these eye movement related
LIP neurons using a delayed saccade task in which a visual target is
presented within a neuron’s RF while the monkey awaits a cue to saccade to
the target for a reward. Specifically, we selected neurons demonstrating a
persistent increase in activity during the delay between the presentation of
the target and the time of the saccade.
Early studies of these LIP neurons discussed their activity in terms of
either attention to the RF (4, 10-11, 26) or a motor plan to move the eyes
into RF (2, 20, 67). However, subsequent investigations demonstrated that
48
this, “into RF” versus “out-of RF,” or “choice,” activity is in fact graded and
highly modulated by several cognitive factors including the weight of
evidence supporting the decision (65), the prior probability the saccade will
be instructed (52), the relative magnitude of reward associated with a
saccade (52) and the relative subjective value of a saccade (70). Most
relevant to our study are modulations correlated with the weight of evidence,
specifically motion coherence, and modulations correlated with relative
target value.
In the context of a simple two-alternative, forced-choice, motion
discrimination task, these LIP neurons are modulated by the weight of
evidence (the strength of the motion coherence) supporting decisions into
and out of the RF. In these experiments, as in ours, two opposing saccade
targets are presented inline with the axis of motion coherence, with one
target located within the RF of the neuron under study (28, 54, 65). If a
decision to choose the target in the RF is based on strong evidence (a highly
coherent motion stimulus) favoring that target, the delay period activity is
greater than if it is based on weaker evidence (low coherence). Conversely,
if a decision to choose the target outside the RF is based on strong evidence
favoring that target, delay period activity is less than if it is based on weaker
evidence.
LIP delay period activity is also finely modulated by the relative value
of the target in the RF with greater relative values generally producing
greater delay period activity. This has been shown both for conditions in
which the target’s value is explicitly signaled (52), as it is in our experiment,
and for conditions in which the monkey must generate an internal estimate
of the targets’ value based on previous experience (70). In these previous
studies, however, target value was only defined in relative terms: a target in
49
the RF was always of greater or lesser value then a target out of the RF.
These studies were thus unable to determine if LIP delay-period activity is
also modulated by differences in absolute target value.
Given the host of factors shown to modulate these neurons, it has been
proposed that LIP integrates information from multiple sources that
momentarily inform the behavioral relevance of a stimulus in the RF. Taken
as a whole, these LIP neurons would comprise a map of the visual field that
could be used for the allocation of attention or to direct saccades.
Furthermore, it has been posited that information converging on LIP is
integrated in a “common currency.” (21-22, 70-71) By common currency we
mean that information is encoded in a “currency,” or scale, that depends on
its ”common” influence on behavior. A common currency predicts that two
disparate factors (such as motion coherence and target value) that have an
equivalent influence on a behavior relevant to LIP (such as the probability of
saccade generation) would modulate LIP activity equivalently.
Thus, LIP is a logical place to begin an investigation into where and
how the two disparate sources of information in our task, motion coherence
and target value, are integrated. By placing one of our targets within the RF
of an LIP neuron, we will be able to: 1) reveal the extent to which LIP
represents absolute and relative reward, 2) determine if and how single
neurons are modulated by both motion coherence and target value, 3)
investigate the the dynamics of integration as behaviorally relevant
information is presented sequentially and 4) determine whether this
information is integrated in a common currency.
3.2 Methods
3.2.1 Subjects
50
The same two adult male rhesus monkeys that participated in the behavioral
experiments presented in Chapter 2 were used in the following physiological
experiments. Before physiological recordings, each monkey underwent an
additional surgical procedure to place a recording chamber above the
intraparietal sulcus.
3.2.2 Physiological Recordings
Area LIP was identified by a combination of sterotactic location,
characteristic physiological activity and anatomical magnetic resonance
imaging. Single neurons were isolated and their activity recorded with
extracellular microelectrodes. Monkey T received a single craniotomy that
matched the dimensions of the recording cylinder. For monkey A, the
cylinder was placed on intact skull protected with a thin layer of dental
acrylic. For this animal, a 3 mm “burr-hole” was drilled, under surgical
conditions, one day before beginning recordings at a given location within
the recording cylinder.
For monkey A, neurophysiological recording was accomplished with
quartz/platinum-tungsten (Thomas Recording, Giessen, Germany) electrodes
that were positioned and manipulated daily with a 5-channel single electrode
system (“Mini Matrix,” Thomas Recording, Giessen, Germany). For
monkey T, we employed tungsten electrodes (FHC Inc., Bowdoin, Maine)
positioned with a Crist grid (Crist Instruments Co., Inc., Hagerstown,
Maryland) and manipulated with a Narishige single electrode drive
(Narishige Co., LTD, East Meadow, New York).
Real time experimental control was implemented in the Rex software
environment for the Qnx operating system (QNX software, Ontario, Canada)
running on a PC compatible computer. Visual stimuli were generated using
51
a VSG graphics card (Cambridge Graphics, UK) and presented on a CRT
display. After amplification, single unit spiking activity was identified and
collected along with digitized task events and eye position traces using the
Plexon (Plexon Inc., Dallas, Texas) data acquisition system operating in
conjunction with Rex. All data were subsequently analyzed offline with
custom scripts written in the MATLAB (The MathWorks, Inc., Natick,
Massachusetts) programming language, running on Apple Macintosh (Apple
Computer, Inc., Cupertino, California) computers.
3.2.3 Cell selection
As mentioned above, we limited our study to LIP neurons identified as
having persistent delay-period activity during a delayed saccade task. We
employed a variant of the delayed saccade task that has been used
extensively to identify these neurons. The temporal structure of this task is
illustrated in Figure 10a. From left to right, trials began with the onset of a
small fixation target. After the monkey acquired and fixated the target for
150 ms, a single saccade target appeared for a variable delay period (250-
800 ms). At the end of the delay period the fixation point disappeared,
cueing the monkey to saccade to the target. For monkey A the saccade
target was always blue, indicating a low magnitude (L) reward (1 unit, ~0.12
ml of juice); for monkey T the target had a 50% probability of being red,
indicating a high magnitude (H) reward (2 units, ~0.24 ml of juice).
Fixation was enforced throughout the trial by requiring the monkey to
maintain its eye position within an electronic window (1.25° radius)
centered on the fixation point. Aborting the trial and enforcing a time-out
period before the onset of the following trial punished inappropriate breaks
of fixation. Completed trials were identified by detecting the time of arrival
52
Fixate Targets Delay Go
6000
80
0Time fromtarget onset(ms)
Time fromsaccade (ms)
Meanresponse(spikes/sec.)
Figure 10a-b. Delayed saccade task used to identify LIP response fields.a The temporal structure of the delayed saccade task. Trials began with the onset of a small fixation point. After fixating for 150 ms, a single saccade target appeared for a variable delay period (250-800 ms) before the fixation point disappeared, cueing the saccade to the target. For monkey A the sac-cade target was always blue, indicating a low-magnitude (L) reward; for monkey T the target could also be red, with a 50% probability, indicating a high-magnitude (H) reward (2 units, ~0.24 ml of juice). b An example LIP neuron during the delayed saccade task. Each plot depicts a mean response as a function of time for one of the six saccade directions; activity is aligned to target onset in the left panels and to saccade time in the right panels.
a
b
53
of the monkey’s eye in an electronic window (1.25 radius) centered on the
target. The saccade target was typically presented at six locations in
pseudorandom order—all 10 degrees eccentric and separated by equal polar
angles (Fig. 10b). Eccentricities and angles were sometimes varied to locate
the sensitive region of a given neuron’s RF.
3.3 Results
3.3.1 Activity during delayed saccades
Figure 10b illustrates data from an example LIP neuron during the
delayed saccade task. Each plot depicts mean firing rate, as a function of
time, for one of the six saccade directions; neural activity is aligned to target
onset in the left panel of each plot through the time of the saccade in the
right panel. Note this neuron responds only when a target was presented at
180° and that activity is sustained throughout the delay period. Elevated
activity defines this spatial location as being within this neuron’s RF. We
recorded neural responses from 51 neurons with spatially selective, elevated
delay period activity from the right hemisphere of monkey A and 31
responses from the left hemisphere of monkey T.
Figure 10c depicts the mean FR (±s.e.m) of the 51 neurons from
monkey A as a function of time, when a target was placed within the RF (in-
RF, red traces) and when a target was placed 180° away from the RF (out-
RF, blue traces). As in Figure 10b, the left panel responses are aligned to
target onset, while in the right panel they are aligned to the time of the
saccade. Figure 10d depicts similar data from the 31 neurons from monkey
T. As mentioned above, for monkey T, the targets could also be red or blue
with equal probability, indicating a high magnitude (H) reward or (L) low
magnitude reward. In this plot the red and magenta lines are in-RF
54
55
Figure 10e. In the discrimination experiments one response target was posi-tioned within the RF of the neuron under study.e T1 is the target within the RF of the neuron under study, as illustrated by the purple, dashed circle, while T2 is positioned 180° away, in the opposite hemifield. The axis of stimulus motion was defined by these two target posi-tions so that motion discrimination choices corresponded to saccades into or out of the RF. We denote choices into the RF as T1 choices and those to the opposite target as T2 choices.
e
T1T2
T1T2
T1T2
T1T2
LL
HH
HL
LH
56
responses for the H and L targets, respectively; while the blue and cyan lines
are out-RF responses from the H and L targets, respectively.
While the selected population of LIP neurons from both monkeys is
clearly spatially selective, there are several notable differences between the
two animals. Neurons from monkey A exhibited higher average firing rates,
a faster and more pronounced transient response to the target onset and an
additional transient response at the time of the saccade. These differences
will also be evident in data acquired during the discrimination task. Note
that in Figure 10d, the red and magenta lines are superimposed, as are the
blue and cyan lines, indicating that these LIP neurons are not, on average,
modulated by target value in the context of a simple delayed saccade task.
3.3.2 The representation of choice, absolute value, relative value and
motion coherence in LIP
In the following four sections we address the representation of choice,
absolute value, relative value and motion coherence in our sample of LIP
neurons. In the first three sections, we qualitatively examine the dynamic
effects of each of these factors on LIP activity. In the fourth section, we
present quantitative analyses that capture these dynamic effects.
In all discrimination experiments we positioned one response target
(T1) within the RF of the neuron under study, as illustrated in Figure 10e
(purple dashed circle), while positioning the other target (T2) 180° away in
the opposite hemifield. The axis of stimulus motion was defined by these
two target positions so that motion discrimination choices corresponded to
saccades into or out of the RF. In the following sections, we denote choices
into the RF as T1 choices and those to the opposite target as T2 choices.
This design allows us to study responses of single LIP neurons to all
57
combinations of reward condition, motion coherence and behavioral
response.
3.3.2.1 Representation of choice: qualitative description
Figures 11a (monkey A) and 11b (monkey T) depict mean LIP firing
rate, averaged across all recorded cells, as a function of time for all
successfully completed trials in the HH (red) and LL (blue) reward
conditions. Data is plotted separately for trials in which the monkey chose
T1 (in-RF, solid lines) and T2 (out-RF, dashed lines). Both 11a and 11b
consist of two panels: a left panel with responses aligned to the time of
target onsetand a right panel with responses aligned to the time of the
saccade. The black vertical lines in both figures denote relevant task epochs:
0-250 ms is the target epoch in which the blank targets are presented; 250-
500 ms is the reward epoch in which the targets change color to
cue the reward condition; 500-1000 ms is the motion epoch in which the
random-dot motion stimulus is presented; 1000-1250 ms is the early
segment of the delay epoch; -350-0 ms (in the right panel) is the late delay
epoch immediately preceding the saccade.
Note first that in both 11a and 11b, the solid and dashed lines are
initially identical (for each color), diverging after approximately 200 ms into
the motion period. Thus, shortly after the onset of the motion stimulus, LIP
neurons in both monkeys begin to signal choice: whether the monkey will
choose T1 or T2. This result is not surprising. We explicitly selected for
study neurons that responded differentially to oppositely directed eye
movement in the delayed saccade task. It is well known from previous work
that such LIP neurons typically exhibit “choice predictive” activity during a
variety of tasks. The data in Figure 11 simply confirms that in our task, our
58
Meanresponse(spikes/sec.)
20
10
40
30
0 250 500 1000 -350 0
Meanresponse(spikes/sec.)
12
8
16
0 250 500 1000 -350 0
Monkey ATargetepoch
Rewardepoch
Motion epoch Early delayepoch
Late delay epoch
Time from target onset (ms) Time from saccade (ms)Monkey T
Time from target onset (ms) Time from saccade (ms)
Figure 11a-b. LIP represents the absolute value of the option in the RF.a-b Mean LIP firing rate, for all cells, as a function of time, for the HH (red) and LL (blue) reward conditions. Data are plotted separately for T1 (solid) and T2 (dashed) choices. In the left panels, responses are aligned to the target onset, while in the right panels, responses are aligned to saccade time. Any difference between the red and blue curves indicates LIP repre-sents the absolute value of the option in the RF.
b
a
59
sample LIP neurons exhibit choice predictive activity in which decisions are
based on a combination of visual motion and reward information. The effect
of behavioral choice in our data is strong, consistent across neurons and
monkeys and present for all reward conditions as demonstrated below.
3.3.2.2 Representation of absolute value: qualitative description
As discussed in the context of behavioral data in Chapter 2, any
differences in performance or in neural activity between the HH and LL
conditions indicate an effect of absolute reward value. By comparing the red
and blue lines in Figure 11 we can see the extent to which LIP represents
absolute reward value. Consider first the data from monkey A in Figure 11a.
The solid red and blue traces (T1 choices) separate with very short latency
following presentation of the reward cues at 250 ms. Thus monkey A’s LIP
population rapidly encodes the absolute value of T1, producing elevated
firing rates when a high value target is presented within the RF. Following
their initial separation, the red and blue traces converge briefly near the
beginning of the motion epoch, but then separate again for the duration of
the trial. Qualitatively, then, except for a brief interval near the onset of the
motion stimulus, LIP neurons from monkey A encode a signal concerning
the absolute value of the reward available in the RF throughout the trial.
Note that that a similar and more robust representation of absolute value is
present for T2 choices as well (dashed traces).
Figure 11b shows a similar pattern of activity for the LIP population
recorded from monkey T. Even though LIP activity in monkey T does not
respond as rapidly or robustly as in monkey A (consistent with the delayed
saccade data—Fig. 10c, d), all major features of the absolute value signal
observed in monkey A are replicated in monkey T: 1) the effect of absolute
60
value begins during the reward cue period, 2) greater absolute value is
represented by higher firing rates, 3) the effect is maintained until the end of
the trial and 4) the effect is present for T2 choice trials as well. A minor
difference is that the absolute reward signal does not “disappear” at any
point in the trial for monkey T.
3.3.2.3 Representation of relative value: qualitative description
As revealed by the behavioral data in Chapter 2, the relative reward
value of the two targets exerts a substantial impact on choice behavior. We
can examine the extent to which LIP represents relative value by comparing
LIP responses in the HH and HL reward conditions. In these conditions, the
value of T1 is constant (high value) while the value of T2 differs (high in
HH, low in HL). Thus, any LIP modulation between these two conditions
indicates a relative effect of T2 value on the response to the high value target
present in the RF. Figures 12a and 12b depict LIP responses for monkeys A
and T, respectively, to the HH (red traces) and HL (black traces) reward
conditions. The format of these figures is identical to Figures 11a and 11b,
and the red curves are the same as in Figure 11.
In Figure 12a, the black and red traces separate late in the reward cue
epoch (black arrow), with the average firing rate being higher for the HL
condition (black arrow). This difference indicates that on average, LIP
neuronsrespond more strongly to a target in the RF (T1) when it has a larger
value relative to that of the T2 target. This “relative value” signal is present
throughout the motion epoch but disappears early in the delay epoch, after
the choice has presumably been determined. The same dynamics are evident
both for T1 and T2 choices (solid and dashed lines, respectively).
A similar pattern of activity is present for the population data from
61
Meanresponse(spikes/sec.)
20
10
40
30
0 250 500 1000 -350 0
Meanresponse(spikes/sec.)
12
8
16
0 250 500 1000 -350 0
Monkey ATargetepoch
Rewardepoch
Motion epoch Early delayepoch
Late delay epoch
Time from target onset (ms) Time from saccade (ms)Monkey T
Time from target onset (ms) Time from saccade (ms)
Figure 12a-b. LIP represents the relative value of the option in the RF.a-b Mean LIP firing rate, for all cells, as a function of time, for the HH (red) and HL (black) reward conditions. HH curves are the same as in Figure 11a-b. Data are plotted separately for T1 (solid) and T2 (dashed) choices. In the left panels, responses are aligned to the target onset, while in the right panels, responses are aligned to saccade time. Any difference between the red and black curves indicates LIP represents the relative value of the option in the RF.
a
b
62
monkey T, illustrated in Figure 12b. As for monkey A, the relative reward
signal emerges late in the reward cue epoch (black arrow), with average
firing rate being higher for larger relative value. For monkey T, however,
the relative reward signal fades more rapidly than for monkey A.
Additionally, for T1 choices, the relative reward signal inverts during the
second half of the motion epoch and remains inverted throughout the delay
epoch. This inversion is not present for T2 choices, however.
By comparing the LL and LH reward conditions, we acquire a second
look at the effects of relative reward on LIP activity. As in the previous
comparison of HH and HL trials, the value of T1 is identical (low) for the
LL and LH conditions. The two conditions differ only in the value of T1
relative to the value of T2, which is equal in the LL condition but low in the
LH condition. Again, any modulation of LIP activity between these two
conditions comprises a signal of relative reward value.
Figures 13a and 13b, plotted in an identical manner to Figures 11 and
12, compare average LIP responses in the LL (blue traces) and LH (green
traces) conditions for monkeys A and T, respectively. Note that the blue
curves in these figures are the same as the blue curves in Figures 11a and
11b. The data for monkey A shows an effect of relative reward similar to
that seen in Figure 12a. The green trace drops below the blue trace during
the reward cue epoch (black arrow), indicating again that average LIP firing
rates fall as the relative value of the target in the RF decreases. The green
and blue traces converge again during the motion period and remain together
throughout the delay period, indicating a diminished representation of
relative reward. As shown in Figure 13b, the effect of relative reward is
similar, although weaker, in monkey T (black arrow).
63
Meanresponse(spikes/sec.)
20
10
40
30
0 250 500 1000 -350 0
Meanresponse(spikes/sec.)
12
8
16
0 250 500 1000 -350 0
Monkey ATargetepoch
Rewardepoch
Motion epoch Early delayepoch
Late delay epoch
Time from target onset (ms) Time from saccade (ms)Monkey T
Time from target onset (ms) Time from saccade (ms)
Figure 13a-b. LIP represents the relative value of the option in the RF.a-b Mean LIP firing rate, for all cells, as a function of time, for the LL (blue) and LH (green) reward conditions. LL curves are the same as in Figure 11a-b. Data are plotted separately for T1 (solid) and T2 (dashed) choices. In the left panels, responses are aligned to the target onset, while in the right panels, responses are aligned to saccade time. Any difference between the blue and green curves indicates LIP represents the relative value of the option in the RF.
a
b
64
3.3.2.4 Representation of motion coherence: qualitative description
To assess qualitatively the effect of motion coherence on LIP activity,
we separately plotted the response to individual motion coherences for the
HH reward condition. Figures 14a-b depict the mean LIP firing rate as a
function of time for monkey A and monkey T, respectively. This plot
format differs somewhat from the previous three figures. First, time begins
at the onset of the motion stimulus (500 ms, left edge). Second, the three
colors now represent three different motion coherences—48%, 6% and 0%
for monkey A (Fig. 14a), and 48%, 12% and 0% for monkey T (Fig. 14b).
Finally, to avoid confounding motion coherence effects with behavioral
choice, we plot data from correct choices only, for nonzero coherences.
Thus the solid lines (T1 choices) derive from positive coherences (except at
0% coherence) while the dashed lines (T2 choices) derive from negative
coherences.
For clarity and brevity we are only presenting the effects of coherence
for the HH reward condition. While the results from the other reward
conditions are comparable, they are qualitatively less compelling. Indeed,
while the following trends are qualitatively weak, they are all confirmed by
our regression models (discussed below). At the start of the motion epoch
(500 ms) for both monkey A and monkey T, all the lines are collapsed
together. The initial response to motion onset is a commonly observed (28,
54, 65) “dip” in activity (gray arrow), which is sometimes interpreted as the
initialization of the motion integration process. Following this dip, the solid
traces rise above the dashed, consistent with the choice predictive activity of
LIP neurons documented in previous studies and in Figures 11-13. Within
these diverging responses the black, blue and red curves also separate.
As discussed in the introduction, we expect the weight of evidence
65
30
10
500 1000 -350 0
Meanresponse(spikes/sec.)
Time from target onset (ms) Time from saccade (ms)
Monkey A, HH reward condition
Monkey T, HH reward condition
Motion epoch Early delayepoch
Late delay epoch
Figure 14a-b. Effect of motion coherence for monkey A and monkey T.a Mean LIP firing rate for the HH reward condition as a function of time from the start of the motion epoch, for three motion coherences: 0% (black), 6% (blue) and 48% (red). We plot data from correct choices only, for non-zero coherences. Grey arrow demarks response “dip;” black arrow demarks graded coherence trend and cyan arrow demarks absence of coherence trend. b Similar for monkey T, but for 0% (black), 12% (blue) and 48% (red) coherence .
a
b20
10
500 1000 -350 0
Meanresponse(spikes/sec.)
Time from target onset (ms) Time from saccade (ms)
Motion epoch Early delayepoch
Late delay epoch
20
66
supporting a decision, (the coherence) to modulate the T1 and T2 responses,
with greater coherence producing greater modulation. Thus, we expect
responses to the highest coherence (48%, red) to produce the greatest
activity when the monkey chose T1 (solid lines) and the least activity when
he chose T2 (dashed lines). Furthermore, we expect response to 6% (12%,
for monkey T) and 0% to be progressively reduced for T1 choices and
increased for T2 choices. This trend is clearly visible (black arrow) for both
T1 (solid) and T2 (dashed) responses.
Finally, note the effects of coherence are predominantly visible during
the second half of the motion epoch. Once the motion epoch ends (1000 ms)
and the delay epochs begin the consistent effects of coherence are greatly
diminished and by the late delay epoch they appear to be entirely absent
(cyan arrows). This indicates that as the motion epoch ends, LIP is
modulated by the impending choice but not by the sensory evidence that
supported it.
3.3.2.5 Quantifying LIP dynamics: absolute value, relative value,
motion coherence and choice
As the preceding section demonstrated, the response of both LIP
populations are highly dynamic, representing different behaviorally relevant
factors to varying degrees at various times. The qualitative assessment
above indicates that on average LIP neurons multiplex the absolute value,
relative value and motion coherence signals. Additionally, LIP is strongly
modulated by the impending choice. To quantify these trends we have
applied a multiple-variable, linear regression model to LIP activity over a
sliding temporal window, in order to determine if and how absolute value,
relative value, motion coherence and choice are modulating LIP as a
67
function of time. The model is described in Equation 7.
Equation 7:
Where FR(t) is the mean firing rate over a given temporal epoch and trial;
βcoh, βt1, βt2 and βchoice are the fit coefficients representing the effect of
motion coherence, target value and choice on this firing rate. COH is an
assigned factor for the coherence of the motion stimulus on that trial, in
fractional units of the maximum coherence and signed to signify the
direction as described above. Thus, COH has a range from -1 to 1, where -1
represents -48% coherence and +1 represents +48% coherence. T1val and
T2val are assigned either +1, if the target was H, or -1 if the target was L.
For example, on HL trials in which the motion coherence was -12%, COH=-
0.25, T1val=+1 and T2val=-1. Choice is assigned a value of +1 for T1
choices and -1 for T2 choices. Constraining these factors to be in the same
range (-1 to 1) allows us to directly compare the values of the fit coefficients
and determine which have greater impact on FR. Note, Equation 7 is very
similar to Equation 1, which was used to model the probability of a T1
choice, with the addition of a choice factor.
For each LIP neuron we apply this model to the average firing rate over
a 50 ms window that is progressively slid, in 1 ms intervals, across the
duration of a trial. This generates a time vector of coefficients (βcoh, βt1,
βt2 and βchoice) for each neuron in the population describing that factor’s
influence on the mean firing rate of that neuron at that time point.
Figures 15a and 15b plot the mean (±s.e.m) coefficient, across neurons,
for βcoh (black), βt1 (red), βt2 (blue) and βchoice (green) as a function of
FR (t) = β0+βcoh(COH)+βt1(T1val)+βt2(T2val)+βchoice(CHOICE)
68
69
time for monkey A and T respectively. The format of Figure 15 is similar to
Figures 11, 12 and 13. When interpreting these results keep in mind, first,
that it is the sum of βt1 and βt2 that fully captures how given reward
conditions (HH, LL, HL and LH) modulate FR. Second, for each reward
condition, before addition, the coefficients are multiplied by the appropriate
factor values (T1val and T2val). For example, as pointed out above, in the
HL reward condition the factor for βt1 is +1 while the factor for βt2 is -1.
Thus, although βt2 might have a negative value, in this condition it actually
has a positive influence on FR. Third, βt1 models the effect of the target
within the RF and thus absolute reward, while βt2 models the effect of the
target outside the RF and thus relative reward.
Consider the red line in Figure 15a, which is the mean (±s.e.m) βt1
value for Monkey A. This coefficient rises rapidly during the reward epoch,
diminishes as the motion epoch begins but then quickly rebounds during the
motion epoch. As the motion epoch ends, the coefficient again diminishes
but stabilizes throughout the delay epochs. This confirms the representation
of absolute reward value we observed in the average FR (Figure 11a). The
red curve in Figure 15b, which plots mean (±s.e.m) βt1 for Monkey T,
follows a similar rise and fall, indicating that both LIP populations have
similar, temporal representations of absolute value.
The results for monkey T, while following the same trend as monkey A,
differ in two main respects: the coefficients are smaller and they are more
variable. The smaller value coefficients result from the lower overall firing
rate in monkey T’s population compared to monkey A’s (c.f. Figs. 10-13).
The greater variance likely results from the smaller sample size (monkey A:
n=51; monkey T: n=31).
The blue lines in Figure 15 show the average βt2 value, capturing the
70
effect of relative value by modeling the influence of the target outside the
RF. Like absolute value, the influence of relative value begins at the onset
of the reward-cue epoch, but it grows more slowly than its counterpart and
peaks at the start of the motion epoch. As the motion epoch unfolds,
however, the effect of relative value diminishes. Note that while the effect
of relative reward persists thought the delay period, it is much smaller than
the effect of absolute value (red). Also note that the average βt2 coefficient
changes its sign at the end of the motion epoch, implying that as the motion
epoch ends, LIP inverts its representation of relative value. This trend is
visible in the average firing rates of Figures 12a and 13a. The blue line in
Figure 16b plots similar βt2 results for Monkey T. Note that monkey T’s
population inverts its representation of relative value midway through the
motion period, much earlier than monkey A.
The black lines in Figures 15 depict the average value of the βcoh
coefficient, capturing the effect of motion coherence on LIP FR. In both
Figures 15a and 15b the black lines begin to rise approximately 200 ms into
the motion epoch, reach their peak approximately 400 ms after motion onset,
after which they decline to zero. For monkey A the effect of motion
coherence then reemerges at a very low level during the delay period, while
for monkey T motion coherence has no effect on the firing rate during the
delay period.
The average value of βchoice, representing the effect of choice
outcome on LIP, is depicted by the green lines in Figures 15a and 15b. The
green lines in both figures follow a very similar and straightforward trend,
emerging from zero after about 200 ms into the motion period. As the
effects of other factors diminish, the effect of choice continues to grow
throughout the delay period, reaching its peak immediately preceding the
71
saccade. Note that for both monkeys, the peak effects of choice are nearly
equal to the peak effects of absolute value.
3.3.2.6 Quantifying LIP dynamics: absolute value, relative value and
motion coherence within choice
While the quantitative analysis presented above captures the obvious
and subtle effects of absolute value, relative value, motion coherence and
choice on LIP activity, it does not capture any differences that might exist in
how value and motion coherence are presented within a given choice.
Considering our qualitative assessment above, we know there are likely to
be significant differences between T1 and T2 choices. Additionally, the
preceding model (Equation 7) had two factors, choice and coherence, which
are highly correlated. Including correlated factors as co-regressors in
regression models can produce inaccurate results. To address both these
issues, we dropped the choice factor from the model (Equation 7) and then
separately applied this abbreviated model to trials, resulting in T1 and T2
choices.
The results of this analysis for monkey A and monkey T are presented
in Figures 16 and 17, respectively, in a format similar to Figure 15. Figure
16a depicts for monkey A, the average (± s.e.m) value of βt1 for T1 choice
(solid) and T2 choice (dashed). These lines are identical until the start of the
delay period, after which they differ significantly. The representation of
absolute value is larger for T2 (dashed) then for T1 choices, an effect clearly
visible in the average firing rates depicted in Figure 11a.
Figure 16b depicts for monkey A, the average (± s.e.m) results for βt2,
which represents relative value, for T1 choices (solid) and T2 choices
(dashed). The results for T1 and T2 choices differ significantly during
72
73
several points in the trial. They first diverge at the peak representation of
relative value, when the motion epoch begins. This indicates that the effect
of relative value is greater on T2 choices then T1 choices. Close
examination of mean firing rates in Figures 12a and 13a, however, reveal no
difference in the representation of relative value for T1 and T2 choice. It is
possible the model is capturing the larger effect of the LH reward conditions
relative to the HL conditions. The LH condition is composed more of T2
than T1 choices. The representation of relative value also differs during the
delay period. For T1 choices, the representation of relative value converges
on zero before the end of the motion epoch and remains at zero throughout
the delay epoch. For T2 choices, however, the representation inverts at the
end of the motion period and remains significantly positive through the
delay period.
74
The effects of motion coherence also differ for T1 and T2 choices.
Figure 16c plots the average (± s.e.m) value of βcoh for T1 (solid) and T2
(dashed) choices. As expected, the effect of motion coherence is greater for
T2 choices during the second half of the motion epoch. Note also that for
T1 choices the average coefficient significantly dips below zero midway
though the motion epoch, while for T2 choices it dips below zero at the start
of the delay epoch. In fact, these trends are visible upon close inspection of
the average firing rates, depicted in Figures 14a-d. Despite these odd
gyrations, however, the effects of coherence are weakly present throughout
the delay period for both T1 and T2 choices.
Figures 17a-c plot the results of this analysis for Monkey T. Figure 18a
depicts the average (± s.e.m) value of βt1 for T1 choice (solid) and T2
choice (dashed). Except for a brief period at the end of the motion epoch,
Monkey T’s LIP population does not represent absolute value differently for
T1 and T2 choices. Figure 17b depicts the average (± s.e.m) value of βT2
for T1 choice (solid) and T2 choice (dashed). Like monkey A, the
representation of relative value inverts as the trial progresses. However,
unlike monkey A, the inversion for Monkey T occurs for T1 choices midway
through the motion period. Similarly, Monkey T shows a greater effect of
coherence for T1 choices than T2 choices as depicted in Figure 17c, which
depicts the average (± s.e.m) value of βcoh for T1 choice (solid) and T2
choice (dashed).
3.3.2.7 Quantifying coherence within reward condition
Recall that in Section 2.3.3, we demonstrated that the monkeys show no
significant difference in psychophysical sensitivity across the four reward
75
76
conditions. This indicated that relative value results in the observed
behavioral bias by contributing an additive offset on neurophysiological
accumulation of current sensory evidence. As discussed in greater detail in
Section 3.4.2, it is possible, however, that relative or absolute value effects
the rate at which sensory evidence is accumulated. To determine if the
effect of motion coherence on LIP activity depends on reward condition (HH,
LL, HL or LH), we modeled the effect of coherence separately for each
reward condition while controlling for choice. The preceding analysis (Figs.
15a and 15b) indicates that the effect of motion coherence on LIP activity is
confined to the second half of the motion epoch (750 ms to 1000 ms from
target onset). Therefore we focused our analysis on this temporal window.
We applied a modified version of the model presented in Equation 7 to the
mean firing rate in the 250 ms time window at the end of the motion epoch.
77
In this model, surmised in Equation 8, we have removed the factors for
T1val and T2val.
Equation 8:
Frequency histograms of the resulting βcoh values are plotted in Figure
18a (monkey A) and 18b (monkey T). Results are plotted for the HH (red),
LL (blue), HL (green) and LH (black) reward conditions, for T1 choices
(upper histogram) and T2 choices (lower histogram). For each monkey and
choice we performed a one-way anova to identify differences in βcoh
frequency among reward conditions. For both monkey A and monkey T, we
detected no significant effect of reward condition on βcoh frequency for
either T1 or T2 choices (monkey A: T1, p=0.3865, T2, p=0.1353; monkey
T: T1, p=0.5883, T2, p=0.7675).
3.3.3 Do individual LIP neurons integrate sensory and value
information?
The results of this model confirm that, on average, both LIP
populations are similarly and dynamically representing absolute value,
relative value and motion coherence, and that most of the trends visible in
the average firing rate can be verified quantitatively with our linear
regression model. Each of these factors, however, might be encoded
exclusively by separate sub-populations within our selected LIP population.
To determine if single neurons are modulated by all three factors, we asked
what percentage of neurons within a given task epoch had significant
FR (t) = β0 + βcoh(COH) + βchoice(CHOICE)
78
coefficients for the various combinations of these three factors.
Figures 19a-f plot a series of Venn diagrams depicting the possible
intersections of these three sets of coefficients. In these figures the red circle
represents the βt1 set, the blue circle represents the βt2 set, while the black
circle represents the set of βcoh coefficients. The overlapping areas of these
circles de-mark elements that these sets have in common. Within each of
these areas, we report the percentage of neurons belonging to this subset
(Equation 7, applied to average activity in each epoch, βt1val, βt2val and
βcoh significantly different from 0). Note that these percentages do not sum
to 100%, as some neurons are not significantly modulated by any of these
Monkey A Monkey T
ßcoh
CountT2 T2
LHHLLLHH
LHHLLLHH
Figure 18a-b. The effect of motion coherence is independent of reward con-dition.a-b
-
a b
79
29.0%
32.2%
3.2% 3.2% 0%
3.2%9.6%
51.6%
6.4%
0% 0% 6.4%
6.4%0%
7.5%
43.3%
11.3% 1.8% 0%
9.4%20.7%
16.9%
64.1%
3.7% 0% 1.8%
5.6%1.8%
32.0%
20.7%
5.6% 3.7% 0%
15.0%5.6%
35.4%
22.5%
9.6% 3.2% 3.2%
3.2%3.2%
ßt2 ßcoh
ßt1
ßt2 ßcoh
ßt1
ßt2 ßcoh
ßt1
ßt2 ßcoh
ßt1
ßt2 ßcoh
ßt1
ßt2 ßcoh
ßt1
Motion epoch
Early delay epoch
Late delay epoch
Monkey A Monkey T
Figure 19a-f. Individual LIP neurons integrate sensory and value informa-tion.a-f Venn diagrams depicting the possible intersections of three sets of coeffi-
areas we have reported the percentage of neurons belonging to this subset.
a b
e f
c d
80
factors within a specific epoch.
Figures 19a and 19b plot the results for monkeys A and T, respectively,
for the reward cue period. In this epoch we can see that 64.1% of neurons in
monkey A’s population represented both T1val and T2val while only 6.4%
of monkey T’s population represented both factors. While a small fraction
of monkey A’s population were modulated either only T1val or only T2val,
a large portion (51.6%) of monkeys T’s population was modulated by only
T1val. Figures 14c and 14d similarly plot the results for the motion stimulus
epoch. Note that in this epoch, while some neurons are encoding a single
factor, a large portion (Monkey A: 75.2%; Monkey T: 48.2%) are encoding
combinations of two or more factors. In the late delay epoch, Figures 19e
and 19f, many neurons (Monkey A: 45%; Monkey T: 32.1%) in both
populations continued to represent two or more factors. These results
indicate that most neurons are multiplexing absolute reward, relative reward
and motion coherence signals.
3.3.4 Common Currency
The preceding behavioral analysis (Chapter 2) revealed that on relative
reward trials (HL and LH), the monkeys are biased towards choosing the
target with the greater relative value. Additionally, our bEVS analysis
quantified the magnitude of this bias in units of motion coherence, thereby
establishing a quantified equivalence between relative value and motion
coherence. Simply put, it revealed that a relative increase in 1 unit of reward
is, on average, behaviorally equivalent to 14.7% (monkey A) and 16.3%
(Monkey T) motion coherence. If information converging on LIP is
integrated into a scale dependent on its common influence on behavior, we
should be able to uncover an equivalence between relative reward value and
81
motion coherence on the neural level comparable to the one observed
behaviorally. This predicts that the modulation of LIP from relative rewards
should be equal to the modulation produced by 14.7% coherence. Thus, if
LIP encodes information in a common currency, the neural equivalent visual
stimulus (nEVS) should be equal to the behavioral equivalent visual
stimulus (bEVS).
Given that our physiological model (Equation 7) and our behavioral
model (Equation 1) are very similar, we would like to define nEVS as we
defined bEVS (Equation 3) and then compare the two. However, two
reasons prevent us from directly comparing these models’ coefficients. First,
Equation 1 is a logistic model of the log odds of a T1 choice, while Equation
7 is a linear model of mean FR. Second, Equation 7 has a factor for choice
that is not present in Equation 1. Thus, each model’s coefficients represent
fundamentally different quantities. We addressed the first issue by modeling
the log odds of an occurrence of a spike with logistic regression rather than
modeling the average firing rate with linear regression. We addressed the
second issue by dropping the choice factor from the model. The scientific
justifications for dropping this term from our model and the implications
thereof are addressed in greater detail in the discussion (Section 3.4.3), and
potentially significantly impact the interpretation of the following results.
Despite this caveat, dropping choice from the model gives us two logistic
models, one for the behavior and one for the physiology. Because these
models have similar factors, we can similarly define and directly compare
nEVS and bEVS. This new model is defined in Equation 9:
Equation 9:
82
Where s is the observed probability of a spike occurring, βcoh, βt1 and βt2
are the fit coefficients representing the effect of motion coherence and target
value on this probability.
Applying this model over a 50 ms window progressively slid, in 1 ms
intervals, across the duration of a trial generates a time vector of coefficients
(βcoh, βt1, βt2) for each neuron in the population describing which factors
influence at that time point. In Figures 20a and 20b we see this model’s
results for monkey A and T respectively, starting at the motion epoch, in a
similar fashion to Figure 15. In these figures the red curves represent the
average βt1 value, the blue the average βt2 value and black the average βcoh
value.
The consequences of removing choice from the physiological model
can be seen by comparing Figures 20a and 20b with Figures 15a and 15b,
respectively. While we cannot directly compare the magnitude of these two
sets of coefficients, we can compare both their magnitudes relative to each
other and their general time-course. Note that the βcoh (black) in this model
(Equation 9) continues to influence LIP activity through the delay epoch,
unlike the model (Equation 7) containing a choice factor. This indicates that
the coherence term now captures a portion of the variance previously
captured by choice. The βt2 term represents relative value (blue), which in
this model (Equation 9) continues to influence LIP through the delay epoch,
indicating it captures a portion of the variance previously captured by choice.
Removing the choice term has little effect on the βt1 term (red), which
influences LIP activity through the motion and delay epoch in both models.
ln
�s
1− s
�= β0 + βcoh(COH) + βt1(T1val) + βt2(T2val)
83
1000500 -350 0
0
-0.2
0.2
0.4
Meancoe!cient
Time from target onset (ms) Time from saccade (ms)
1000500 -350 0
0
-0.2
0.2
0.4
Meancoe!cient
Time from target onset (ms) Time from saccade (ms)
Monkey A
Monkey T
Figure 20a-b. To calculate nEVS we modeled the log odds of a spike occurring with logistic regression model without a factor for choice.a-b
a
b
Motion epochepoch epoch
84
The factors in Equation 9 are defined in a manner identical to those in
Equation 1, allowing us to define nEVS with Equation 10:
Equation 10:
The preceding analysis indicates that LIP neurons multiplex absolute
reward, relative reward and motion coherence (Figure 19). Although the
same is true when this analysis is repeated with Equation 8 rather then
Equation 7, the percentage of neurons simultaneously and significantly
modulated by absolute reward, relative reward and motion coherence in the
late delay period are larger (Monkey A: 56.6%, Monkey T: 35.48%).
These results are depicted in Venn diagrams plotted in Figures 21a and 21b.
In the following analysis we will focus on this subset of our LIP populations.
9.67%
0%
0% 3.2% 12.9%
29%35%
5.66%
5.66%.
0% 9.43% 7.54%
11.32%
56.5%
ßt2 ßcoh
ßt1
ßt2 ßcoh
ßt1
Late delay epoch
Monkey A Monkey T
Figure 21a-b. Venn diagrams depicting that larger percentage of neurons are simultaneously and significantly modulated by absolute reward, relative reward and motion coherence in the late delay epoch. Venn diagrams are similar to those in Figure 20.
a b
nEV S =βt1− βt2
βcoh
85
Figures 22a (Monkey A) and 22b (Monkey T) plot frequency
histograms of the nEVS values for this sub-population of LIP neurons. The
means (±s.e.m) of these distributions are denoted with solid green lines. The
mean (±s.e.m) bEVS for the behavioral data collected with these neurons is
denoted with red lines. For Monkey A, the mean nEVS was 22.35% (s.e.m,
±4.08) coherence, while the mean bEVS was 15.35% (s.e.m, ±1.05)
coherence. For Monkey T the mean nEVS was 26.61% (s.e.m, ±6.47)
coherence and the mean bEVS was 18.71% (s.e.m, ±1.84) coherence. A
paired t-test of each monkey’s data reveals no significant difference
(Monkey A: p=0.3732; Monkey T: p=0.2543) between the bEVS and
nEVS means. Assuming we are justified in removing choice from our
model (see discussion Section 3.4.3), these results indicate that this LIP sub-
population integrates reward information and motion coherence in a
common currency.
3.3.5 Population heterogeneity
While the preceding analysis focused on LIP average activity, it is
important to note that within these populations, there is a small degree of
heterogeneity. Here we will present some single cell examples from both
LIP populations, representing some of the unique activity profiles we have
observed. The following data are presented in identical manner to Figures
11, 12 and 13. However, in these figures we are presenting the results from
all four reward conditions (HH in red, LL in blue, HL in black and LH in
86
green) and both choices (into RF, solid lines; out of RF, dashed lines)
simultaneously.
Figure 23a plots a single cell example from Monkey A, with a response
profile nearly identical to the population average. Note that the
representation of absolute value (HH, red vs. LL, blue) that emerges soon
after the presentation of the reward cue diminishes as the motion cue period
begins and reemerges later in the trial. Also, note that the representation of
relative value (HH, red vs. HL, black; and LL, blue vs. LH, green) emerges
at the end of the reward cue period, grows though the beginning of the
20 40 60
1
2
3
4
0 100
10
20
Count
Monkey A
nEVS (% coherence)
Monkey Ta b
Figure 22a-b. LIP integrates reward information and motion coherence in a common currency.a-b Frequency histograms of the nEVS values resulting from Equation 10. The distribution means are denoted with the sold green lines (±sem, dashed). The mean bEVS, for the behavioral data collected with these neurons, is denoted with the solid red line (±sem, dashed). For Monkey A the mean nEVS was 22.35% (sem ±4.08) coherence and the mean bEVS was 15.35% (sem ±1.05) coherence. For Monkey T the mean nEVS was 26.61% (sem ±6.47) coherence and the mean bEVS was 18.71% (sem ±1.84) coherence. A paired t-test of each monkey!s data reveals no significant difference (Monkey A: p=0.3732; Monkey T: p=0.2543) between the bEVS and nEVS means.
87
motion cue period and then diminishes as the trial progresses. Figure 23b
plots a similar cell from Monkey T’s population. Note, in this neuron the
representation of absolute reward (difference between HH, red line and LL,
blue line) persists relatively equally through the entire trial for T1 choices
while fluctuating slightly for T2 choices.
Figure 23c depicts a neuron from Monkey A’s population which is not
modulated by choice. This neuron, however, like all others in the population,
was selected based on its choice predictive activity during the delayed
saccade task (presented above). Figure 23d plots this neuron’s activity
during the delayed saccade task. Here we are plotting this neuron’s mean
response during the delay epoch (radius) as a function of target position
(angle). This neuron fired at ~40 spikes per second when a target was
presented at 180° (where T1 was placed during the discrimination task), but
fired at ~10 spikes per second when the target was presented at 0° (where T2
was placed during the discrimination task). Thus, while this neuron was
apparently modulated by choice in the delayed saccade task, it was not in the
context of the discrimination task. Note, however, this neuron briefly
represented absolute value. During the delay epoch, its response to all
reward conditions is ~35 spikes per second, the same level of activity
observed in the delayed saccade task. This indicates the response in the
delayed saccade task was driven by the target’s value. Figure 23e depicts a
neuron from Monkey A’s population with an opposite pattern of activity.
This neuron represents neither absolute nor relative reward during the
motion discrimination task, but is instead modulated by choice.
88
Meanresponse(spikes/sec.)
40
80
0 250 500 1000 -350 0
Monkey A 110804_2a
Time from target onset (ms) Time from saccade (ms)
Meanresponse(spikes/sec.)
20
40
0 250 500 1000 -350 0
Monkey T 111507_2a
Time from target onset (ms) Time from saccade (ms)
a
b
Figure 23a-b. Examples of single cells with responses nearly identical to their population average.a-b Mean LIP firing rate as a function of time, for the HH (red) and HL (black) reward conditions. Data are plotted separately for T1 (solid) and T2 (dashed) choices. In the left panels responses are aligned to the target onset, while in the right responses are aligned to saccade time.
Targetepoch
Rewardepoch
Motion epoch Early delayepoch
Late delayepoch
89
Monkey A 120803_2a
Meanresponse(spikes/sec.)
30
60
0 250 500 1000 -350 0Time from target onset (ms) Time from saccade (ms)
60
240
120
300
180 0102030
c
d
Figure 23c-d. A Single cell demonstrating no choice-related activity in the discrimination task, despite being well tuned in the delayed saccade task.c Mean LIP firing rate as a function of time, similar to 23a-b. Note the solid and dashed lines are overlapping indicating this neuron did not represent the impending choice during the discrimination task. It did, however, have strong delay period activity in the delayed saccade task. d Mean response (radius) of this same neuron during the delay epoch of the delayed saccade task as a function of target position (angle).
90
3.4 Discussion
The primary goal of these experiments was to determine if and when
single LIP neurons represent relative reward value, absolute reward value
and motion coherence. We further endeavored to determine if LIP integrates
these factors in a common currency. We have molded the firing rates of
single LIP neurons as a function of these factors and successively applied
this model across the duration of the experimental trial. This analysis has
revealed that LIP neurons simultaneously represent these factors, that this
representation is highly dynamic and might occur in the context of a
common currency.
3.4.1 The dynamic representation of absolute value, relative value,
Meanresponse(spikes/sec.)
15
30
0 250 500 1000 -350 0
Monkey A 12003_1b
Time from target onset (ms) Time from saccade (ms)
eTargetepoch
Rewardepoch
Motion epoch Early delayepoch
Late delayepoch
Figure 23e. A Single cell demonstrating choice-related activity only. This neuron represents neither absolute nor relative reward during the motion discrimination task, but is instead modulated by choice during the late delay epoch.
91
coherence and choice.
The preceding analysis demonstrates that LIP neurons initially respond
to our task with a rapid representation of the absolute value of the target in
the response field. Within 200 ms, this representation is then augmented by
the value of the target outside the response field and LIP comes to
additionally represent the target’s relative value. Targets of greater absolute
and relative value are represented in LIP with greater firing rates.
Importantly, the representation of relative value is clearest at the start of the
motion epoch and therefore ideally positioned to effect the integration of the
forthcoming motion information (discussed below). As the motion epoch
develops, however, the representation of both relative and absolute value
fade. As these value signals fade, LIP neurons become strongly modulated
by the monkeys’ forthcoming choice.
This representation of choice quickly dominates LIP responses and
persists through the time of the saccade. Within this representation of choice,
neurons are modulated by the specific coherence of the motion stimulus.
This modulation is brief and largely confined to the second half of the
motion epoch. As the motion epoch ends, the representation of relative
value is largely gone, but the representation of absolute value remains. The
delay epoch’s LIP activity represents the absolute value of the target in the
response field and predominantly represents choice, irrespective of the
coherence or relative value supporting it.
3.4.2 Relation to the integrator/ accumulator model of decision making
These results are very consistent with the integrator model of decision
making presented by Mazurek and Shadlen (48). In this model, LIP
accumulates a decision variable up to a threshold. This decision variable can
92
be thought of as the motion coherence, or the weight of evidence supporting
each alternative. As discussed in Chapter 2, Gold and Shadlen (21) also
suggest that this decision variable is proportional to an option’s logLR and is
thus capable of incorporating factors additively. In the context of a two-
alternative, forced-choice, motion discrimination task, the difference
between opposing-direction motion-signals from the sensory cortex is the
posited physiological substrate of the decision variable. This difference
signal is accumulated by LIP neurons representing each alternative (located
in their RF) until a threshold is crossed. It is the crossing of this threshold
that is the presumptive representation of a choice.
Our model predicts that relative value biases choice by adjusting how
quickly the decision variable crosses the threshold. Relative value can
accomplish this by influencing one or more of three possible model
parameters: the accumulator’s initial state, the rate of accumulation or the
threshold’s height. Our physiological results support a model in which
relative value imposes an additive offset to the accumulator’s initial state,
without adjusting the rate of accumulation (Section 3.3.2.7, Figs. 18a-b).
This result is compatible with the behavioral analysis presented in Chapter 2,
demonstrating that relative value additively affects the probability of
choosing T1 without effecting psychophysical sensitivity (Section 2.3.3 and
Figs. 5a-b).
3.4.2.1 Relative value imposes an additive offset to the accumulator’s
initial state
Recall from the preceding analysis that LIP’s peak representation of
relative value is at the start of the motion epoch (Figures 15a-b, ~500 ms). If
relative value influences the accumulator’s initial state, then this is when we
93
would expect to see its effect. We can estimate how much of an offset the
HL reward condition introduces, in terms of neural activity, by solving
βt1(T1val) + βt2(T2val) from Equation 7, with the coefficients from the start
of the motion epoch. Based on the results depicted in Figures 16a and 16b,
we estimate that at the start of the motion epoch relative value introduces an
offset of approximately 4.3 (monkey A) and 3.4 (monkey T) spikes per
second. These estimates are very reasonable given the average firing rates
depicted in Figures 12a-b.
Hanks and colleagues (28) artificially introduced an additive off-set to
the accumulator by microstimulating LIP neurons during a reaction time
version of the motion discrimination task. They report that LIP stimulation
introduces a slight choice bias, equal to a bEVS of ~2.85% coherence, with
the effects on reaction timeequal to ~4.65% coherence. In contrast,
stimulation of MT in the same monkey performing the same task results in a
much larger choice bias (12). The authors then conclude that stimulation of
MT increases the evidence supporting a decision, while stimulation of LIP
offsets the accumulator. Intuitively, they argue that while the local effect of
stimulation on MT is small, it is also constant, and because this stimulation
effect is temporally accumulated in LIP, the total effect on choice is
substantial. In contrast, while the effect of stimulating LIP is also small and
constant, it does not benefit from temporal accumulation and therefore has
only a small effect on choice (see 24 for review). To quantify this intuition,
they first assume that stimulation in LIP introduces an offset of ~5 spikes per
second (sps). (This assumption is based on the previously observed effects
of MT stimulation in the same monkey; 12). They then model the effect of a
5 sps additive offset to LIP and demonstrate that the result provides a good
fit to the observed behavior.
94
In summary, they equate an additive offset of ~5 sps with a bias of
~2.85% coherence. We observe that relative reward adds an offset of ~4.3
(monkey A) and ~3.4 (monkey T) sps, and resulting biases are equal to
~15% (monkey A) and ~17% (monkey T) coherence. The limited effect of
microstimulation, compared to our explicit reward cue, is likely a result of
the stimulation’s limited capacity to affect the entirety of LIP’s decision-
related network.
In addition to an additive offset, it is also possible that relative value
drives the decision variable to cross the threshold earlier by increasing the
rate of accumulation for relatively high value targets. If this is so, then for
example, we would expect the coherence effects to be greater for a T1
choice in the HL conditions than in the LH conditions. We see no
significant effect of reward condition on the distributions of βcoh
frequencies within T1 or T2 choices for either monkey (Section 3.3.2.7, Figs.
18a-b). This indicates that the rate at which the decision variable is
accumulated is independent of relative value. In Chapter 2 we similarly
found that the monkeys’ psychophysical sensitivity to coherence is
independent of relative value (Section 2.3.3 and Figs. 5a-b).
The final possibility is that relative value effects the threshold’s height.
Increasing an option’s relative value should lower its threshold, causing the
decision variable to reach the threshold more quickly. However, estimating
the threshold height, or the time at which the decision variable crossed it, is
not possible within this experimental design. In these experiments, the
motion stimulus is presented for a fixed duration of 500 ms and the monkey
is forced to delay reporting his choice. Thus, we have no means of
determining when the decision variable crosses the threshold. If this task
were modified to emphasize reaction time and permit subjects to report
95
choices freely, we could take the reaction time as surrogate for the boundary
crossing (54).
3.4.2.2 Coherence effects are consistent with the integrator model
Our coherence effects are both reasonably consistent with the integrator
model and similar to previous reports. The most comparable previous study
is Shadlen and Newsome (65), who reported their range of coherence (0%-
51.2%) increased LIP activity by 2.7 spikes per second for T1 choices and
4.2 spikes per second for T2 choices. Roitman and Shadlen (54, fixed-
duration experiments) report slightly larger modulations for the same
coherence range: 13.2 and 5.2 spikes per second for T1 and T2 choices,
respectively. Based on our model of LIP activity (Equation 7; βcoh), we
find that our range of coherence (0%-48%) modulates LIP activity by 2.0
spikes per second (monkey A) and 0.78 spikes per second (monkey T)
across both choices.
The integrator model predicts that these coherence effects should be
largely confined to the end of the motion epoch and should, ideally, be
absent in the delay epoch activity of T1 choices, but present in the delay
epoch activity of T2 choices (48). Indeed, our results (Section 3.3.2.6, Figs.
15-b, 16c and 17c) demonstrate that the effects of coherence are largely
confined to the second half of the motion epoch. Our results are slightly less
clear regarding delay period activity.
Mazurek and Shadlen (48) argue that the difference between T1 and T2
delay epoch activity is a result of the accumulation process. They argue that
for T1 choices, the observed accumulator (i.e. the LIP cell under study, with
T1 in the RF) must have crossed the decision threshold at some point during
the motion epoch. Thus, its delay activity should be pegged at the
96
threshold’s height. For T2 choices, however, the observed accumulator does
not reach the threshold (thus, T1 is not chosen). The delay period activity
after a T2 choice should then be pegged at some value below the threshold,
with this value a function of coherence. In both monkeys, however, we find
a very weak effect of coherence during the delay period. Similarly, weak
effects of coherence were reported during the early delay epoch by Roitman
and Shadlen (54, fixed-duration experiments) and Shadlen and Newsome
(65). For monkey T, the effects of coherence during the delay period are
larger for T1 choices then T2 choices (Figure 17c), while for monkey A they
are very similar for T1 and T2 choices (Figure 16c).
3.4.2.3 Effects of absolute value are not predicted by the integrator model
The integrator model does not predict the strong and persistent effects
of absolute value found throughout our trial. The integrator model
emphasizes relative value’s effect on integration (21, 24) because relative
value, not absolute value, influences behavior. The representation of
absolute value through the motion and delay epoch, however, does not
exclude the integrator model.
During the motion epoch, the increased activity for trials of greater
absolute value would also serve as an offset to the accumulators. In the HH
condition, this offset would be equally applied to both targets and would
drive both option’s integrators closer to their respective thresholds. This
leads to two predictions.
First, because all responses should cross the threshold sooner, reaction
times should be shorter for all choices on HH trials. We are unable to test
this prediction directly because this is not a reaction time task. As a
surrogate, in Chapter 2 we measured saccade latency (Section 2.3.5; Figs.
97
9a-b) and found no systematic effects of absolute value. Conversely, the
second prediction is that on LL trials all coherence should be less likely to
reach the threshold. Consequently, in the LL reward condition, responses to
both targets should be less likely than in the the HH reward condition and
the monkey should be more likely to saccade to some other region of space.
This prediction is supported by the analysis of no-choice trials presented in
Chapter 2. We demonstrated that across all coherences, both monkeys were
less likely to choose either low value target in the LL reward condition
(Section 2.3.4; Figs. 7-8).
However, the persistent representation of absolute value during the
delay epoch is more difficult to reconcile with an integrator model. Like
coherence, the effects of both absolute and relative value should be absent
during the delay epoch for T1 choices, but present for T2 choices. While
both monkeys show effects of absolute value during the delay period, for
monkey A, this effect is larger for T2 choices than for T1 choices (Section
3.3.2.6; Figs. 16a-b). This difference, however, is not visible for monkey T
(Section 3.3.2.6; Figs.17a-b).
3.4.3 Does LIP integrate sensory and value information in a common
currency?
In the preceding analysis, we attempted to demonstrate that in the late
delay period LIP comes to integrate sensory and value information in a
common currency (Section 3.3.4; Figs. 20-22). As discussed above, our
behavioral analysis established a quantified equivalence between relative
reward value and motion coherence. Specifically, it revealed that a relative
increase in 1 unit of reward is, on average, behaviorally equivalent to an
increase of 14.7% (monkey A) and 16.3% (monkey T) motion coherence
98
(Section 2.3.1; Figs.3a-b). We posited that if information converging on LIP
is integrated in a scale dependent on its common influence on behavior, we
should be able to uncover equivalence between relative value and motion
coherence on the neural level comparable to the one observed behaviorally.
This predicts that the modulation of LIP from relative value should be equal
to the modulation produced by 14.7% coherence.
It is important to note that, at the behavioral level, coherence has a
greater effect on the probability of a T1 choice than relative reward does.
However, the results of our full physiological model (Section 3.3.2.5;
Equation 7) reveal that absolute value, relative value and choice all have a
greater influence on LIP firing rate then does coherence (Figs.15a-b).
Consequently, attempts to find equivalence between bEVS and nEVS using
the factors in Equation 7 failed, even when we modeled LIP activity in terms
of the log odds of a spike. Upon dropping choice from our model, however,
the magnitude of βcoh grew as it accounted for variance previously captured
by choice. Only after this manipulation did our model generate βcoh values
greater then βt1 and βt2 values, and became capable of generating nEVS
values similar to bEVS values. Given, however, how clearly LIP is
representing the monkey’s impending choice, this is not a justification for
dropping the choice factor from the model.
The intellectual justification for removing choice from the model rests
on an assumption that the delay epoch activity in LIP plays a casual role in
choice. In these experiments, we practically defined choice by which way
the monkey moved his eyes at the end of the delay period. Thus, before the
saccade, choice is a post-hoc factor and illogically included as a predictive
factor of LIP activity. Under this assumption, delay epoch activity
represents a continually evolving decision that is not truly a choice until the
99
time of the saccade.
If, however, choice is defined by when the decision variable crosses the
threshold, then the differential LIP activity during the late delay epoch
(following the threshold crossing) may be influenced by the choice itself. In
this case, it would be important to include choice as a predictor of LIP
activity. Under this assumption, a choice state has already been reached, so
delay epoch activity represents a consequence. Including choice as a
predictive factor is also logical if LIP reflects a decision process occurring in
another brain area, such as the frontal cortex or the superior colliculus, and
simply mirrors the choice developing elsewhere (33, 38). Our physiological
data, however, cannot reveal exactly when the threshold is crossed, or if LIP
simply reflects a decision process rather than implementing one. Still, it is,
reasonable to assume that the threshold is crossed during the motion epoch.
If choice does indeed cause the differential T1 and T2 delay epoch
activity, then it must be included in the model and our estimate of nEVS
(Equation 10) would be fundamentally misleading. Rather then expressing
the influence of relative value in terms of coherence, it is expressing
influence of relative value, largely in terms of choice, and thus there is no
basis to compare nEVS and bEVS. Under this assumption, delay epoch
activity represents only the impending choice and its absolute value.
This causality conundrum can be momentarily side-stepped by focusing
on the representation of relative value at the start of the motion epoch and
the representation of coherence at the end of the motion epoch. Recall that
our core common-currency prediction is that the modulation resulting from
increase in relative value should be equal to the modulation produced by
14.7% coherence. If we assume the behaviorally relevant representation of
relative value is at the start of the motion epoch we can, using Equation 7,
100
calculate that relative value produces an increase of 4.35 spikes per second
(monkey A). Using Equation 7 and controlling for all other factors, we can
further determine what coherence produces a similar increase in firing rate.
Taking the peak value βcoh during the motion epoch (2.0, monkey A), we
can determine that a coherence of approximately 104% would, on average,
result in a modulation of LIP equivalent to this increase in relative value.
3.4.4 Representation of value and probability of choice in LIP
Recently, two similar studies of value signals in LIP came to two very
different conclusions as to if and how LIP represented value. A study by
Dorris and Glimcher (13) concludes that LIP represents a pure relative value
signal, while Sugrue and Newsome (70) conclude that LIP represents local
relative value in a manner indistinguishable from the local probability of
choice. Our results indicate that LIP is a highly dynamic representation of
value in which an initial representation of absolute value is transformed to a
representation of relative value at the start of the motion epoch and then, as
choice related activity develops, returns to representing absolute value and
choice. Our results are difficult to fully reconcile with either of these
previous studies.
Dorris and Glimcher trained monkeys in a free-choice paradigm in
which monkeys chose between a “safe” target, consistently delivering a
small reward and an alternative, “risky” target, probabilistically delivering a
large reward. In response to changes in this probability, monkeys adjusted
their frequency of selecting the risky target to an optimal level (in terms of a
Nash equilibrium) in which each option’s “subjective desirability” was
equalized across a block of trials. Subjective desirability is a measure of an
option’s value multiplied by the probability that it will be realized.
101
Dorris and Glimcher first reproduce an earlier finding (54) that LIP
neurons encode an option’s relative value during instructed saccades, until
LIP begins to encode the impending saccade, at which point the
representation of relative value disappears. Then, to demonstrate that LIP is
representing subjective desirability, they place the risky target in the RF of
an LIP neuron and show that its activity was invariant, despite the monkeys
fluctuating probability of choosing the option in the RF. They argue that
subjective desirability is behaviorally constant and that LIP activity is
constant, thus, the latter represents the former. Additionally, they double the
magnitude of the reward for both options and show that LIP does not
respond to this increase. Based on these two observations, they conclude
that LIP essentially represents only the relative value of the target in the RF
regardless of the probability of choosing it.
Sugrue and Newsome (70) trained monkeys on a free choice task in
which the two alternative targets are rewards with probabilities that change
between blocks. They show that monkeys similarly adjust their probability
of choosing a given target to “match” the fraction of rewards recently
experienced from that option. Because the overall probability of reward is
constant, one option always has a greater relative value then the other.
They demonstrate, in contrast to Dorris and Glimcher, that LIP clearly
represents the monkey’s impending choice towards or away from the RF.
Further, within these representations of choice LIP is finely modulated by
the monkey’s locally calculated subjective estimate of the target’s relative
value. This graded representation of relative value is preset in the delay
epoch activity of both T1 and T2 choice and persists through the time of the
saccade. Based on this graded representation, on the task’s logic, which
requires a spatial remapping of value on each trial, and on their behavioral
102
model in which local, relative value directly generates the probability of
choosing a target, they conclude that LIP largely represents the probability
of choice. Sugrue et al. (73) further argue that Dorris and Glimcher fail to
find LIP activity representing local value and hence the local probability of
choice largely because they only analyze T1 choice, thus emphasizing trials
on which this option had a high local value.
Consistent with Dorris and Glimcher, we find a strong relative value
signal that fades as the saccade is cued. While we do not explicitly instruct a
saccade, as Dorris and Glimcher do, our motion stimulus is an instructing
cue. In contrast to Dorris and Glimcher, however, we find a very strong and
consistent representation of absolute value. Their failure to find an absolute
value signal could result from LIP normalizing its representation of absolute
value within a block. In our experiments, trials of different absolute value
are randomly interweaved, potentially encouraging a more dynamic
representation of value.
In contrast to Sugrue et al., our results fail to support a role for LIP in
representing local probability of choice either across or within reward
conditions. First, recall from Chapter 2 that absolute value has no effect on
the monkey’s probability of choosing T1. Thus any representation of
absolute value in LIP undermines a representation of probability of choice
across reward conditions. Consider, for example, the monkeys probability
of choosing T1 at 0% coherence, which is equal for both HH and LL trials.
Yet because of LIP’s strong representation of absolute value, its activity will
clearly differentiate these conditions. Second, the probability of choice is
largely a function of the motion coherence, which only briefly influences
LIP activity at the end of the motion epoch.
103
3.5 Summary
These experiments demonstrate that single LIP neurons simultaneously
represent relative value, absolute value and motion coherence. By modeling
the firing rates of single LIP neurons as a function of these factors and
successively applying this model across the duration of the experimental
trial, we demonstrate that this representation is highly dynamic.
LIP neurons initially respond with a rapid representation of absolute
value, which is then augmented by the value of the target outside the
response field and comes to represent the target’s relative value. Targets of
greater absolute and relative value are represented with greater firing rates.
Relative value is strongly represented at the start of the motion epoch. As
the motion epoch develops, the representation of both relative and absolute
values fade and LIP neurons become strongly modulated by the monkey’s
forthcoming choice.
This representation of choice quickly dominates LIP response and is
modulated by the specific coherence of the motion stimulus. This
modulation is brief and largely confined to the second half of the motion
epoch. As the motion epoch ends, the representation of relative value is
largely gone, but the representation of absolute value remains. Throughout,
the delay epoch’s LIP activity represents the absolute value of the target in
the response field and predominantly represents choice, irrespective of the
coherence or relative value supporting it.
These results are very consistent with the integrator model of decision
making presented by Mazurek and Shadlen (48). Relative value’s
prominence at the start of the motion epoch indicates it introduces an
additive offset to the integration of the forthcoming motion information.
Our physiological and behavioral results support a model in which relative
104
value adjusts the accumulator’s initial state, without adjusting the rate of
accumulation. The offset imposed by relative value is similar in magnitude
to the offset imposed by Hanks and Shadlen (28) with microstimulation.
However, we observe greater choice bias.
We attempted to demonstrate that in the late delay period, LIP
represents sensory and value information in a common currency. By
common currency, we mean that information converging on LIP is
integrated in a scale depending on its common influence on behavior. This
model allowed us to directly compare our distribution of bEVS values to a
distribution of nEVS values and to demonstrate that their means do not
significantly differ. Our model, however, assumes choice does not causally
affect delay period activity. The validity of this assumption is a subject
demanding further attention and discussion and until it is verified, we must
consider these results specious.
In total, these results support LIP’s role in decisions requiring the
integration of sensory and value information. With some exceptions, LIP
simultaneously represented sensory and value information in a manner
similar to how it represents these factors alone. As previously mentioned,
the DLPFC is also independently modulated by sensory and value
information. Additionally, when studied independently, reward value and
motion coherence similarly modulated DLPFC and LIP. This result
supports the proposal that LIP is part of a decision-related network spanning
several cortical areas including the DLPFC. This proposal predicts that
neurons in the DLPFC should respond to our task in a manner similar to how
LIP neurons respond. In Chapter 4, we will present results from a
preliminary study of DLPFC indicating that this is unlikely to be the case.
105
Chapter 4
4.1 Introduction
Chapter 3 demonstrated that neurons in area LIP, previously shown to
represent sensory or value information in independent sets of experiments, in
fact, integrate this information at both the single unit and population levels.
Regions of the PFC, particularly the dorsal lateral PFC (DLPFC) are also
independently modulated by motion coherence (38) and value information (1,
40-41, 45, 53, 73). Additionally, when studied independently, these factors,
and other, similarly modulated DLPFC and LIP (8, 38, 45). In this chapter
we will present preliminary physiological data recorded from the DLPFC of
one monkey engaged in our motion discrimination task with multiple reward
contingencies. We will begin to determine whether LIP and DLPFC
continue to respond similarly to sensory and value information when they
are presented simultaneously and competitively in a single behavioral
paradigm.
The PFC is believed to play a general role in wide range of behaviors
requiring dynamic cognitive control (for review see 17, 49) and working
memory (16, 25), particularly when multiple sources of information guide
action (17, 49). PFC neurons have also been shown to encode behaviorally
relevant task categories (15), task specific rules (72), as well specific
combinations of value and action (73). The PFC has been extensively
studied in the context of competitive games, in witch signals related to
choices, their outcomes and their conjunction have been documented (3, 43,
63, 68). PFC neurons are, therefore, likely active during our task, in which
two different, often competitive, factors must be temporally integrated to
produce optimal behavior. We can begin to understand how our sensory and
value factors might jointly effect PFC neurons by first considering how they
106
each influence its activity alone. Of particular relevance are two studies
demonstrating that neurons in the PFC are independently modulated by
reward value and motion coherence.
Leon and Shadlen (45) trained monkeys on a memory-guided,
instructed saccade task, in which both the saccade location and reward
magnitude (small, 1x; or large 2x) were cued. They report that, overall,
neurons in the DLPFC responded with greater firing rates to saccades
associated with larger rewards. Some of these value responses, however,
emerged only after the saccade was cued, while others emerged independent
of the cue’s timing or location. Using a similar memory-guided delayed
saccade task, Kim and Shadlen (38) recorded DLPFC responses to a simple
motion discrimination task. They report that, while some neurons only
predict the monkey’s upcoming choice during the delay epoch, most begin
predicting choice during the motion epoch, and are modulated by the
coherence of the visual stimulus. This coherence dependent modulation was
reported as qualitatively and quantitatively similar in magnitude and time
course to those seen in LIP. Additionally, DLPFC and LIP also have
qualitatively and quantitatively similar response to a simple delayed saccade
task (8).
These and other decision related studies of the PFC (3, 38, 43-44, 55,
63, 68, 73) as well as the anatomical interconnectivity between PFC, LIP
(47) and other decision-related areas (33), has led to the proposition that the
DLPFC and LIP might constitute a single, distributed, decision-related
network (38, 62). If so, neurons isolated in DLPFC using our delayed
saccade task should be functionally related to those isolated in LIP by
similar criteria.
In contrast to this expectation, we observed remarkable differences
107
between the physiological responses of PFC and LIP neurons in this animal.
Even though our data set is small (26 neurons—see Methods) and obtained
from only one animal, the differences were sufficiently striking that they
seemed worth documenting in this thesis.
4.2 Methods
Monkey A, one of the two, adult, rhesus monkeys that participated in the
behavioral and electrophysiological experiments presented in Chapter 2 and
3 was used in the following physiological experiment. Prior to physiological
recordings the monkey underwent an additional surgical procedure to place a
recording chamber above the principal sulcus. All other methods were
identical to those described in Chapter 3.
4.2.1 Physiological Recordings
PFC was identified by a combination of stereotactic location, regional
physiological activity and anatomical magnetic resonance imaging. Figures
24a and 24b are two representative MRI images used to target electrode
penetrations and identify the recording sites. Figure 24a is from a series of
image planes normal to the bore of the recording cylinder. The cylinder’s
“footprint” is denoted by the large green circle, while the two smaller green
circles denote the location of two burr holes used in recording (see Methods,
Chapter 3). The purple and red lines respectively denote the principal sulcus
(PS) and arcuate sulcus (AS). Figure 24b is a coronal image, showing the
saline-filled recording cylinder and reference grid, centered over the
principal sulcus (purple). The approximate trajectory of an electrode passing
thought the burr hole is depicted with a dotted cyan line. Single neurons
were isolated and their activity recorded with methods and materials
108
identical to those presented in Chapter 3.
4.3 Results
4.3.1 Cell selection and delayed saccade task
To select recording sites in the PFC we used the same delayed saccade task
with multiple reward contingencies described in Chapter 2. Recall that, in
this task, a single target can appear at one of six locations (0°, 60°, 120°,
180°, 240°, 300° and 360°) with one of two reward values (high and low).
PS
PS
AS
Figure 24a-b. Anatomical magnetic resonance imaging PFC recording sitea From a series of image planes normal to the bore of the recording cylinder. The cylinder!s “footprint” is denoted by the large green circle, while the two smaller green circles denote the location of two burr holes used in recording . The purple and red lines respectively denote the principal sulcus (PS) and arcuate sulcus (AS). b A coronal image, showing the saline-filled recording cylinder and reference grid, centered over the principal sulcus (purple). The approximate trajectory of an electrode passing though the burr hole is depicted with a dotted cyan line.
ba
109
Using this task we identified neurons having persistent, delay-period activity
related to either the target’s location or its value. Because we were often
able to isolate multiple neurons on single or multiple electrodes we collected
26 single units over 12 experimental sessions from the right hemisphere of
monkey A.
During every session we identified at least one neuron responsive to
target location during the delayed saccade task. We often collected
additional single units, whose responses on the delayed saccade task were, at
the time, less clear. An off-line, multi-way analysis of variance—with
factors for target location, target value and their interaction—on each
neuron’s mean delay epoch activity revealed that of the 20 neurons with
significant (p<0.05) effects, 69% (18) were modulated by target location;
23% (6) by target value, 15% (4) by both target location and target value.
Only 11% (3) showed a significant interaction (p<0.05) between target
location and value. The remaining six units, while not significantly
modulated during the delayed saccade task, were in fact modulated
significantly during at least one epoch in the direction discriminate task
(multi-way ANOVA with factors for reward contingency and target location,
run on the mean firing rate in each task epoch, p<0.05). Thus, all 26 units
were included in all subsequent analyses.
In contrast to LIP neurons, which had highly localized response fields
within the contralateral visual hemifield, neurons in the PFC tended to be
less selective, responding similarly to targets positioned anywhere within the
contralateral hemifield. Figure 25a plots mean neural response (radius) to
the six target locations (angle) during the delay epoch of the delayed saccade
task, for an example PFC neuron. For comparison, Figure 25b similarly
plots an LIP neuron’s response. In Figure 25a the red points and lines are
110
responses when the target was high value, while the blue is when the target
was low value. Note that this PFC neuron responds more to targets at 120°,
180°and 240°, (targets within the contra-lateral visual hemifield) than to
those at 60°, 0° and 300°. In contrast, the LIP neuron (only one reward
condition) responds differentially to only one target position, 180°.
4.3.2 Population Response
Because we often collected multiple single units in a single
experimental session, we were usually unable to fully optimize the target
location for each neuron. As noted above, however, these neurons
Figure 25a-b.In contrast to LIP, neurons in the PFC tended to be less selec-tive, responding similarly to targets positioned anywhere within the contralat-eral hemifield.a Plots the mean neural response, in spikes.sec (radius) to the six target locations (angle) during the delay epoch of the delayed saccade task, for an example PFC neuron. The red points and lines are responses from when the target was high value, while the blue is from when the target was low value. b similarly plots an LIP neuron!s response.
5
10
15
20
30
210
60
240
90
270
120
300
150
330
180 0
5
10
15
30
210
60
240
90
270
120
300
150
330
180 0
ba
111
responded well to targets in the contralateral hemifield. Thus, in the context
of our direction discrimination task, “T1” in this chapter will always refer to
the target in the visual hemifield contralateral to the recording site.
Figure 26 plots the mean firing rate of all 26 DLPFC neurons as a
function of time, similarly to Figures 11, 12 and 13, for all completed trials
in the HH (red) and LL (blue), HL (black) and LH (green) reward conditions.
Within these reward conditions results are plotted for trials in which the
monkey chose T1 (contralateral target, solid lines) and T2 (ipsilateral target,
dashed lines). Note, that while this population is clearly modulated by
reward contingencies (colored lines), this modulation is not systematic (there
is no logical ordering of the four reward conditions) and is thus very
different from the LIP data. Only two systematic trends are visible in these
data. First is the gradual buildup of activity, which peaks approximately
midway though the motion period and declines throughout the delay period.
Second is the gradual separation of responses preceding saccades to T1 and
T2 (decision-related activity). As the forthcoming examples will
demonstrate, our population of PFC neurons is, in fact, extremely
heterogeneous, and the average responses depicted in Figure 18 are notable
primarily for how poorly they represent the responses of individual PFC
neurons. This is notably distinct from our LIP data, for which the
population histograms were reasonably representative of most single
neurons.
4.3.3 Population heterogeneity
To indicate the heterogeneity in our DLPFC population we present data
from four individual neurons from this population, each exemplifying an
extremely specific response profile. For each neuron, and each task epoch,
112
we have preformed a multi-way analysis of variance, with factors for the
reward conditions (HH, LL, HL and LH), the choice (T1 and T2) and signed
motion coherence. We then performed a posthoc, pairwise comparison test
to determine which factors significantly affected firing rate.
Figure 27 plots the mean response of one PFC neuron in a format
similar to Figure 26. In the reward and motion epochs, this neuron was
significantly (p=0; both epochs) modulated by the reward condition; firing
Figure 26. Average DLPFC response (n=26).Plots the mean firing rate of all 26 DLPFC neurons as a function of time, for all completed trials in the HH (red) and LL (blue), HL (black) and LH (green) reward conditions. Within these reward conditions results are plotted for trials in which the monkey chose T1 (contralateral target, solid lines) and T2 (ipsilateral target, dashed lines). Note, that while this population is clearly modulated by reward contingencies (colored lines), this modulation is not systematic (there is no logical ordering of the four reward conditions) and is thus very different from the LIP data for which the population histograms were reasonably representative of most single neurons.
Monkey A
Time from target onset (ms) Time from saccade (ms)
Meanresponse(spikes/sec.)
10
20
0 250 500 1000 -350 0
Targetepoch
Rewardepoch
Motion epoch Early delayepoch
Late delayepoch
113
rates were statistically indistinguishable for LH (green) and HH (red)
conditions, but were greater for both of these conditions than for the LL
(blue) and HL (black) conditions. In the delay period this neuron was
significantly (p=0) but weakly modulated by choice, firing more for T1
choices (solid lines) than for T2 choices (dashed), but was not modulated by
reward condition (p=0.3287). If one wished to summarize the selectivity of
this neurons in words (perhaps an inadvisable endeavor), one might say that
it appears to respond best when a high value target is presented in the
ipsilateral visual field, mostly during the reward cue period.
Figure 27. A single DLPFC neuron.Plots the mean firing rate as a function of time, for all completed trials in the HH (red) and LL (blue), HL (black) and LH (green) reward conditions. Within these reward conditions results are plotted for trials in which the monkey chose T1 (contralateral target, solid lines) and T2 (ipsilateral target, dashed lines). In the reward and motion epochs, this neuron was significantly modu-lated by the reward condition. In the delay period this neuron was weakly modulated by choice, firing more for T1 choices (solid lines) than for T2 choices (dashed), but was not modulated by reward condition.
Monkey A
Time from target onset (ms) Time from saccade (ms)
Meanresponse(spikes/sec.)
10
20
30
0 250 500 1000 -350 0
Targetepoch
Rewardepoch
Motion epoch Early delayepoch
Late delayepoch
114
Another example neuron, plotted in Figure 28, is also modulated by
reward condition in the reward cue and motion epochs, but it changes its
preferred reward conditions at the transition between these epochs. In the
reward cue epoch, this neuron fires significantly (p=0) more for LH (green),
equally for the HH (red) and LL (blue) conditions, and least for the HL
(black) condition. However, in the motion epoch this pattern switches, and
the neuron fires significantly (p=0) more (and equally) for the HH (red) and
HL (black) conditions as compared to the LL (blue) and LH (green)
conditions. In the delay period this neuron is significantly (p=0) modulated
by choice, firing more for T1 (solid) then T2 (dashed) choices, but is not
significantly modulated by reward conditions (p=0.2397).
Some neurons had responses that were highly specific to particular
combinations of reward condition, choice and epoch. Figure 29 depicts a
neuron responding almost exclusively during the motion period epoch to LH
(green, p=0) trials that result in a T1 choice (solid, p=0.0001). This is not a
subtle effect: the firing rate peaks at over 20 spikes/sec for the responsive
condition, but fails to exceed 5 spikes/sec for all others. During the motion
period, this neuron also responded weakly to the HH (red) and LL (blue)
conditions, but not at all to HL (black) conditions. Although, in the motion
epoch, this neuron was modulated by choice for the LH (green), HH (red)
and LL (blue) conditions, it was not significantly (p=0.2247) modulated by
choice in the delay period for any condition.
Figure 30 plots data from a similarly specific neuron. This neuron
was significantly (p=0) modulated by reward condition in the reward cue
epoch, preferring HL (black), HH (red), LL (blue) and LH (green)
conditions in that order, which is consistent with a representation of relative
value as observed in many LIP neurons. In the motion epoch, however, this
115
neuron’s response becomes nearly five times larger for one reward condition
(HL, black traces) than for all the others. The selectivity of this neuron was
not quite as impressive as the neuron in Figure 21 since it responded well to
both choices in the HL condition. Similar to the neuron in Figure 29, this
neuron was significantly (p=0) modulated by choice in the motion cue
period but not in the delay period.
Figure 28. A single DLPFC neuronPlots the mean firing rate as a function of time, for all completed trials in the HH (red) and LL (blue), HL (black) and LH (green) reward conditions. Within these reward conditions results are plotted for trials in which the monkey chose T1 (contralateral target, solid lines) and T2 (ipsilateral target, dashed lines). This neuron is modulated by reward condition in the reward cue and motion epochs, but it changes its preferred reward conditions at the transition between these epochs. In the delay period this neuron is modulated by choice, firing more for T1 (solid) than T2 (dashed) choices, but is not modu-lated by reward conditions.
Monkey A
Time from target onset (ms) Time from saccade (ms)
Meanresponse(spikes/sec.)
20
60
40
0 250 500 1000 -350 0
Targetepoch
Rewardepoch
Motion epoch Early delayepoch
Late delayepoch
116
4.3.4 Discussion
While the preceding experiments and analysis are preliminary and in no
way conclusive, they suggest that DLPFC neurons represent sensory and
value information in an extremely heterogeneous manner. In their
heterogeneity generally, and the individual responses specifically, these
neurons appear to be fundamentally different than LIP neurons. LIP neurons,
selected using the same criterion, represent sensory and value information in
a systematic and consistent manner, commensurate with an accumulator
Figure 29. A single DLPFC neuron responding specifically to particular com-binations of reward condition, choice and epoch.Plots the mean firing rate as a function of time, for all completed trials in the HH (red) and LL (blue), HL (black) and LH (green) reward conditions. Within these reward conditions results are plotted for trials in which the monkey chose T1 (contralateral target, solid lines) and T2 (ipsilateral target, dashed lines). This neuron responds almost exclusively during the motion period epoch to LH trials that result in a T1 choice. The firing rate peaks at over 20 spikes/sec. for the responsive condition, but fails to exceed 5 spikes/sec. for all others.
Monkey A
Time from target onset (ms) Time from saccade (ms)
Meanresponse(spikes/sec.)
10
20
0 250 500 1000 -350 0
Targetepoch
Rewardepoch
Motion epoch Early delayepoch
Late delayepoch
117
model of sensory integration. In contrast, DLPFC neurons can demonstrate
extreme specificity for combinations of reward condition, task epoch and
choice (e.g. Fig. 29 and Fig. 30) indicating that, while DLPFC likely plays
an important role in generating behavior, that role is fundamentally different
than LIP’s.
Our PFC data, though modest in number, are not compatible with a
model of decision making in which a decision variable evolves
Figure 30. A single DLPFC neuron responding specifically to particular com-binations of reward condition, choice and epoch.Plots the mean firing rate as a function of time, for all completed trials in the HH (red) and LL (blue), HL (black) and LH (green) reward conditions. Within these reward conditions results are plotted for trials in which the monkey chose T1 (contralateral target, solid lines) and T2 (ipsilateral target, dashed lines). This neuron was significantly modulated by reward condition in the reward cue epoch, preferring HL (black), HH (red), LL (blue) and LH (green) conditions in that order, which is consistent with a representation of relative value as observed in many LIP neurons. In the motion epoch, however, this neuron!s response becomes nearly five times larger for one reward condition (HL, black traces) than for all the others.
Monkey A
Time from target onset (ms) Time from saccade (ms)
Meanresponse(spikes/sec.)
20
40
0 250 500 1000 -350 0
Targetepoch
Rewardepoch
Motion epoch Early delayepoch
Late delayepoch
118
simultaneously in DLPFC and LIP. Our results are also incompatible with a
model in which either LIP or DLPFC reflects the evolution of a decision in
the other area. If either of these scenarios were accurate, we would expect
greater similarity between these two areas during our task. Our data,
however, are compatible with previous studies demonstrating the DLPFC
neurons represent highly specific combinations of task relevant categories,
or rules, and choices (15, 72-73). The modulations we observe in DLPFC in
response to reward contingencies (HH, LL, HL or LH), appear more related
to which of the four reward conditions was present, than to either the
average reward or the value of the target in the contralateral hemifield.
Neurons that singularly represent a relative reward condition (HL or LH)
during the motion epoch, like those depicted in Figures 29 and 30, could
serve as the source of the additive bias we observed in LIP.
Additionally, while behavioral choice modulated the activity of some
DLPFC neurons, most were never systematically modulated by coherence
like LIP neurons. This was surprising given that we selected our DLPFC
population similarly to both our LIP populations and previous of studies of
coherence in DLPFC by Kim and Shadlen (38). Two possibilities might
account for this discrepancy. First, while we used a delayed saccade task to
select our neurons, Kim and Shadlen used a memory guided delayed saccade
task. Given DLPFC’s role in working memory (16-17, 25) it is possible that
we selected a different population in the DLPFC.
Second, it might result from DLPFC neurons representing a trials
category or rule, rather than a decision process. In a simple motion
discrimination the motion coherence is the only behaviorally relevant factor
and would therefore be the dimensions along which trials would be
categorized. However, in our task, the reward condition is likely the
119
dimension along which trails are categorized, particularly given that motion
coherence effect is identical across all reward conditions (Chapter 3).
120
Chapter 5
5.1 Summary and conclusions
This study’s central finding is that monkeys integrate sensory and value
information at the behavioral and neuronal level. At the behavioral level we
find that monkey’s choices are systematically biased towards options of
greater relative value. Changes in absolute value have no significant effect
on performance in the motion discrimination task. We quantified the bias's
magnitude in terms of the motion coherence, demonstrated it is nearly
optimal, and that excursions from optimality result in a consistent over-bias,
despite which, 98% of the maximum available rewards are still harvested.
We speculate that our monkeys exhibit over-biases because of positive
utility functions. Importantly, we show that this bias is independent of
psychophysical sensitivity, and is implemented behaviorally as an additive
factor.
In contrast to our behavioral results, we find that both relative and
absolute value modulate single LIP neurons, and that the representation of
sensory and value information is dynamic but systematic. Absolute value
significantly modulated firing rate primarily when the rewards are first cued
and, again, later during the delay period, when only absolute value and
choice are represented. Additionally, when options of greater value are
presented in a LIP neuron’s RF it results in greater firing rates.
Relative value has its clearest representation immediately preceding
the onset of the motion stimulus. We argue this relative value signal is well
situated to bias decisions by adjusting the level where sensory evidence
begins accumulating towards a threshold. Importantly, we find that the
effect of motion coherence on LIP firing rate is independent of absolute and
relative value. This indicates that, consistent with behavior, relative value
121
introduces an additive offset to the accumulation of sensory evidence. These
and other results support the integrator model of decision making presented
by Mazurek and Shadlen (48).
Our results do not support models of LIP emphasizing the
representation of local probability of choice (70) or pure, relative subjective-
desirability (13). If LIP represented the local probability of choice then LIP
activity should be equal for all reward conditions on trials when the
monkeys are equally likely to choose T1. The presence, however, of an
absolute value signal results in unequal firing rates representing an equal
probability of choice (e.g. 0% coherence in the HH and LL conditions).
Additionally, if LIP represented local probability of choice, we would expect
motion coherence, which is correlated with probability of choice, to
significantly modulate delay period activity. We find, however, that delay
period activity is significantly modulated by choice and absolute value. If
LIP purely represented the relative subject desirability of an option then
absolute value should not be represented at all.
Our attempts to determine if LIP integrates sensory and value
information in a common currency produced uncertain results. While we
presented a model of LIP activity (Equation 9) permitting us to compare the
behavioral and neuronal equivalence between relative value and motion
coherence (bEVS and nEVS, respectively), it makes assumptions about the
relationship between choice and motion coherence that are potentially
invalid. It is clear, however, that on the behavioral level motion coherence
has a greater influence on the probability of choice then value does, while, at
the neuronal level, value has a greater influence on firing rate then motion
coherence. This observation is difficult to reconcile with the concept of a
common currency.
122
We also presented preliminary results comparing DLPFC with LIP.
Previous investigations of DLPFC showed that its representation of motion
coherence and value were qualitatively and quantitatively similar to those in
LIP. This leads to the proposition that these two areas are parts of a single,
distributed, decision-related network (38, 62). To the contrary, our results
suggest that the DLPFC and LIP contribute to behavior in fundamentally
different ways. While LIP appears to implement a decision-related
accumulation process, capable of accommodating an additive relative value
signal, DLPFC activity appears better suited for signaling the current reward
condition. We observed DLPFC neurons that singularly represent a relative
value condition (Figs. 21 and 22) during the motion epoch, which could
serve as the source of the additive bias we observed in LIP.
5.2 Future directions
5.2.1 Common currency
Our results clearly indicate that single neurons in LIP simultaneously
represent sensory and value information. If LIP is directly responsible for
deciding where to move the eyes it should represent all the factors
influencing this choice in a magnitude proportional to their influence. While
this study was not explicitly designed to determine if LIP integrates
disparate factors in common currency it nonetheless provided one of the first
opportunities to address it. While our results do not strongly support the
idea that LIP combines sensory and value information in a common currency
it is a question demanding further investigation.
Analyzing the integration of sensory and value information in the
context of common currency is a principle focus of our on-going
123
collaboration with Jay McClelland and Juan Gao, in the Department of
Psychology at Stanford University. The question of how LIP integrates
multiple factors is the most significant question for future research.
5.2.2 Reaction time discrimination
Another version of the random dots task allows subjects to report their
decision as soon as they are ready after the random dots appear (9, 28, 54).
This reaction time (RT) version contrasts with the “fixed-duration” task,
used in this study, requiring subjects to view the random dots for a fixed
duration and wait though a delay period before reporting their choice. An
important direction for future research is to incorporate RT into our current
paradigm. A RT version of this task would provide additional metrics that
would deepen our understanding of how rewards bias behavior, and how that
bias is implemented at the neuronal level.
A RT version of this task would provide an additional behavioral
measure of bias, because in addition to quantifying the monkey’s choice, we
could also quantify how long it took to generate. This RT information will
allow us to more accurately determine the effect of reward conditions on the
decision’s duration. In the preceding sections we predicted that, generally,
reaction times for choices to the higher value target should be significantly
shorter than those to low value targets because the decision variable should
be elevated for these high value conditions, and thus more readily cross the
threshold.
Additionally, a RT version of this task will deepen our understanding of
how the bias is implemented at a neuronal level. With an RT version we
could precisely determine the beginning and ending of accumulation. This
would allow us to better determine if, and how, relative and absolute value
124
effect the rate of accumulation. The analysis presented in Section 3.3.2.7
suggests that the rate of accumulation is independent of reward condition;
this analysis, however, was performed on a fixed and arbitrary temporal
epoch that may not accurately capture the true window of accumulation.
Finally, a RT version of this task would allow us to take the time of the
saccade as a surrogate for threshold crossing. Knowing the time of threshold
crossing would permit us to determine if absolute and relative value effect
the threshold’s height. As discussed in Section 3.4.2, adjusting the
threshold’s height is on of three ways (along with offsetting the start of
accumulation, and increasing or decreasing the rate of accumulation) a bias
can be implemented in the accumulator model. The results presented above
suggest the bias is implemented by offsetting the start of accumulation,
however, it is possible this process works in conjunction with an offset in the
threshold’s height.
5.2.3 Mapping utility with additional reward magnitudes
In Chapter 2 we speculated that our monkeys exhibit greater-than-
optimal biases because of positive utility functions associated with a highly
motivated desire for fluids. To truly determine the shape of our monkeys’
actual utility functions with this paradigm we would require at least one, but
ideally several, additional reward ratios (e.g 3:1). If the monkeys truly have
a positive utility function the observed biases should continue to be greater
then the optimal bias, and the difference between the observed and optimal
should increase with greater reward ratios.
Mapping a full utility curve with this paradigm, while possible, is not
suggested. One experimental difficulty with this paradigm was the large
125
number of unique trial conditions (2 directions, 5-7 coherences, and 4
reward conditions generates 40-56 conditions) each of which needed to be
repeated 30-40 times to sufficiently define the PMF and characterize an
isolated neuron. This large number of trials was at the limits of both the
monkeys’ capacity to work, in terms of attention and satiation, and the
experimenter’s capacity to maintain the electrical isolation of a single
neuron. Additional reward conditions would only multiply these challenges.
Other behavioral paradigms are better suited for mapping a full utility
curve. A single point on a utility curve is more simply and directly mapped
by finding the magnitude of a certain reward that is behaviorally equivalent
to an uncertain, or risky, reward. For example, on a single trial a subject is
asked to choose between a certain reward, say 1 drops of juice, and a risky
reward, say a 50% chance of getting either 0 or 5 drops. If the subject is
indifferent to these choices, and treats them equivalently, we can say two
options (1*1=2 and 0*0.5 + 5*0.5=2.5) have equal utility. These two values
become the first point on our utility curve. The full shape of the utility curve
is mapped by determining this equivalence across a rage of risky reward
magnitudes. For details and examples see Chapter 6 of Stephens and Krebs
(69).
126
References
1. Amemori K, Sawaguchi T. 2006. Contrasting effects of reward
expectation on sensory and motor memories in primate prefrontal
neurons. Cereb Cortex. Jul;16(7):1002-15.
2. Barash S, Bracewell RM, Fogassi L, Gnadt JW, Andersen RA. 1991,
Saccade-related activity in the lateral intraparietal area. I. Temporal
properties; comparison with area 7a. J Neurophysiol. Sep;66(3):1095-
108.
3. Barraclough DJ, Conroy ML, Lee D. 2004. Prefrontal cortex and
decision making in a mixed-strategy game. Nat Neurosci.
Apr;7(4):404-10.
4. Bisley JW, Goldberg ME. 2003. Neuronal activity in the lateral
intraparietal area and spatial attention. Science. Jan 3;299(5603):81-6.
5. Britten KH, Shadlen MN, Newsome WT, Movshon JA. 1993.
Responses of neurons in macaque MT to stochastic motion signals.
Vis Neurosci. Nov-Dec;10(6):1157-69.
6. Celebrini S, Newsome WT. 1994. Neuronal and psychophysical
sensitivity to motion signals in extrastriate area MST of the macaque
monkey. J Neurosci. Jul;14(7):4109-24.
7. Celebrini S, Newsome WT. 1995. Microstimulation of extrastriate
area MST influences performance on a direction discrimination task. J
Neurophysiol. Feb;73(2):437-48.
8. Chafee MV, Goldman-Rakic PS. 1998. Matching patterns of activity
in primate prefrontal area 8a and parietal area 7ip neurons during a
spatial working memory task. J Neurophysiol. Jun;79(6):2919-40.
9. Churchland AK, Kiani R, Shadlen MN. 2008. Decision-making with
multiple alternatives. Nat Neurosci. Jun;11(6):693-702
127
10. Colby CL, Duhamel JR, Goldberg ME. 1996. Visual, presaccadic, and
cognitive activation of single neurons in monkey lateral intraparietal
area. J Neurophysiol. Nov;76(5):2841-52
11. Colby CL, Goldberg ME. 1999. Space and attention in parietal cortex.
Annu Rev Neurosci. 22:319-49.
12. Ditterich J, Mazurek ME, Shadlen MN. 2003. Responses of neurons
in macaque MT to stochastic motion signals. Nat Neurosci.
Aug;6(8):891-8
13. Dorris MC, Glimcher PW. 2004. Activity in posterior parietal cortex
is correlated with the relative subjective desirability of action. Neuron.
Oct 14;44(2):365-78.
14. Evarts EV. 1966. Pyramidal tract activity associated with a
conditioned hand movement in the monkey. J Neurophysiol.
Nov;29(6):1011-27.
15. Freedman DJ, Riesenhuber M, Poggio T, Miller EK. 2008.
Categorical representation of visual stimuli in the primate prefrontal
cortex. Science. Jan 12;291(5502):312-6
16. Funahashi S, Bruce CJ, Goldman-Rakic PS. 1989. Mnemonic coding
of visual space in the monkey's dorsolateral prefrontal cortex. J
Neurophysiol. Feb;61(2):331-49.
17. Fuster JM. 2001. The prefrontal cortex--an update: time is of the
essence. Neuron. May;30(2):319-33.
18. Gallistel CR, Mark TA, King AP, Latham PE. 2001 The rat
approximates an ideal detector of changes in rates of reward:
implications for the law of effect. J Exp Psychol Anim Behav Process.
Oct;27(4):354-72
19. Glimcher PW. 2003. The neurobiology of visual-saccadic decision
128
making. Annu Rev Neurosci. 26:133-79.
20. Gnadt JW, Andersen RA. 1988. Memory related motor planning
activity in posterior parietal cortex of macaque. Exp Brain Res.
70(1):216-20.
21. Gold JI, Shadlen MN. 2001. Neural computations that underlie
decisions about sensory stimuli. Trends Cogn Sci. Jan 1;5(1):10-16
22. Gold JI, Shadlen MN. 2002. Banburismus and the brain: decoding the
relationship between sensory stimuli, decisions, and reward. Neuron.
Oct 10;36(2):299-308.
23. Gold JI, Shadlen MN. 2003. The influence of behavioral context on
the representation of a perceptual decision in developing oculomotor
commands. J Neurosci. Jan 15;23(2):632-51
24. Gold JI, Shadlen MN. 2007. The neural basis of decision making.
Annu Rev Neurosci.30:535-74.
25. Goldman-Rakic, P.S 1987. Circuitry of primate prefrontal cortex and
the regulation of behavior by representational memory. Handbook of
Physiology vol5(1)
26. Gottlieb JP, Kusunoki M, Goldberg ME. 1998. The representation of
visual salience in monkey parietal cortex. Nature. Jan
29;391(6666):481-4.
27. Green, D. M., & Swets, J. A. 1966. Signal detection theory and
psychophysics. New York: John Wiley and Sons.
28. Hanks TD, Ditterich J, Shadlen MN. 2006. Microstimulation of
macaque area LIP affects decision-making in a motion discrimination
task. Nat Neurosci. May;9(5):682-9.
29. Hernández A, Zainos A, Romo R. 2000. Neuronal correlates of
sensory discrimination in the somatosensory cortex. Proc Natl Acad
129
Sci May 23;97(11):6191-6.
30. Hernández A, Zainos A, Romo R. 2002. Temporal evolution of a
decision-making process in medial premotor cortex. Neuron. Mar
14;33(6):959-72.
31. Herrnstein, R. J. 1961. Relative and absolute strength of responses as
a function of frequency of reinforcement. J Exp Anal Behav. 4, 267-
272.
32. Horwitz GD, Newsome WT. 1999. Separate signals for target
selection and movement specification in the superior colliculus.
Science. May 14;284(5417):1158-61.
33. Horwitz GD, Newsome WT. 2001. Target selection for saccadic eye
movements: prelude activity in the superior colliculus during a
direction-discrimination task. J Neurophysiol. Nov;86(5):2543-58.
34. Janssen P, Shadlen MN. 2005. A representation of the hazard rate of
elapsed time in macaque area LIP. Nat Neurosci. Feb;8(2):234-41.
35. Judge SJ, Richmond BJ, Chu FC. 1980. Implantation of magnetic
search coils for measurement of eye position: an improved method.
Vision Res. 20(6):535-8.
36. Kable JW, Glimcher PW. 2007. The neural correlates of subjective
value during intertemporal choice. Nat Neurosci. Dec;10(12):1625-33.
37. Kiani R, Hanks TD, Shadlen MN. 2008. Bounded integration in
parietal cortex underlies decisions even when viewing duration is
dictated by the environment. J Neurosci. Mar 19;28(12):3017-29
38. Kim JN, Shadlen MN. 1999. Neural correlates of a decision in the
dorsolateral prefrontal cortex of the macaque. Nat Neurosci.
Feb;2(2):176-85.
39. Knutson B, Cooper JC. 2005. Functional magnetic resonance imaging
130
of reward prediction. Curr Opin Neurol. Aug;18(4):411-7.
40. Kobayashi S, Lauwereyns J, Koizumi M, Sakagami M, Hikosaka O.
2002. Influence of reward expectation on visuospatial processing
in macaque lateral prefrontal cortex. J Neurophysiol. Mar;87(3):1488-
98.
41. Kobayashi S, Nomoto K, Watanabe M, Hikosaka O, Schultz W,
Sakagami M. 2006. Influences of rewarding and aversive outcomes on
activity in macaque lateral prefrontal cortex. Neuron.
Sep21;51(6):861-70
42. Lau B, Glimcher PW. 2008. Value representations in the primate
striatum during matching behavior. Neuron. May 8;58(3):451-63.
43. Lee D, Conroy ML, McGreevy BP, Barraclough DJ. 2004.
Reinforcement learning and decision making in monkeys during a
competitive game. Brain Res Cogn Brain Res. Dec;22(1):45-58
44. Lee D, McGreevy BP, Barraclough DJ. 2005. Learning and decision
making in monkeys during a rock-paper-scissors game. Brain Res
Cogn Brain Res. Oct;25(2):416-30.
45. Leon MI, Shadlen MN. 1999. Effect of expected reward magnitude on
the response of neurons in the dorsolateral prefrontal cortex of the
macaque. Neuron. Oct;24(2):415-25
46. Leon MI, Shadlen MN. 2003. Representation of time by neurons in
the posterior parietal cortex of the macaque. Neuron. Apr
24;38(2):317-27
47. Lewis JW, Van Essen DC. 2000. Corticocortical connections of visual,
sensorimotor, and multimodal processing areas in the parietal lobe of
the macaque monkey. J Comp Neurol. Dec 4;428(1):112-37.
48. Mazurek ME, Roitman JD, Ditterich J, Shadlen MN. 2003. A role for
131
neural integrators in perceptual decision making. Cereb Cortex.
Nov;13(11):1257-69
49. Miller EK, Cohen JD. 2001. An integrative theory of prefrontal cortex
function. Annu Rev Neurosci. 24:167-202
50. Montague PR, King-Casas B, Cohen JD. 2006. Imaging valuation
models in human choice. Annu Rev Neurosci. 29:417-48
51. Mountcastle VB, Steinmetz MA, Romo R. 1990. Frequency
discrimination in the sense of flutter: psychophysical measurements
correlated with postcentral events in behaving monkeys.J Neurosci.
Sep;10(9):3032-44
52. Platt ML, Glimcher PW. 1999. Neural correlates of decision variables
in parietal cortex. Nature. Jul 15;400(6741):233-8
53. Roesch MR, Olson CR. 2003. Impact of expected reward on neuronal
activity in prefrontal cortex, frontal and supplementary eye fields and
premotor cortex. J Neurophysiol.Sep;90(3):1766-89.
54. Roitman JD, Shadlen MN. 2002. Response of neurons in the lateral
intraparietal area during a combined visual discrimination reaction
time task. J Neurosci.Nov 1;22(21):9475-89
55. Romo R, Brody CD, Hernández A, Lemus L. 1999. Neuronal
correlates of parametric working memory in the prefrontal cortex.
Nature. Jun 3;399(6735):470-3.
56. Romo R, Hernández A, Zainos A, Lemus L, Brody CD. 2002.
Neuronal correlates of decision-making in secondary somatosensory
cortex. Nat Neurosci. Nov;5(11):1217-25.
57. Romo R, Salinas E. 2003. Flutter discrimination: neural codes,
perception, memory and decision making. Nat Rev Neurosci.
Mar;4(3):203-18.
132
58. Salzman CD, Murasugi CM, Britten KH, Newsome WT. 1992.
Microstimulation in visual area MT: effects on direction
discrimination performance. J Neurosci. Jun;12(6):2331-55
59. Schall JD, Hanes DP. 1993. Neural basis of saccade target selection in
frontal eye field during visual search. Nature. Dec 2;366(6454):467-9
60. Schall JD, Bichot NP. 1998. Neural correlates of visual and motor
decision processes. Curr Opin Neurobiol. Apr;8(2):211-7
61. Schall JD. 2001. Neural basis of deciding, choosing and acting. Nat
Rev Neurosci. Jan;2(1):33-42
62. Selemon LD, Goldman-Rakic PS. 1988. Common cortical and
subcortical targets of the dorsolateral prefrontal and posterior parietal
cortices in the rhesus monkey: evidence for a distributed neural
network subserving spatially guided behavior. J Neurosci.
Nov;8(11):4049-68
63. Seo H, Barraclough DJ, Lee D. 2007. Dynamic signals related to
choices and outcomes in the dorsolateral prefrontal cortex. Cereb
Cortex. Sep;17
64. Shadlen MN, Britten KH, Newsome WT, Movshon JA. 1996. A
computational analysis of the relationship between neuronal and
behavioral responses to visual motion. J Neurosci. Feb 15;16(4):1486-
510
65. Shadlen MN, Newsome WT. 001. Neural basis of a perceptual
decision in the parietal cortex (area LIP) of the rhesus monkey. J
Neurophysiol. Oct;86(4):1916-36
66. Shizgal P. 1997. Neural basis of utility estimation. Curr Opin
Neurobiol. Apr;7(2):198-208
67. Snyder LH, Grieve KL, Brotchie P, Andersen RA. 1999. Separate
133
body- and world-referenced representations of visual space in parietal
cortex. Nature. Aug 27;394(6696):887-91
68. Soltani A, Wang XJ. 2006. A biophysically based neural model of
matching law behavior: melioration by stochastic synapses. J
Neurosci. Apr 5;26(14):3731-44.
69. Stephens, D. W. and Krebs, J. R. 1986. Foraging Theory. Princeton
Univ. Press, Princeton, NJ
70. Sugrue LP, Corrado GS, Newsome WT. 2004. Matching behavior and
the representation of value in the parietal cortex. Science. Jun
18;304(5678):1782-7
71. Sugrue LP, Corrado GS, Newsome WT. 2005. Choosing the greater of
two goods: neural currencies for valuation and decision making. Nat
Rev Neurosci. May;6(5):363-75
72. Wallis JD, Anderson KC, Miller EK. 2001. Single neurons in
prefrontal cortex encode abstract rules. Nature. Jun 21;411(6840):953-
6
73. Wallis JD, Miller EK. 2003. Neuronal activity in primate dorsolateral
and orbital prefrontal cortex during performance of a reward
preference task. Eur J Neurosci. Oct;18(7):2069-81
74. Watanabe M, Hikosaka K, Sakagami M, Shirakawa S. 2007. Reward
expectancy-related prefrontal neuronal activities: are they neural
substrates of "affective" working memory? Cortex. Jan;43(1):53-64.
75. Xue G, Lu Z, Levin IP, Weller JA, Li X, Bechara A. 2008. Functional
dissociations of risk and reward processing in the medial prefrontal
cortex. Cereb Cortex. May;19(5):1019-27.