eye movements, attention,and working memory in natural environments
DESCRIPTION
Eye Movements, Attention,and Working Memory in Natural Environments. Mary Hayhoe University of Rochester. Selecting information from visual scenes. What controls the selection process?. Fundamental Constraints Attention is limited. - PowerPoint PPT PresentationTRANSCRIPT
Eye Movements, Attention,and WorkingMemory in Natural Environments
Mary Hayhoe University of Rochester
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Selecting information from visual scenes
What controls the selection process?
Humans must select a limited subset of the available information in the environment.
Fundamental Constraints Attention is limited. Visual Working Memory is limited.
Only a limited amount of information can be retained.
What controls these processes?
How do attentional and memory limitations play out in natural behavior?
Need to understand Usage, ie mechanisms than control allocation of gaze, attention, and memory, not just Capacity
The Question
- Natural behavior : sequences of operations over several sec - selection and timing under observer’s control.
- Trial structure of standard paradigms : repeated instances of a single operation. Experimenter controls timing and nature of selection.
Developments in Eye Tracking
Head fixed (restricted): Contact lenses: magnetic coils,
Dual Purkinje Image tracker
Head Free: Head mounted IR video-based systems Scene camera
Difficulty: optical power of eye + observer movement
Investigation of natural tasks with head-mounted eye-trackers
Scene camera on head provides video record of scene + eye position
Need active interaction with environment - not just passive viewing of images.
Task structure allows interpretation of role of fixations.
Advantages of Natural Behavior
Foot placement
Obstacle avoidance
Heading
Viewing pictures of scenes is different from acting within scenes.
Eye Movements During Natural Behavior
QuickTime™ and aMPEG-4 Video decompressor
are needed to see this picture.
(Hayhoe et al, 2003)
Other Tasks
Driving (Land & Lee, 1992)
Table Tennis (Land & Furneaux, 1997)
Piano (Land & Furneaux, 1997)
Toy models (Pelz et al, 2000)
Cricket (Land & Macleod, 2000)
Walking (Patla & Vickers, 1997,Turano et al 2003)
Saliency vs Tasks
Image properties eg contrast, chromatic saliency etc can account for some proportion of the observed fixations when viewing images of scenes (Itti & Koch, 2001; Parkhurst & Neibur, 2003; Mannan et al, 1997).
However, only modest role for image saliency in interactive tasks.
Insights from natural behavior
1. Fixations tightly linked to task: “just-in-time”strategy.
QuickTime™ and aH.263 decompressor
are needed to see this picture.
Fixations tightly linked to actions.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Johanssen et al 2001
Timing of fixations linked to current action
Hand pathfixations
Gaze arrives at critical point just before needed and departs when goal achieved.
Model
Workspace Resource Area
(Ballard et al 1995)
eye
hand
“Just-in-time” strategy
Insights from natural behavior
1. Fixations tightly linked to task: “just-in-time”strategy.
2. Fixations patterns reflect learning at several levels: what objects are relevant/where information is located/order of sub-tasks/properties of world.
Cognitive Goal
Micro-task
Fixation
Acquire Info
Make PBJ sandwich
Get jelly
Fixate jelly jar
Learn task sequence
Learn where to fixate
Gaze distribution is very different for different tasks.
Subjects have learnt that traffic signs are mostly at intersections
Follow
Follow+Stop
Total Fixation Duration(%)
7.02
5.89
77.1
42
0.779
6.44
14.9
45.4
0.191
0.243
120 m 30 m
ROAD CAR SIDE INT BACK
Time fixatingIntersection.
Learning Where to Look
Obey traffic rules
Follow
(Shinoda et al, 2001)
45%
15%
Learning optimal location:Fixate tangent point while driving around a curve
Fixation density
Gaze angle relative to body gives steering angle - gaze angle= “control variable”
(Land & Lee, 1994)
Need to learn optimal location for control of pouring
(Land, Mennie, Rusted 1999)
regulate flow
monitor level
Neural Substrate for Learning where to Look
target selection
signals to muscles
inhibits SC
saccade decision
saccade command
Saccadic eye movement circuitry
LIP:lateral intra-parietal
Neural Substrate for Learning where to Look
Schultz, 2000: dopaminergic neurons in basal ganglia signal expected reward.
Hikosaka et al, 2000: Caudate cell responses in basal ganglia reflect both upcoming saccade and expected reward. Regulates fixation & timing of saccades.Cortical saccade-related areas sensitive to reward:LIP - Platt & Glimcher, 1999; Sugrue et al, 2004Supplementary eye fields - Stuphorn et al, 2000
Note: Targeted hand movements show similar rapid learning and influence of reward (Trommershauser et al, 2003)
Learning Properties of World
Eye movements in cricket
bowler batsmanBounce point
Land & MacLeod, 2000
Eye movements in cricket:
Batsman anticipate bounce point
Better batsman arrive earlier
Land & MacLeod, 2000
pursuit
Learning Properties of World
saccade
Anticipation impliesinternal model of ball’sexpected path
bounce
The need for internal models
Less evidence for internal models of environment.eg - evidence for minimal memory representations
However, need to plan movements and predict state of environment to counter visual delays. For example, memory of spatial structure of scene is necessary for coordinated movements (eg Chun & Nakayama, 2000 Hayhoe et al, 2003)
Internal models of body’s dynamics mitigate problemof sensory feedback delays. (eg Wolpert et al, 1998)
Eye movements when catching
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
Catching: Gaze Patterns
CatcherThrower
saccade X
X
smooth pursuit
X
Catching: Gaze Anticipation
CatcherThrower
X
XX
61 ms
-53 ms
Timing of departure and arrival linked to critical events
20 deg
Scatter Plot of Fixations near Bounce
2D elevation
Subjects fixate above the bounce point
bounce point
Relatively tight lateralclusteringimplies Sstarget likelylocation of bounce.
Poor tracking when ball is unexpectedly bouncy
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
Better tracking 2 trials later- subjects have revised model
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
Pursuit accuracy following bounce
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1 2 3 4 5 6Trial Number
%age of time gaze on Target
tennis ball
bouncy ball
Pursuit improves rapidly with repeated trials
5 subjects
Earlier Arrival at Bounce Point Over Trials
-180-160-140-120-100-80-60-40-20
0
1 2 3 4 5 6
Trial Number
Latency (ms)
bouncy ball
tennis ball
Subjects continue to adjust saccade timing
Predicting the location for contact with racquet
QuickTime™ and aPlanar RGB decompressor
are needed to see this picture.
Predictive SaccadePredictive Saccade
Anticipation: 183 +/- 35 msAnticipation: 183 +/- 35 ms
BallBall
Anticipatory saccade to predicted location 183 msec before ball.
Fixation after saccadeFixation after saccadeDuration: 250 +/- 21 msDuration: 250 +/- 21 ms
BallBall
RacquetRacquet
Predictive Saccade ctdPredictive Saccade ctd
Ball arrives at fixation point
Error = 2.6 degError = 2.6 deg
Anticipatory saccades, head movements, and pursuit movements reveal that acquisition of visual information is planned for a predicted state of the world.
Internal Models Allow Predictive Vision
Predictions may be based on some kind of internal model of events.
Subjects rapidly adjust this model when errors occur.
Rapid adjustment of performance suggests that prediction is a ubiquitous feature of visually guided behavior.
Cognitive Goal
Micro-task
Fixation
Acquire Info
How selective?
What is stored?
Make PBJ sandwich
Get jelly
Fixate jelly jar
Object? Feature?
Fixations alone don’t Specify what informationis selected.
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
Insights from natural behavior
1. Fixations tightly linked to task: “just-in-time”strategy.
2. Fixations patterns reflect learning at several levels: what objects are relevant/where information is located/order of sub-tasks/properties of world.
3. Duration of fixations reflect time required tocomplete the current visual operation: impliesspecialized computations
Pelz et al, 2000
Different fixation durations for different tasks
Model
Workspace Resource Area
(Ballard et al 1995)
eye
hand
Different fixation durations depending on context
75 msec longer
Experimental Question:
How tightly does task constrain attentional selection and working memory usage?
Can use virtual environments to achieve experimentalcontrol while looking at natural behavior.
Task: select and pick up an object and then put it down.
QuickTime™ and aPhoto - JPEG decompressor
are needed to see this picture.
Phantom Force Feedback System
Haptic feedback for 2-fingered grasping.
Eye tracker mounted inside the virtual reality helmet
Virtual Research V8 head Mounted DisplayHead position: Polhemus Fastrack
ASL 501 Eye TrackerInfra-red, video based
ASL limbus tracker
Eye tracking in Virtual Environment
Saccade detection for image changes during saccades
“Pick up any red
brick.”
PICK-UP CUE:
Height
Width
Color
Texture
FEATURES
RELEVANT
IRRELEVANT FINGERTIPS
PUT-DOWN CUE: “Place the red brick on the right.”
Example of a One-Feature Trial
QuickTime™ and aMotion JPEG B decompressor
are needed to see this picture.
Second task: TWO relevant features
Color relevant for pickup
Width relevant for put-down
Example of a Two-Feature Trial
QuickTime™ and aMotion JPEG B decompressor
are needed to see this picture.
Normalized Trial Length
0
0.4
0.8
0
0.4
0.8
0
0.4
0.8
0
0.4
0.8
0
0.4
0.8
0 20 40 60 80 100
Bricks in Array
Pick-Up Cue
Put-Down Cue
Brick in Hand
Conveyor Belts
Left BeltRight Belt
Pro
bab
ility
of
Fix
atio
nBrick 1Brick 2Brick 3Brick 4Brick 5
Fixations
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Scene
Acquire PU Cue
Acquire
FORGET PU Feature
Acquire PD Cue
Guide Hand to Belt
Guide Hand
ACQUIRE PD Feature
PU Feature
LowMemory Load
Two Possible Strategies
Acquire PU Cue
Acquire
RETAIN
Acquire PD Cue
Guide Hand to Belt
Guide Hand
RECALL
Brick Features
Brick Features
PD Feature
“High”Memory Load
Fixation Patterns
Fixation Pattern May Reveal Memory Strategy
Acquire PU Cue
Acquire
FORGET PU Feature
Acquire PD Cue
Guide Hand to Belt
Guide Hand
ACQUIRE PD Feature
PU Feature
Acquire PU Cue
Acquire
RETAIN
Acquire PD Cue
Guide Hand to Belt
Guide Hand
RECALL
Brick Features
Brick Features
PD Feature
Sorting based on information in scene
Sorting based on working memory
High Memory Load Low Memory Load
One-Feature Two-Feature
Unpredictable
Sorting based on working memory
Sorting based on information in scene
Time of Acquisition Depends on Memory Demands
0
0.2
0.4
0.6
0.8
1
One-Feature Two-Feature
Predictable
Pro
bab
ilit
y o
f F
ixat
ion
Seq
uen
ce
Fixation sequence during a trial depends on what information subjects need later in the trial.
When Subjects are uncertain of what they need,re-fixate the brick to acquire the second feature later in trial, presumably to reduce memory load.
Suggests subjects often acquire only partial information about brick features during pick-up.
Fixation Patterns Reveal Subtle Control by Task
1.Fixation Sequence: Delay acquisition when unpredictable and greater
memory load.
2.Change Detection
Experimental Logic
Change both task relevant and irrelevant features of the object the subject is holding
Greater sensitivity to relevant changes suggests task-specific representations.
Normalized Trial Length
0
0.4
0.8
0
0.4
0.8
0
0.4
0.8
0
0.4
0.8
0
0.4
0.8
0 20 40 60 80 100
Bricks in Array
Pick-Up Cue
Put-Down Cue
Brick in Hand
Conveyor Belts
Left BeltRight Belt
Probability of Fixation
Brick 1Brick 2Brick 3Brick 4Brick 5
Fixations
How Specific is Visually Acquired Information?
?memory
Up to 8 changes per 80 trials
Feature Changes
-- When the brick is being carried towards the area
for sorting
-- During a saccade
Changed feature may be relevant or irrelevant to task
TRASH CAN: Dispose of any brick with a changed feature.
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
Subject successfully detects color change
Subject fails to detect color change
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
0
0.2
0.4
0.6
0.8
1
RelevantIrrelevantRate of Change Detection
A
Pick-UpRelevant
Irrelevant0
0.2
0.4
0.6
0.8
1 B
Put-DownRelevant
0
0.2
0.4
0.6
0.8
1
RelevantIrrelevantRate of Change Detection
C
Pick-UpRelevant
Irrelevant0
0.2
0.4
0.6
0.8
1 D
Put-DownRelevant
Relevant Changes Noticed Most OftenPredictable
Unpredictable
Relevant Changes Noticed Most Often
Less Effect of Relevance when Unpredictable
Attention selects features, not objects (in this task).
Features selected, and time of selection, is controlled by the current microtask.
Object representations are not necessarily maintained in memory as bound entities or “object files”.
Commonly accepted view: Attention binds features into object representations.Remember the attended objects in form of Object Files. Approximately 4 object files held in working memoryacross gaze positions. (Treisman, 1988; Irwin & Andrews, 1996; Luck & Vogel, 1997; Rensink, 2000; Wheeler & Treisman 2003)
Present results suggest:
1.Fixation Sequence: Delay acquisition when unpredictable and greater
memory load.
2.Change Detection:Visual acquisition and storage of information is highly selective
3. Sorting Performance
Unpredictable
IrrelevantPURelevant
PDRelevant
0.00
0.20
0.40
0.60
0.80 Two-Feature Less Effect of Relevance when Relevance is Unpredictable
PredictableRate of Change
Detection
Relevant Changes Noticed Most Often
0.00
0.20
0.40
0.60
0.80
PU & PDRelevant
Irrelevant
One-Feature
*
IrrelevantPURelevant
PDRelevant
0.00
0.20
0.40
0.60
0.80 Two-Feature
*
0.00
0.20
0.40
0.60
0.80
PU & PDRelevant
Irrelevant
One-Feature
*
Rate of Change
Detection
MIS
SE
D!
MIS
SE
D!
MIS
SE
D!
MIS
SE
D!
MIS
SE
D!
MIS
SE
D!
MIS
SE
D!
MIS
SE
D!
MIS
SE
D!
MIS
S!
What happens when a change is missed?
Why are changes not noticed?
Perhaps changes are not noticed due to a failure to re-fixate/encode the new stimulus after the change.
Insensitivity to changes has been interpreted as evidence for poor memory of the pre-change stimulus. (O’Regan, 1992; Rensink, 1997; Simons, 2000)
Sort by Old
Working memory of old feature used for sorting decision
Two Possible Sorting Decisions
Failure to Update New Feature
Sort by New
New feature used for sorting decision
Failure to Maintain
Old Feature
Subject sorts by NEW feature.
QuickTime™ and aDV/DVCPRO - NTSC decompressor
are needed to see this picture.
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
Subject sorts by OLD feature
One-Feature
Two-Feature
Unpredictable
Sort by Old
Sort by New
Sort by Old When Predictable
Sort by Old
Sort by New
Sort by New When Unpredictable
0.00
0.20
0.40
0.60
0.80
1.00
One-Feature
Two-Feature
Predictable
Fra
ctio
n o
f M
isse
d T
rial
s
Sort by Old
Working memory of old feature used for sorting decision
Fixation Duration997ms
747ms
Sort by old feature despite long fixation on brick
Significance of Sorting by Old
Use of memory rather than current sensory data, despite long fixation on brick, may result from attentional demands of brick placement. (“Inattentional blindness”: Mack & Rock, 1998)
Observers may not re-sample the image to update information because features are typically stable.Note: use of memory in contrast to “just-in-time” strategy.
Hypothesis: Another factor in attentional selectionand working memory use is subjects’ knowledge of properties of world.
Implication: we need internal models of the scene after all! Failure to detect change doesn’t imply absence of representation, merely the wrong one.
Significance of Sorting by Old (ctd)
Delay acquisition of task relevant information until necessary. (“Just-in-time” representations: Ballard et al., 1995)
Strategy to delay acquisition depended on predictability and memory load. Presumably such trade-offs intrinsic to natural behavior.
Sorting by new in unpredictable case presumably a consequence of greater probability of re-fixation after put-down cue.
Sort by New When Unpredictable
1.Fixation Sequence: Delay acquisition when unpredictable and greater
memory load.
2.Change Detection:Visual acquisition and storage of information is highly selective.
3. Sorting Performance:Information typically retained in memory and not updated.4. Duration of Fixations and Hand Movements
Change in fixation durations and hand movement duration following a detected change are consistent with re-allocation of attention to the change-detection task.
Consistent with recent evidence showing sensitivity of oculomotor responses to reward structure of the task.(Platt & Glimcher, 1998; Stuphorn & Schall, 2000; Hikosaka, 2000)
Sensitivity to reward provides a potential mechanism for mediating flexible task-specific modulation of attention. (cf Ballard & Sprague,2004)
Regularity of natural behavior makes controlledinvestigation possible.
The variety of behavioral measures, together withtask context allows stronger inferences.
Comments
Understanding fixation patterns in natural behaviorwill require an understanding of way tasks are learnt represented in the brain.
Neural data on role of reward a critical substrate for explaining task-directed eye movement patterns.
Reinforcement learning models of complex behaviornecessary to explain fixation patterns.
Cognitive Goal
Micro-task
Fixation
Acquire Info
Perform Brick Sorting Trial
Pick-up Brick
Fixate Brick
Depends on Task Relevance
ATTENTION
There is always a “task”.
Thank You
Do subjects modify the distribution of attention in effort to detect changes?
Brick Fixation Duration
Noticed Change
Trials After Change
Trials Before Change
Dif
fere
nc
e i
n
Mo
ve
me
nt
Du
rati
on
(s
ec
)
Dif
fere
nc
e i
n F
ixa
tio
n
Du
rati
on
o
n B
ric
k
(se
c)
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
-5 -4 -3 -2 -1 0 1 2 3 4 5
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
-5 -4 -3 -2 -1 0 1 2 3 4 5
Hand Movement Duration
Longer brick fixations and hand movement during trials following a noticed change.
Subjects may have reprioritized the change detection task.
(Predictable condition shown only)
One-Feature Two-Feature
Summary - In natural tasks, fixations are restricted to task-specificlocations and tightly linked in time to immediate tasks demands (often use just-in-time strategy).
- Fixation patterns reflect learning at several levels: task sequence, location of necessary information,optimal location for control of actions, models of dynamic properties of world.
- Selection of information within a fixation may be highly specific, and is determined by momentary task.
- Once selected, information is often retained. Suggests scene representations built up over multiplefixations, and updated depending on dynamic properties of world.
QuickTime™ and aMPEG-4 Video decompressor
are needed to see this picture.
Subject detects change to a new color
Good detection even when other blocks outside field
Small number of items in working memory, some of which are retained in long term memory.
Scene “gist”
Semantic knowledge.
Spatial structure to guide movement.
Visual Memory across saccades
Unnecessary to posit separate mechanism like attention to understand what computations the brain is doing. Should focus on what’s needed for the task.
Regularity of natural behavior makes controlledinvestigation possible.
The variety of behavioral measures, together withtask context allows stronger inferences.
There is always a “task”.
Comments
Image properties eg contrast, edges, chromatic saliency can account for some fixations when viewing images of scenes (Itti & Koch, 2001; Parkhurst & Neibur, 2003; Mannan et al, 1997).
Saliency Models
Problems with Saliency Models
Important information may not be salient eg Stop signs in a cluttered environment.
Salient information may not be important - eg retinal image transients from eye/body movements.
Can’t account for many observed fixations, especially in natural behavior.