human vision: what we know that ain’t so (and what to do about it) ronald a. rensink departments...

87
Human Vision: What We Know That Ain’t So (and what to do about it) Ronald A. Rensink Departments of Computer Science and Psychology University of British Columbia Vancouver, Canada

Post on 20-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Human Vision:

What We Know That Ain’t So

(and what to do about it)

Ronald A. Rensink

Departments of Computer Science and Psychology

University of British Columbia

Vancouver, Canada

2CPSC 444, 15 Apr 10

It isn't what we don't know that gives us trouble,it's what we know that ain't so…

—Will Rogers

3CPSC 444, 15 Apr 10

To design effective interfaces, we need to know how human perception operates

Belief: If we pay enough attention, we see everything around us

What do we know about how we see?

4CPSC 444, 15 Apr 10

Count the number of times the ball is passed between the people in the white shirts…

QuickTime™ and aCinepak decompressor

are needed to see this picture.

Test (Simons & Chabris, 2000):

5CPSC 444, 15 Apr 10

46% of observers did not see the person in the gorilla suit

Similar effects have been found in many situations in everyday life (“looked but failed to see”)

6CPSC 444, 15 Apr 10

Overview

Visual attention lets us see everything around us

1. Preattentive Systems (= before attention operates)• act quickly, but don’t do much

2. Attentional Systems (= conscious perception)• build up a complete description of the scene

3. Nonattentional Systems (= unconscious perception)• really don’t do much

7CPSC 444, 15 Apr 10

1. Preattentive Systems (= before attention operates)• act quickly, and are quite intelligent

2. Attentional Systems (= conscious perception)• dynamic, “just in time” representation of scenes

3. Nonattentional Systems (= unconscious perception)• might play very important roles in how we see

We can take advantage of these to create new, more effective kinds of visual displays…

Overview

8CPSC 444, 15 Apr 10

1. Preattentive Systems

Initial stage of visual processing

Light rays from environment into pixels

9CPSC 444, 15 Apr 10

Color receptorsColorChannels

Blue Green Red

10CPSC 444, 15 Apr 10

RG

B RG

B

RG

B

11CPSC 444, 15 Apr 10

Second stage of visual processing

12CPSC 444, 15 Apr 10

Orientation receptors

OrientationChannels

0° (vertical)

Similar channels for 30° 60° 90° (horizontal)

13CPSC 444, 15 Apr 10

0°45°

90°135°

0°45°

90°135°

0°45°

90°135°

14CPSC 444, 15 Apr 10

Preattentive Visual Properties

Automatic — no attention needed- no conscious effort needed

Computed rapidly - about 1/5 of a second

Computed in parallel over entire visual field- at each point, a limited extent in space (c. 2°)

Measurements (templates) such as• color• spatial frequency / width• orientation

15CPSC 444, 15 Apr 10

Why should we care?

Provides the scientific basis for the design of rapidly-perceived displays.

“An understanding of what is processed preattentively is probably the most important contribution that vision science can make to data visualization”

(Ware, 2000)

16CPSC 444, 15 Apr 10

Find a blue dot in a display

Unique color is easy to spot when few other items around

17CPSC 444, 15 Apr 10

Also easy to spot when many other items around

18CPSC 444, 15 Apr 10

For color, a unique value is always easy to notice

Reaction time does not depend on number of items “pop-out”

Reactiontime (ms)

Number of items (set size)

300

400

500

600

19CPSC 444, 15 Apr 10

Similarly with orientation…

Unique orientation is easy to spot when few items around

20CPSC 444, 15 Apr 10

Also easy to spot when many other items around

21CPSC 444, 15 Apr 10

For orientation, a unique value is always easy to notice

A unique visual primitive will “pop out” of a display

-draws attention to itself

Reactiontime (ms)

Number of items (set size)

300

400

500

600

22CPSC 444, 15 Apr 10

Unique L-shaped item is not always easy to spot

Find the L-shaped figure

But an item does not necessarily pop out if it is unique

23CPSC 444, 15 Apr 10

Gets more difficult as more items are in display

24CPSC 444, 15 Apr 10

For L-shapes, reaction time depends on number of items

Reaction time depends linearly on number of items

Reactiontime (ms)

Number of items (set size)

300

400

500

600

search slope(30-100 ms/item)

25CPSC 444, 15 Apr 10

Interpretation (Treisman & Gormican, 1988):

Rapid search:- Visual primitives calculated in absence of attention- Unique primitive draws attention

Slow search:- Combination of pre-attentive features requires a spotlight of attention, which “welds” visual primitives at a rate of c. 30 ms/item

26CPSC 444, 15 Apr 10

Texture Segmentation

Segmentation via differences in color:

Distinct segments each have different primitives

27CPSC 444, 15 Apr 10

Segmentation via differences in orientation

28CPSC 444, 15 Apr 10

No segmentation if primitives don’t differ

No differences in features.

(Only in their arrangement)

29CPSC 444, 15 Apr 10

Belief:

To achieve its great speed, preattentive visionsacrifices the complexity of the propertiesit determines.

What is Processed Preattentively?

Conventional View

Simple properties (e.g., color, 2D orientation) are preattentive

Complex properties (e.g., 3D orientation) aren’t- these require attention (need to arrange)

30CPSC 444, 15 Apr 10

But…

31CPSC 444, 15 Apr 10

Scene-based properties

Image-based (2D) properties are the samein both target and distractor- only differ in how the pixels are arranged

1. Convexity (Kleffner & Ramachandran, 1992)

If these are the only properties computedpreattentively, search should be slow

Target Distractor

32CPSC 444, 15 Apr 10

33CPSC 444, 15 Apr 10

Search is fast -> unique preattentive feature

Feature: surface convexity (scene-based property) recovered rapidly at early levels

34CPSC 444, 15 Apr 10

Target Distractor

Image-based (2D) properties are the samein both target and distractor

2. 3D Orientation (Enns & Rensink 1990, 1991)

If these are the only properties computed at early levels, search should be slow

35CPSC 444, 15 Apr 10

36CPSC 444, 15 Apr 10

Reactiontime (ms)

Set size

Targ Distr

slope = 6 ms/ item

Rapid search for cubes

Reactiontime (ms)

Set size

Targ Distr

slope = 35 ms/ item

Slow search for hexagons

Three-dimensional orientationcan be recovered rapidly

37CPSC 444, 15 Apr 10

Target and distractor have shadows withdifferent orientations…

If 2D (or 3D) orientation is accessible,search should always be rapid,no matter what.

Target Distractor

3. Identification of Shadows (Rensink & Cavanagh, 1993)

38CPSC 444, 15 Apr 10

39CPSC 444, 15 Apr 10

40CPSC 444, 15 Apr 10

For right-side up, search is slow

For upside-down, search is rapid

Reactiontime (ms)

Set size

Targ Distr

slope = 27 ms/ item

Reactiontime (ms)

Set size

Targ Distr

slope = 9 ms/ item

Shadows affect visual search- 2D orientation info available only for U/D

41CPSC 444, 15 Apr 10

29 10

11 9

10

27 9

Rightside-up

10

Target Distractor

7 4

Upside-down

42CPSC 444, 15 Apr 10

Target and distractor have different pieces.

4. Completion of Occlusions (Rensink & Enns, 1998)

If these are the relevant properties at early levels,search should always be rapid,no matter how they are arranged.

Target Distractor

43CPSC 444, 15 Apr 10

TargLooking for

Easy to detect — Curvature in “bite”

44CPSC 444, 15 Apr 10

Looking for TargTarg

Slower to detect—Bite has been “compensated for”- occluded structure has been rapidly completed

45CPSC 444, 15 Apr 10

Search is rapid - based on distinctive parts

Search is based on completed fragments only”pre-empts” the unique primitives

Search is slow - cannot access visible pieces

Reactiontime (ms)

Set size

Targ Distr

slope = 8 ms/ item

Reactiontime (ms)

Set size

Targ Distr

slope = 66 ms/ item

46CPSC 444, 15 Apr 10

Summary

Complex properties (e.g., 3D orientation)can be determined rapidly without attention

- these are calculated via simple rules that work most of the time (e.g., lighting from above)

Attention acts on proto-objects (Rensink & Enns, 1998): - “quick and dirty” estimates of scene properties (e.g., surface slant, true surface color)

47CPSC 444, 15 Apr 10

1. Displays need not be limited to simple 2D properties (e.g., color and orientation) to convey information.

More complex properties based on scene structure (e.g., 3D orientation, shadows) can also be used.

- will only work under the appropriate conditions - likely to be as effective as “old” dimensions

Implications for Display Design

2. “Primitive” properties won’t always work as expected.- will be unavailable if pre-empted by proto-objects

48CPSC 444, 15 Apr 10

What is the role of visual attention?

2. Attentional Systems

Conventional view:

Attention “welds” visual primitives into more complex structures (objects).

The accumulation of these structures isthe basis for visual perception (scenes)

Is this really true?

?

49CPSC 444, 15 Apr 10

Change Blindness

QuickTime™ and aQuickDraw decompressor

are needed to see this picture.

50CPSC 444, 15 Apr 10

QuickTime™ and aQuickDraw decompressor

are needed to see this picture.

51CPSC 444, 15 Apr 10

QuickTime™ and aQuickDraw decompressor

are needed to see this picture.

52CPSC 444, 15 Apr 10

Change blindness: the failure to see changes in an image, even when these are large, and repeatedly made

• image flicker

• eye movements (saccades)

• eyeblinks

• "splats" not on change

• movie cuts

• real-world interruptions

• gradual change

Proposal: All of these have the same cause

Can be induced by:

53CPSC 444, 15 Apr 10

Proposal: Attention is needed to perceive change in an object.

When change is made same time as another event, transients interfere with drawing of attention,causing change to become “invisible”.

Under normal circumstances, a change creates a motion transient, which draws attention.

54CPSC 444, 15 Apr 10

Coherence theory:

1. Attention latches onto proto-objects- forms an object complex coherent in space and time- supports perception of change

2. Object complex exists as long as item is attended

3. Once attention is released, object “dissolves” back into proto-objects

- no buildup of coherent information after attention is withdrawn

- some memory of scene remains, but not of object complex

55CPSC 444, 15 Apr 10

Focused attention

Proto- objects

Focused attention

Proto- objects

Focused attention

Proto- objects

56CPSC 444, 15 Apr 10

How Much Can We Attend?

Use images that change back and forth in time,like the scene examples

Approach: Visual Search for Change

But images that are much simpler in content - can control the number of items, the type of change, etc.

57CPSC 444, 15 Apr 10

- on half the trials, one of the items changes (target) - observer must report if change present or absent

display 2

Focused attentionholds onto item,allowing changeto be seen

display 1

blank

blank

Displaysalternateuntil observerresponds

Rensink (2000)

58CPSC 444, 15 Apr 10

Results

Attention has a capacity of 3-4 items

- similar to other estimates of attentional capacity

- demonstrates that visual detail is not built up- otherwise, capacity estimate would be unlimited

59CPSC 444, 15 Apr 10

But...

If we can’t attend to much (3-4 items), we have a coherent representation of only a few objects at any given time (= objects that are attended)

If only a few objects are represented at a time, why do we feel we see all objects at once?

60CPSC 444, 15 Apr 10

• Although objects appear to be present simultaneously, do not all need to be represented simultaneously

• All that is needed is that the properties of the objects can be accessed when requested.

Observation:

Proposal: Virtual Representation (Rensink, 2000a)

61CPSC 444, 15 Apr 10

Real station:

1-2 sites at a time

⇒ Ifdataalreadypresent,use it.⇒ Else locate appropriate machine,andloadinthedata.

web.mit .edu/ bcs

cvs. rochester.edu

yorku.ca

vision.arc .nasa.gov

mpik-tueb.mpg.de

Millions of web sites,eachwith lotsofdata

hyperion.com

World Wide Web

psy. jhu.edu

Websurfer" sees" : 1 ) cvs.rochester.edu 2 ) vision.arc.nasa.gov 3 ) yorku.ca 4 )…

Virtual station: Millions of sites

62CPSC 444, 15 Apr 10

To build a coherent representation of an object:

Focus eyes and attention on appropriate location whenever that object is needed

Representation “dissolves” once it is no longer needed

Can this work for the visual system? Yes!!

Can always obtain information from the world

Use the world itself as an external memory(Stroud, 1955; Brooks, 1991)

63CPSC 444, 15 Apr 10

Real station:

1-2 sites at a time

⇒ Ifdataalreadypresent,use it.⇒ Else locate appropriate machine,andloadinthedata.

web.mit .edu/ bcs

cvs. rochester.edu

yorku.ca

vision.arc .nasa.gov

mpik-tueb.mpg.de

Millions of web sites,eachwith lotsofdata

hyperion.com

World Wide Web

psy. jhu.edu

Websurfer" sees" : 1 ) cvs.rochester.edu 2 ) vision.arc.nasa.gov 3 ) yorku.ca 4 )…

Virtual station: Millions of sites

Real representation:

1-2 objects at a time

⇒ If object alreadyattended, use it.⇒ Else locate appropriate proto- object, and make it coherent.

left screen

podium

stage

right screen

speaker (R.R.)

Millions of objects,eachwith lotsofdata

World

noisy person

ceiling

Visual system " sees" : 1 ) speaker (R.R.) 2 ) left screen 3 ) right screen 4 )…

Virtual representation: Millions of objects

64CPSC 444, 15 Apr 10

Note:

Using the world as an external memory means that perception is not carried out in isolation

- perceiver and environment form a partnership

- perception is inherently dynamic and interactive

65CPSC 444, 15 Apr 10

Need something compatible with what is known about human vision…

How might a virtual representation be implemented?

66CPSC 444, 15 Apr 10

Nonattentional extraction of aspects of scene (I):

1. Gist: abstract meaning of scene (farm, harbor, etc.)- obtained within 150 ms (Biederman, 1981)- obtained without attention (Oliva & Schyns, 1997)

Proposal : Triadic Architecture (Rensink, 2000a)

Possibly derived via statistics of low-level structures(e.g. Swain & Ballard, 1991)

67CPSC 444, 15 Apr 10

Nonattentional extraction of aspects of scene (II):

2. Layout: arrangement of items in the scene.

- nonvolatile - held without attention (Simons, 1996)- can be learned without attention (Chun & Jiang, 1998)

68CPSC 444, 15 Apr 10

Complex

Gist

Elements

Layout

Early (nonattentional)

Setting (nonattentional)

HIGHER LEVELS

Object (attentional)

Complex

Gist

Elements

Layout

Early (nonattentional)

Setting (nonattentional)

HIGHER LEVELS

Object (attentional)

High level control

Control of attention—and thus, what is seen—can be guided by high-level knowledge (long-term memory) and mental set (task)

69CPSC 444, 15 Apr 10

Complex

Gist

Elements

Layout

Early (nonattentional)

Setting (nonattentional)

HIGHER LEVELS

Object (attentional)

Complex

Gist

Elements

Layout

Early (nonattentional)

Setting (nonattentional)

HIGHER LEVELS

Low level control

Object (attentional)

Control of attention —and thus, what is seen—also guided by the contents of low-level visual systems. Key factor is salience.

70CPSC 444, 15 Apr 10

Our impression that many coherent objects are representedsimultaneously is an illusion.

– scenes are represented via a virtual representation– dynamic, “just in time” system

Attention is not necessarily a perceptual “gateway”– it’s only the stream specialized for coherent objects– other (nonattentional) streams help guide it

Summary

71CPSC 444, 15 Apr 10

1. Pickup of Information

Implications for Display Design

Only a limited amount of information can be attended (= available to conscious perception) at any time

- 3-4 items- only a small amount of info from each item

No attentional distraction from other parts of display- could induce blindness

Only one dynamic information source at a time- can’t separate two simultaneous changes

72CPSC 444, 15 Apr 10

What is attended (= what is seen) depends on the viewer & the task

- different people can literally see the same world differently

Can use flicker paradigm to measure which parts and properties of objects are attended first. (= most easily seen to change)

2. Perception of Visual Displays

73CPSC 444, 15 Apr 10

Activities such as navigating through information space can be highly natural, if designed correctly

Mechanisms already exist to couple ourperceptual systems to environment. - “natural-born cyborgs” (Clark, 2003)

> Can use these mechanisms as the basis of effective human-machine interactions

3. Interaction with Visual Displays

74CPSC 444, 15 Apr 10

4. Coercive Graphics

Display could coerce user to attend to a given location- effectively “takes over” virtual representation can make viewer see (or not see) given items

This is how magicians control what an audience “sees”

Creation of “magical” displays?- transitions at other locations made “invisible”? - perception of events that could not occur in the real world.

- “unreal” (dis)appearances- “unreal” change to objects, regions

75CPSC 444, 15 Apr 10

3. Nonattentional Systems

Gist

Incoming Light

1. Early Vision(Proto-objects)

2. Attentional Stream

3. Nonattentional Streams

Higher Levels

Layout?

Triadic architecture implies an important

role for nonattentional streams in vision

These streams are not concerned with

explicit (= conscious) perception

Rather, they involve implicit (= unconscious) perception

76CPSC 444, 15 Apr 10

3.1 Some early results…

Bridgeman et al. (1975) — eye movements• target moves while observer moves eyes to it

• eye makes corrective movement, even though observers have no conscious perception of change

Goodale et al. (1986) — manual pointing• target moves while observer saccades to it

• hand corrects its trajectory while reaching to target, even though observers have no conscious perception of

change

77CPSC 444, 15 Apr 10

Two Visual Systems Milner & Goodale (1995)

Eye

"What" system

- requires attention- relatively slow (c. 300 ms)- conscious "picture" of world- basis for rational decisions

"How" system- may not require attention- quite fast- visuomotor control- emotions...

Distinct visual systems- supported by two separate neural pathways

The “how” system is effectively an “inner zombie”

78CPSC 444, 15 Apr 10

79CPSC 444, 15 Apr 10

Fernandez-Duque & Thornton (2000)

- observers view 2-display sequence; each display is a simple array of rectangles

- observers tested on two items: - the item changed, and the item diagonally across from it

If observer did not notice change, asked to guess which item changed.

time

?

3.2 Nonconscious Detection of Change

80CPSC 444, 15 Apr 10

Results

Observers could guess better than chance (55-65%)even though change was not consciously noticed(!)

– attention not involved

81CPSC 444, 15 Apr 10

Reports by some observers that they “knew” a change was occurring before they saw (= visually experienced) it.

- not: a “picture” that could be seen

- but: a “gut feeling” of a change happening

3.3 Mindsight

82CPSC 444, 15 Apr 10

.

t1: respond when change is felt

t2: respond when change is seen

Experiment (Rensink,2004)

Observers view flicker sequence (natural images)• asked to hit button (t1) when change was felt• then hit button (t2) when change was seen

83CPSC 444, 15 Apr 10

Results

1/2 of observers had no feeling of change without a visual experience of it

1/3 of observers could feel a change before seeing it, at least on some trials

• average duration = 3.7 seconds• not a result of guessing.

• good performance on absent trials• accuracy = 82%; d’ ≥ 2.1

84CPSC 444, 15 Apr 10

• positive — not due to a problem with attention• “seeing” responses equally fast for both groups

• informative — not a result of guessing• can distinguish change-present from change-absent; d’ ≥ 2.1

For these observers, this effect was…

• sophisticated — not due to simple motion signals• inserting a bright yellow flash in every other blank had no effect on

most measures of sensing• did disrupt seeing

85CPSC 444, 15 Apr 10

Displays might influence other aspects of user’s experience besides conscious “image” of its contents

could guide user actions (e.g. control of mouse)

Implications for Display Design

1. Visuomotor Actions

might induce user to automatically “do the right thing” (no conscious noticing of this)

- displays designed for “zombie system”: easy to use for a given motor activity

86CPSC 444, 15 Apr 10

- feeling that something is occurring, without an accompanying visual experience

2. Displays for “Sixth Sense” Experience

use as a form of “soft warning” (?)

87CPSC 444, 15 Apr 10

Summary

• dynamic, “just in time” representation of scenes• just one of several concurrent systems

• may play several important roles in how we see, and in how we interact with the world

1. Preattentive Systems (= before attention operates)• act quickly, but don’t do much

2. Attentional Systems (= conscious perception)• build up a complete description of the scene• central - everything perceived must engage it

3. Nonattentional Systems (= unconscious perception)• don’t really do much

• act quickly, and can be quite intelligent