attention in computer vision

Post on 08-Jan-2016

53 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Attention in Computer Vision. Mica Arie-Nachimson and Michal Kiwkowitz May 22, 2005 Advanced Topics in Computer Vision Weizmann Institute of Science. Problem definition – Search Order. Vision applications apply “expensive” algorithms (e.g. recognition) to image patches - PowerPoint PPT Presentation

TRANSCRIPT

Attention in Computer Vision

Mica Arie-Nachimson and Michal Kiwkowitz

May 22, 2005Advanced Topics in Computer Vision

Weizmann Institute of Science

Problem definition – Search Order

Object recognition

NO

• Vision applications apply “expensive” algorithms (e.g. recognition) to image patches

• Mostly naïve selection of patches• Selection of patches determines number of calls to

“expensive” algorithm

Problem Definition - Search Order

Object recognition

NOYES

• More sophisticated selection of patches would imply less calls to “expensive” algorithm

• Attention used to efficiently focus on incoming data (better use for limited processing capacity)

Problem Definition - Search Order

Object recognition

12345

6

Outline• What is Attention• Attention in Object Recognition

• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison

• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms

• FLNN• VSLE

Outline• What is Attention• Attention in Object Recognition

• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison

• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms

• FLNN• VSLE

Attention

• Attention implies allocating resources, perceptual or cognitive, to some things at the expense of not allocating them to something else.

What is Attention

• You are sitting in class listening to a lecture.

• Two people behind you are talking. – Can you hear the lecture?

• One of them mentions the name of a friend of yours. – How did you know?

Attention in Other Applications

• Face Detection (feature selection)

• Video Analysis (temporal block selection)

• Robot Navigation (select locations)

• …

Attention is Directed by:

Bottom-up: • From small to large units of meaning • Rapid • Task-independent

Attention is Directed by:

Top-down:• Use higher levels (context, expectation)

to process incoming information (Guess)• Slower• Task dependent

http://www.rybak-et-al.net/nisms.html

Outline• What is Attention• Attention in Object Recognition

• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison

• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms

• FLNN• VSLE

When is information selected (filtered)? – Early selection (Broadbent, 1958)– Cocktail party phenomenon (Moray, 1959)– Late selection (Treisman, 1960) - attenuation

• All information is sent to perceptual systems for processing

• Some is selected for complete processing• Some is more likely to be selected

Attention

WHICH?

Parallel SearchIs there a green O ?

+

A. Treisman, G. Gelade, 1980

Conjunction Search

Is there a green N ?

+

A. Treisman, G. Gelade, 1980

Results

A. Treisman, G. Gelade, 1980

Conjunction Search

+

A. Treisman, G. Gelade, 1980

Color map Orientation map

A. Treisman, G. Gelade, 1980

Color map Orientation map

A. Treisman, G. Gelade, 1980

Conjunction Search

+

A. Treisman, G. Gelade, 1980

Primitives

PP PP

PP

Intensity

PP P

PPP

Orientation

PP PP

PP

Color

xx

x

xs

x

Curvature

II

I

IILine End

Movement

x x x

xx

x

Feature Integration Theory

Attention - two stages:

Attention•Serial Processing•Localized Focus•Slower•Conjunctive search

Pre-attention•Parallel Processing•Low Level Features•Fast•Parallel Search

How is the Focus

found & shifted?

A. Treisman, G. Gelade, 1980

Outline• What is Attention• Attention in Object Recognition

• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison

• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms

• FLNN• VSLE

Shifts in Attention

“Shifts in selective visual attention: towards the underlying neural circuitry”,

Christof Koch, and Shimon Ullman, 1985

C. Koch, and S. Ullman, 1985

Feature Maps

•Orientation•Color•Curvature•Line end•Movement

Feature Maps

•Orientation•Color•Curvature•Line end•Movement

Feature Maps

•Orientation•Color•Curvature•Line end•Movement

Feature Maps

•Orientation•Color•Curvature•Line end•Movement

Feature Maps

•Orientation•Color•Curvature•Line end•Movement

Central RepresentationAttention

SaliencySaliency

Saliency

“A model of saliency-based visual attention for rapid scene analysis”

Laurent Itti, Christof Koch, and Ernst Niebur, 1998

L. Itti, C. Koch, and E. Niebur, 1998

• Salient - stands out

• Example – telephone & road sign have high saliency

from C. Koch L. Itti, C. Koch, and E. Niebur, 1998

Intensity

L. Itti, C. Koch, and E. Niebur, 1998

Cells in the retina

01

2

Intensity

Create 8 spatial scale using Gaussian pyramids

8

L. Itti, C. Koch, and E. Niebur, 1998

IntensityCenter-Surround difference operator- Sensitive to local spatial

discontinuities- Principle computation in the retina &

primary visual cortex- Subtract coarse scale from fine

scale

+

-

Fine scale

Coarse scale

L. Itti, C. Koch, and E. Niebur, 1998

+

-

fine

coarse

Toy Example

0 0 0

0 0 0

0 0 0

0 0 0

0 255 0

0 0 0

Fine level Coarse level

Gauss Pyramid Interpolation

Coarse level

Point-by-point subtraction

0 0 0

0 255 0

0 0 0

Toy Example

255 255 255

255 255 255

255 255 255

255 255 255

255 255 255

255 255 255

Fine level Coarse level

Gauss Pyramid Interpolation

Coarse level

Point-by-point subtraction

0 0 0

0 0 0

0 0 0

Intensity

4,3,2c 4,3 ccs

)()(),( sIcIscI

)5()2()5,2( III

Compute:

6 Intensity maps

)6()2()6,2( III

Different ratios – multiscale feature extraction

)6()3()6,3( III

L. Itti, C. Koch, and E. Niebur, 1998

Color

Same c and s as with intensity12 Color maps

Kandel et al. (2000). Principles of Neural Science. McGraw-Hill/Appleton & Lange

L. Itti, C. Koch, and E. Niebur, 1998More

Orientation

Same c and s as with intensity24 Orientation maps

}135,90,45,0{

|),(),(|),,( sOcOscO

From Visual system presentation by S. Ullman

L. Itti, C. Koch, and E. Niebur, 1998More

from C. Koch L. Itti, C. Koch, and E. Niebur, 1998

More

Normalization Operator

L. Itti, C. Koch, and E. Niebur, 1998

Saliency Map

3

)()()( ONCNINS

L. Itti, C. Koch, and E. Niebur, 1998

1. Extract Feature Maps

Algorithm- up to now

2. Compute Center-Surround (42)

• Intensity – I (6)

• Color – C (12)

• Orientation – O (24)

3. Combine each channel into conspicuity map

4. Compute saliency by summing and normalizing maps

Laurent Itti, Christof Koch, and Ernst Niebur, 1998

Leaky integrate-and-fire neurons“Inhibition of return”

Winner Takes All

Selection (FOA)

L. Itti, C. Koch, and E. Niebur, 1998

FOA – Focus Of Attention

Results

• FOA shifts: 30-70 ms• Inhibition: 500-900 ms

Inhibition of return ends

L. Itti, C. Koch, and E. Niebur, 1998

Results

Spatial Frequency Content, Reinage & Zador, 1997

Image

SFC

Saliency

Output

L. Itti, C. Koch, and E. Niebur, 1998

Results

(a) (b)

(c) (d)

Image

SFC

Saliency

Output

L. Itti, C. Koch, and E. Niebur, 1998Spatial Frequency Content, Reinage & Zador, 1997

Outline• What is Attention• Attention in Object Recognition

• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison

• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms

• FLNN• VSLE

Attention & Object Recognition

• “Is bottom-up attention useful for object recognition?”– U. Rutishauser, D. Walther, C. Koch and P. Perona,

2004

U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004

Computer recognition

Human recognition

segmented Cluttered scenes

labeled Non labeled

Attention

Object Recognition

saliency model

U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004

Growing region in strongest map

To Object Recognition

(Lowe)

More

Attention & Object Recognition

Learning inventories – “grocery cart problem”

U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004

Real world scenes1 image for training (15 fixations)

2-5 images for testing (20 fixations)

testing

training Object recognitionMatch

“Grocery Cart” Problem

U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004

training testing1

testing2

“Grocery Cart” Problem

Downsides:

• Bias of human photography

• Small image set

U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004

Solution• Robot as acquisition tool

Robot - Landmark Learning

Objective – how many objects are found and classified correctly?

Navigation – simple obstacle avoiding algorithm using infrared sensors

U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004

Object recognition

< 3 key points

Landmark Learning

With

Attention

U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004

Landmark Learning

With Random Selection

U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004

Landmark Learning - Results

U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004

Saliency Based Object Recognition

• Biologically motivated• Uses bottom-up, allows

combining top-down information

• Segmentation– Cluttered scenes– Unlabeled objects– Multiple objects in single image

• Static priority map

U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004

Outline• What is Attention• Attention in Object Recognition

• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison

• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms

• FLNN• VSLE

Comparison

“Comparing attention operators for learning landmarks”, R. Sim, S. Polifroni, G. Dudek , June 2003

Other attention operators for low level features

R. Sim, S. Polifroni, G. Dudek , June 2003

Comparison

R. Sim, S. Polifroni, G. Dudek , June 2003

Edge density Radial symmetry

Smallest eigenvalue Caltech saliency

Comparison

• Landmark learning

• Training – learn landmarks knowing camera pose

• Testing - determine pose of camera according to landmarks (pose estimation)

R. Sim, S. Polifroni, G. Dudek , June 2003

Comparison - Results

• All operators better than random

• Radial symmetry worst results

• Caltech operator performs similar to edge and eigenvalue operators

• BUT – More complex to implement – More computing time

• Less preferred candidate in practice

R. Sim, S. Polifroni, G. Dudek , June 2003

Outline• What is Attention• Attention in Object Recognition

• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison

• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms

• FLNN• VSLE

The Problem

Object recognition

12345

6

Outline• What is Attention• Attention in Object Recognition

• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison

• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms

• FLNN• VSLE

Biological Motivation

• An alternative approach: continuous search difficulty

• Based on similarity:– Between Targets and Non-Targets in the scene– Between Non-Targets and Non-Targets in the scene

• Similar structural units do not need separate treatment

• Structural units similar to a possible target get high priority

Duncan & Humphreys [89]

Biological Motivation

similar

similar

not similar

not similar

search difficulty

target- nontarget similarity

nontarget- nontarget similarity

Duncan & Humphreys [89]

Biological Motivation

• Explains pop-out vs. serial search phenomenon

Non-targets:

Target:

Duncan & Humphreys [89]

Biological Motivation

• Explains pop-out vs. serial search phenomenon

Non-targets:

Target:

Duncan & Humphreys [89]

similar

similar

not similar

not similar

search difficulty

Biological Motivation

• Explains pop-out vs. serial search phenomenon Non-targets:

Target:

Non-targets:

Target:

target- nontarget similarity

nontarget- nontarget similarity

Duncan & Humphreys [89]

Using Inner-scene Similarities

• Every candidate is characterized by a vector of n attributes

• n-dimentional metric space– A candidate is a point in the space– Some distance function d is associated with

the space

Avraham & Lindenbaum [04] Avraham & Lindenbaum [05]

Using Inner-scene Similarities Example

• One feature only: object area

• d: regular Euclidean distance Feature space

Outline• What is Attention• Attention in Object Recognition

• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison

• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms

• FLNN• VSLE

Difficulty of Search

• The difficulty measure is the number of queries until the first target is found

• Two main factors– Distance between Targets and Non-Targets– Distance between Non-Targets and Non-

Targets

Feature space

CoverDifficulty of Search

Feature space

c: the number of circles in the cover

Difficulty of Search

c will be our measure of the search difficulty

We need some constraint on the

circles’ size!

c: the number of circles

dt: max-min target distanceDifficulty of Search

dt

dt-cover

diamete

r

d t

Difficulty of Searchdt

Minimum dt-cover

c: The number of circles in the minimal dt-cover

diamete

r

d t

Difficulty of Searchdt

c: the number of circlesDifficulty of Search

dt

c = 7

dt

dt

c: insects exampleDifficulty of Search

dt

Feature spacec = 3

Example: easy searchDifficulty of Search

dt

c = 2

Example: hard searchDifficulty of Search

c = # of candidates

dt

Define the Difficulty using c

• Lower bound: Every search algorithm needs c calls to the oracle before finding the first target in the worst case

• Upper bound: There is an algorithm that will need max. c calls to the oracle to find the first target, for all search tasks

Difficulty of Search

Lower bound

Every search algorithm needs c calls to the oracle before finding the first target in the worst case

Difficulty of Search

1

2

3

4

5dt

dt

dt

dt

Upper bound

There is an algorithm that will need max. c calls to the oracle to find the first target, for all search tasks

FLNN-Farthest Labeled Nearest Neighbor

Difficulty of Search

Outline• What is Attention• Attention in Object Recognition

• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison

• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms

• FLNN• VSLE

FLNNFarthest Labeled Nearest Neighbor

Efficient Algorithms

1

2

3

4

5

c is a tight bound!

How do we compute c?Difficulty of Search

dt

– Need to know dt

– Compute the minimal dt-cover

– Count number of circles c=7

dt

– Need to know dt

– Compute the minimal dt-cover

– Count number of circles = c

To know the exact dt we need to know all the targets and non-targets, but that’s what we’re looking for…

Computing the minimal dt-cover is NP-complete!

Ok, that’s easy…

Difficulty of Search

dt

How do we compute c?

Upper & Lower Bounds on c

• Upper bounds:– The number of candidates

– Know that dt is larger than some d0:• Can approximate cover size

• Lower bounds:– FLNN worst case

– Know that dt is larger than some d0:• Can approximate cover size

Difficulty of Search

Outline• What is Attention• Attention in Object Recognition

• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison

• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms

• FLNN• VSLE

Improving FLNN

• What’s wrong with FLNN?– Relates only to the nearest known neighbor– Finds only the first target efficiently– Cannot be easily extended to include top-

down information

Efficient Algorithms

VSLEVisual Search using Linear Estimation

• Each candidate has a prob. to be a target• Query the candidate with the highest probability• Update other candidates’ prob. according to the

known results– Every known target/non-target affects other

candidates in reverse order to its distance.

If we know results for candidates 1,…,m:

• Dynamic priority map

Efficient Algorithms

Efficient Algorithms

0.650.4

0.45

0.6 0.5

0.54

0.450.51

0.530.46

0.58

0.51

0.1

0.4

0.450.5

0.560.48

0.5

0.56

0.63

0.70.68

VSLEVisual Search using Linear Estimation

Efficient Algorithms

0.15

0.45

0.6 0.60.63

0.450.65

0.20.25

0.53

0.23

0.55

0.1 0.620.15

0.59

0.210.27

0.65

VSLEVisual Search using Linear Estimation

0.06

0.45

0.12 0.550.18

0.95

0.220.28

More

Combining Top-Down Information

• Simply specify the initial probabilities to match previous known data

• Add known target objects to the space. This will alter the probabilities accordingly and speed up search

Efficient Algorithms

Experiment 1: COIL-100Efficient Algorithms

Columbia Object Image Library [96]

Experiment 1: COIL-100

• Features:– 1st, 2nd, 3rd gaussian derivatives 9 basis

filters– 5 scales 9x5 = 45 features

• Euclidean distance

Efficient Algorithms

Rao & Ballard [95]

Experiment 1: COIL-100Efficient Algorithms

10 cars10 cups

# queries# queries

Experiment 2: hand segmentedEfficient Algorithms

• Every large segment is a candidate• 24 candidates• 4 targets

Berkeley hand segmented DB

Martin, Fowlkes, Tal & Malik [01]

Experiment 2: hand segmented

• Features: color histograms and

separated into 8 bins each 64 features

• Euclidean distance

Efficient Algorithms

Experiment 3: automatic color segmentation

• Automatic color segmented image for face detection

Efficient Algorithms

Experiment 3: color segmentation

• 146 candidates

• 4 features: segment size, mean value of red, green and blue

• Euclidean distance

Efficient Algorithms

# queries

Combining top-down information

• Add known targets to the space

Efficient Algorithms

Without additional targets With additional targets

# queries# queries

Summary: similarity modelSaliency model• Biologically motivated• Uses bottom-up, allows

combining top-down information

• Segmentation• Static priority map

Similarity model• Biologically motivated• Uses bottom-up, allows

combining top-down information

• No segmentation• Dynamic priority map• Measures the search

difficulty

Summary

• What is attention

• Aid object recognition tasks by choosing the area of interest

• Two approaches: saliency model and similarity model– Biological motivation– Algorithms

Thank You!

Linearly Estimating l(xk)

A linear estimation for l(xk):

Which, of course, minimizes the error

Solving a set of equations gives an estimation:

Linearly Estimating l(xk)

Estimation:

Where vector of known labels,

and is computed as follows (i,j=1,…,m):

R and r depend only on the distances, computed in

advance once

top related