attention in computer vision

Attention in Computer Vision

Mica Arie-Nachimson and Michal Kiwkowitz

May 22, 2005Advanced Topics in Computer Vision

Weizmann Institute of Science

Problem definition – Search Order

Object recognition

• Vision applications apply “expensive” algorithms (e.g. recognition) to image patches

• Mostly naïve selection of patches• Selection of patches determines number of calls to

“expensive” algorithm

Problem Definition - Search Order

Object recognition

• More sophisticated selection of patches would imply less calls to “expensive” algorithm

• Attention used to efficiently focus on incoming data (better use for limited processing capacity)

Problem Definition - Search Order

Object recognition

Outline• What is Attention• Attention in Object Recognition

• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison

• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms

• FLNN• VSLE

Attention

• Attention implies allocating resources, perceptual or cognitive, to some things at the expense of not allocating them to something else.

What is Attention

• You are sitting in class listening to a lecture.

• Two people behind you are talking. – Can you hear the lecture?

• One of them mentions the name of a friend of yours. – How did you know?

Attention in Other Applications

• Face Detection (feature selection)

• Video Analysis (temporal block selection)

• Robot Navigation (select locations)

• …

Attention is Directed by:

Bottom-up: • From small to large units of meaning • Rapid • Task-independent

Attention is Directed by:

Top-down:• Use higher levels (context, expectation)

to process incoming information (Guess)• Slower• Task dependent

http://www.rybak-et-al.net/nisms.html

• FLNN• VSLE

When is information selected (filtered)? – Early selection (Broadbent, 1958)– Cocktail party phenomenon (Moray, 1959)– Late selection (Treisman, 1960) - attenuation

• All information is sent to perceptual systems for processing

• Some is selected for complete processing• Some is more likely to be selected

Attention

WHICH?

Parallel SearchIs there a green O ?

A. Treisman, G. Gelade, 1980

Conjunction Search

Is there a green N ?

Results

Conjunction Search

Color map Orientation map

Conjunction Search

Primitives

Intensity

Orientation

Curvature

IILine End

Movement

Feature Integration Theory

Attention - two stages:

Attention•Serial Processing•Localized Focus•Slower•Conjunctive search

Pre-attention•Parallel Processing•Low Level Features•Fast•Parallel Search

How is the Focus

found & shifted?

• FLNN• VSLE

Shifts in Attention

“Shifts in selective visual attention: towards the underlying neural circuitry”,

Christof Koch, and Shimon Ullman, 1985

C. Koch, and S. Ullman, 1985

Feature Maps

•Orientation•Color•Curvature•Line end•Movement

Feature Maps

Central RepresentationAttention

SaliencySaliency

Saliency

“A model of saliency-based visual attention for rapid scene analysis”

Laurent Itti, Christof Koch, and Ernst Niebur, 1998

L. Itti, C. Koch, and E. Niebur, 1998

• Salient - stands out

• Example – telephone & road sign have high saliency

from C. Koch L. Itti, C. Koch, and E. Niebur, 1998

Intensity

Cells in the retina

Intensity

Create 8 spatial scale using Gaussian pyramids

IntensityCenter-Surround difference operator- Sensitive to local spatial

discontinuities- Principle computation in the retina &

primary visual cortex- Subtract coarse scale from fine

Fine scale

Coarse scale

coarse

Toy Example

0 255 0

Fine level Coarse level

Gauss Pyramid Interpolation

Coarse level

Point-by-point subtraction

0 255 0

Toy Example

255 255 255

Fine level Coarse level

Gauss Pyramid Interpolation

Coarse level

Point-by-point subtraction

Intensity

4,3,2c 4,3 ccs

)()(),( sIcIscI

)5()2()5,2( III

Compute:

6 Intensity maps

)6()2()6,2( III

Different ratios – multiscale feature extraction

)6()3()6,3( III

Same c and s as with intensity12 Color maps

Kandel et al. (2000). Principles of Neural Science. McGraw-Hill/Appleton & Lange

L. Itti, C. Koch, and E. Niebur, 1998More

Orientation

Same c and s as with intensity24 Orientation maps

}135,90,45,0{

|),(),(|),,( sOcOscO

From Visual system presentation by S. Ullman

L. Itti, C. Koch, and E. Niebur, 1998More

from C. Koch L. Itti, C. Koch, and E. Niebur, 1998

Normalization Operator

Saliency Map

)()()( ONCNINS

1. Extract Feature Maps

Algorithm- up to now

2. Compute Center-Surround (42)

• Intensity – I (6)

• Color – C (12)

• Orientation – O (24)

3. Combine each channel into conspicuity map

4. Compute saliency by summing and normalizing maps

Laurent Itti, Christof Koch, and Ernst Niebur, 1998

Leaky integrate-and-fire neurons“Inhibition of return”

Winner Takes All

Selection (FOA)

FOA – Focus Of Attention

Results

• FOA shifts: 30-70 ms• Inhibition: 500-900 ms

Inhibition of return ends

Results

Spatial Frequency Content, Reinage & Zador, 1997

Saliency

Output

Results

(a) (b)

(c) (d)

Saliency

Output

L. Itti, C. Koch, and E. Niebur, 1998Spatial Frequency Content, Reinage & Zador, 1997

• FLNN• VSLE

Attention & Object Recognition

• “Is bottom-up attention useful for object recognition?”– U. Rutishauser, D. Walther, C. Koch and P. Perona,

U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004

Computer recognition

Human recognition

segmented Cluttered scenes

labeled Non labeled

Attention

Object Recognition

saliency model

Growing region in strongest map

To Object Recognition

(Lowe)

Attention & Object Recognition

Learning inventories – “grocery cart problem”

Real world scenes1 image for training (15 fixations)

2-5 images for testing (20 fixations)

testing

training Object recognitionMatch

“Grocery Cart” Problem

training testing1

testing2

“Grocery Cart” Problem

Downsides:

• Bias of human photography

• Small image set

Solution• Robot as acquisition tool

Robot - Landmark Learning

Objective – how many objects are found and classified correctly?

Navigation – simple obstacle avoiding algorithm using infrared sensors

Object recognition

< 3 key points

Landmark Learning

Attention

Landmark Learning

With Random Selection

Landmark Learning - Results

Saliency Based Object Recognition

• Biologically motivated• Uses bottom-up, allows

combining top-down information

• Segmentation– Cluttered scenes– Unlabeled objects– Multiple objects in single image

• Static priority map

• FLNN• VSLE

Comparison

“Comparing attention operators for learning landmarks”, R. Sim, S. Polifroni, G. Dudek , June 2003

Other attention operators for low level features

R. Sim, S. Polifroni, G. Dudek , June 2003

Comparison

Edge density Radial symmetry

Smallest eigenvalue Caltech saliency

Comparison

• Landmark learning

• Training – learn landmarks knowing camera pose

• Testing - determine pose of camera according to landmarks (pose estimation)

Comparison - Results

• All operators better than random

• Radial symmetry worst results

• Caltech operator performs similar to edge and eigenvalue operators

• BUT – More complex to implement – More computing time

• Less preferred candidate in practice

• FLNN• VSLE

The Problem

Object recognition

• FLNN• VSLE

Biological Motivation

• An alternative approach: continuous search difficulty

• Based on similarity:– Between Targets and Non-Targets in the scene– Between Non-Targets and Non-Targets in the scene

• Similar structural units do not need separate treatment

• Structural units similar to a possible target get high priority

Duncan & Humphreys [89]

similar

not similar

search difficulty

target- nontarget similarity

nontarget- nontarget similarity

• Explains pop-out vs. serial search phenomenon

Non-targets:

Target:

• Explains pop-out vs. serial search phenomenon

Non-targets:

Target:

similar

not similar

search difficulty

• Explains pop-out vs. serial search phenomenon Non-targets:

Target:

Non-targets:

Target:

target- nontarget similarity

nontarget- nontarget similarity

Using Inner-scene Similarities

• Every candidate is characterized by a vector of n attributes

• n-dimentional metric space– A candidate is a point in the space– Some distance function d is associated with

the space

Avraham & Lindenbaum [04] Avraham & Lindenbaum [05]

Using Inner-scene Similarities Example

• One feature only: object area

• d: regular Euclidean distance Feature space

• FLNN• VSLE

Difficulty of Search

• The difficulty measure is the number of queries until the first target is found

• Two main factors– Distance between Targets and Non-Targets– Distance between Non-Targets and Non-

Targets

Feature space

CoverDifficulty of Search

Feature space

c: the number of circles in the cover

c will be our measure of the search difficulty

We need some constraint on the

circles’ size!

c: the number of circles

dt: max-min target distanceDifficulty of Search

dt-cover

diamete

Difficulty of Searchdt

Minimum dt-cover

c: The number of circles in the minimal dt-cover

diamete

Difficulty of Searchdt

c: the number of circlesDifficulty of Search

c: insects exampleDifficulty of Search

Feature spacec = 3

Example: easy searchDifficulty of Search

Example: hard searchDifficulty of Search

c = # of candidates

Define the Difficulty using c

• Lower bound: Every search algorithm needs c calls to the oracle before finding the first target in the worst case

• Upper bound: There is an algorithm that will need max. c calls to the oracle to find the first target, for all search tasks

Lower bound

Every search algorithm needs c calls to the oracle before finding the first target in the worst case

Upper bound

There is an algorithm that will need max. c calls to the oracle to find the first target, for all search tasks

FLNN-Farthest Labeled Nearest Neighbor

• FLNN• VSLE

FLNNFarthest Labeled Nearest Neighbor

Efficient Algorithms

c is a tight bound!

How do we compute c?Difficulty of Search

– Need to know dt

– Compute the minimal dt-cover

– Count number of circles c=7

– Need to know dt

– Compute the minimal dt-cover

– Count number of circles = c

To know the exact dt we need to know all the targets and non-targets, but that’s what we’re looking for…

Computing the minimal dt-cover is NP-complete!

Ok, that’s easy…

How do we compute c?

Upper & Lower Bounds on c

• Upper bounds:– The number of candidates

– Know that dt is larger than some d0:• Can approximate cover size

• Lower bounds:– FLNN worst case

– Know that dt is larger than some d0:• Can approximate cover size

• FLNN• VSLE

Improving FLNN

• What’s wrong with FLNN?– Relates only to the nearest known neighbor– Finds only the first target efficiently– Cannot be easily extended to include top-

down information

VSLEVisual Search using Linear Estimation

• Each candidate has a prob. to be a target• Query the candidate with the highest probability• Update other candidates’ prob. according to the

known results– Every known target/non-target affects other

candidates in reverse order to its distance.

If we know results for candidates 1,…,m:

• Dynamic priority map

0.650.4

0.6 0.5

0.450.51

0.530.46

0.450.5

0.560.48

0.70.68

0.6 0.60.63

0.450.65

0.20.25

0.1 0.620.15

0.210.27

0.12 0.550.18

0.220.28

Combining Top-Down Information

• Simply specify the initial probabilities to match previous known data

• Add known target objects to the space. This will alter the probabilities accordingly and speed up search

Experiment 1: COIL-100Efficient Algorithms

Columbia Object Image Library [96]

Experiment 1: COIL-100

• Features:– 1st, 2nd, 3rd gaussian derivatives 9 basis

filters– 5 scales 9x5 = 45 features

• Euclidean distance

Rao & Ballard [95]

Experiment 1: COIL-100Efficient Algorithms

10 cars10 cups

# queries# queries

Experiment 2: hand segmentedEfficient Algorithms

• Every large segment is a candidate• 24 candidates• 4 targets

Berkeley hand segmented DB

Martin, Fowlkes, Tal & Malik [01]

Experiment 2: hand segmented

• Features: color histograms and

separated into 8 bins each 64 features

Experiment 3: automatic color segmentation

• Automatic color segmented image for face detection

Experiment 3: color segmentation

• 146 candidates

• 4 features: segment size, mean value of red, green and blue

# queries

Combining top-down information

• Add known targets to the space

Without additional targets With additional targets

# queries# queries

Summary: similarity modelSaliency model• Biologically motivated• Uses bottom-up, allows

• Segmentation• Static priority map

Similarity model• Biologically motivated• Uses bottom-up, allows

• No segmentation• Dynamic priority map• Measures the search

difficulty

Summary

• What is attention

• Aid object recognition tasks by choosing the area of interest

• Two approaches: saliency model and similarity model– Biological motivation– Algorithms

Thank You!

Linearly Estimating l(xk)

A linear estimation for l(xk):

Which, of course, minimizes the error

Solving a set of equations gives an estimation:

Linearly Estimating l(xk)

Estimation:

Where vector of known labels,

and is computed as follows (i,j=1,…,m):

R and r depend only on the distances, computed in

advance once

attention in computer vision

late selection treisman

conjunction searchis

expensive algorithms

early selection broadbent

expensive algorithmattention

attenuationall information

green o

number of calls

Documents

modeling the modulatory effect of attention on human ... ·...

deep learning for computer vision with matlab · deep...

deep learning for computer vision: attention models (upc...

1. conscious attention and conscious vision: how are...

computer vision - local-features-tutorial.github.io ·...

computer assisted attention training

attention in computer vision mica arie-nachimson and michal...

computer vision fundamentals of computer vision - mubarak...

tsbb15 computer vision - linköping university · january...

monitoring students’ attention in a classroom through...

computer vision ecse 6650 computer...

carlo tomasi, computer science. human vision computer...

computer vision: vision and modeling

comp 776: computer vision - computer...

computer graphics & computer vision

6.891 vision why study computer vision? why study computer

computer vision fundamentals of computer vision - mubarak...

computer vision syndrome computer vision syndrome

computer vision with matlab master class€¦ · typical...

attention-driven egocentric computer vision for robotic...