sun: a model of visual salience using natural statistics

91
1 SUN: SUN: A Model of Visual Salience Using Natural Statistics Gary Cottrell Gary Cottrell Lingyun Zhang Matthew Lingyun Zhang Matthew Tong Tong Tim Marks Honghao Tim Marks Honghao Shan Shan Nick Butko Javier Nick Butko Javier Movellan Movellan

Upload: selah

Post on 08-Jan-2016

37 views

Category:

Documents


0 download

DESCRIPTION

SUN: A Model of Visual Salience Using Natural Statistics. Gary Cottrell Lingyun Zhang Matthew Tong Tim Marks Honghao Shan Nick Butko Javier Movellan Chris Kanan. SUN: A Model of Visual Salience Using Natural Statistics …and it use in object and face recognition. Gary Cottrell - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: SUN:  A Model of Visual Salience Using Natural Statistics

1

SUN: SUN: A Model of Visual Salience

Using Natural Statistics

Gary CottrellGary Cottrell

Lingyun Zhang Matthew Lingyun Zhang Matthew TongTong

Tim Marks Honghao Tim Marks Honghao ShanShan

Nick Butko Javier MovellanNick Butko Javier Movellan

Chris KananChris Kanan

Page 2: SUN:  A Model of Visual Salience Using Natural Statistics

2

SUN: SUN: A Model of Visual Salience

Using Natural Statistics…and it use in object and face

recognition

Gary CottrellGary Cottrell

Lingyun Zhang Matthew Lingyun Zhang Matthew TongTong

Tim Marks Honghao Tim Marks Honghao ShanShan

Nick Butko Javier MovellanNick Butko Javier Movellan

Chris KananChris Kanan

Page 3: SUN:  A Model of Visual Salience Using Natural Statistics

3

Matthew H. TongMatthew H. Tong

CollaboratorsCollaborators

Lingyun Zhang

Honghao ShanHonghao ShanTim MarksTim Marks

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 4: SUN:  A Model of Visual Salience Using Natural Statistics

4

CollaboratorsCollaborators

Nicholas J. ButkoNicholas J. Butko Javier R. MovellanJavier R. Movellan

Page 5: SUN:  A Model of Visual Salience Using Natural Statistics

5

CollaboratorsCollaborators

Chris KananChris Kanan

Page 6: SUN:  A Model of Visual Salience Using Natural Statistics

6

Visual SalienceVisual Salience

Visual SalienceVisual Salience is some notion of is some notion of what is what is interestinginteresting in the world - it in the world - it captures our attention.captures our attention.

Visual salience is important because Visual salience is important because it drives a decision we make a it drives a decision we make a couple of hundred thousand couple of hundred thousand times a daytimes a day - where to look. - where to look.

Page 7: SUN:  A Model of Visual Salience Using Natural Statistics

7

Visual SalienceVisual Salience Visual SalienceVisual Salience is some notion of what is is some notion of what is

interestinginteresting in the world - it captures our in the world - it captures our attention.attention.

But that’s kind of vague…But that’s kind of vague… The role of Cognitive Science is to make The role of Cognitive Science is to make

that explicit, by creating a that explicit, by creating a working working modelmodel of visual salience. of visual salience.

A good way to do that these days is to use A good way to do that these days is to use probability theory - because as everyone probability theory - because as everyone knows, the brain is Bayesian! ;-)knows, the brain is Bayesian! ;-)

Page 8: SUN:  A Model of Visual Salience Using Natural Statistics

8

Data We Want to ExplainData We Want to Explain

Visual search:Visual search: Search asymmetry: A search for one object Search asymmetry: A search for one object

among a set of distractors is faster than vice among a set of distractors is faster than vice versa.versa.

Parallel vs. serial search (and the continuum Parallel vs. serial search (and the continuum in between): An item “pops out” of the display in between): An item “pops out” of the display no matter how many distractors vs. reaction no matter how many distractors vs. reaction time increasing with the number of distractors time increasing with the number of distractors (not emphasized in this talk…)(not emphasized in this talk…)

Eye movements when viewing images and Eye movements when viewing images and videos.videos.

Page 9: SUN:  A Model of Visual Salience Using Natural Statistics

9

Audience participation!Audience participation!

Look for the unique Look for the unique itemitem

Clap when you find itClap when you find it

Page 10: SUN:  A Model of Visual Salience Using Natural Statistics

10

Page 11: SUN:  A Model of Visual Salience Using Natural Statistics

11

Page 12: SUN:  A Model of Visual Salience Using Natural Statistics

12

Page 13: SUN:  A Model of Visual Salience Using Natural Statistics

13

Page 14: SUN:  A Model of Visual Salience Using Natural Statistics

14

Page 15: SUN:  A Model of Visual Salience Using Natural Statistics

15

Page 16: SUN:  A Model of Visual Salience Using Natural Statistics

16

QuickTime™ and a decompressor

are needed to see this picture.

Page 17: SUN:  A Model of Visual Salience Using Natural Statistics

17

QuickTime™ and a decompressor

are needed to see this picture.

Page 18: SUN:  A Model of Visual Salience Using Natural Statistics

18

What just happened?What just happened?

This phenomenon is called This phenomenon is called the the visual search asymmetryvisual search asymmetry:: Tilted bars are more easily found Tilted bars are more easily found

among vertical bars than vice-versa.among vertical bars than vice-versa. Backwards “s”’s are more easily found Backwards “s”’s are more easily found

among normal “s”’s than vice-versa.among normal “s”’s than vice-versa. Upside-down elephants are more easily Upside-down elephants are more easily

found among right-side up ones than found among right-side up ones than vice-versa.vice-versa.

Page 19: SUN:  A Model of Visual Salience Using Natural Statistics

19

Why is there an Why is there an asymmetry?asymmetry?

There are not too many There are not too many computational computational explanations:explanations: ““Prototypes do not pop out”Prototypes do not pop out” ““Novelty attracts attention”Novelty attracts attention”

Our model of visual salience will Our model of visual salience will naturally account for this.naturally account for this.

Page 20: SUN:  A Model of Visual Salience Using Natural Statistics

20

Saliency MapsSaliency Maps

Koch and Ullman, 1985: the brain Koch and Ullman, 1985: the brain calculates an explicit saliency map of calculates an explicit saliency map of the visual worldthe visual world

Their definition of saliency relied on Their definition of saliency relied on center-surround principles center-surround principles Points in the visual scene are salient if Points in the visual scene are salient if

they differ from their neighborsthey differ from their neighbors In more recent years, there have In more recent years, there have

been a multitude of definitions of been a multitude of definitions of saliencysaliency

Page 21: SUN:  A Model of Visual Salience Using Natural Statistics

21

Saliency MapsSaliency Maps

There are a number of candidates for the There are a number of candidates for the salience map: there is at least one in LIP, salience map: there is at least one in LIP, the Lateral Intraparietal Sulcus, a region the Lateral Intraparietal Sulcus, a region of the parietal lobe, also in the frontal eye of the parietal lobe, also in the frontal eye fields, the superior colliculus,… but there fields, the superior colliculus,… but there may be representations of salience much may be representations of salience much earlier in the visual pathway - some even earlier in the visual pathway - some even suggest in V1.suggest in V1.

But we won’t be talking about the brain But we won’t be talking about the brain today…today…

Page 22: SUN:  A Model of Visual Salience Using Natural Statistics

22

Probabilistic SaliencyProbabilistic Saliency

Our basic assumption: Our basic assumption: The main goal of the visual system is to The main goal of the visual system is to

find potential targets that are important find potential targets that are important for survival, such as prey and predators.for survival, such as prey and predators.

The visual system should direct attention The visual system should direct attention to locations in the visual field with a high to locations in the visual field with a high probability of the target class or classes.probability of the target class or classes.

We will lump all of the potential targets We will lump all of the potential targets together in one random variable, together in one random variable, TT

For ease of exposition, we will leave out For ease of exposition, we will leave out our location random variable, our location random variable, L.L.

Page 23: SUN:  A Model of Visual Salience Using Natural Statistics

23

Probabilistic SaliencyProbabilistic Saliency Notation: Notation: xx denotes a point in the visual denotes a point in the visual

fieldfield TTxx: binary variable signifying whether point : binary variable signifying whether point xx

belongs to a target classbelongs to a target class FFxx: the visual features at point : the visual features at point xx

The task is to find the point The task is to find the point xx that that maximizesmaximizes

the probability of a target given the the probability of a target given the features at point features at point xx

This quantity This quantity isis the saliency of a point the saliency of a point xx Note: Note: This is what most classifiers This is what most classifiers

compute!compute!

Page 24: SUN:  A Model of Visual Salience Using Natural Statistics

24

Probabilistic SaliencyProbabilistic Saliency

Taking the log and applying Bayes’ Taking the log and applying Bayes’ Rule results in:Rule results in:

Page 25: SUN:  A Model of Visual Salience Using Natural Statistics

25

Probabilistic SaliencyProbabilistic Saliency

log p(Flog p(Fxx|T|Txx)) Probabilistic description of the features Probabilistic description of the features

of the targetof the target Provides a form ofProvides a form of top-downtop-down

(endogenous, intrinsic) (endogenous, intrinsic) saliency saliency Some similarity to Iconic Search (Rao et Some similarity to Iconic Search (Rao et

al., 1995) and Guided Search (Wolfe, al., 1995) and Guided Search (Wolfe, 1989)1989)

Page 26: SUN:  A Model of Visual Salience Using Natural Statistics

26

Probabilistic SaliencyProbabilistic Saliency

log p(Tlog p(Txx)) Constant over locations for fixed target Constant over locations for fixed target

classes, so we can drop it.classes, so we can drop it. Note: this is a stripped-down version of Note: this is a stripped-down version of

our model, useful for presentations to our model, useful for presentations to undergraduates! ;-) - we usually include undergraduates! ;-) - we usually include a location variable as well that encodes a location variable as well that encodes the prior probability of targets being in the prior probability of targets being in particular locations.particular locations.

Page 27: SUN:  A Model of Visual Salience Using Natural Statistics

27

Probabilistic SaliencyProbabilistic Saliency

-log p(F-log p(Fxx)) This is called the This is called the self-information self-information of

this variable It says that rare It says that rare feature valuesfeature values attract attract

attentionattention Independent of taskIndependent of task Provides notion of Provides notion of bottom-upbottom-up

(exogenous, extrinsic) saliency(exogenous, extrinsic) saliency

Page 28: SUN:  A Model of Visual Salience Using Natural Statistics

28

Probabilistic SaliencyProbabilistic Saliency

Now we have two terms:Now we have two terms: Top-downTop-down saliency saliency Bottom-upBottom-up saliency saliency Taken together, this is the Taken together, this is the pointwise pointwise

mutual informationmutual information between the between the features and the targetfeatures and the target

Page 29: SUN:  A Model of Visual Salience Using Natural Statistics

29

Math in Action:Math in Action:Saliency Using “Natural Saliency Using “Natural

Statistics”Statistics” For most of what I will be telling you For most of what I will be telling you

about next, we use only the -log p(F) about next, we use only the -log p(F) term, or bottom up salience.term, or bottom up salience.

Remember, this means rare feature Remember, this means rare feature values attract attention. values attract attention.

This This is is a computational instantiation a computational instantiation of the idea that “novelty attracts of the idea that “novelty attracts attention”attention”

Page 30: SUN:  A Model of Visual Salience Using Natural Statistics

30

Math in Action:Math in Action:Saliency Using “Natural Saliency Using “Natural

Statistics”Statistics” Remember, this means rare feature Remember, this means rare feature

values attract attention. values attract attention. This means two things:This means two things:

We need some features (that have We need some features (that have values!)! What should we use?values!)! What should we use?

We need to know when the values We need to know when the values are are unusualunusual: So we need : So we need experience.experience.

Page 31: SUN:  A Model of Visual Salience Using Natural Statistics

31

Math in Action:Math in Action:Saliency Using “Natural Saliency Using “Natural

Statistics”Statistics” Experience, in this case, means Experience, in this case, means

collecting statistics of how the collecting statistics of how the features respond to natural images.features respond to natural images.

We will use two kinds of features:We will use two kinds of features: Difference of Gaussians (DOGs)Difference of Gaussians (DOGs) Independent Components Analysis Independent Components Analysis

(ICA) derived features(ICA) derived features

Page 32: SUN:  A Model of Visual Salience Using Natural Statistics

32

Feature Space 1:Feature Space 1:Differences of GaussiansDifferences of Gaussians

These respond to differences in These respond to differences in brightness between the center and the brightness between the center and the surround.surround.We apply them to three different color We apply them to three different color channels separately (intensity, Red-channels separately (intensity, Red-Green and Blue-Yellow) at four scales: Green and Blue-Yellow) at four scales: 12 features total.12 features total.

Page 33: SUN:  A Model of Visual Salience Using Natural Statistics

33

Feature Space 1:Feature Space 1:Differences of GaussiansDifferences of Gaussians

Now, we run these over Lingyun’s Now, we run these over Lingyun’s vacation photos, and record how vacation photos, and record how frequently they respond.frequently they respond.

Page 34: SUN:  A Model of Visual Salience Using Natural Statistics

34

Feature Space 2:Feature Space 2:Independent Independent ComponentsComponents

Page 35: SUN:  A Model of Visual Salience Using Natural Statistics

35

Learning the Learning the DistributionDistribution

We fit a generalized Gaussian distribution to the histogram of each feature.

p(Fi ;σ i ,θi ) =θ i

2σ i ⋅Γ1θi

⎝⎜⎞

⎠⎟

exp −Fiσ i

θi⎛

⎝⎜⎜

⎠⎟⎟

where Fi is the ith filter response,

θi is the shape parameter and σ i is the scale parameter.

Page 36: SUN:  A Model of Visual Salience Using Natural Statistics

36

• This is P(F) for four different This is P(F) for four different features.features.• Note these features are Note these features are sparse sparse - - I.e., their most frequent response is I.e., their most frequent response is near 0.near 0.• When there is a big response When there is a big response (positive or negative), it is (positive or negative), it is interesting!interesting!

The Learned Distribution The Learned Distribution (DOGs)(DOGs)

Page 37: SUN:  A Model of Visual Salience Using Natural Statistics

37

The Learned Distribution The Learned Distribution (ICA)(ICA)

QuickTime™ and a decompressor

are needed to see this picture.

For example, here’s a For example, here’s a feature:feature:

Here’s a frequency Here’s a frequency count of how often it count of how often it matches a patch of matches a patch of image:image:

Most of the time, it Most of the time, it doesn’t match at all - doesn’t match at all - a response of “0”a response of “0”

Very infrequently, it Very infrequently, it matches very well - a matches very well - a response of “200”response of “200”

BOREDOM!

NOVELTY!

Page 38: SUN:  A Model of Visual Salience Using Natural Statistics

38

Bottom-up SaliencyBottom-up Saliency

We have to estimate the joint We have to estimate the joint probability from the features.probability from the features.

If all filter responses are If all filter responses are independent:independent:

They’re not independent, but we They’re not independent, but we proceed as if they are. (ICA features proceed as if they are. (ICA features are “pretty independent”)are “pretty independent”)

Note: No weighting of features is Note: No weighting of features is necessary!necessary!

−log p(F) = − log p(Fi )i

Page 39: SUN:  A Model of Visual Salience Using Natural Statistics

39

Qualitative Results: BU Qualitative Results: BU SaliencySaliency

OriginalOriginal Human Human DOG DOG ICA ICAImageImage fixations fixations Salience SalienceSalience Salience

Page 40: SUN:  A Model of Visual Salience Using Natural Statistics

40

OriginalOriginal Human Human DOG DOG ICA ICAImageImage fixations fixations Salience SalienceSalience Salience

Qualitative Results: BU Qualitative Results: BU SaliencySaliency

Page 41: SUN:  A Model of Visual Salience Using Natural Statistics

41

Qualitative Results: BU Qualitative Results: BU SaliencySaliency

Page 42: SUN:  A Model of Visual Salience Using Natural Statistics

42

Quantitative Results: BU Quantitative Results: BU SaliencySaliency

These are quantitative measures of how well the These are quantitative measures of how well the salience map predicts salience map predicts humanhuman fixations in static fixations in static images.images.

We are best in the KL distance measure, and We are best in the KL distance measure, and second best in the ROC measure. second best in the ROC measure.

Our main competition is Bruce & Tsotsos, who Our main competition is Bruce & Tsotsos, who have essentially the same idea we have, except have essentially the same idea we have, except they compute novelty they compute novelty in the current image.in the current image.

ModelModel KL(SE)KL(SE) ROC(SE)ROC(SE)

Itti et al.(1998)Itti et al.(1998) 0.1130(0.0011)0.1130(0.0011) 0.6146(0.0008)0.6146(0.0008)

Bruce & Tsotsos Bruce & Tsotsos (2006)(2006)

0.2029(0.0017)0.2029(0.0017) 0.6727(0.0008)0.6727(0.0008)

Gao & Vasconcelos Gao & Vasconcelos (2007)(2007)

0.1535(0.0016)0.1535(0.0016) 0.6395(0.0007)0.6395(0.0007)

SUN (DoG)SUN (DoG) 0.1723(0.0012)0.1723(0.0012) 0.6570(0.0007)0.6570(0.0007)

SUN (ICA)SUN (ICA) 0.2097(0.0016)0.2097(0.0016) 0.6682(0.0008)0.6682(0.0008)

Page 43: SUN:  A Model of Visual Salience Using Natural Statistics

43

Related WorkRelated Work

Torralba et al. (2003) derives a similar Torralba et al. (2003) derives a similar probabilistic account of saliency, but:probabilistic account of saliency, but: Uses Uses currentcurrent image’s statistics image’s statistics Emphasizes effects of global features and Emphasizes effects of global features and

scene gistscene gist Bruce and Tsotsos (2006) also use self-Bruce and Tsotsos (2006) also use self-

information as bottom-up saliencyinformation as bottom-up saliency Uses Uses currentcurrent image’s statistics image’s statistics

Page 44: SUN:  A Model of Visual Salience Using Natural Statistics

44

Related WorkRelated Work

The use of the current image’s statistics The use of the current image’s statistics means:means: These models follow a very different principle: finds These models follow a very different principle: finds

rare feature values rare feature values in the current imagein the current image instead instead of of unusual feature values in general: novelty.unusual feature values in general: novelty.

As we’ll see, novelty helps explain several As we’ll see, novelty helps explain several search asymmetriessearch asymmetries

Models using the current image’s statistics are Models using the current image’s statistics are unlikely to be neurally computable in the unlikely to be neurally computable in the necessary timeframe, as the system must necessary timeframe, as the system must collect statistics from entire image to calculate collect statistics from entire image to calculate local saliency at each pointlocal saliency at each point

Page 45: SUN:  A Model of Visual Salience Using Natural Statistics

45

Search AsymmetrySearch Asymmetry

Our definition of bottom-up saliency leads to Our definition of bottom-up saliency leads to a clean explanation of several search a clean explanation of several search asymmetries (Zhang, Tong, and Cottrell, asymmetries (Zhang, Tong, and Cottrell, 2007)2007) All else being equal, targets with uncommon All else being equal, targets with uncommon

feature values are easier to findfeature values are easier to find Examples:Examples:

Treisman and Gormican, 1988 - A tilted bar is more Treisman and Gormican, 1988 - A tilted bar is more easily found among vertical bars than vice versaeasily found among vertical bars than vice versa

Levin, 2000 - For Caucasian subjects, finding an African-Levin, 2000 - For Caucasian subjects, finding an African-American face in Caucasian faces is faster due to its American face in Caucasian faces is faster due to its relative rarity in our experience (basketball fans who relative rarity in our experience (basketball fans who have to identify the players do not show this effect).have to identify the players do not show this effect).

Page 46: SUN:  A Model of Visual Salience Using Natural Statistics

46

Search Asymmetry Search Asymmetry ResultsResults

Page 47: SUN:  A Model of Visual Salience Using Natural Statistics

47

Search Asymmetry Search Asymmetry ResultsResults

Page 48: SUN:  A Model of Visual Salience Using Natural Statistics

48

Top-down salienceTop-down saliencein Visual Searchin Visual Search

Suppose we actually have a target in mind Suppose we actually have a target in mind - e.g., find pictures, or mugs, or people in - e.g., find pictures, or mugs, or people in scenes.scenes.

As I mentioned previously, the original As I mentioned previously, the original (stripped down) (stripped down) salience model can be salience model can be implemented as a classifier applied to implemented as a classifier applied to each point in the image. each point in the image.

When we include location, we get (after a When we include location, we get (after a large number of completely unwarranted large number of completely unwarranted assumptions):assumptions):

log saliencex = −logp(F = fx)

Self-information:Bottom-up saliency

1 24 4 34 4+ logp(F = fx |Tx =1)

Log likelihood:Top-down knowledge

of appearance

1 24 4 4 34 4 4+ logp(Tx =1|L =l)

Location prior:Top-down knowledge

of target's location

1 24 44 34 4 4

Page 49: SUN:  A Model of Visual Salience Using Natural Statistics

49

Qualitative Results (mug Qualitative Results (mug search)search)

Where we Where we disagree the disagree the most with most with Torralba et al. Torralba et al. (2006)(2006)

GISTGIST

SUNSUN

Page 50: SUN:  A Model of Visual Salience Using Natural Statistics

50

Qualitative Results (picture Qualitative Results (picture search)search)

Where we Where we disagree the disagree the most with most with Torralba et al. Torralba et al. (2006)(2006)

GISTGIST

SUNSUN

Page 51: SUN:  A Model of Visual Salience Using Natural Statistics

51

Qualitative Results (people Qualitative Results (people search)search)

Where we Where we agreeagree the most the most with Torralba with Torralba et al. (2006)et al. (2006)

GISTGIST

SUNSUN

Page 52: SUN:  A Model of Visual Salience Using Natural Statistics

52

Qualitative Results (painting Qualitative Results (painting search)search)

This is an example where SUN and humans This is an example where SUN and humans make the same mistake due to the similar make the same mistake due to the similar appearance of TV’s and pictures (the black appearance of TV’s and pictures (the black square in the upper left is a TV!).square in the upper left is a TV!).

Image Humans SUN

Page 53: SUN:  A Model of Visual Salience Using Natural Statistics

53

Quantitative ResultsQuantitative Results

Area Under the ROC Curve (AUC) Area Under the ROC Curve (AUC) gives basically identical results.gives basically identical results.

Page 54: SUN:  A Model of Visual Salience Using Natural Statistics

54

Saliency of Dynamic Saliency of Dynamic ScenesScenes

Created spatiotemporal Created spatiotemporal filters filters Temporal filters: Difference of Temporal filters: Difference of

exponentials (DoE) exponentials (DoE) Highly active if changeHighly active if change If features stay constant, If features stay constant,

goes to zero responsegoes to zero response Resembles responses of Resembles responses of

some neurons (cells in some neurons (cells in LGN)LGN)

Easy to computeEasy to compute Convolve with spatial filters to Convolve with spatial filters to

create spatiotemporal filterscreate spatiotemporal filters

Page 55: SUN:  A Model of Visual Salience Using Natural Statistics

55

Saliency of Dynamic Saliency of Dynamic ScenesScenes

Bayesian Saliency (Itti and Baldi, 2006): Bayesian Saliency (Itti and Baldi, 2006): Saliency is Bayesian “surprise” (different from Saliency is Bayesian “surprise” (different from

self-information)self-information) Maintain distribution over a set of models Maintain distribution over a set of models

attempting to explain the data, P(M)attempting to explain the data, P(M) As new data comes in, calculate saliency of a point As new data comes in, calculate saliency of a point

as the degree to which it makes you alter your as the degree to which it makes you alter your modelsmodels Total surprise: S(D, M) = KL(P(M|D); P(M))Total surprise: S(D, M) = KL(P(M|D); P(M))

Better predictor than standard spatial salienceBetter predictor than standard spatial salience Much more complicated (~500,000 different Much more complicated (~500,000 different

distributions being modeled) than SUN dynamic distributions being modeled) than SUN dynamic saliency (days to run vs. hours or real-time)saliency (days to run vs. hours or real-time)

Page 56: SUN:  A Model of Visual Salience Using Natural Statistics

56

Saliency of Dynamic Saliency of Dynamic ScenesScenes

In the process of evaluating and In the process of evaluating and comparing, we discovered how much the comparing, we discovered how much the center-bias of human fixations was center-bias of human fixations was affecting results.affecting results.

Most human fixations are towards the Most human fixations are towards the center of the screen (Reinagel, 1999)center of the screen (Reinagel, 1999)

Accumulated human fixations from three experiments

Page 57: SUN:  A Model of Visual Salience Using Natural Statistics

57

Saliency of Dynamic Saliency of Dynamic ScenesScenes

Results varied widely depending on Results varied widely depending on how edges were handledhow edges were handled How is the invalid portion of the How is the invalid portion of the

convolution handled?convolution handled?

Accumulated saliency of three models

Page 58: SUN:  A Model of Visual Salience Using Natural Statistics

58

Saliency of Dynamic Saliency of Dynamic ScenesScenes

Initial results

Page 59: SUN:  A Model of Visual Salience Using Natural Statistics

59

Measures of Dynamic Measures of Dynamic SaliencySaliency

Typically, the algorithm is compared to the Typically, the algorithm is compared to the human fixations within a framehuman fixations within a frame I.e., how salient is the human-fixated point I.e., how salient is the human-fixated point

according to the model versus all other points in according to the model versus all other points in the framethe frame

This measure is subject to the center bias - if the This measure is subject to the center bias - if the borders are down-weighted, the score goes upborders are down-weighted, the score goes up

Page 60: SUN:  A Model of Visual Salience Using Natural Statistics

60

Measures of Dynamic Measures of Dynamic SaliencySaliency

An alternative is to compare the salience of An alternative is to compare the salience of the human-fixated point to the same point the human-fixated point to the same point across framesacross frames Underestimates performance, since often Underestimates performance, since often

locations are genuinely more salient at all time locations are genuinely more salient at all time points (ex. an anchor’s face during a news points (ex. an anchor’s face during a news broadcast)broadcast)

Gives any static measure (e.g., centered-Gives any static measure (e.g., centered-Gaussian) a baseline score of 0.Gaussian) a baseline score of 0.

This is equivalent to sampling from the This is equivalent to sampling from the distribution of human fixations, rather than distribution of human fixations, rather than uniformlyuniformly

On this set of measures, we perform comparably On this set of measures, we perform comparably with (Itti and Baldi, 2006)with (Itti and Baldi, 2006)

Page 61: SUN:  A Model of Visual Salience Using Natural Statistics

61

Saliency of Dynamic Saliency of Dynamic ScenesScenes

Results using non-center-biased metrics on the human fixation data on videos from

Itti(2005) - 4 subjects/movie, 50 movies, ~25 minutes of video.

Page 62: SUN:  A Model of Visual Salience Using Natural Statistics

62

Movies…Movies…

Page 63: SUN:  A Model of Visual Salience Using Natural Statistics

63

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

Page 64: SUN:  A Model of Visual Salience Using Natural Statistics

64

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

Page 65: SUN:  A Model of Visual Salience Using Natural Statistics

65

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

Page 66: SUN:  A Model of Visual Salience Using Natural Statistics

66

Demo…Demo…

Page 67: SUN:  A Model of Visual Salience Using Natural Statistics

67

Summary of this part of Summary of this part of the talkthe talk

It is a good idea to start from first It is a good idea to start from first principles.principles.

Often the simplest model is bestOften the simplest model is best Our model of salience rocks.Our model of salience rocks.

It does bottom upIt does bottom up It does top downIt does top down It does video (fast!)It does video (fast!) It naturally accounts for search It naturally accounts for search

asymmetriesasymmetries

Page 68: SUN:  A Model of Visual Salience Using Natural Statistics

70

Christopher KananChristopher Kanan

Garrison CottrellGarrison Cottrell

Page 69: SUN:  A Model of Visual Salience Using Natural Statistics

71

Now we have a model of Now we have a model of salience - but what can it salience - but what can it be used for?be used for?

Here, we show that we Here, we show that we can use it to recognize can use it to recognize objects.objects.

Christopher Kanan

MotivationMotivation

Page 70: SUN:  A Model of Visual Salience Using Natural Statistics

72

Our attention is Our attention is automatically drawn to automatically drawn to interesting regions in interesting regions in images.images.

Our salience algorithm is Our salience algorithm is automatically drawn to automatically drawn to interesting regions in interesting regions in images.images.

These are useful locations These are useful locations for for discriminatingdiscriminating one one object (face, butterfly) object (face, butterfly) from another.from another.

One reason why this One reason why this might be a good idea…might be a good idea…

Page 71: SUN:  A Model of Visual Salience Using Natural Statistics

73

Training Phase (learning Training Phase (learning object appearances):object appearances):

Use the salience map to decide Use the salience map to decide where to look. where to look. (We use the ICA salience map)(We use the ICA salience map)

Memorize these Memorize these samples samples of the of the image, with image, with labelslabels (Bob, Carol, (Bob, Carol, Ted, or Alice) Ted, or Alice) (We store the ICA (We store the ICA feature feature valuesvalues))

Christopher Kanan

Main IdeaMain Idea

Page 72: SUN:  A Model of Visual Salience Using Natural Statistics

74

Testing Phase Testing Phase (recognizing objects we (recognizing objects we have learned):have learned):

Now, given a new face, use the Now, given a new face, use the salience map to decide where to salience map to decide where to look.look.

Compare Compare new new image samples to image samples to storedstored ones - the closest ones in ones - the closest ones in memory get to vote for their label.memory get to vote for their label.

Christopher Kanan

Main IdeaMain Idea

Page 73: SUN:  A Model of Visual Salience Using Natural Statistics

75

Stored memories of BobStored memories of AliceNew fragments

75Result: 7 votes for Alice, only 3 for Bob. It’s Alice!

Page 74: SUN:  A Model of Visual Salience Using Natural Statistics

76

VotingVoting

The voting process is actually based The voting process is actually based on Bayesian updating (and the Naïve on Bayesian updating (and the Naïve Bayes assumption).Bayes assumption).

The size of the vote depends on the The size of the vote depends on the distance from the stored sample, distance from the stored sample, using kernel density estimation. using kernel density estimation.

Hence NIMBLE: NIM with Bayesian Hence NIMBLE: NIM with Bayesian Likelihood Estimation.Likelihood Estimation.

Page 75: SUN:  A Model of Visual Salience Using Natural Statistics

77

The ICA features do double-duty:The ICA features do double-duty: They are They are combinedcombined to make the salience to make the salience

map - which is used to decide where to map - which is used to decide where to looklook

They are They are storedstored to represent the object at to represent the object at that locationthat location

QuickTime™ and a decompressor

are needed to see this picture.

Overview of the systemOverview of the system

Page 76: SUN:  A Model of Visual Salience Using Natural Statistics

78

Compare this to standard Compare this to standard computer vision systems:computer vision systems:

One pass over the image, and One pass over the image, and global features.global features.

ImageGlobal

FeaturesGlobal

ClassifierDecision

NIMBLE vs. Computer NIMBLE vs. Computer VisionVision

Page 77: SUN:  A Model of Visual Salience Using Natural Statistics

79

Page 78: SUN:  A Model of Visual Salience Using Natural Statistics

80Belief After 1 Fixation Belief After 10 Fixations

Page 79: SUN:  A Model of Visual Salience Using Natural Statistics

81

Human vision works in multiple Human vision works in multiple environments - our basic features environments - our basic features (neurons!) don’t change from one problem (neurons!) don’t change from one problem to the next.to the next.

We tune our parameters so that the system We tune our parameters so that the system works well on Bird and Butterfly datasets - works well on Bird and Butterfly datasets - and then apply the system and then apply the system unchangedunchanged to to faces, flowers, and objectsfaces, flowers, and objects

This is very different from standard This is very different from standard computer vision systems, that are tuned to computer vision systems, that are tuned to particular setparticular set

Christopher Kanan

Robust VisionRobust Vision

Page 80: SUN:  A Model of Visual Salience Using Natural Statistics

82

Cal Tech 101: 101 Different Categories

AR dataset: 120 Different People with different lighting, expression, and accessories

Page 81: SUN:  A Model of Visual Salience Using Natural Statistics

83

Flowers: 102 Different Flower SpeciesFlowers: 102 Different Flower Species

Christopher Kanan

Page 82: SUN:  A Model of Visual Salience Using Natural Statistics

84

~7 fixations required to achieve at ~7 fixations required to achieve at least 90% of maximum least 90% of maximum performance performance

Christopher Kanan

Page 83: SUN:  A Model of Visual Salience Using Natural Statistics

85

So, we created a simple cognitive So, we created a simple cognitive model that uses simulated fixations model that uses simulated fixations to recognize things.to recognize things. But it isn’t But it isn’t thatthat complicated. complicated.

How does it compare to approaches How does it compare to approaches in computer vision?in computer vision?

Page 84: SUN:  A Model of Visual Salience Using Natural Statistics

86

Caveats:Caveats: As of mid-2010.As of mid-2010. Only comparing to single feature Only comparing to single feature

type approaches (no “Multiple type approaches (no “Multiple Kernel Learning” (MKL) Kernel Learning” (MKL) approaches).approaches).

Still superior to MKL with very few Still superior to MKL with very few training examples per category.training examples per category.

Page 85: SUN:  A Model of Visual Salience Using Natural Statistics

871 5 15

30NUMBER OF TRAINING EXAMPLES

Page 86: SUN:  A Model of Visual Salience Using Natural Statistics

881 2 3 6

8 NUMBER OF TRAINING EXAMPLES

Page 87: SUN:  A Model of Visual Salience Using Natural Statistics

89

QuickTime™ and a decompressor

are needed to see this picture.

Page 88: SUN:  A Model of Visual Salience Using Natural Statistics

90

More neurally and behaviorally More neurally and behaviorally relevant gaze control and relevant gaze control and fixation integration.fixation integration.People don’t randomly sample People don’t randomly sample

images.images. A foveated retinaA foveated retina Comparison with human eye Comparison with human eye

movement data during movement data during recognition/classification of recognition/classification of faces, objects, etc.faces, objects, etc.

Page 89: SUN:  A Model of Visual Salience Using Natural Statistics

91

A fixation-based approach A fixation-based approach can work well for image can work well for image classification.classification.

Fixation-based models can Fixation-based models can achieve, and even exceed, some achieve, and even exceed, some of the best models in computer of the best models in computer vision. vision.

……Especially when you don’t have Especially when you don’t have a lot of training images.a lot of training images.

Christopher Kanan

Page 90: SUN:  A Model of Visual Salience Using Natural Statistics

92

Software and Paper Software and Paper Available at Available at

www.chriskanan.cowww.chriskanan.comm

[email protected]@ucsd.eduThis work was supported by the NSF (grant #SBE-0542013) to the Temporal

Dynamics of Learning Center.

Page 91: SUN:  A Model of Visual Salience Using Natural Statistics

93Thanks!Thanks!