jinhui tang †, shuicheng yan †, richang hong †, guo-jun qi ‡, tat-seng chua † † national...

30
Inferring Semantic Concepts from Community- Contributed Images and Noisy Tags Jinhui Tang , Shuicheng Yan , Richang Hong , Guo-Jun Qi , Tat-Seng Chua National University of Singapore University of Illinois at Urbana-Champaign

Upload: claire-white

Post on 25-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

Inferring Semantic Concepts from Community- Contributed Images and Noisy Tags

Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua †

† National University of Singapore‡ University of Illinois at Urbana-Champaign

Page 2: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

Outline

Motivation

Sparse-Graph based Semi-supervised Learning

Handling of Noisy Tags

Inferring Concepts in Semantic Concept Space

Experiments

Summarization and Future Work

Page 3: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

Web Images and Metadata

Page 4: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

Our task

No manual annotation are required.

Page 5: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

Methods Can be Used

With models: SVM GMM …

Infer labels directly: k-NN Graph-based semi-supervised methods

Page 6: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

Normal Graph-based Methods

A common disadvantage: Have certain parameters that require manual

tuning Performance is sensitive to parameter tuning

The graphs are constructed based on visual distance Many links between samples with unrelated-concepts The label information will be propagated incorrectly.

Locally linear reconstruction: Still needs to select neighbors based on visual distance

Page 7: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

Key Ideas of Our Approach

Sparse Graph based Learning

Noisy Tag Handling

Inferring Concepts in the Concept Space

Page 8: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

Why Sparse Graph ?

Human vision system seeks a sparse representation for the incoming image using a few visual words in a feature vocabulary. (Neural Science)

Advantages: Reduce the concept-unrelated links to avoid the

propagation of incorrect information; Practical for large-scale applications, since the sparse

representation can reduce the storage requirement and is feasible for large-scale numerical computation.

Page 9: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

Normal Graph v.s. Sparse Graph

Normal Graph Construction.

Sparse Graph Construction.

Page 10: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

Sparse Graph Construction

The ℓ1-norm based linear reconstruction error minimization can naturally lead to a sparse representation for the images *.

* J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma. Robust face recognition via sparse representation. IEEE Transaction on Pattern Analysis and Machine Intelligence, 31(2):210–227, Feb. 2009

The sparse reconstruction can be obtained by solving the following convex optimization problem:

minw ||w||1 , s.t. x=Dw

w R∈ n : the vector of the reconstruction coefficients; x R∈ d : feature vector of the image to be reconstructed;D R∈ d*n (d < n) : a matrix formed by the feature vectors of the other images in the dataset.

Page 11: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

Sparse Graph Construction (cont.)

Handle the noise on certain elements of x: Reformulate x = Dw+ , where Rξ ξ ∈ d is the noise term. Then :

ˆ 1ˆ ˆmin || || , . .s t w w x Bw( )[ ; ] d n dR B D I T T Tˆ [ ; ]w w

Set the edge weight of the sparse graph:

( ), ;

( 1), ;

0, ;

i

ij i

j if j i

w j if j i

if j i

w

w

Page 12: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

Semi-supervised Inference

Result:

2

1

min || || , . .N

f i ij j i i ii j i

f w f s t f y if x L

* 1u uu ul

f M M y

T

T

:

ll lu

ul uu

symmetric matrix

M C C

C I W I W

M MM

M M

Page 13: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

Semi-supervised Inference (cont.)

* 1u uu ul

f M M y The problem with :

Muu is typically very large for image annotation It is often computationally prohibitive to calculate its

inverse directly Iterative solution with non-negative constraints:

may not be reasonable since some samples may have negative contributions to the other samples

1uuM

Solution: Reformulate: uu u ulM f M y

The generalized minimum residual method (usually abbreviated as GMRES) can be used to iteratively solve this large-scale sparse system of linear equations effectively and efficiently.

Page 14: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

Different Types of Tags

√: correct; ?: ambiguous; m: missing

Page 15: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

Handling of Noisy Tags

We cannot assume that the training tags are fixed during the inference process.

The noisy training tags should be refined during the label inference.

Solution: adding two regularization terms into the inferring framework to handle the noise:

2 21 2 1

ˆ ˆmin || || || || || ||l l l f f Wf f f f y

Page 16: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

Handling of Noisy Tags (cont.)

Solution: Set the original label vector as the initial estimation of ideal label

vector, that is, set , and then solve

and we can obtain a refined fl. Fix fl and solve

Use the obtained to replace the y in the previous graph-based method, and we can solve the sparse system of linear equations to infer the labels of the unlabeled samples.

2 21

ˆmin || || || ||l l f f Wf f fl̂ f y

2 2ˆ 1

1

ˆ ˆmin || || || ||l

l l l

f

f f f y

l̂f

Page 17: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

Why Concept Space?

It is well-known that inferring concepts based on low-level visual features cannot work very well due to the semantic gap.

To bridge this semantic gap Construct a concept space and then infer the semantic

concepts in this space. The semantic relations among different concepts are

inherently embedded in this space to help the concept inference.

Page 18: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

The requirements for the concept space

Low-semantic-gap: Concepts in the constructed space should have small semantic gaps;

Informative: These concepts can cover the semantic space spanned by all useful concepts (tags), that is, the concept space should be informative;

Compact: The set including all the concepts forming the space should be compact (i.e., the dimension of the concept space is small).

Page 19: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

Concept Space Construction

Basic terms: Ω : the set of all concepts; Θ : the constructed concept set.

Three measures: Semantic Modelability: SM(Θ) Coverage of Semantic Concept Space: CE(Θ, Ω) Compactness: CP(Θ)=1/#(Θ)

Objective:

max ( ) ( , ) (1 ) ( )SM CE CP

Page 20: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

Solution for Concept Space Construction

Simplification: fix the size of the concept space.

max ( ) (1 ) ( , ) , . . #( )SM CE s t m

Then we can transform this maximization to a standard quadratic programming problem.

See the paper for more details.

Page 21: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

Inferring Concepts in Concept Space

Image mapping: xi D(i)

Query concept mapping: cx Q(cx)

Ranking the given images:

1 2( ) [ ( ), ( ),..., ( )]

mc c cD i D i D i D i

1 2( ) [ ( | ), ( | ),..., ( | )]x x x x mQ c p c c p c c p c c

T( ) ( )( ( ), ( ))

|| ( ) || || ( ) ||x

xx

Q c D isim Q c D i

Q c D i

Page 22: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

The Whole Framework

Page 23: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

Experiments

Dataset NUS-WIDE Lite Version (55,615 images)

Low-level Features Color Histogram (CH) and Edge Direction Histogram

(EDH), combine directly.

Evaluation 81 concepts AP and MAP

Page 24: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

Experiments

Ex1: Comparisons among Different Learning Methods

Page 25: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

Experiments

Ex1: Comparisons among Different Learning Methods

Page 26: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

Experiments

Ex2: Concept Inference with and without Concept Space

Page 27: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

Experiments

Ex3: Inference with Tags vs. Inference with Ground-truth

We can achieve an MAP of 0.1598 by inference from tags in the concept space, which is comparable to the MAP obtained by inference from ground-truth of training labels.

Page 28: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

Summarization

Exploited the problem of inferring semantic concepts from community-contributed images and their associated noisy tags.

Three points: Sparse graph based label propagation Noisy tag handling Inference in a low-semantic-gap concept space

Page 29: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

Future Work

Training set construction from the web resource

Page 30: Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign

Thanks! Questions?