jinhui tang †, shuicheng yan †, richang hong †, guo-jun qi ‡, tat-seng chua † † national...

Inferring Semantic Concepts from Community- Contributed Images and Noisy Tags

Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua †

† National University of Singapore‡ University of Illinois at Urbana-Champaign

Outline

Motivation

Sparse-Graph based Semi-supervised Learning

Handling of Noisy Tags

Inferring Concepts in Semantic Concept Space

Experiments

Summarization and Future Work

Web Images and Metadata

Our task

No manual annotation are required.

Methods Can be Used

With models: SVM GMM …

Infer labels directly: k-NN Graph-based semi-supervised methods

Normal Graph-based Methods

A common disadvantage: Have certain parameters that require manual

tuning Performance is sensitive to parameter tuning

The graphs are constructed based on visual distance Many links between samples with unrelated-concepts The label information will be propagated incorrectly.

Locally linear reconstruction: Still needs to select neighbors based on visual distance

Key Ideas of Our Approach

Sparse Graph based Learning

Noisy Tag Handling

Inferring Concepts in the Concept Space

Why Sparse Graph ?

Human vision system seeks a sparse representation for the incoming image using a few visual words in a feature vocabulary. (Neural Science)

Advantages: Reduce the concept-unrelated links to avoid the

propagation of incorrect information; Practical for large-scale applications, since the sparse

representation can reduce the storage requirement and is feasible for large-scale numerical computation.

Normal Graph v.s. Sparse Graph

Normal Graph Construction.

Sparse Graph Construction.

Sparse Graph Construction

The ℓ1-norm based linear reconstruction error minimization can naturally lead to a sparse representation for the images *.

* J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma. Robust face recognition via sparse representation. IEEE Transaction on Pattern Analysis and Machine Intelligence, 31(2):210–227, Feb. 2009

The sparse reconstruction can be obtained by solving the following convex optimization problem:

minw ||w||1 , s.t. x=Dw

w R∈ n : the vector of the reconstruction coefficients; x R∈ d : feature vector of the image to be reconstructed;D R∈ d*n (d < n) : a matrix formed by the feature vectors of the other images in the dataset.

Sparse Graph Construction (cont.)

Handle the noise on certain elements of x: Reformulate x = Dw+ , where Rξ ξ ∈ d is the noise term. Then :

ˆ 1ˆ ˆmin || || , . .s t w w x Bw( )[ ; ] d n dR B D I T T Tˆ [ ; ]w w

Set the edge weight of the sparse graph:

( ), ;

( 1), ;

0, ;

i

ij i

j if j i

w j if j i

if j i

w

w

Semi-supervised Inference

Result:

2

1

min || || , . .N

f i ij j i i ii j i

f w f s t f y if x L

* 1u uu ul

f M M y

T

T

:

ll lu

ul uu

symmetric matrix

M C C

C I W I W

M MM

M M

Semi-supervised Inference (cont.)

* 1u uu ul

f M M y The problem with :

Muu is typically very large for image annotation It is often computationally prohibitive to calculate its

inverse directly Iterative solution with non-negative constraints:

may not be reasonable since some samples may have negative contributions to the other samples

1uuM

Solution: Reformulate: uu u ulM f M y

The generalized minimum residual method (usually abbreviated as GMRES) can be used to iteratively solve this large-scale sparse system of linear equations effectively and efficiently.

Different Types of Tags

√: correct; ?: ambiguous; m: missing

Handling of Noisy Tags

We cannot assume that the training tags are fixed during the inference process.

The noisy training tags should be refined during the label inference.

Solution: adding two regularization terms into the inferring framework to handle the noise:

2 21 2 1

ˆ ˆmin || || || || || ||l l l f f Wf f f f y

Handling of Noisy Tags (cont.)

Solution: Set the original label vector as the initial estimation of ideal label

vector, that is, set , and then solve

and we can obtain a refined fl. Fix fl and solve

Use the obtained to replace the y in the previous graph-based method, and we can solve the sparse system of linear equations to infer the labels of the unlabeled samples.

2 21

ˆmin || || || ||l l f f Wf f fl̂ f y

2 2ˆ 1

1

ˆ ˆmin || || || ||l

l l l

f

f f f y

l̂f

Why Concept Space?

It is well-known that inferring concepts based on low-level visual features cannot work very well due to the semantic gap.

To bridge this semantic gap Construct a concept space and then infer the semantic

concepts in this space. The semantic relations among different concepts are

inherently embedded in this space to help the concept inference.

The requirements for the concept space

Low-semantic-gap: Concepts in the constructed space should have small semantic gaps;

Informative: These concepts can cover the semantic space spanned by all useful concepts (tags), that is, the concept space should be informative;

Compact: The set including all the concepts forming the space should be compact (i.e., the dimension of the concept space is small).

Concept Space Construction

Basic terms: Ω : the set of all concepts; Θ : the constructed concept set.

Three measures: Semantic Modelability: SM(Θ) Coverage of Semantic Concept Space: CE(Θ, Ω) Compactness: CP(Θ)=1/#(Θ)

Objective:

max ( ) ( , ) (1 ) ( )SM CE CP

Solution for Concept Space Construction

Simplification: fix the size of the concept space.

max ( ) (1 ) ( , ) , . . #( )SM CE s t m

Then we can transform this maximization to a standard quadratic programming problem.

See the paper for more details.

Inferring Concepts in Concept Space

Image mapping: xi D(i)

Query concept mapping: cx Q(cx)

Ranking the given images:

1 2( ) [ ( ), ( ),..., ( )]

mc c cD i D i D i D i

1 2( ) [ ( | ), ( | ),..., ( | )]x x x x mQ c p c c p c c p c c

T( ) ( )( ( ), ( ))

|| ( ) || || ( ) ||x

xx

Q c D isim Q c D i

Q c D i

The Whole Framework

Experiments

Dataset NUS-WIDE Lite Version (55,615 images)

Low-level Features Color Histogram (CH) and Edge Direction Histogram

(EDH), combine directly.

Evaluation 81 concepts AP and MAP

Experiments

Ex1: Comparisons among Different Learning Methods

Experiments

Ex2: Concept Inference with and without Concept Space

Experiments

Ex3: Inference with Tags vs. Inference with Ground-truth

We can achieve an MAP of 0.1598 by inference from tags in the concept space, which is comparable to the MAP obtained by inference from ground-truth of training labels.

Summarization

Exploited the problem of inferring semantic concepts from community-contributed images and their associated noisy tags.

Three points: Sparse graph based label propagation Noisy tag handling Inference in a low-semantic-gap concept space

Future Work

Training set construction from the web resource

Thanks! Questions?

jinhui tang †, shuicheng yan †, richang hong †, guo-jun qi ‡, tat-seng chua † † national...

Documents