yan cui 2013.1.16

A novel supervised feature extraction and classification framework for land cover

recognition of the off-land scenario

Yan Cui

2013.1.16

1. The related work

2. The integration algorithm

framework

3. Experiments

The related work

Locally linear embedding

Sparse representation-based classifier

K-SVD dictionary learning

Locally linear embedding

LLE is an unsupervised learning algorithm

that computers low-dimensional,

neighbor-hood-preserving embedding of

high-dimensional inputs.

Specifically, we expect data point and

its neighbors to lie on or close to a

locally linear patch of the manifold and

the local reconstruction errors of these

patches are measured by

2

1 2( ) (1)

k

i ij ji je w x w x

2

1 2( ) (2)

k

i ij ji je w y w y

Sparse representation-based classifier

The sparse representation-based classifier

can be considered a generalization of

nearest neighbor (NN) and nearest

subspace (NS), it adaptively chooses the

minimal number of training samples

needed to represent each test sample.

1

1 2

11 12 1 1 2 1 2

[ , , , ]

[ , , , , , , , , , , , , , ] i c

c

m nn i i in c c cn

A A A A

x x x x x x x x x R

(3)my A R

( ) arg 00 min ,

s.t.

L

A y

(4)

( ) arg 11 min ,

s.t.

L

A y

(5)

)(ˆ ii Ay

2 2

2 2ˆmin ( ) ( ) = (6) i i ii

r y y y y A

K-SVD dictionaries learningThe original training samples have much

redundancy as well as noise and trivial information that can be negative to the recognition.

If the training samples are huge, the computation of SR will be time consuming, so an optimal dictionary is needed for the sparse representation and classification.

The K-SVD algorithm

2

02 0min . . ( 1, 2, , )0

ii i ix D s t T i n

The dictionary update stage:

Let be the training data matrix,

is the -th class training samples matrix, a test data can be well approximated by the linear combination of the training data, i.e.

The integration algorithm for supervised learning

1 2[ , , , ] m ncB B B B R

1 2[ , , , ] ( 1,2, , ) ii

m ni i i inB x x x R i c i

1

n

i iiy x

mRy

Let be the representation coefficient vector with respect to -class. To make SRC achieve good performance on all training samples, we expect the within class residual minimized, while the between class residual maximized, simultaneously. Therefore we redefine the following optimization problem:

22

2 12min ( ) ( ) i jj i

y B y B

( )i i

(15)

22

2 12min ( ) ( )i jj i

y D y D

( ) , ) (k k i j

k

(16)

Let is the representation coefficient vector with respect to -th class, so the optimization problem in Eq. (16) is turned to

( )i

i

22

2 12min ( ) ( )i iy D y D

(17)

In order to obtain the sparse representation coefficients, we want to learn an embedding map to reduce the dimensionality of and preserve the spare reconstruction. So the optimization problem in Eq. (17) is turned to

1 2[ , , , ] m ddW w w w R

2 2

12 2,min ( ) ( )T T T T

i iWW y W D W y W D

For a given test set , we can adaptivelylearn the embedding map, the optimal dictionary and the sparse reconstruction coefficients by the following optimization problem

1 2{ , , , }lU y y y

2 2

1,

ˆmin T T T T

FW FW U W D W U W D

The feature extraction and classification algorithm

Experiments for unsupervised learning

The effect of dictionary selection

Compare with pure feature extraction

Databases descriptions

UCI databases: the Gas Sensor Array Drift Data set and the Synthetic Control Chart Time Series Date Set.

Experiments


Compare with pure classification


Databases descriptions

Compare with pure classification

Thanks!

Question & suggestion?

yan cui 2013.1.16

Documents

optimal dictionary

training data matrix

pure feature extractionthanks

pure classificationcompare

test data

following optimization

data point

class residual minimized