cvpr tutorial final

MACHINE VISION GROUP

TUTORIAL

CVPR 2011

June 24, 2011

Image and Video Description with Local Binary Pattern

Variants

Prof. Matti Pietikäinen, Prof. Janne Heikkilä{mkp,jth}@ee.oulu.fi

Machine Vision Group

University of Oulu, Finland

http://www.cse.oulu.fi/MVG/


Texture is an important characteristic of images and videos


Contents

1. Introduction to local binary patterns in spatial and

spatiotemporal domains (30 minutes)

2. Some recent variants of LBP (20 minutes)

3. Local phase quantization (LPQ) operator (50 minutes)

4. Example applications (45 minutes)

5. Summary and some future directions (15 minutes)


Part 1: Introduction to local binary patterns in spatial and

spatiotemporal domains

Matti Pietikäinen


Property

Pattern ContrastTransformation

LBP in spatial domain

2-D surface texture is a two dimensional phenomenon characterized by:

spatial structure (pattern)

Thus,

1) contrast is of no interest in gray scale invariant analysis

2) often we need a gray scale and rotation invariant pattern measure

Gray scale no effect

Rotation no effectaffects

affects

?affectsZoom in/out


Local Binary Pattern and Contrast operators

Ojala T, Pietikäinen M & Harwood D (1996) A comparative study of texture measures

with classification based on feature distributions. Pattern Recognition 29:51-59.

6 5 2

7 6 1

9 8 7

1

1

1 11

0

00 1 2 4

8

163264

128

example thresholded weights

LBP = 1 + 16 +32 + 64 + 128 = 241

Pattern = 11110001

C = (6+7+8+9+7)/5 - (5+2+1)/3 = 4.7

An example of computing LBP and C in a 3x3 neighborhood:

Important properties:

LBP is invariant to any

monotonic gray level change

computational simplicity


- arbitrary circular neighborhoods

- uniform patterns

- multiple scales

- rotation invariance

- gray scale variance as contrast measure

Ojala T, Pietikäinen M & Mäenpää T (2002) Multiresolution gray-scale and rotation

invariant texture classification with Local Binary Patterns. IEEE Transactions on Pattern

Analysis and Machine Intelligence 24(7):971-987.

Multiscale LBP


70

51

7062

83

65

78

47

80

-19

0-8

13

-5

8

-23

10

0

10

1

0

1

0

1

1. Sample 2. Difference 3. Threshold

1*1 + 1*2 + 1*4 + 1*8 + 0*16 + 0*32 + 0*64 + 0*128 = 15

4. Multiply by powers of two and sum


An example of LBP image and histogram


gc and gp -1):

T = t(gc, g0, , gP-1)

Circular neighborhood(g1,g3,g5,g7 interpolated)

gc

R g0

g2g1

g4

g3

g6

g5 g7

gc

g0

g2 g1

g4

g3

g6g5 g7

R

Texture at gc is modeled using a local neighborhood of radius R,

which is sampled at P (8 in the example) points:

Square neighborhood

Foundations for LBP: Description of local image texture


Without losing information, we can subtract gc from gp:

T = t(gc, g0- gc P-1- gc)

Assuming gc is independent of gp-gc, we can factorize above:

T ~ t(gc) t(g0-gc P-1-gc)

t(gc) describes the overall luminance of the image, which is unrelated to

local image texture, hence we ignore it:

T ~ t(g0-gc P-1-gc)

Above expression is invariant wrt. gray scale shifts

Description of local image texture (cont.)


average t(gc,g0- gc) average absolute difference

between t(gc,g0- gc) and t(gc) t(g0-gc)

Pooled (G=16) from 32 Brodatz textures used in

[Ojala, Valkealahti, Oja & Pietikäinen: Pattern Recognition 2001]

Exact independence of t(gc) and t(g0-gc P-1-gc ) is not warranted in

practice:

Description of local image texture (cont.)


Invariance wrt. any monotonic transformation of the gray scale is achieved

by considering the signs of the differences:

T ~ t(s(g0-gc), , s(gP-1-gc))

where

s(x) = {1, x 0

0, x < 0

Above is transformed into a unique P-bit pattern code by assigning

binomial coefficient 2p to each sign s(gp-gc):

P-1

LBPP,R = s (gp-gc) 2p

p=0

LBP: Local Binary Pattern


U=2

U=0

U=4 U=6 U=8


-

patterns of LBP

1 = black

0 = white


Rotation of Local Binary Patterns

edge (15)

(30) (60) (120) (240) (225) (195) (135)(15)

rotation

Spatial rotation of the binary pattern changes the LBP code:


Rotation invariant local binary patterns

Formally, rotation invariance can be achieved by defining:

LBPP,Rri = min{ROR(LBPP,R -1}

(15) (30) (60) (120) (240) (225) (195) (135)

mapping

(15)


Uniform

patterns

Bit patterns with 0

or 2 transitions

0 1 or 1 0

when the pattern is

considered circular

All non-uniform

patterns assigned to

a single bin

58 uniform patterns

in case of 8

sampling points


1 P-1

VARP,R= - (gp - m)2

P p=0

where

1 P-1

m = - gpP p=0

VARP,R

invariant wrt. gray scale shifts (but not to any monotonic transformation like LBP)

invariant wrt. rotation along the circular neighborhood

Operators for characterizing texture contrast

Local gray level variance can be used as a contrast measure:

Usually using complementary contrast leads to a better

performance than using LBP alone, but this is ignored in many

comparative studies!


Quantization of continuous feature space

bin 0 bin 1 bin 2 bin 3

cut values

equal area

total distribution

Texture statistics are described with discrete histogramsMapping needed for continuous-valued contrast features

Nonuniform quantizationEvery bin have the same amount of total data

Highest resolution of the quantization is used where the number of entries

is largest


Estimation of empirical feature distributions

0 1 2 3 4 5 6 7 ... B-1

VARP,RLBPP,R

riu2 / VARP,R

LBPP,Rriu2

Joint histogram of

two operators

Input image (region) is scanned with the chosen operator(s), pixel by pixel,

and operator outputs are accumulated into a discrete histogram

LBPP,Rriu2

0 1 2 3 4 5 6 7 ... P+

1


Example problem: Unsupervised texture segmentation

LBP/C was used as texture operator

Segmentation algorithm consists of three phases:1. hierarchical splitting

2. agglomerative merging

3. pixelwise classification

Ojala T & Pietikäinen M (1999) Unsupervised texture segmentation using

feature distributions. Pattern Recognition 32:477-486.

hierarchical

splitting

agglomerative

merging

pixelwise

classification


Segmentation examples

Natural scene #2: 192x192 pixels

Natural scene #1: 384x384 pixels


Multiscale analysis

Information provided by N operators can be combined simply by summing

up operatorwise similarity scores into an aggregate similarity score:

N

LN = Ln e.g. LBP8,1riu2 + LBP8,3

riu2 + LBP8,5riu2

n=1

Effectively, the above assumes that distributions of individual operators are

independent


Image regions can be e.g. re-scaled prior to feature extraction

Multiscale analysis using images at multiple scales


Nonparametric classification principle

In Nearest Neighbor classification, sample S is assigned to the class

of model M that maximizes

B-1

L(S,M) = Sb ln Mbb=0

Instead of log-likelihood statistic, chi square distance or histogram

intersection is often used for comparing feature distributions.

The histograms should be normalized e.g. to unit length before classification,

if the sizes of the image windows to be analyzed can vary.

The bins of the LBP feature distribution can also be used directly as

features e.g. for SVM classifiers.


Rotation revisited

Rotation of an image by degrees

Translates each local neighborhood to a new location

Rotates each neighborhood by degrees

LBP histogram Fourier features

Ahonen T, Matas J, He C & Pietikäinen M (2009) Rotation invariant image description

with local binary pattern histogram fourier features. In: Image Analysis, SCIA 2009

Proceedings, Lecture Notes in Computer Science 5575, 61-70.

If = 45°, local binary patterns

00000001 00000010,

00000010 00000100, ...,

11110000 11100001, ...,

Similarly if = k*45°,

each pattern is circularly

rotated by k steps


Rotation revisited (2)

In the uniform LBP histogram, rotation of input image by k*45° causes a

cyclic shift by k along each row:


Rotation invariant features

LBP histogram features that are

invariant to cyclic shifts along the rows are

invariant to k*45° rotations of the input image

Sum (original rotation invariant LBP)

Cyclic autocorrelation

Rapid transform

Fourier magnitude


LBP Histogram Fourier Features

.

.

.

LBP-HF feature vector

.

.

.

Fourier

magnitudes

Fourier

magnitudes

1

0

/2)),((),(P

r

Puri

PI ernUhunH

Fourier

magnitudes

),(),(),( unHunHunH


Example

Input imageUniform

LBP histogram

Original rot.invariant LBP (red)

LBP-Histogram fourier (blue)


Description of interest regions with center-symmetric LBPs

Heikkilä M, Pietikäinen M & Schmid C (2009) Description of interest regions

with local binary patterns. Pattern Recognition 42(3):425-436.

n5

nc

n3 n1

n7

n0n4

n2

n6

Neighborhood

LBP =

s(n0 nc)20

+

s(n1 nc)21

+

s(n2 nc)2 2 +

s(n3 nc)2 3 +

s(n4 nc)24

+

s(n5 nc)25

+

s(n6 nc)26

+

s(n7 nc)2 7

Binary Pattern

CS-LBP =

s(n0 n4)20

+

s(n1 n5)21

+

s(n2 n6)22 +

s(n3 n7)23


Description of interest regions

InputRegion

x

y

CS-LBPFeatures

x

y

Featu

re

Region Descriptor

xy



Setup for image matching experiments

CS-LBP perfomed better than SIFT in image maching and categorization

experiments, especially for images with Illumination variations



LBP unifies statistical and structural approaches


Dynamic textures (R Nelson & R Polana: IUW, 1992; M Szummer & R

Picard: ICIP, 1995; G Doretto et al., IJCV, 2003)


Dynamic texture recognition

Determine the emotional

state ofthe face

Zhao G & Pietikäinen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern

Analysis and Machine Intelligence 29(6):915-928. (parts of this were earlier

presented at ECCV 2006 Workshop on Dynamical Vision and ICPR 2006)


Dynamic textures

An extension of texture to the temporal domain

Encompass the class of video sequences that

exhibit some stationary properties in time

Dynamic textures offer a new approach to motion

analysis

- general constraints of motion analysis (i.e. scene is

Lambertian, rigid and static) can be relaxed [Vidal et al.,

CVPR 2005]


Volume Local Binary Patterns (VLBP)

Sampling in volume

Thresholding

Multiply

Pattern


LBP from Three Orthogonal Planes (LBP-TOP)

0 2 4 6 8 10 12 14 160

5

10x 10

4

P: Number of Neighboring Points

Length

of

Featu

re V

ecto

r

Concatenated LBPVLBP


32

10

-1-2

-3-1

0

1

3

2

1

0

-1

-2

-3

XT

Y

-3 -2 -1 0 1 2 3-3

-2

-1

0

1

2

3

X

Y

-3 -2 -1 0 1 2 3-1

0

1

X

T

-3 -2 -1 0 1 2 3-1

0

1

Y

T


LBP-TOP


DynTex database

Our methods outperformed the state-of-the-art in experimentswith DynTex and MIT dynamic texture databases



Results of LBP from three planes

5 10 15 20 25 300

0.2

0.4

0 100 200 300 400 500 600 700 8000

0.05

0.1

0.15

0.2

LBP XY XZ YZ Con weighted

8,8,8,1,1,1 riu2 88.57 84.57 86.29 93.14 93.43[2,1,1]

8,8,8,1,1,1 u2 92.86 88.86 89.43 94.57 96.29[4,1,1]

8,8,8,1,1,1 Basic 95.14 90.86 90 95.43 97.14[5,1,2]

8,8,8,3,3,3 Basic 90 91.17 94.86 95.71 96.57[1,1,4]

8,8,8,3,3,1 Basic 89.71 91.14 92.57 94.57 95.71[2,1,8]


Part 2: Some recent variants of LBP

Matti Pietikäinen


Gabor filtering for extracting more macroscopic information [Zhang et

al., ICCV 2005]

Preprocessing for illumination normalization [Tan & Triggs, AMFG 2007]

Edge dection- used to enhance the gradient information

Preprocessing prior to LBP feature extraction


Square or circular neighborhood is normally used- circular neighborhood important for rotation-invariant operators

Anisisotropic neighborhood (e.g. elliptic)

- improved results in face recognition [Liao & Chung, ACCV 2007,

and in medical image analysis [Nanni et al., Artif. Intell. Med. 2010]

Encoding similarities between patches of pixels [Wolf et al., ECCV 2008]

- they characterize well topological structural information of face appearance

Neighborhood topology


Using mean or median of the neihborhood for thresholding

Using a non-zero threshold [Heikkilä et al., IEEE PAMI 2006]

Local tenary patterns - encoding by 3 values [Tan & Triggs, AMGF 2007]

Extended quinary patterns encoding by 4 values [Nanni et al., Artif.

Intell. Med. 2010]

Soft LBP [Ahonen & Pietikäinen, Finsig 2007]

Scale invariant local ternary pattern [Liao et al., CVPR 2010]- for background subtraction applications

Thresholding and encoding


Robust LBP [Heikkilä et al., PAMI 2006]

In robust LBP, the term s(gp gc) is replaced with s(gp gc + a)

Allows bigger changes in pixel values without affecting thresholding results

- improved results in background subtraction [Heikkilä et al., PAMI 2006]

- was also used in CS-LBP interest region descriptor [Heikkilä et al., PR 2009]


Binary code: 11000000

0

0 0

0

10 1

0

0

0 0

1

-10 -1

1

0

0 0

1

00 0

1

Binary code: 00001100

Ternary code:1100(-1)(-1)00

Local ternary patterns (LTP) [Tan & Triggs, AMGF 2007]


Enlongated quinary patterns (EPQ) [Nanni et al., Artif. Intell. Med. 2010]

- Different ellipse orientations can be

used to optimize the performance


Feature selection e.g. with AdaBoost to reduce the number of bins [Zhang

et al., LNCS 2005]

Subspace methods projecting LBP features into a lower-dimensional

space [Shan et al., ICPR 2006], [Chan et al., ICB 2007]

Learning the most dominant and discriminative patterns [Liao et al., IEEE

TIP 2009], [Guo et al., ACCV 2010]

Feature selection and learning


Input imageUniform

LBP histogram

A learning-based LBP using Fisher separation criterion

[Guo et al., ACCV 2010]


LBP was designed as a complementary measure of local contrast,

using joint LBP/C or LBP/VAR histograms

LBPV puts the local contrast into 1-dimensional histogram [Guo et al.,

Pattern Recogn. 2010]

Completed LBP (CLBP) considers complementary sign and magnitude

vectors [Guo et al., IEEE TIP 2010]

Weber law descriptor (WLD) includes excitation and orientation

components [Chen et al., IEEE PAMI 2010]

Use of complementary contrast/magnitude information


Completed LBP [Guo et al., IEEE TIP 2010]

a) 3 x 3 sample block

b) The local differences

c) The sign component

d) The magnitude component


WLD: Weber law descriptor [Chen et al., PAMI 2010]

Composed of

excitation

and orientation

components


Local Phase Quantization (LPQ)

Janne Heikkilä

Tutorial: Image and Video Description with

Local Binary Pattern Variants


Motivation

Local descriptors are widely used in image analysis

Interest point descriptors (e.g. SIFT, SURF)

Texture descriptors (e.g. LBP, Gabor texture features)

Descriptors should be robust to various degradations including

geometric distortions, illumination changes and blur.

Blur-sensitivity of the descriptors has not been much studied in

the literature.

Sharp images are often assumed.

We propose using Fourier phase information for blur insensitive

texture analysis.


Previous work on phase based methods

Phase correlation [Kuglin & Hines, ICCS 1975] has been

successfully used for image registration.

Fourier features based on

the phase rather than the amplitude do not seem to be useful for

Oppenheimer & Lim [Proc. IEEE 1981] showed the importance

of the phase information in signal reconstruction.

Daugman [PAMI 1993] used phase-quandrant coding of multi-

scale 2-D Gabor wavelet coefficients for iris recognition.

Fischer & Bigün [SCIA 1995] presented a method for texture

bondary tracking based on Gabor phase.

Zhou et al. [ICIP 2001] introduced a texture feature based on a

histogram of local Fourier coefficients (magnitude and phase).

Zhang et al. [TIP 2007] proposed histogram of Gabor phase

patterns (HGPP) for face recognition.


Image blur

Blur is typically caused by defocus or motion.

Blur is usually harmful for image analysis.

Two approaches:

restoration (deblurring)

using blur-insensitive features


Performance of some texture analysis

methods under blurOutex database with

artificially generated circular

blur.

Examples: blur with radius

r=0,1,2.

Methods:

LBP: [Ojala et al., PAMI

2002]

Gabor: [Manjunath & Ma,

PAMI 1996]

Results

Significant drop-off in the

accuracy!

0 0.5 1 1.5 20

20

40

60

80

100

Circular blur radius [pixels]

Cla

ssifi

catio

n ac

cura

cy

LBP8,3

Gabor


Blur model

Image blur can be modelled by

where f is the image and h is the PSF.

Examples of discrete PSFs: (a) out of focus blur, (b) linear

motion blur, (c) Gaussian blur, and (d) arbitrary blur.

In many cases the PSF is centrally symmetric.


Blur invariance of phase spectrum

In frequency domain the blur model corresponds to

or

For centrally symmetric blur

We notice that


Blur invariance of phase spectrum

image

Original

Blurred

(circular

blur d=4)

Phase

remains

unchanged

amplitude spectrum phase spectrum


What frequencies should be used?

Frequency responses of 3 centrally symmetric PSFs:

For low frequencies typically H(u) 0

It is safe to select low frequency phase angles as blur invariants.

Low frequency components carry most of the information.


From global to local

For local descriptors it is necessary to compute the phase

locally.

Local frequency characteristics can be extracted using

frequency selective filters.

Higher spatial resolution implies lower frequency resolution and

vice versa.

Local phase cannot be measured accurately!

Blur does not have to be spatially invariant.


Computing the local phase

A set of frequency selective filters is needed. Selection criteria:

Low spatial frequency (guarantees that H(u) 0)

Narrow band (less distortion)

The phase is derived from the analytic signal

fA(x) = f(x) i fH(x)

where fH(x) is the Hilbert transform of f(x)

Filters are complex valued (real and imaginary parts).

1-D case: quadrature filters (Gabor, log-Gabor, derivative of

Gaussian, Cauchy, etc.)

2-D case: not well-defined, many alternatives:

Monogenic signal (isotropic)

Directional 2-D quadrature filters

Short-term Fourier transform (STFT)


Examples of complex valued filters

Four 15 by15 filters with lowest non-zero frequency components:

Gaussian derivative

Row 1: real part,

Row 2: imaginary part,

Row 3: frequency response

Notice: Filters are strongly

truncated!

STFT (uniform window) STFT (Gaussian window)


Short-Term Fourier Transform (STFT)

Discrete version of the STFT is defined by

where k is the position and u is the 2-D frequency.

Various alternatives exist for the window function w(x), for

example (m is the window size):

(Gabor filter)

STFT is separable

It can be implemented with 1-D convolutions.


Quantization of the phase angle

Phase angles could be used directly but it would result in long feature vectors.

Quantizing the phase to four angles [0, /2, , 3 /2] causes only moderate distortion.

original phase quantized


Local Phase Quantization (LPQ)

The local frequency coefficients F(k, u) are computed for all

pixels at some frequencies .

We use the following frequencies (L = 4):

where a = 1/m.

The coefficients are quantized using:

that results in a two-bit code.

An 8-bit codeword is obtained from 4 coefficients.

Codewords are histogrammed and used as a feature vector

(LPQ descriptor).

Re0 1

2 3

Im


Basic LPQ descriptor

Illustration of the algorithm:


Experimental results (1)

Classification with artificially blurred textures (Outex TC 00001):

LPQ

(uniform)

LBP

Gabor

filter bank

m = 7


Can we do better?

The filter responses are correlating. Example: responses of filters

5 and 7:

Scalar quantization is efficient only if the samples are

independent.

Performance can be improved using decorrelation (PCA).


Image model

Neighboring pixels are strongly correlated in natural images.

Let denote the pixel positions

in an m by m image patch

Pixels in fk are considered as realizations of a random process.

Correlation coefficient between adjacent pixels is denoted by .

Covariance between two positions li and lj is assumed to be

exponentially related to their Euclidean distance so that

. Notice: blur is now assumed to be isotropic!

The covariance matrix of fk can be expressed as

Covariance matric could be also learned from training data.


STFT revisited

Using vector notation we can rewrite the STFT as

where

Let us consider all the frequency samples .

We separate the real and imaginary of the filters:

where

and


STFT revisited (2)

The frequency samples and the pixel values have linear

dependence:

where

and


Decorrelated LPQ

The covariance matrix of the frequency samples becomes

If L = 4, D is an 8 by 8 matrix.

We can employ whitening transform

where V is an orthonormal matrix derived by using SVD:

The vector Gk is quantized

8-bit integers obtained are used in the same way as in the basic

LPQ.Reference: Ojansivu V & Heikkilä J (2008) Blur insensitive texture classification using local

phase quantization. Image and Signal Processing, ICISP 2008 Proceedings, Lecture Notes

in Computer Science 5099:236-243.


Experimental results (2)

Classification with artificially blurred textures (Outex TC 00001):

LPQ

(uniform)

LBP

Gabor

filter bank

LPQ

(decorrelated)

m = 7,

= 0.9

BIF


Comparison with other texture descriptors

* truncated filters** inherently not rotation invariant*** with the CUReT dataset

LPQ BIF

[Grosier &

Griffin, CVPR

2008]

VZ-joint

[Varma &

Zisserman,

CVPR 2003]

VZMR8

[Varma &

Zisserman,

IJCV 2005]

Filter bank yes* yes no yes

Number of filters 8 24 - 38

Multiscale no yes no yes

Rotation invariant no** yes no** yes

Codebook fixed fixed learned learned

Histogram length 256 1296 610*** 610***


Results with different filters

Three types of filters (m = 7) were tested

STFT using uniform weighting with and without decorrelation

STFT using Gaussian weighting (Gabor filter) w & w/o decorrelation.

Quadrature filter (derivative of Gaussian) w & w/o decorrelation.

Outex


Results with different filters (2)

Sharp images (Outex02, m = 3):

Varying filter sizes:

Brodatz data (m = 7):


Application: face recognition

A modified version of [Ahonen et al., PAMI 2006].

CMU PIE database

Face Recognition Grand Challenge (1.0.4)

Reference: Ahonen T, Rahtu E, Ojansivu V & Heikkilä J (2008) Recognition of blurred faces

using local phase quantization. Proc. 19th International Conference on Pattern Recognition

(ICPR 2008), Tampa, FL, 4p.


Rotation invariant LPQ

We define blur insensitive local characteristic orientation by

where i =2 i/M and M is the

number of samples (we use M = 36).

Characteristic orientation is used to

normalize the orientation locally for

each location k.

LPQ is computed for the normalized

patches.

Results with Outex10 (rotation

and blur):

Reference: Ojansivu V, Rahtu E & Heikkilä J (2008) Rotation invariant blur insensitive texture

analysis using local phase quantization. Proc. 19th International Conference on Pattern

Recognition (ICPR), Tampa, FL, 4 p.

RI-LPQ

LBP-HF

LBPRIU2


Spatio-temporal texture analysis

A local patch in a video is considered as a 3-D volume.

3-D STFT can be computed and the phase can be extracted.

Example:

Original frames Magnitude part Phase part


Spatio-temporal volume LPQ

For a 3-D descrete domain STFT there are 13 independent low

frequency coefficient (excluding the DC-coefficient):

After 2-bit quantization this would lead to

26-bit presentation, and the histogram

would be 226 6.7e+7 which is too much.

PCA is applied to compress the

representation.

The procedure is almost the same

as with the 2-D case.

Data vector is reduced from 26 to 10,

which leads to a 1024-bin histogram.Reference: Päivärinta VJ, Rahtu E & Heikkilä J (2011)

Volume local phase quantization for blur-insensitive dynamic texture classification.

SCIA 2011, LNCS 6688, pp. 360 369.


Spatio-temporal model

The covariance between the pixels values in a 3-D volume is

assumed to follow the exponential model:

where

and

s and t are the correlation coefficients of spatially and

temporally adjacent pixels, respectively.

Only 10 most significant eigenvectors are selected for the PCA

from


Volume LPQ (VLPQ)


LPQ-TOP

First proposed by Jiang et al. [FG 2011].

Similar to LBP-TOP

LBP has been replaced with LPQ.

LPQ is computed in three planes

XY, XT, and YT.

Histograms are concatenated.


Experiments

Experiments are made using the DynTex++ database from

Ghanem B. & Ahuja N. (2010) Maximum margin distance learning

for dynamic exture recognition. In: K. Daniilidis, P. Maragos & N.

Paragios (eds.) Computer Vision - ECCV 2010, Lecture Notes in

Computer Science, vol. 6312, Springer Berlin / Heidelberg, pp.

223 236.

3600 gray scale dynamic textures of size 50×50×50. The

textures are divided into 36 classes, each holding 100 videos.

Examples:


Experimental results

Correlation coefficients s = 0.1, t = 0.1, uniform weighting,

window size 5 x 5 x 5.

Spatially

and

temporally

varying blur:

Notice: the maximum margin distance learning technique (Ghanem

B. & Ahuja N. ECCV-2010 ) achieved only 63.7 % accuracy with

sharp images.


Implementation and performance

Matlab implementations of LPQ, RI-LPQ, VLPQ and LPQ-TOP

can be downloaded from

http://www.cse.oulu.fi/Downloads/LPQMatlab

Execution times for a DynTex++ video sequence (50 x 50 x 50

pixels):

Platform: MATLAB R2010a on a 2.4 GHz, 96 GB Sunray server


Papers where LPQ has been usedChan C, Kittler J, Poh N, Ahonen T & Pietikäinen M (2009) (Multiscale) local phase

quantization histogram discriminant analysis with score normalisation for robust face

recognition, In Proc. IEEE Workshop on Video-Oriented Object and Event Classification,

633 640.

Nishiyama M, Hadid A, Takeshima H, Shotton J, Kozakaya T & Yamaguchi O (2011) Facial

deblur inference using subspace analysis for recognition of blurred faces. IEEE

Transactions on Pattern Analysis and Machine Intelligence 33(4): 838-845.

Brahnam S, Nanni L, Shi J-Y & Lumini A (2010) Local phase quantization texture descriptor

for protein classification., Proc. International Conference on Bioinformatics and

Computational Biology (Biocomp2010), Las Vegas, Nevada, USA, 7 p.

Nanni L, Lumini A & Brahnam S (2010) Local binary patterns variants as texture descriptors

for medical image analysis, Artificial Intelligence in Medicine 49(2):117-125.

Fiche C, Ladret P & Vu NS (2010) Blurred Face Recognition Algorithm Guided by a No-

Reference Blur Metric. Image Processing: Machine Vision Applications III, 9 p.

Jiang B, Valstar MF & Pantic M (2011) Action unit detection using sparse appearance

descriptors in space-time video volumes. Proc. 9th IEEE Conference on Automatic Face

and Gesture Recognition (FG 2011), Santa Barbara, CA, 314-321.

Yang S & Bhanu B (2011) Facial Expression Recognition Using Emotion Avatar Image.

Proc. Workshop on Facial Expression Recognition and Analysis Challenge FERA2011,

Santa Barbara, CA, 866-871.

Dhall A., Asthana A., Goecke R., and Gedeon T. (2011) Emotion Recognition Using PHOG

and LPQ features. Proc. Workshop on Facial Expression Recognition and Analysis

Challenge FERA2011, Santa Barbara (CA), 878-883.


Conclusions

Many previous works exist where phase information has been

utilized in image analysis.

Our contribution is the framework for constructing blur

insensitive texture descriptors that are both robust and

computationally efficient.

LPQ and its variants can be used for characterizing blurred still

images and videos.

Good performance is also achieved with non-blurred data.

STFT with uniform weighting seems to be a good choice for

computing the phase.

Filters can be truncated without loss of important information.

Decorrelation improves the accuracy when blur is isotropic.

LPQ has been already used by many researchers in fields of

medical image analysis and facial image analysis.


Part 4: Example applications

Matti Pietikäinen


Face analysis using local binary patterns

Face recognition is one of the major challenges in computer vision

Ahonen et al. proposed (ECCV 2004, PAMI 2006) a face descriptor based

on LBPs

This method has already been adopted by many leading scientists and

groups

Computationally very simple, excellent results in face recognition and

authentication, face detection, facial expression recognition, gender

classification


Face description with LBP

Ahonen T, Hadid A & Pietikäinen M (2006) Face description with local binary

patterns: application to face recognition. IEEE Transactions on Pattern Analysis

and Machine Intelligence 28(12):2037-2041. (an early version published at

ECCV 2004)

A facial description for face recognition:


Weighting the regionsBlock size Metrics Weighting

18 * 21

130 * 150

Feature vector length 2891


Local Gabor Binary Pattern Histogram Sequence [Zhang et al., ICCV

2005]

Illumination normalization by preprocessing] prior to LBP/LPQ feature extraction

[Tan & Triggs, AMFG 2007]

Improving the robustness of LBP-based face recognition


Illumination invariance using LBP with NIR imaging

S.Z. Li et al. [IEEE PAMI, 2007]


Case: Trusted biometrics under spoofing attacks (TABULA

RASA) 2010-2014 (http://www.tabularasa-euproject.org/)

The project will address some of the issues of direct

(spoofing) attacks to trusted biometric systems. This is an issue

that needs to be addressed urgently because it has recently

been shown that conventional biometric techniques, such as

fingerprints and face, are vulnerable to direct (spoof) attacks.

Coordinated by IDIAP, Switzerland

We will focus on face and gait recognition


Example of 2D face spoofing attack

LBP is very powerful, discriminating printing artifacts and differences in light

reflection

- outperformed results of Tan et al. [ECCV 2010], and LPQ and Gabor features


Automatic landscape mode detection

The aim was to develop and implement an algorithm that automatically

classifies images to landscape and non-landscape categories

The analysis is solely based on the visual content of images.

The main criterion is to find an accurate but still computationally light

solution capable of real-time operation.

Huttunen S, Rahtu E, Heikkilä J, Kunttu I & Gren J (2011) Real-time detection of landscape

scenes. Proc. Scandinavian Conference on Image Analysis (SCIA 2011), LNCS, 6688:338-347.


Landscape vs. non-landscape

Definition of landscape and non-landscape images is not

straightforward

If there are no distinct and easily separable objects present in a

natural scene, the image is classified as landscape

The non-landscape branch consists of indoor scenes and other

images containing man-made objects at relatively close distance


Data set

The images used for training and testing were downloaded from the

PASCAL Visual Object Classes (VOC2007) database and the Flickr

site

All the images were manually labeled and resized to QVGA

(320x240).

Training: 1115 landscape images and

2617 non-landscape images

Testing: 912 and 2140, respectively


The approach

Simple global image representation

based on local binary pattern (LBP)

histograms is used

Two variants:

Basic LBP

LBP In+Out

SVM classifier

Histogram

computation

SVM classifier

training

Feature

extraction


Classification results


Classification examples

Landscape as

landscape

(TP)

Non-landscape

as

landscape

(FP)

Non-landscape as

non-landscape

(TN)

Landscape

as

non-landscape

(FN)


Summary of the results


Real-time implementation

The current real-time implementation coded in C relies on the basic

LBPb

Performance analysis

Windows PC with Visual Studio 2010 Profiler

The total execution time for one frame was about 3 ms

Nokia N900 with FCam

About 30 ms


Demo videos

Reference: Huttunen S, Rahtu E, Kunttu I, Gren J & Heikkilä J (2011) Real-time detection of

landscape scenes. Proc. Scandinavian Conference on Image Analysis (SCIA 2011), LNCS,

6688:338-347.


Modeling the background and detecting moving objects

Heikkilä M & Pietikäinen M (2006) A texture-based method for modeling the

background and detecting moving objects. IEEE Transactions on Pattern

Analysis and Machine Intelligence 28(4):657-662. (an early version published

at BMVC 2004)


Roughly speaking, the background subtraction can be seen as a two-stage

process as illustrated below.

Background modeling

The goal is to construct and maintain a statistical representation of the

scene that the camera sees.

Foreground DetectionThe comparison of the input frame with the current background model.

The areas of the input frame that do not fit to the background model are

considered as foreground.


We use an LBP histogram computed over a circular region around the

pixel as the feature vector.

The history of each pixel over time is modeled as a group of K weighted

LBP histograms: {x1,x2 xK}.

The background model is updated with the information of each new video

frame, which makes the algorithm adaptive.

The update procedure is identical for each pixel.

x1

x2

xK


Examples of detection results


Detection results for images of Toyama et al. (ICCV 1999)


Demo for detection of moving objects


LBP in multi-object tracking

Takala V & Pietikäinen M (2007) Multi-object tracking using color, texture, and motion.

Proc. Seventh IEEE International Workshop on Visual Surveillance (VS 2007),

Minneapolis, USA, 7 p.


Facial expression recognition from videos

Zhao G & Pietikäinen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(6):915-928.

Determine the emotional state of the face

Regardless of the identity of the face


Facial Expression Recognition

Mug Shot

[Feng, 2005][Shan, 2005]

[Bartlett, 2003][Littlewort,2004]

Dynamic Information

Action Units Prototypic Emotional

Expressions

[Tian, 2001][Lien, 1998]

[Bartlett,1999][Donato,1999]

[Cohn,1999]

Psychological studies [Bassili 1979], have demonstrated that humans do a better job in

recognizing expressions from dynamic images as opposed to the mug shot.

[Cohen,2003]

[Yeasin, 2004]

[Aleksic,2005]


(a) Non-overlapping blocks(9 x 8) (b) Overlapping blocks (4 x 3, overlap size = 10)

(a) Block volumes (b) LBP features (c) Concatenated features for one block volume

from three orthogonal planes with the appearance and motion


Database

Cohn-Kanade database :

97 subjects

374 sequences

Age from 18 to 30 years

Sixty-five percent were female, 15 percent were African-American,

and three percent were Asian or Latino.


Happiness Anger Disgust

Sadness Fear Surprise


Comparison with different approaches

People

Num

Sequence

Num

Class

Num

Dynamic Measure Recognition

Rate (%)

[Shan,2005] 96 320 7(6) N 10 fold 88.4(92.1)

[Bartlett, 2003] 90 313 7 N 10 fold 86.9

[Littlewort,

2004]

90 313 7 N leave-one-

subject-

out

93.8

[Tian, 2004] 97 375 6 N ------- 93.8

[Yeasin, 2004] 97 ------ 6 Y five fold 90.9

[Cohen, 2003] 90 284 6 Y ------- 93.66

Ours 97 374 6 Y two fold 95.19

Ours 97 374 6 Y 10 fold 96.26


Demo for facial expression recognition

Low resolution

No eye detection

Translation, in-plane and out-of-plane rotation, scale

Illumination change

Robust with respect to errors in

face alignment


Principal appearance and motion

from boosted spatiotemporal descriptors

Multiresolution features=>Learning for pairs=>Slice selection

1) Use of different number of neighboring points when computing the features in

XY, XT and YT slices

2) Use of different radii which can catch the occurrences in different space and

time scales

Zhao G & Pietikäinen M (2009) Boosted multi-resolution spatiotemporal descriptors forfacial expression recognition. Pattern Recognition Letters 30(12):1117-1127.


3) Use of blocks of different sizes to have global and local statistical

features

The first two resolutions focus on the

pixel level in feature computation, providing different local

spatiotemporal information

the third one focuses on the

block or volume level, giving more global information in space and

time dimensions.


Selected 15 most discriminative slices


Example images in different illuminations

Taini M, Zhao G, Li SZ & Pietikäinen M (2008) Facial expression recognition

from near-infrared video sequences. Proc. International Conference on Pattern Recognition (ICPR), 4 p.

Visible light (VL) : 0.38-0.75 m

Near Infrared (NIR) : 0.7 m-1.1 m


On-line facial expression recognition from NIR videos

NIR web camera allows expression recognition in near darkness.

Image resolution 320 × 240 pixels.

15 frames used for recognition.

Distance between the camera and subject around one meter.

Start sequences Middle sequences End sequences


Component-based approaches [Huang et al.,2010-2011]

Boosted spatiotemporal LBP-TOP features are extracted from areas

centered at fiducial points (detected by ASM) or larger areas

- more robust to changes of pose, occlusions

- can be used for analyzing action units [Jiang et al, FG 2011]


Visual speech recognition

Visual speech information plays an important role in speech recognition under noisy conditions or for listeners with hearing impairment.

A human listener can use visual cues, such as lip and tongue movements, to enhance the level of speech understanding.

The process of using visual modality is often referred to as lipreading which is to make sense of what someone is saying by watching the movement of his lips.

McGurk effect [McGurk and MacDonald 1976] demonstrates that inconsistency between audio and visual information can result in perceptual confusion.

Zhao G, Barnard M & Pietikäinen M (2009). Lipreading with local spatiotemporal descriptors. IEEE Transactions on Multimedia 11(7):1254-1265.


System overview

Our system consists of three stages.

First stage: face and eye detectors, and the localization of mouth.

Second stage: extracts the visual features.

Last stage: recognize the input utterance.


Local spatiotemporal descriptors for visual information

(a) Volume of utterance sequence

(b) Image in XY plane (147x81)

(c) Image in XT plane (147x38) in y =40

(d) Image in TY plane (38x81) in x = 70

Overlapping blocks (1 x 3, overlap size = 10).

LBP-YT images

Mouth region images

LBP-XY images

LBP-XT images


Features in each block volume.

Mouth movement representation.


Experiments

Database:

Our own visual speech database: OuluVS Database

Totally, 817 sequences from 20 speakers were used in the experiments.

C1 Excuse me C6 See you

C2 Good bye C7 I am sorry

C3 Hello C8 Thank you

C4 How are you C9 Have a good time

C5 Nice to meet you C10 You are welcome


Experimental results - OuluVS database

Mouth regions from the dataset.

Speaker-independent:

C1 C2 C3 C4 C5 C6 C7 C8 C9 C100

20

40

60

80

100

Phrases index

Reco

gniti

on re

sults

(%)

1x5x3 block volumes1x5x3 block volumes (features just from XY plane)1x5x1 block volumes


Selected 15 slices for phrases See you Thank you

These phrases were most difficult to

recognize because they are quite similar in

The selected slices are mainly in the first and

second part of the phrase.

different throughout the whole utterance, and the

selected features also come from the whole

pronunciation.

Selecting 15 most discriminative

features


Demo for visual speech recognition


LBP-TOP with video normalization [Zhou et al., CVPR 2011]

With normalization nearly 20%

improvement in speaker independent

recognition is obtained


Activity recognition

Kellokumpu V, Zhao G & Pietikäinen M (2009) Recognition of human actions

using texture. Machine Vision and Applications (available online).


Texture based description of movements

We want to represent human movement with local

properties

> Texture

But texture in an image can be anything? (clothing, scene

background)

> Need preprocessing for movement representation

> We use temporal templates to capture the dynamics

We propose to extract texture features from temporal templates

to obtain a short term motion description of human movement.

Kellokumpu V, Zhao G & Pietikäinen M (2008) Texture based description of

movements for activity analysis. Proc. International Conference on Computer Vision Theory and Applications (VISAPP), 1:206-213.


Overview of the approach

Silhouette representation

LBP feature extraction

HMM modeling

MHI MEI

Silhouette representation

LBP feature extraction

HMM modeling

MHI MEI


Features

w

w

w

w

1

2

3

4

w

w

w

w

1

2

3

4


Hidden Markov Models (HMM)

Model is defined with:

Set of observation histograms H

Transition matrix A

State priors

Observation probability is

taken as intersection of the

observation and model

histograms:

),min()|( iobsitobs hhqshP

a23

a 11 a 22 a 33

a 12


Experiments

Experiments on two databases:

Database 1:

15 activities performed by 5 persons

Database 2 - Weizmann database:

10 Activities performed by 9 persons

Walkig, running, jumping, skipping etc.


Experiments HMM classification

Database 1 15 activities by 5 people

LBP

Weizmann database 10 activities by 9 people

LBP Ref. Act. Seq. Res.

Our method 10 90 97,8%

Wang and Suter 2007 10 90 97,8%

Boiman and Irani 2006 9 81 97,5%

Niebles et al 2007 9 83 72,8%

Ali et al. 2007 9 81 92,6%

Scovanner et al. 2007 10 92 82,6%

MHI 99%

MEI 90%

MHI + MEI 100%

8,2

4,1


Activity recognition using dynamic textures

Instead of using a method like MHI to incorporate

time into the description, the dynamic texture features

capture the dynamics straight from image data.

When image data is used, accurate segmentation of

the silhouette is not needed

Instead a bounding box of a person is sufficient!!

Kellokumpu V, Zhao G & Pietikäinen M (2008) Human activity recognition using

a dynamic texture based method. Proc. British Machine Vision Conference (BMVC ), 10 p.


Dynamic textures for action recognition

Illustration of xyt-volume of a person walking

yt

xt


Dynamic textures for action recognition

Formation of the feature histogram for an xyt volume

of short duration

HMM is used for sequential modeling

Feature histogram of a bounding volume


Action classification results Weizmann dataset

Classification accuracy 95,6% using image data

1.00

1.00

1.00

.78.22

1.00

1.00

1.00

.11.11.78

1.00

1.00

1.00

1.00

1.00

.78.22

1.00

1.00

1.00

.11.11.78

1.00

1.00Bend

Jack

Jump

Pjump

Run

Side

Skip

Walk

Wave1

Wave2

Bend

Jack

Jum

p

Pju

mp

Run

Sid

e

Skip

Walk

Wave1

Wave2


Action classification results - KTH

.980.020

.855.145

.032,108.860

.977.020.003

.01.987.003

.033.967

.980.020

.855.145

.032,108.860

.977.020.003

.01.987.003

.033.967

Box Clap Wave Jog Run Walk

Clap

Wave

Jog

Run

Walk

Box

Classification accuracy 93,8% using image data


Dynamic textures for gait recognition

Feature histogram of the whole volume

xt xyyt

),min( ji hhSimilarity

Kellokumpu V, Zhao G & Pietikäinen M (2009) Dynamic texture based gait

recognition. Proc. International Conference on Biometrics (ICB ), 1000-1009.


Experiments - CMU gait database

CMU database

25 subjects

4 different conditions

(ball, slow, fast, incline)

B F S B F

S


Experiments - Gait recognition results


Dynamic texture synthesis

Guo Y, Zhao G, Chen J, Pietikäinen M & Xu Z (2009) Dynamic texture synthesis

using a spatial temporal descriptor. Proc. IEEE International Conference on Image Processing (ICIP), 2277-2280.

Dynamic texture synthesis is to provide a continuous and infinitely

varying stream of images by doing operations on dynamic textures.


Introduction

Basic approaches to synthesize dynamic textures:

- parametric approaches

physics-based

method and image-based method

- nonparametric approaches: they copy images chosen from original sequences and depends less on texture properties than parametric approaches

Dynamic texture synthesis has extensive applications in:

- video games

- movie stunt

- virtual reality


Synthesis of dynamic textures using a new representation

- The basic idea is to create transitions from frame i to frame j anytime the successor of i is similar to j, that is, whenever Di+1, j is small.

A. Schödl, R. Szeliski, D. Salesin, and I. Essa

Proc. ACM SIGGRAPH, pp. 489-498, 2000.


When transitions of video texture

are identified, video frames are

played by video loops

Match subsequences by filtering the difference matrix Dij

with a diagonal kernel with weights

[w m,...,wm

Distance measure can be updated by

summing future anticipated costs

Calculate the concatenated local binary pattern

histograms from three orthogonal planes for each

frame of the input video

Compute the similarity measure Dij between frame

pair Ii and I j by applying Chi-square to the

histogram of representation

- The algorithm of the dynamic texture synthesis:

1. Frame representation;

2. Similarity measure;

3. Distance mapping;

4. Preserving dynamics;

5. Avoid dead ends;

6. Synthesis

To create transitions from frame i to j when i is similar

to j , all these distances are mapped to probabilities

through an exponential function Pij. The next frame to

display after i is selected according to the distribution

of Pij.


Synthesis of dynamic textures using a new representation

An example:

Considering that there are three transitions: i n j n ( n = 1 , 2 , 3 ) , loops

from the source frame i to the destination frame j would create new image

paths, named as loops. A created cycle is shown as:


Experiments

We have tested a set of dynamic textures, including natural scenes and

human motions.

(http://www.texturesynthesis.com/links.htm and DynTex database, which

provides dynamic texture samples for learning and synthesizing.)

The experimental results demonstrate our method is able to describe the DT

frames from not only space but also time domain, thus can reduce

discontinuities in synthesis.

Demo 1oDemo 1 i Demo 2 i Demo 2 o


Experiments

Dynamic texture synthesis of natural scenes concerns temporal

changes in pixel intensities, while human motion synthesis

concerns temporal changes of body parts.

The synthesized sequence by our method maintains smooth

dynamic behaviors. The better performance demonstrates its

ability to synthesize complex human motions.


Detection and tracking of objects

- Object detection [Zhang et al., IVC 2006]

- Human detection [Mu et al., CVPR 2008; Wang et al., ICCV 2009]

- On-line boosting [Grabner & Bishof, CVPR 2006]

Biometrics

- Fingerprint matching [Nanni & Lumini, PR 2008]

- Finger vein recognition [Lee et al., IJIST 2009]

- Touch-less palmprint recognition [Ong et al., IVC 2008]

- Gait recognition [Kellokumpu et al., 2009]

- Eye localization [Kroon et al., CVIU 2009]

- Face recognition in the wild [Wolf et al., ECCV 2008]

- Face verification in web image and video search [Wang et al., CVPR 2009]

Examples of using LBP in different applications


Visual inspection

- Paper characterization [Turtinen et al.., IJAMT 2003]

- Separating black walnut meat from shell [Jin et al., JFE 2008 2009]

- Fabric defect detection [Tajeripur et al., EURASIP JASP 2008]

Biomedical applications

- Cell phenotype classification [Nanni & Lumini, Artif. Intell. Med. 2008]

- Diagnosis of renal cell carcinoma [Fuchs et al., MICCAI 2008]

- Ulcer detection in capsule endoscope images [Li & Meng, IVC 2009]

- Mass false positive reduct. in mammographic images [Llado et al., CMIG 2009]

- Lung texture analysis in CT images [Sorensen et al., IEEE TMI 2010]

- Retrieval of brain MR images [Unay et al., IEEE TITM 2010]

- Quantitative analysis of facial paralysis [He et al., IEEE TBE 2009]


Video analysis, photo management, and interactive TV

- Concept detection [Wu et al., ICIP 2008; Li et al., CVPR 2008]

- Overlay text detection and extraction from videos [Kim & KIM, IEEE TIP 2009]

- Crowd estimation [Ma et al., ISIITA 2008]

- EasyAlbum interactive photo annotation system [Cui et al.,CM CHI 2007]

- Cognitive face analysis for interactive TV [Ho An & Chung, IEEE TCE 2009]


Part 5: Summary and some future directions

Matti Pietikäinen


Summary

Modern texture operators form a generic tool for computer vision

LBP and its variants are very effective for various tasks in computer

vision

The choice between LBP and LPQ depends on the problem at hand

The advantages of the LBP and its variants include

- computationally very simple

- can be easily tailored to different types of problems

- robust to illumination variations

- robust to localization errors

LPQ is also insensitive to image blurring

For a bibliography of LBP-related research, see

http://www.cse.oulu.fi/MVG/LBP_Bibliography


Some future directions

New LBP/LPQ variants are still emerging

Often a single descriptor is not effective enough

Multi-scale processing

Use of complementary descriptors

- CLBP, Gabor&LBP, SIFT&LBP, HOG&LBP, LPQ&LBP

Combining local with more global information (e.g. LBP & Gabor)

Combining texture and color

Combining sparse and dense descriptors

Machine learning for finding the most effective descriptors for

a given problem

Dynamic textures offer a new approach to motion analysis

- general constraints of motion analysis (i.e. scene is Lambertian, rigid

and static) can be relaxed


A book on LBP will be published by fall 2011


Thanks!

cvpr tutorial final

Documents