6. feature extraction - shodhgangashodhganga.inflibnet.ac.in › bitstream › 10603 › 13306 ›...

107

6. FEATURE EXTRACTION

There is no universal or exact definition of what constitutes a feature for sign

recognition (George Caridakis et al., 2008). It often depends on the problem or the

type of language. A feature is defined as an “interesting” part of an image, and is used

as a starting point in main primitives for subsequent algorithms. The overall algorithm

will often only be as good as its feature detector. Consequently, the desirable property

for a feature detector is its repeatability: whether or not the same feature will be

detected in two or more different images of the same scene. The most important types

of features which can be considered when trying to identify the signs are spatial,

temporal and textural. The feature extraction stage is built and designed to process real

images (Ryszard S. Choras, 2007). The algorithms used in these systems are

commonly divided into three tasks: extraction, selection and classification. For a valid

classification, there has to be a rational nexus between the features and it is the most

critical assignment because the particular features made available for discrimination

directly influence the efficacy of the classification task. The end result of feature

extraction is a set of features, commonly called a feature vector, which constitutes a

representation of an image.

6.1 Analysis of features and extraction methods

The feature is defined as a function of one or more measurements, each of

which specifies some quantifiable property of an object, and is computed such that it

quantifies some significant characteristics of the object. The various features classified

and currently employed are

• General features: Independent features such as color, texture, and shape

According to the abstraction level, they can be further divided into:

- Pixel-level features: Features calculated at each pixel, e.g. color, location.

- Local features: Features calculated over the results of subdivision of the image

band of an image segmentation or edge detection (Thawar Arif, et al., 2009).

- Global features: Features calculated over the entire image or just regular

sub-area of an image.

108

• Domain-specific features: Application of dependent features such as human faces,

fingerprints and conceptual ones.

All features can be coarsely classified into low-level features and high-level

features. Low-level features can be extracted directly from the original images,

whereas high-level feature extraction depends on low level features.

The issue of choosing the features from the extracted vector should be guided

by the following concerns:

the features should carry enough information about the image and should not

require any domain-specific knowledge for their extraction.

they should be easy to compute in order to approach the feasibility of a large

image collection and rapid retrieval.

they should relate well to the human perceptual characteristics since users

finally determine the suitability of the images retrieved.

6.1.1 Color Features

The color feature is one of the most widely used visual features in image

classification. Images characterized by color features have many advantages (Ryszard

S. Choras, 2007), namely,

• Robustness : The color histogram is invariant to rotation of the image on the view

axis, and changes in small steps when rotated otherwise or scaled.

• Effectiveness: There is high percentage of relevance between the query image and

the extracted matching images.

• Implementation simplicity: The construction of the color histogram is a direct

process, including scanning the image, assigning color values to the resolution of the

histogram, and building the histogram using color components as indices.

• Computational simplicity: The histogram computation has O(x,y) complexity for

images of size x × y . The complexity for a single image match is linear, O(n), where

n represents the number of different colors, or resolution of the histogram.

109

• Low storage requirements: The color histogram size is significantly smaller than the

image itself, assuming color quantisation.

6.1.2 Texture Features

Texture is an important property of image and is a powerful regional descriptor

that helps in the retrieval process (Jong Kook Kim and Hyun Wook Park, 1999).

Texture, on its own does not have the capability of finding similar images, but it can

be used to classify textured images from non-textured ones and then be combined with

another visual attribute like color to make the retrieval more effective (Abdul kadir et

al., 2011).

Textural features are

o Statistical measures

• Entropy

• Homogeneity

• Contrast

o Wavelets

o Fractals

6.1.3 Shape Features

Shape is an important visual feature and one of the primitive feature for image

content description. Shape content description cannot be defined exactly because

measuring the similarity between shapes is difficult (Morteza Zahedi et al., 2006).

Therefore, two steps are essential in shape based image retrieval, they are: feature

extraction and similarity measurement between the extracted features. Shape

descriptors can be divided into two main categories: region-based which use the whole

area of an object for shape description and contour based which use local features as

boundary segments.

6.2 Proposed Combinational features for Classification

A variety of features can be used for sign language recognition. For recognition

of an ISL word, hand shapes are extracted as same as that of gesture motions and

110

figure 6.1 shows the ISL alphabet representation images. In proposed approach a hand

morph images are obtained by capturing through monochrome video camera. The

features used are capable of primarily describing the shape of the segmented signer's

hands, in order to represent the handshapes used by the signer, which are the main

source of information with regard to the interpretation of a specific sign.

Figure 6.1 : ISL a,b,c,d,e,f,g,i,k,l,m,n,o,p,q,r,s,t,u,v,w,x,z letters

A novel approach is introduced by applying a combination of features for sign

classification (Mahmoud, M. Zaki. and Samir, I. Shaheen. 2011). The extracted

features used in the classification module include the hand and the various stages of

the gesture. Concerning hand shape, extreme learning machine algorithm is used to

recognise each sign instance into one of the models created by training a unique model

for every corresponding class. The study and the experiments indicate that, the

combinational features can provide distinctive information in most cases, only an

appropriate combination can result in robust and confident user independent sign

language recognition. The great variability in gestures and signs, both in time, size,

position and interpersonal differences, makes the recognition task difficult. By

extracting features from image processing sequence classification can be done by a

discriminative classifier. In developing a sign language recognition system, it is

important to model both the motion (temporal characteristics) and shape (spatial

characteristics) of the hand (Nobuhiko Tanibata et al., 2002). More emphasis is

placed only on the spatial characteristics of the hand. Figure 6.2 shows the schematic

representation and figure 6.3 shows the feature extraction concepts.

111

Figure 6.2: Schematic view of gesture recognition process

Figure 6.3: Hand feature extraction

6.2.1 Need for Combinational features for sign recognition

Under different scene conditions, the performance of different feature detectors

will be significantly different. The nature of the background, existence of other objects

(occlusion), and illumination must be considered to determine the kind of features that

can be efficiently and reliably detected. Usually the hand shape and the movements

are of major concern in order to guess the word/sentence. The feature vector is a

single row column matrix of N elements and its computation involves time and

memory. In general, as the output classes are increased and the nonlinearity of

differentiation in the classes also gets increased, more feature vectors are necessary

for the object recognition technique.

Figure 6.4: Taxonomy of feature Extraction

112

Figure 6.4 shows the flow diagram for feature extraction module. In order to

extract the required features, two different types of information is analysed in the

image: (i) the intensity values of the pixels and (ii) their spatial interdependency.

Although all feature extraction methods use the information on the intensity values,

only a few use the spatial dependency between them. Once features are detected, a

local image patch around the feature can be extracted. This extraction may involve

quite considerable amounts of image processing. The result is known as a feature

descriptor or feature vector. The features extracted and used are mean intensity, area,

perimeter, diameter, centroid, energy, homogeneity, entropy and dissimilarity.

6.2.2 Structural features based on region properties

The task of segmenting an image and estimating features of image regions are

highly interdependent. Region properties used is a measure which sets the properties

for each connected component (object) in the binary image and is a logical array that

can have any number of dimensions.

The feature vector is extracted from each image by calculating geometrical

properties of the hand morph. The elements of the feature vectors are the lengths of

vectors that originate from the center of mass and end up to the fingertips. This

gives the geometrical feature of the handshapes.

(http://en.wikipedia.org/wiki/Shape_factor(image analysis_and_microscopy).

They are,

i. Area of the object

ii. Centroid

iii. Perimeter

iv. Diameter

The quantification of these properties enables to differentiate the hand shapes and the

statistics computed are listed for the purpose.

6.2.2.1 Area

It is a relatively noise-immune measure of object size, because every pixel in

the object contributes towards the measurement. It gives the area of the region

selected using Region of Interest. In general, the area of region is defined as eqn (6.1)

113

( ) ( , )A S I x y dydx (6.1)

where I(x,y) = 1, if the pixel is within a shape,(x,y)S, and 0 otherwise. In practice,

integrals are approximated by summations. As given in eqn (6.2) ,

( ) ( , )A S I x y A

(6.2)

where A is the area of one pixel in which area changes in scale. However, small

errors in the computation of the area do appear when applying a rotation

transformation owing to discretization of the image.

6.2.2.2 Perimeter

Perimeter is one of the structural properties of the region which is defined by a

list of co-ordinates and is the sum of the distances from each coordinate to the next.

The perimeter measurement can become distorted by the fractal nature of certain

boundaries. If x(t) and y(t) denote the parametric coordinates of a curve enclosing a

region S, then the perimeter of the region is defined as eqn (6.3),

2 2( ) ( ) ( )t

P S x t y t dt

(6.3)

This equation corresponds to the sums of all the infinitesimal arcs that define

the curve. In the discrete case, x(t) and y(t) are defined by a set of pixels in the image.

Thus the perimeter is given by eqn (6.4)

2 2

1 1( ) i i i i

t

P S x x y y (6.4)

where xi and yi represent the coordinates of the ith pixel forming the curve. Since

pixels are organized in a square grid, the terms in the summation takes only two

values.

6.2.2.3 Centroid

The geometrical centre of a body is known as its centroid. The center mass of

an object takes into account the gray level within the image more generally; the

centroid represents the point designated by the mean of the coordinates. If the

boundary is irregular, finding the mean is done using calculus. The centroid of a non-

114

self-intersecting closed region is defined by n vertices (x0,y0), (x1,y1), ..., (xn−1,yn−1) is

the point (Cx,Cy), where (Cx,Cy) is given in eqn (6.5)

1

1 1 1

0

1

1 1 1

0

1

6

1

6

n

x i i i i i i

i

n

y i i i i i i

i

C x x x y x yA

C y y x y x yA

(6.5)

where A is the regions signed area given in eqn (6.6)

1

1 1

0

1

2

n

i i i i

i

A x y x y

(6.6)

In these formule, the vertices are assumed to be numbered in order of their occurrence

along the polygon's perimeter, and the vertex (xn,yn) is assumed to be the same as

(x0,y0).

6.2.2.4 Diameter

The distance around a selected region is called the circumference. The distance

across a circle through the center is called the diameter. is the radius of the

circumference of a circle. Thus to obtain the value, the circumference is divided by

the diameter. This relationship is expressed in the following eqn (6.7):

c (6.7)

=π

where ,C is circumference and d is diameter.

6.2.3 Statistical features using intensity observation

In order to optimise skin recognition, an appropriate colour space is needed.

The objective of using a colour space or a colour model is to standardise the

specification of colours in some standard format. A colour space is a specification of a

coordinate system and a subspace within that system is represented by a single point.

When designing a skin based segmentation system an important factor to be

considered is choosing the correct colour space to minimize the false positive skin

detections. Colour spaces that are generally used include the Red, Green, Blue or RGB

model, the Cyan, Magenta, Yellow, Key or CMYK model and finally the Hue,

d

115

Saturation, Intensity or HSI model. RGB used in video cameras, is probably the most

commonly hardware-oriented model used today. Each colour in the RGB model

appears in its primary spectral components of red, green and blue. Using 24 bit colour,

each RGB component value ranges from 0, being the lowest intensity can be displayed

on a monitor, to 255 being the highest. All three components are used to produce a

single colour. The intensity component is given by eqn (6.8):

i = 1/3 (R + G + B) (6.8)

6.2.3.1 Mean Intensity

The mean intensity in the selected region of interest (ROI) is given in eqn (6.9):

,

1( , )

x yROI x y dxdy

N

(6.9)

The intensity-based features are extracted from the gray-level or color

histogram of the image. This type of features does not provide any information about

the spatial distribution of the pixels. The intensity histogram in a hand is employed to

define features. For the background corrected images, the gray value of each pixel is

converted to its corresponding optical density (OD) as given in eqn (6.10):

OD = log10 ( gray value of the background)

gray value of the pixel (6.10)

Subsequently, they compute the sum and the mean of the optical densities of

the pixels to define its intensity-based features. In addition to the features extracted in

each RGB color channel, intensity-based feature is also extracted based on difference

between the red and blue components.

6.2.4 Textural features based on GLCM

Selected computable textural features based on gray-tone spatial dependencies

illustrate their application in sign language recognition. Gray level co-occurrence

matrix is one of the most known texture analysis methods that estimate image

properties related to second order statistics. Each entry (i,j) , GLCM is correspond to

116

the number of occurrences of the pair of gray levels i and j which are a distance d,

apart in original image. GLCM is done by calculating how often a pixel with the

intensity (gray-level) value i occurs in a specific spatial relationship to a pixel with the

value j. Each element (i,j) in the resultant GLCM is simply the sum of the number of

times that the pixel with value i occurred in the specified spatial relationship to a pixel

with value j in the input image. The number of gray levels determines the size of the

GLCM. The gray-level co-occurrence matrix can also reveal certain properties about

the spatial distribution of the gray levels in the texture image and it is represented in

figure 6.5.

Figure 6.5: Spatial arrangements of pixels

Co-occurrence matrices are defined as a relative separation vector and it uses

each pair of pixels separated by the vector as matrix indices and the matrix element is

incremented. Shape characterises the texture by factors derived from it. Co-

occurrence, in general, can be specified in a matrix of relative frequencies P(i; j; d; θ)

with which two neighboring texture elements are separated by distance d at orientation

θ that occur in the image, one with property i and the other with property j. In gray

level co-occurrence, as a special case, texture elements are pixels and properties are

gray levels. A sample gray level matrix is given in figure 6.6.

(a) (b)

Figure 6.6 (a) 4x4 image with gray levels 0-3. (b) General form of co-occurrence

matrices P(i,j,d,θ) for gray levels 0-3 where #(i,j) stands for number of times gray level i

and j have been neighbours

117

6.2.4.1 Homogeneity

Homogeneity is which it measures the closeness of the distribution of elements

in the gray level matrix. The distribution parameter is also high when compared with

the evaluated methods indeed this helps in better segmentation. To quantitatively

characterise the homogeneous texture regions for similarity it is done using local

spatial statistics of the texture which is obtained by scale and orientation selective

Gabor filtering. The image is thus partitioned into a set of homogeneous texture

regions, and then the texture features associated with the regions are indexed in the

image data. GLCM Homogeneity are calculated for four direction (i.e. Ө= 0◦, 45◦, 90◦

or 135◦). A feature vector of size 4 is created for the image. Homogeneity is computed

as given in eqn (6.11),

𝑝(𝑖 ,𝑗 )

1+ 𝑖−𝑗 𝑖,𝑗 (6.11)

6.2.4.2 Dissimilarity

The measure of dissimilarity allows principled comparisons between

segmentations created by different algorithms, as well as segmentations on different

images. Texture features are used on the basis of dissimilarity between the two feature

vectors. The feature vector represents relative frequency distribution and it is

measured by the relative entropy, or Kullback-Leibler (K-L) divergence. D(g,p)

denote the divergence between two distributions and given as eqn (6.12),

𝑓𝑔 = 𝑓𝑔 ,𝑡 : 𝑡 = 1, … . , 𝑇 𝑎𝑛𝑑 𝑓𝑞 = 𝑓𝑞 ,𝑡 : 𝑡 = 1, … , 𝑇 (6.12)

Then, 𝐷 𝑔, 𝑞 is represented as eqn (6.13)

𝐷 𝑔, 𝑞 = 𝑓𝑔 ,𝑡𝑇𝑡=1 𝑙𝑜𝑔

𝑓𝑞 ,𝑡

𝑓𝑔 ,𝑡 (6.13)

This dissimilarity measure is asymmetric and does not represent a distance because

the triangle inequality is not satisfied.

6.2.4.3 Energy

Energy returns the sum of squared elements in the GLCM and the

mathematical representation of energy is given in eqn (6.14)

118

2

2

2

02

1

c

v

cmmcE (6.14)

6.2.4.4 Entropy

Entropy is a measure of disorder within a region, is a natural characteristic to

incorporate into a segmentation evaluation method. The Mathematical representation

of entropy is given in eqn (6.15)

)}',(log{)',(1 1

ggpggpFG

g

G

g

Har

Ent

(6.15)

6.2.5 Combinational Feature Vectors

Although it is possible to extract a large set of features, only a small subset of

them is used in the classification due to the curse of dimensionality (Prasad Gabbur,

2003). It states that as the dimensionality increases, the amount of required training

data increases exponentially. Moreover, there might be a strong correlation between

different features and therefore, there is an incentive to combine the features to

produce a feature vector (Ulrich von Agris et al., 2008). The features are grouped into

three following categories based on the information they provide. The important

challenge in this step is to find the most proper representation(s) and select a subset of

the features extracted from this representation(s).

The structural features provide information about the size and shape of hands.

The textural features provide information about the variation in the intensity

of a surface and quantify properties such as smoothness, coarseness, and

regularity.

The intensity-based features provide information on the intensity (gray-level

or color) histogram of the pixels located in hand shapes.

Feature vector consist of the combination of structural, textural and statistical

data. The vision data consists of the hand shape characteristics which are

experimented in various classifiers and recognizers.

119

Three image based feature extraction approaches are considered that are

applied for different classification techniques and there is no precise method that

would be the most suitable for all classification tasks. Methods used are 1. Region

properties 2. Image statistics and 3. GLCM. Due to the fact that there is no well-

grounded strong theory that would help to build up an automated system, a decision

support system that would accumulate separate facts, trends and dependencies

between the data characteristics and output parameters of classification schemes are

proposed and evaluated.

6.3 Sample dataset for Indian Sign Language

When it comes to the automatic recognition of sign language by computers, the

problem becomes extremely tough to handle (Oi Mean Foong et al., 2008). Since there

are no such easy clues to perfectly differentiate each hand signs, gesture

spotting/identification always remains a challenging task in real-time gesture

recognition and table 6.1 gives an idea about different types of gestures available

(Sushmita Mitra. and Tinku Acharya, 2007).

Table 6.1 : Different types of gestures

Types Meaning Examples

Symbolic Gesture Gesture have a single

meaning within each

culture

Sign language, command

Gesture

Deictic Gesture Gestures direct the

listener’s attention to

specific events or objects

in this environment.

Pointing Gestures

Iconic Gestures Gestures represent

meaningful objects or

actions

Predefined Gesture

Pantomimic Gesture Gestures that depict

objects or actions, with or

without accompanying

speech.

Mimic Gesture

120

The problem of occlusion between hand and face is a biggest challenge put

forth before the researchers. Moreover the interference between hand and face often

occurs in continuous sign language conversation.

Figure 6.7 : Sample Indian Sign Language images

Several body parts are involved in making a meaningful sign language gesture.

For example in Indian Sign Language (ISL) face parts like chin, cheeks, lips, eyes,

head, and nose are referred to represent a gesture and figure 6.7 shows some sample

images from ISL dictionary. The features derived from the ISL based gesture images

are given in table 6.2 as follows.

Table 6.2: Features extracted from the ISL datasets

S.No Features extracted

1. Mean Intensity

2. Area

3. Perimeter

4. Diameter

5. Centroid

6. Energy

7. Homogeneity

8. Entropy

9. Dissimilarity

121

6.4 Summary

The main objective of this study is to emphasize on the need for adequate textural

features in sign language recognition system since the drawbacks of the classical

approach of single type features are of time and space complexity. Combinational

features have a lot of impact on the performance of image recognition systems. By

combining these three feature types an enhancement in the performance of the system

can be accomplished considerably. Statistical and structural features of an image are

always considered to be an important attribute in recognition system. With it, textural

features of an image can be combined for the performance enhancement of the ISL

recognition. Experiments are carried out on the methods with various ISL images. The

result shows that the combination of features gives more accuracy which is proved

objectively. Selecting good features is a crucial step in any object recognition system.

Though it is possible to use the complete image as a feature, system's complexity

might increase prohibitively in such a case. Thus, a descriptive feature vector of small

size is used in this approach. To conclude, the idea front forward is definitely an

enhanced method which is an important qualifier for the gesture classification system

and various classification approaches is explained in next chapter, Classification.

6. feature extraction - shodhgangashodhganga.inflibnet.ac.in › bitstream › 10603 › 13306 ›...

Documents