feature extraction for speech recognition based on ...€¦ · web viewrecently, lda and improved...

Speech Feature Analysis Using Step-Weighted

Linear Discriminant Analysis

Jiang Hai, Er Meng Joo

School of Electrical and Electronic Engineering, Nanyang Technological University

S1-B4b-06, Nanyang Avenue, Nanyang Technological University, Singapore 639798

Telephone: (65) 67905472

EDICS: 2.SPEE

Abstract: In the speech feature extraction procedure, the relative simple strategy to promote

the discriminant of feature vectors is to plus their deltas. Followed the dimension of the

feature vector will increase remarkably. Therefore, how to effectively decrease the feature

space dimension is key to the performance of calculation. In this paper, a step-weighted linear

discriminant dimensionality reduction technique is proposed. Dimensionality reduction using

the linear discriminant analysis (LDA) is commonly based on optimization of certain

separability criteria in the output space. The resulting optimization problem using LDA is

linear, but these separability criteria are not related to the classification accuracy in the output

space directly. As a result, even the best weighting function among the input-space results in

poor classification of data in the output-space. Through the step-weighted linear discriminant

dimensionality reduction technique, we can adjust the weight function of between-class

scatter matrix based on the output-space when one dimension is reduced. We describe this

method and present an application to a speaker-independent isolated digit recognition task .

Keywords: Dimensionality reduction, Linear Discriminant Analysis, Speech recognition

- 1 -

Table 1 Notation Conventions Used in This Paper

1. Introduction

Dimensionality reduction is the process of mapping high dimensional patterns to a lower

dimensional subspace and is typically used as a preprocessing step in classification

application. The optimality criterion of choice for classification purposes is the Bayes error,

which is the minimum achievable classification error, given the underlying distribution.

However, it is a time consuming or unreliable task to estimate the Bayes error. Because the

difficulty in directly estimating the Bayes error, linear projections based on using the scatter

matrices are quite popular in dimensionality reduction for the purposes of classification. The

optimality criteria of Fisher’s Linear Discriminant is [1].

It is common to apply linear discriminant analysis (LDA) for statistical pattern classification

tasks to reduce computation and to decrease the dimension. The LDA transformation attempts

- 2 -

Counter and number of patterns

The total number of data samples

The total number of classes

The Euclidean distance between the means of class and class in the

input-space

The weight function

The number of training vectors in class

, A training pattern and the label. For and

, The within-class scatter matrix

, The between-class scatter matrix of the class means

, The mean of class

, The global sample mean

The transformation matrix is formed by

The mapping

The dimension of the input-space and output-space

to reduce dimension with keeping most of discrimination information in the feature space.

Recently, LDA and improved LDA has been applied to several problems, such as face

recognition[2][3] and speech recognition[4]. In speech recognition task, feature space

dimension can be increased by extending the feature vector to add a range of neighboring

frame data. Doing this will noticeably increase discrimination of feature space, but the

computation becomes impractical at the same time. Efficiently compress the dimension of

feature space is very useful in speech signal processing.

Due to an optimality criterion based on scatter matrices in general is not directly related to

classification accuracy. Therefore, a weighted scatter matrix is often constructed in which

smaller distances are more heavily weighted than larger distances [5]. However, the scatter

matrix is calculated in the input-space and the true value of scatter matrix in output-space is

far from correct when the dimensionality is reduced more than one dimension. Rohit Lotlikar

and Ravi Kothari proposed the Fractional-Step Dimensionality Reduction method to

overcome this problem [6]. However, they just considered the between-class scatter matrix

and without calculating the within-class scatter matrix. This will lead to lose useful

discriminant information in the procedure of projection. In this paper, we introduce the

concept of step-weighted dimensionality reduction, wherein, the dimensionality is reduced

from to at one dimension a step. In addition to describing the algorithm of step-

weighted LDA method, we present an application to the speaker-independent isolated digit

word recognition problem.

2 The conventional LDA

The LDA problem is formulated as follows [7]. Let be a feature vector. We seek to

find a transformation , with , such that in the transformed space,

minimum loss of discrimination occurs. In practice, is much smaller than . A common

form of an optimality criteria to be maximized is the function . In classical

LDA, the corresponding input-space within-class and between-class scatter matrix are defined

by,

- 3 -

(1)

(2)

(3)

(4)

The LDA is to maximize in some sense the ratio of between-class and within-class scatter

matrices after transformation. This will enable to choose a transform that keeps the most

discriminative information while reducing the dimension. Precisely, we want to maximize the

objective function

The columns of the optimum are the relative generalized eigenvectors corresponding to the

first maximal magnitude eigenvalues of the equation

(5)

3 Step-weighted Linear Discriminant Analysis (SW-LDA)

Because the definition of between-class scatter matrix is not directly related to classification

accuracy. Therefore, a weighted scatter matrix is often constructed in which smaller distances

are more heavily weighted than larger distances [5].

(6)

In conventional LDA, if we wish reduce the dimensionality from to ( is much smaller

than ), we would compute and its eigenvectors . We would obtain the

dimensional representation which is spanned by . When there are have enough

many classes, it is very possible that a pair of classes have the same orientation as or

- 4 -

. Because the are orthogonal respectively, the two classes

would heavily overlap in the dimensional space. Though the two classes are well-separated

in the original space, they were not sufficiently weighted in computing after projecting

some dimensions,

We gradually compress the data one dimension per step. At each dimensional reduction step,

we recompute the between-class and within-class scatter matrix based on the changed

interclass distances and intraclass distances and rebuild the weighting function, then compute

its eigenvectors. Thereby those class centers which come closer together can be increasingly

weighted. The corresponding output-space within-class and between-class scatter matrix are

defined by

(7)

(8)

(9)

(10)

(11)

The entire procedure is expressed for a reduced dimensionality of .

Step 1: Calculating the and according to the equation (1) and (2);

Step 2: Computing the transformation matrix and reduce the

feature space dimensionality from to ;

Step 3: Calculating the and according to the equation (8) and (9);

Step 4: Computing the transformation matrix and reduce the

feature space dimensionality from to ;

- 5 -

Step 5: Repeat the Step 3 and Step 4 until feature space reach to dimension;

Step 6: Computing the transformation matrix .

Step 7: After training procedure, using transformation matrix project the observed

feature vectors from to ;.

4 Application to Speech Database

Our speech recognition experiments were based upon a HMM based speech recognizer for

speaker independent isolated English digits task. The TI46 corpus of isolated words which

was designed and collected at Texas Instruments (TI) is used in the proposed system. The

TI46 corpus contains 16 speakers: 8 males labeled and 8 females. There are 15 utterances of

each English digit (0~9) from each speaker: 10 designated as training tokens and 5 designed

as testing tokens in the proposed system. The front-end features of this system were 16 Mel-

frequency cepstral coefficients plus their deltas. Therefore, the original dimensionality of the

speech feature space is 32. The recognition system was trained by 10 clear training tokens

per person and the training corpus has 1600 speech utterances altogether. To test the

robustness of the speech recognition, we add white noise on the clear testing utterances

according to different Signal-to-Noise Ratio (SNR) .

To apply LDA, common weighting LDA (W-LDA) and step-weighted LDA (SW-LDA) to our

speech recognition system, we need labeled training data. The labels come from the training

data based on different digit.

In all our simulations, we chose the dimensionality of compressed feature space =24 and

=16. We present results obtained using each data set with the LDA, W-LDA and the proposed

SW-LDA algorithms. The W-LDA and SW-LDA ran the simulations for taken from the

set . For each choice of , the training

accuracy was noted.

- 6 -

Figure 1 The Recognition Rate for dimensionality reducing from 32 to 24

Figure 1 shows the speech recognition results when the dimension of feature vectors is

reduced from 32 to 24. The accuracy of recognition shows that the SW-LDA has better

performance than the W-LDA method generally. The SW-LDA is better than the common

LDA when weighting function is , and . The best weighting function for SW-LDA is

.

- 7 -

Figure 2 The Recognition Rate of dimensionality reducing from 32 to 16

Figure 2 shows the testing accuracies obtained with common LDA, conventional W-LDA and

SW-LDA for different weighting functions. The rates of speech recognition show that SW-

LDA are better than W-LDA along the range of powers. SW-LDA has better performance than

conventional LDA when the range of powers of . The best weighting function for SW-

- 8 -

LDA is .

5. Conclusion

We proposed a method of dimensionality reduction based on SW-LDA method. Using SW-

LDA, one can obtain good dimensionality reduction performance than the common LDA

technique. When the dimensionality is reduced much more, the SW-LDA method shows

relatively better property than the conventional weighting LDA method. Applying the SW-

LDA, the speech recognition accuracy rates based on MFCC feature extraction obtained more

increase than that of common LDA and weighting LDA obviously.

References

[1] K.Fukunaga, “Introduction to Statistical Pattern Recognition” New York: Academic, 1990

[2] Belhumeur, Hespanha and Kriegman, “Eigenfaces vs. Fisherfaces: recognition using class

specific linear projection”, Pattern Analysis and Machine Intelligence, IEEE Transactions on,

Volume: 19 Issue: 7, July 1997, Page(s): 711-720

[3] K.Etemad, R.Chellappa, “Discriminant analysis for recognition of human face images”, J. Opt.

Soc. Am. A 14 (8) (1997) 1724-1733

[4] Martin, Charlet and Mauuary, L. “Robust speech/non-speech detection using LDA applied to

MFCC” Acoustics, Speech , and Signal Processing, 2001. Proceedings. (ICASSP ’01). 2001 IEEE

International Conference on, Volume: 1,7-11 May 2001

[5] Y., Y. Gao and H. Erdogan (2000), “Weighted pairwise scatter to improve linear

discriminant analysis”, in Proc. ICSLP 4, pp.608~611.

[6] Rohit Lotlikar and Ravi Kothari, “Fractional-Step Dimensionality Reduction” IEEE

Transactions on Pattern Analysis and Machine Intelligence, Vol.22, No.6, June 2000

[7] Duchene and S. Leclercq, "An Optimal Transformation for Discriminant Principal Component

Analysis," IEEE Trans. On Pattern Analysis and Machine Intelligence,Vol. 10, No 6, November

1988

- 9 -

feature extraction for speech recognition based on ...€¦ · web viewrecently, lda and improved...

Documents