role of haar and daubechies wavelet in bangla vowel processing

5
JOURNAL OF TELECOMMUNICATIONS, VOLUME 13, ISSUE 1, MARCH 2012 19 © 2012 JOT www.journaloftelecommunications.co.uk Role of Haar and Daubechies Wavelet in Bangla Vowel Processing S. Haque, A. U. Khan AbstractWavelet Transform (WT) is applied to the seven Bangla vowel /i/, /e/, /æ/, /a/, / /, /o/, /u/ samples for analysis and synthesis. The performance of WT for synthesizing the selected Bangla vowels is measured by calculating Normalized Root Mean Square Error (NRMSE) between the original and synthesized signal and by calculating Retained Energy (RE) in the reconstructed waveform. It is observed from our study that WT with Haar, Daubechies wavelet at decomposition level 5 reproduces the Bangla vowel signal with a very small NRMSE and large RE. Among the tested wavelets, use of Haar wavelet produced NRMSE in the order of 10 -15 and 92% RE, Daubechies wavelet produced NRMSE in the order of 10 -11 and 98% RE. Although Daubechies wavelet produced NRMSE larger than Haar but RE in the first few coefficient obtained by Daubechies wavelet is much larger than Haar. Therefore, the performance of Daubechies wavelet is better than Haar wavelet for Bangla vowel reconstruction. Index TermsWavelet Transform, Bangla vowels, Haar, Daubechies —————————— —————————— 1 INTRODUCTION peech analysis systems generally carry out analysis which is usually obtained via time-frequency repre- sentations such as Short Time Fourier Transforms (STFTs) or Linear Pre-dictive Coding (LPC) techniques. In some respects, these methods may not be suitable for representing speech; as they assume signal stationary within a given time frame and may therefore lack the ability to analyze localized events accurately. Further- more, the LPC approach assumes a particular linear (all- pole) model of speech production which strictly speaking is not the case. The main disadvantage of a Fourier ex- pansion however, is that it has only frequency resolution and no time resolution [1]. This means that although all the frequencies present in a signal can be determined, the presence of disturbances in time is not known. Analysis is the process of breaking a complex signal into smaller parts to gain a better understanding of it. The technique has been applied in the study of mathematics and logic since before Aristotle, though analysis as a for- mal concept is a relatively recent development. [2] In general, synthesis refers to a combination of two or more entities those together forms something new [3]. Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware. A text-to-speech system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. Fig. 1: Analysis of a wave using Fourier Transform and Wavelet transform In different languages, WT has been used for analyz- ing various speech corpora e.g. speech analysis, pitch de- tection, recognition, speech synthesis, speech segmenta- tion [4],[5],[6] etc. But few works [7] have been reported on Bangla phoneme synthesis using WT. As a first phase of study on Bangla speech processing we selected the Bangla vowels in isolated utterance for the purpose of analysis and synthesis. The objective is to obtain better accuracy in speech processing using WT. 2 WT AND SPEECH SIGNAL PROCESSING Fourier analysis consists of breaking up a signal into sine waves of various frequencies. Similarly, wavelet analysis is the breaking up of a signal into shifted and scaled ver- sions of the original or mother wavelet. ———————————————— S.H. Author is with the Electronics and Telecommunication Engineering Department, Daffodil International University, Bangladesh. A.U.K. is with the BanglaLion Communications Company, Bangladesh. S

Upload: journal-of-telecommunications

Post on 27-Oct-2014

15 views

Category:

Documents


0 download

DESCRIPTION

Journal of Telecommunications, ISSN 2042-8839, Volume 13, Issue 1, March 2012 http://www.journaloftelecommunications.co.uk

TRANSCRIPT

Page 1: Role of Haar and Daubechies Wavelet in Bangla Vowel Processing

JOURNAL OF TELECOMMUNICATIONS, VOLUME 13, ISSUE 1, MARCH 2012

19

© 2012 JOT

www.journaloftelecommunications.co.uk

Role of Haar and Daubechies Wavelet in Bangla Vowel Processing

S. Haque, A. U. Khan

Abstract— Wavelet Transform (WT) is applied to the seven Bangla vowel /i/, /e/, /æ/, /a/, / /, /o/, /u/ samples for analysis and

synthesis. The performance of WT for synthesizing the selected Bangla vowels is measured by calculating Normalized Root

Mean Square Error (NRMSE) between the original and synthesized signal and by calculating Retained Energy (RE) in the

reconstructed waveform. It is observed from our study that WT with Haar, Daubechies wavelet at decomposition level 5

reproduces the Bangla vowel signal with a very small NRMSE and large RE. Among the tested wavelets, use of Haar wavelet

produced NRMSE in the order of 10-15

and 92% RE, Daubechies wavelet produced NRMSE in the order of 10-11

and 98% RE.

Although Daubechies wavelet produced NRMSE larger than Haar but RE in the first few coefficient obtained by Daubechies

wavelet is much larger than Haar. Therefore, the performance of Daubechies wavelet is better than Haar wavelet for Bangla

vowel reconstruction.

Index Terms—Wavelet Transform, Bangla vowels, Haar, Daubechies

—————————— ——————————

1 INTRODUCTION

peech analysis systems generally carry out analysis which is usually obtained via time-frequency repre-sentations such as Short Time Fourier Transforms

(STFTs) or Linear Pre-dictive Coding (LPC) techniques. In some respects, these methods may not be suitable for representing speech; as they assume signal stationary within a given time frame and may therefore lack the ability to analyze localized events accurately. Further-more, the LPC approach assumes a particular linear (all-pole) model of speech production which strictly speaking is not the case. The main disadvantage of a Fourier ex-pansion however, is that it has only frequency resolution and no time resolution [1]. This means that although all the frequencies present in a signal can be determined, the presence of disturbances in time is not known.

Analysis is the process of breaking a complex signal into smaller parts to gain a better understanding of it. The technique has been applied in the study of mathematics and logic since before Aristotle, though analysis as a for-mal concept is a relatively recent development. [2]

In general, synthesis refers to a combination of two or more entities those together forms something new [3]. Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware. A text-to-speech system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into

speech.

Fig. 1: Analysis of a wave using Fourier Transform and Wavelet transform

In different languages, WT has been used for analyz-

ing various speech corpora e.g. speech analysis, pitch de-tection, recognition, speech synthesis, speech segmenta-tion [4],[5],[6] etc. But few works [7] have been reported on Bangla phoneme synthesis using WT.

As a first phase of study on Bangla speech processing we selected the Bangla vowels in isolated utterance for the purpose of analysis and synthesis. The objective is to obtain better accuracy in speech processing using WT.

2 WT AND SPEECH SIGNAL PROCESSING

Fourier analysis consists of breaking up a signal into sine waves of various frequencies. Similarly, wavelet analysis is the breaking up of a signal into shifted and scaled ver-sions of the original or mother wavelet.

————————————————

S.H. Author is with the Electronics and Telecommunication Engineering Department, Daffodil International University, Bangladesh.

A.U.K. is with the BanglaLion Communications Company, Bangladesh.

S

Page 2: Role of Haar and Daubechies Wavelet in Bangla Vowel Processing

20

A wavelet is a waveform of effectively limited dura-tion that has an average value of zero. The results of the WT are many wavelet coefficients which are a function of scale and position.The main purpose of WT is to decom-pose arbitrary signals into localized contributions that can be labeled by a ‘scale parameter’. If we compare wavelets with sine waves, which are the basis of Fourier analysis. Sinusoids do not have limited duration — they extend from minus to plus infinity. And where sinusoids are smooth and predictable, wavelets tend to be irregular and asymmetric as shown in Fig.1.

From the pictures of wavelets and sine waves, it can be observed that signals with sharp changes might be better analyzed with an irregular wavelet than with a smooth sinusoid, just as some foods are better handled with a fork than a spoon. Furthermore, because it affords a different view of data than those presented by tradi-tional techniques, it can compress or de-noise a signal without appreciable degradation [8].

Wavelet Decomposition and Reconstruction

The discrete WT can be used to analyze, or decompose signals. This process is called decomposition or analysis. The other half of the story is how those components can be assembled back into the original signal without loss of information. This process is called reconstruction, or synthesis [8]. The mathematical manipulation that ef-fects synthesis is called the inverse discrete wave-let transforms (IDWT).

In mathematics, the Haar wavelet is a certain se-quence of rescaled "square-shaped" functions which to-gether form a wavelet family or basis [8]. Wavelet analy-sis is similar to Fourier analysis in that it allows a target function over an interval to be represented in terms of an orthonormal function basis. The Haar sequence is now recognized as the first known wavelet basis and extensive-ly used as a teaching example in the theory of wavelets. Wavelet analysis involves filtering and down sampling, the wavelet reconstruction process consists of up sam-pling and filtering. Up sampling is the process of leng-thening a signal component by inserting zeros between samples.

Reconstruction Filters

The filtering part of the reconstruction process also bears some discussion, because it is the choice of filters that is crucial in achieving perfect reconstruction of the original signal.

Fig. 2 shows the process of decomposing and recon-structing a signal using WT. The different types of wave-lets are Haar, Daubechies, Biorthogonal, Coiflets, Morlet, Symmlet.

The downsampling of the signal components per-formed during the decomposition phase introduces a dis-tortion called aliasing. It turns out that by carefully choos-ing filters for the decomposition and reconstruction

Fig. 2: Decomposition and Reconstruction procedure using Wavelet transform

Fig. 3: Haar and Daubechies Wavelet

phases that are closely related (but not identical), we can cancel out" the effects of aliasing. The low- and high-pass decomposition filters (L and H), together with their asso-ciated reconstruction filters (L' and H'), form a system of what is called quadrature mirror filters.

Page 3: Role of Haar and Daubechies Wavelet in Bangla Vowel Processing

21

We have seen that it is possible to reconstruct our original signal from the coefficients of the approximations and details. It is also possible to reconstruct the approxi-mations and details themselves from their coefficient vec-tors. As an example, let's consider how we would recon-struct the first-level approximation A1 from the coeffi-cient vector cA1. We pass the coefficient vec-tor cA1 through the same process we used to reconstruct the original signal. However, instead of combining it with the level-one detail cD1, we feed in a vector of zeros in place of the detail coefficients vector:

The process yields a recon-structed approximation A1, which has the same length as the original signal S and which is a real approximation of it. Similarly, we can reconstruct the first-level detail D1, using the analogous process: The reconstructed details and approximations are true constituents of the original signal. In fact, we find when we combine them that A1 + D1 = S

The coefficient vectors cA1 and cD1 — because they were produced by downsampling and are only half the length of the original signal, cannot directly be combined to reproduce the signal. It is necessary to reconstruct the approximations and details before combining them. Ex-tending this technique to the components of a multilevel analysis, we find that similar relationships hold for all the reconstructed signal constituents. That is, there are sever-al ways to reassemble the original signal.

Waveform of Haar and Daubechies wavelet is shown in Fig.3. In our work we used only Haar and Daubechies wavelet for decomposition and reconstruction of the se-lected Bangla phonemes.

3 SPEECH MATERIAL

In the speech analysis scenario, first we need to collect the speech signal. Isolated Bangla vowels /i/, /e/, /æ/, /a/,

/ /, /o/, /u/ were uttered three times by a native Ban-gla male speaker in a quiet room. We recorded the Bangla vowel signal at a frequency rate of 48 KHz in a stereo sound system. Then we down sample it to 10 KHz in mono system. Then the speech samples were normalized and were ready to be used for our work.

4 WORKING PROCEDURE

Each of the Bangla vowel signal was decomposed using Haar and Daubechies 4 wavelets at decomposition level 5 and the approximation and detail coefficients were calcu-lated and stored for using in the reconstruction section as described in section 3. Using the calculated wavelet coef-ficients we reconstructed the vowels again using the cho-sen wavelets. We repeated these steps for decomposing wavelets. We repeated these steps for decomposing and reconstructing each of the 7 Bangla vowels. The working process is described in Fig, 4. After reconstructing the

Fig. 4: Block diagram of working procedure of analysis and synthesis of our selected vowel phonemes.

Fig. 5: Waveform of Original, Reconstructed, Approximations, Details at 5 different scales and RE for vowel /i/ using Daubechies 4 wavelet

selected Bangla phonemes performance of the selected wavelets were measured using Eq. 1 and Eq.2.

Page 4: Role of Haar and Daubechies Wavelet in Bangla Vowel Processing

22

5 PERFORMANCE MEASUREMENT OF THE

SYNTHESIZED BANGLA VOWEL SIGNAL

We measured the performance of the synthesized Bangla vowel signal using two parameters, NRMSE and RE as given by Eq. 1 and Eq. 2.

1. Normalised Root Mean Square Error (NRMSE) is calculated using Eq. 1.

(1)

In Eq. 1, x(n) is the speech signal, r(n) is the reconstructed sig-

nal, and x(n) is the mean of the speech signal.

1. Retained Energy(RE) in First N/2 wavelet coefficients is given by Eq. 2.

(2)

Fig. 6: NRMSE and RE obtained by using Haar and Daubechies wavelet with WT for reconstructing the seven Bangla vowels

In Eq.2 is the norm of the original signal and

is the norm of the reconstructed signal. For one-dimensional orthogonal wavelets the retained energy is equal to the L2-norm recovery performance.

5 RESULT AND DISCUSSION

In this section, we discuss the performance of the wave-lets for reconstructing or synthesizing signal. We calcu-late the NRMSE and RE between the original and the re-constructed vowel at decomposition levels 5.

Fig 5 shows a sample speech signal /i/ and approx-imations of the signal, at five different scales. These ap-proximations are reconstructed from the coarse low fre-quency coefficients in the WT vector. This figure shows that the original speech data is still well represented by the level 5 approximation.

The NRMSE of the reconstructed vowel waveform is calculated for all the seven vowels of Bangla and is found to be in the order of 10-11 or less. The RE of the first few coefficients of the WT is found to be more than 92%. It may be said that the reconstructed vowel waveform ob-tained by WT is almost similar to the original waveform. Therefore, we may say that WT preserves the important speech information with few parameters.

The calculated the NRMSE and RE for all the vowels and plotted graphically as shown in Fig. 6.

6 CONCLUSION

This work deals with the study of Bangla vowel decom-position and reconstruction which is the basis of Bangla speech processing. We presented WT techniques and de-tails of how to use them for Bangla vowel phoneme anal-ysis and synthesis. Analyzing a signal by Haar and Dau-bechies wavelet at decomposition level 5 and reconstruct-ing the signal make a scheme of analysis and synthesis. The analysis and synthesis was done by using Haar and Daubechies wavelet at decomposition level 5.

Among the tested wavelets, use of Haar wavelet produced NRMSE in the order of 10-15 and 92% RE, Dau-bechies wavelet produced NRMSE in the order of 10-11 and 98% RE. Although Daubechies wavelet produced NRMSE larger than Haar but RE in the first few coeffi-cient obtained by Daubechies wavelet is much larger than Haar. Therefore, the performance of Daubechies wavelet is better than Haar wavelet for Bangla vowel reconstruc-tion.

Page 5: Role of Haar and Daubechies Wavelet in Bangla Vowel Processing

23

REFERENCES

S. Haque received her B.Sc. and M.Sc. degree in Applied physics

and electronics from Rajshahi University, Bangladesh. She joined

Bangladesh Atomic Energy Commission as a scientific officer, in

1999. Since 1999, she was affiliated with the Department of Com-

puter Science and Technology, Islamic University, Kushtia, Bangla-

desh. She is continuing her Ph.D. in Graduate School of Engineering

and Science at the University of the Ryu-kyus, Okinawa, Japan.

Since 2008 she is with the Daffodil International University, Shukra-

bad, Dhanmondi, Dhaka, Bangladesh. Her current research interest

is speech, image and Bio-medical signal processing.

A.U. Khan received his B.Sc. Engineering degree in Electronics and

Telecommunication Engineering from Daffodil International Universi-

ty, Bangladesh in 2011.Then he joined in Telnet Communication

Company as an Engineer in 2011. He is working as an Engineer in

the core network of BanglaLion Communication since February,

2012.

[1]

R. Polikar, “The Wavelet Tutorial’, 136,Rowan Hall, Depart-

ment of Electrical and Computer Engineering, Rowan Uni-

versity, Glassboro, NJ 08028. June 1996.

[2] http://plato.stanford.edu/entries/analysis/

[3] http://en.wikipedia.org/wiki/Synthesis

[4] I. Agbinya, “Discrete Wavelet Transform Techniques in Speech Processing”, IEEE Tencon Digital Signal Processing Applications Proceedings, IEEE, New York, NY, 1996, pp 514-519.

[5] S. Kadambe, and Boudreaux-Bartels, G.F., 1992, “Applica-tions of the Wavelet Transform for Speech Detection”, IEEE Trans., on Information Theory, Vol.-38, no.2, pp 917-954.

[6] O. Farooq, S. Datta, “Phoneme recognition using wavelet

based features”, · Journal of Information Sciences—Informatics and Computer Science, Vol-150, Issue 1-2, March 2003

[7] “Using Wavelet Transform for Bangla Phoneme Synthesis”, S. Haque, S. A. Hossain, M.A. Sobhan, Proceedings of Inter-national Conference on Computer Processing of Bangla, 19th February, 2011, Dhaka, Bangladesh of Computing.

[8] http://www.mathworks.com/help/toolbox/