suppression of noise in speech using adaptive gain equalizer

5
JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES, VOLUME 3, ISSUE 4, APRIL 2013 1 AbstractThe quality of speech during the communication is generally effected by the surrounding noise and interference. To improve the quality of speech signal and to reduce the noise, speech enhancement is one of the most used branches of signal processing. For the reduction of noise from speech signals, one method is the AGE (Adaptive Gain Equalizer). This report presents a real time implementation of an AGE noise suppressor method using Uniform FFT (Fast Fourier Transform) modulated filter bank for speech communication system. Our result shows that this method offers low complexity, low delay and high flexibility makes this method suitable for wide range of implementations. Index TermsAdaptive Gain Equalizer, FFT Filter Bank, Noise Suppression. 1. INTRODUCTION he objective of AGE is to divide the input signal into a number of frequency sub bands, that are individually and adaptively boosted according to a short term signal-to-noise ratio (SNR) estimate in each sub band at every time instant, that means it is focusing on enhancing the speech signal instead of suppression of the noise. A high sub band SNR estimate indicates that the sub band signal content is less corrupted by noise. Hence the sub band should be boosted. A low sub band SNR estimate indicates that the surrounding noise is dominant in the sub band at hand. Hence no boosting of the sub band speech should be performed. To achieve this speech boosting effect, a short term average per speech tracking and long term average for background noise floor level tracking are calculated simultaneously. Using the coefficients of these quantities, a gain function is achieved that weights the sub band signal directly according to a sub band signal SNR estimate at that particular time instant. If only noise is present in the signal, the noise floor level estimate and the short term average will be approximately same. Hence, the coefficients of these two measures will be unity and no alteration of the sub band signal will be performed. If speech is present, the short term average will increase but the noise floor level estimate will remain approximately unchanged. Hence, the coefficients will become larger than unity, amplifying the signal in the sub-band at hand. A general filter bank is a group of parallel low pass, band pass or high pass filters. It converts the normal representation of the signal nothing but time domain into time-frequency domain which is usually implemented in modern speech processing methods. Here, we are using a uniform FFT modulated filter bank which comprises of band pass filters which have very little mutual overlap in frequency which is shown in Fig.1. The notation “uniform” is because of the fact that the filters are uniformly distributed on the frequency axis during the modulation process. Fig.1. A bank of eight band pass filters hk[n], with Fourier transform Hk[f]=F{hk[n]} , comprise a filter bank. 2. PROBLEM STATEMENT AND MAIN CONTRIBUTION In a typical situation where a speech signal is distorted by noise i.e., the noise is acoustically added to the speech. The goal is to suppress the noise using some speech enhancement method resulting in an output signal with a higher SNR. Our main contribution is to design the adaptive gain equalizer noise suppressor for speech enhancement using MATLAB and then implement the method using CC studio on TMSC6713 processor and validate the results. 3. PROBLEM SOLUTION 3.1 Uniform FFT Modulated Filter bank: This filter bank consists of K band pass filters, for k=0, 1...K-1, with impulse response functions , each of length N taps. These filter banks are created by modulating (frequency-shifting) a low pass prototype Suppression of Noise in Speech using Adaptive Gain Equalizer 1 Anil Chokkarapu, 2 Sarath C Uppalapati and 3 Abhiram Chinthakuntla T ———————————————— Anil Chokkarapu is with School of Engineering, Blekinge Tekniska Högskola, Karlskrona, Sweden. Sarath C Uppalapati . is with School of Engineering, Blekinge Tekniska Högskola, Karlskrona, Sweden. Abhiram Chinthakuntla is with School of Engineering, Blekinge Tekniska Högskola, Karlskrona, Sweden

Upload: journalofict

Post on 07-Nov-2014

96 views

Category:

Documents


3 download

DESCRIPTION

Journal of Information and Communication Technologies, ISSN 2047-3168, Volume 3, Issue 4, April 2013 www.jict.co.uk

TRANSCRIPT

Page 1: Suppression of Noise in Speech using Adaptive Gain Equalizer

JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES, VOLUME 3, ISSUE 4, APRIL 2013

1

Abstract— The quality of speech during the communication is generally effected by the surrounding noise and interference. To improve

the quality of speech signal and to reduce the noise, speech enhancement is one of the most used branches of signal processing. For the reduction of noise from speech signals, one method is the AGE (Adaptive Gain Equalizer). This report presents a real time implementation of an AGE noise suppressor method using Uniform FFT (Fast Fourier Transform) modulated filter bank for speech communication system. Our result shows that this method offers low complexity, low delay and high flexibility makes this method suitable for wide range of implementations.

Index Terms—Adaptive Gain Equalizer, FFT Filter Bank, Noise Suppression.

1. INTRODUCTION

he objective of AGE is to divide the input signal into

a number of frequency sub bands, that are

individually and adaptively boosted according to a short

term signal-to-noise ratio (SNR) estimate in each sub

band at every time instant, that means it is focusing on

enhancing the speech signal instead of suppression of the

noise. A high sub band SNR estimate indicates that the

sub band signal content is less corrupted by noise. Hence

the sub band should be boosted. A low sub band SNR

estimate indicates that the surrounding noise is dominant

in the sub band at hand. Hence no boosting of the sub

band speech should be performed.

To achieve this speech boosting effect, a short term

average per speech tracking and long term average for

background noise floor level tracking are calculated

simultaneously. Using the coefficients of these quantities,

a gain function is achieved that weights the sub band

signal directly according to a sub band signal SNR

estimate at that particular time instant. If only noise is

present in the signal, the noise floor level estimate and

the short term average will be approximately same.

Hence, the coefficients of these two measures will be

unity and no alteration of the sub band signal will be

performed. If speech is present, the short term average

will increase but the noise floor level estimate will remain

approximately unchanged. Hence, the coefficients will

become larger than unity, amplifying the signal in the

sub-band at hand.

A general filter bank is a group of parallel low pass,

band pass or high pass filters. It converts the normal

representation of the signal nothing but time domain into

time-frequency domain which is usually implemented in

modern speech processing methods. Here, we are using a

uniform FFT modulated filter bank which comprises of

band pass filters which have very little mutual overlap in

frequency which is shown in Fig.1. The notation

“uniform” is because of the fact that the filters are

uniformly distributed on the frequency axis during the

modulation process.

Fig.1. A bank of eight band pass filters hk[n], with Fourier transform

Hk[f]=F{hk[n]} , comprise a filter bank.

2. PROBLEM STATEMENT AND MAIN

CONTRIBUTION

In a typical situation where a speech signal is distorted

by noise i.e., the noise is acoustically added to the speech.

The goal is to suppress the noise using some speech

enhancement method resulting in an output signal with a

higher SNR.

Our main contribution is to design the adaptive gain

equalizer noise suppressor for speech enhancement using

MATLAB and then implement the method using CC

studio on TMSC6713 processor and validate the results.

3. PROBLEM SOLUTION

3.1 Uniform FFT Modulated Filter bank:

This filter bank consists of K band pass filters, 𝐻𝑘 𝑧 for

k=0, 1...K-1, with impulse response functions𝑕𝑘 𝑛 , each

of length N taps. These filter banks are created by

modulating (frequency-shifting) a low pass prototype

Suppression of Noise in Speech using Adaptive Gain Equalizer

1Anil Chokkarapu, 2Sarath C Uppalapati and3Abhiram Chinthakuntla

T

————————————————

Anil Chokkarapu is with School of Engineering, Blekinge Tekniska Högskola, Karlskrona, Sweden.

Sarath C Uppalapati . is with School of Engineering, Blekinge Tekniska Högskola, Karlskrona, Sweden.

Abhiram Chinthakuntla is with School of Engineering, Blekinge Tekniska Högskola, Karlskrona, Sweden

Page 2: Suppression of Noise in Speech using Adaptive Gain Equalizer

JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES, VOLUME 3, ISSUE 4, APRIL 2013

2

filter (which is equivalent to the first band pass filter at

DC frequency, i.e. 𝑕0 𝑛 ) , according to

𝑕𝑘 𝑛 = 𝑊𝑘−𝑘𝑛𝑕0 𝑛 for n=0, 1, . . .N-1 (1)

Therefore, the Z-transform of each modulated band

passfilter is given by:

𝐻𝑘 𝑧 = 𝐻0 𝑊𝐾𝑘𝑧 (2)

We assume that the input signal filtered by each

modulated band pass filter, are subject to decimation a

factor D, where D=K/O and O denotes the over-sampling

ratio . A filter with the decimator is implemented with

the polyphase implementation. It is achieved by dividing

the prototype filter 𝐻𝑘 𝑧 into O number of groups

containing D number of polyphase components. Now we

apply IFFT to the O number of group of polyphase

components individually and since the sub band indices

are not in order, we need to arrange them in increasing

order. Since, we have a symmetric, real valued input

speech signal and we have only the first half of the

frequency, then the other half can be generated by just

taking the complex conjugate values of the first half. This

implementation is known as analysis filter bank and it is

shown in Fig. 2.

Fig.2. Analysis filter bank

Designing the synthesis filter bank is similar to that of

analysis filter bank except for the polyphase components

of the synthesis filter bank are obtained by flipping and

applying conjugate to the analysis polyphase components

and instead of IFFT in the analysis filter bank, we use FFT

in the synthesis filter bank. The model of the synthesis

filter bank can be shown in Fig. 3.

3.1 Adaptive gain equalizer:

Suppose we have an acoustic noise denoted w[n] and a

speech signal denoted s[n]. The noise corrupted speech

signal x[n] can be written as 𝑥 𝑛 = 𝑠 𝑛 + 𝑤 𝑛 . By

filtering the input signal by using an analysis filter bank,

the signal is divided into K sub bands each denoted by

Fig.3. Synthesis filter bank

𝑥𝑘 𝑛 where k is the sub band index, we get𝑥𝑘 𝑛 =

𝑥 𝑛 ∗ 𝑕𝑘 𝑛 where * indicates convolution operator.The

input signal can be described as

𝑥 𝑛 = 𝑥𝑘 𝑛 = 𝑠𝑘 𝑛 + 𝑤𝑘 𝑛 𝐾−1𝑘=0

𝐾−1𝑘=0 (4)

where𝑠𝑘 𝑛 is the speech part sub-band k and 𝑤𝑘 𝑛 is the

noise part sub-band k. The output y[n] is formed by

𝑦 𝑛 = 𝐺𝑘 𝑛 𝑥𝑘 𝑛 𝐾−1𝑘=0 (5)

Where Gk[n] is a gain function (AGE weighting function)

which introduces a gain to each sub band and it amplifies

the signal when speech is active. Fig. 4 shows the simple

block diagram of the AGE.

Fig.4. Block diagram of Adaptive gain equalizer

Two terms used for the calculation of the gain function

are; a long term (slow) average𝐴𝑠,𝑡(𝑡)and the short term

(fast) average𝐴𝑓 ,𝑡(𝑡). The short term average for sub-band

k, 𝐴𝑓 ,𝑘(𝑛) is calculated as,

𝐴𝑓 ,𝑘 𝑛 = 1 − 𝛼𝑘 𝐴𝑓 ,𝑘 𝑛 − 1 + 𝛼𝑘 |𝑥𝑘 𝑛 | (6)

Where 𝛼𝑘 is small positive constant, given by

𝛼𝑘 =1

𝑇𝑠,𝑘∗𝐹𝑠 (7)

where 𝐹𝑠 is the sampling frequency in Hz and 𝑇𝑠,𝑘 is a time

constant in seconds.

In the same way, slow average is computed as,

𝐴𝑠,𝑘 𝑛 = 1 + 𝛽𝑘 𝐴𝑠,𝑘 𝑛 − 1 𝑖𝑓 𝐴𝑠,𝑘 𝑛 − 1 ≤ 𝐴𝑓 ,𝑘 𝑛

= 𝐴𝑓 ,𝑘 𝑛 𝑖𝑓 𝐴𝑠,𝑘 𝑛 − 1 > 𝐴𝑓 ,𝑘 𝑛 (8)

where 𝛽𝑘 is a small positive constant. The AGE gain

function is computed as:

Page 3: Suppression of Noise in Speech using Adaptive Gain Equalizer

JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES, VOLUME 3, ISSUE 4, APRIL 2013

3

𝐺𝑘 𝑛 = min(1, 𝐴𝑓 ,𝑘 𝑛

𝐴𝑠,𝑘 𝑛 ∗𝐿𝑘 𝑝𝑘

) (9)

where 𝐿𝑘 is some positive constant and 𝑝𝑘 decides the

gain raise individually applied to each of the sub-band

signals.

4. EVALUATION OF RESULTS

A general mathematical of the method was given in

section III where a number of sub-band dependent

parameters were introduced. In this section, where a

practical evaluation is performed, some of these

parameters are individually set to the same value for all

sub bands.

For evaluation purpose, we have recorded a signal

with sampling frequency of 8 KHz. The method has been

verified in MATLAB before implemented in real-time.

We then saved the filter coefficients of the analysis and

synthesis filter banks in a format suitable for its use in the

DSK C6713 processor. Then we add these filter

coefficients while implementing the method on a DSK

C6713 processor using CC Studio software. Now input a

sine wave of 2Vp-p from a function generator and observe

the sine wave output of the DSP.

Fig.5. Analysis-synthesis filter bank configuration used to verify the

filter bank implementation. The sub band signals 𝑥𝑘 𝑛 are scaled by sub band scaling constants 𝑔𝑘 to yield sub band output signals

𝑦𝑘 𝑛 = 𝑔𝑘 ∗ 𝑥𝑘 𝑛

To verify the implementation of the analysis and

synthesis filter bank, we find the power spectral densities

of the input noise sequence x[n] and output y[n] and

determine the transfer magnitude function. To do this, a

noise sequence x[n] is generated by using the “randn”

command in MATLAB and this sequence is stored as a

wav-file. We give the generated noise sequence x[n] to the

DSP project which simply copies the ADC input to the

DAC output i.e., without any filter bank processing as in

Fig. 6. We then calculate the PSD of the DAC output and

store it as 𝑃𝑥𝑥 𝑓 as in Fig. 7. These signals are recorded

at a frequency of 44 KHz. Now, we consider the filter

bank processing and then calculate the PSD of the DAC

output and store it as 𝑃𝑦𝑦 𝑓 as in Fig. 8 and Fig. 9. Using

the PSD of output and input signal, we can calculate the

transfer magnitude function𝐻𝑘 𝑓 = 𝑃𝑥𝑥 𝑓

𝑃𝑦𝑦 𝑓

We then compared the estimated transfer function to the

ideal transfer magnitude function. This will verify the

filter bank implementation.

Fig.6. Pass a generated noise sequence through the DSP system

without doing any processing of the signal and we estimate the PSD of the output signal

Fig.7. PSD of the output signal when noise is passed through the

DSP system without doing any processing of the signal

Fig.8. PSD of the output signal after filter bank processing (without AGE)

Now, we input a speech signal x[n] which contains some

music and noise to the DSP and then observe the output

signal after the filter bank processing(without including

AGE) and check whether we are able to hear the same

speech signal without any disturbances or noise

interference. This verifies the implementation of DSP and

then we follow the same procedure including AGE and

then observe the speech signal. Based on the SNR

estimate, the speech signal is amplified and noise is

attenuated which results in an enhanced speech signal

where 𝛼𝑘 and 𝛽𝑘 play an important role in determining

the output speech signal.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-120

-110

-100

-90

-80

-70

-60

-50

-40

-30

Normalized Frequency ( rad/sample)

Pow

er/

frequency (

dB

/rad/s

am

ple

)

Welch Power Spectral Density Estimate

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-100

-80

-60

-40

-20

0

20

Normalized Frequency ( rad/sample)

Pow

er/f

requ

ency

(dB

/rad

/sam

ple)

Welch Power Spectral Density Estimate

Page 4: Suppression of Noise in Speech using Adaptive Gain Equalizer

JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES, VOLUME 3, ISSUE 4, APRIL 2013

4

Fig.9. PSD of the output signal Pyy[f] after using the AGE Noise

suppressor

Fig.10. Input noisy speech signal

Fig.11. Enhanced speech signal using AGE noise suppression method obtained from DSP

A small 𝛼𝑘 results in unnatural sounding speech with

remaining artifacts. A very large 𝛼𝑘 results in a short term

average that reacts too slowly to the incoming signal

amplitude variations. Hence, the speech attacks will be

cropped and also, speech amplifications with a small

amount of noise will be limited. The positive constant 𝛽𝑘

controls how fast the noise floor level estimate will adapt

to changes in the noise environment. A small value of βk

results in an noise floor level estimate equal to the short

term average while very large value results in slow

convergence and poor noise level tracking capabilities in

non-stationary environments.

As gain function Gk[n] is ratio of short term

average and the noise floor level estimate, we must take

care to avoid singularities causing numerical overflow in

DSP. So, in this implementation, any sample which is

outside DSP range was clipped to minimum sample

value possible.

This method focuses on speech enhancement

rather than noise suppression. We are not actually trying

to improve the SNR by removing noise, but, we are trying

to improve the SNR by amplifying the speech. The time

constants that control the short term average should be

kept in the range of speech pseudo-stationary time, i.e.

about 20-30ms. The noise floor level estimate controlling

parameter 𝛽𝑘 can be varied around 10−6 depending on

the desired effects. The upper bounding of the gain

function affects the resulting speech distortion and

should be kept within 5-20 dB. A larger amplification of

the signal may result in a piercing sounding output

speech. Before implementing the AGE in real-time, it was

implemented using MATLAB andthe results obtained

from off-line MATLAB simulation and an online real-

time DSP implementation of the algorithm are almost

similar.

5. CONCLUSION

A preprocessing noise suppression algorithm using

AGE was developed, implemented and tested with use of

filter bank techniques on DSP processor kit using Code

Composer studio tool. AGE algorithm for speech

enhancement is a straight forward, robust and flexible

method for speech enhancement.

6. REFERENCES

[1] P. P. Vaidyanathan, Multirate Systems and Filter Banks. Prentice

Hall, 1993.

[2] N. Westerlund, M. Dahl, and I. Claesson, “Speech enhancement

for personal communication using an adaptive gain equalizer,”

Elsevier Signal Processing, vol. 85, pp. 1089–1101, 2005

[3] S. Benny, N. Grbic and I. Claesson, “Implementation aspects of

the adaptive gain equalizer,” Array Signal Processing, Research

Report 2006:4

[4] S. Benny, Digital Signal Processors ET1304 Projects. BTH, EE,

Karlskrona, Sweden.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-120

-100

-80

-60

-40

-20

0

Normalized Frequency ( rad/sample)

Pow

er/

frequency (

dB

/rad/s

am

ple

)Welch Power Spectral Density Estimate

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 105

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

0 1 2 3 4 5 6

x 105

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

Page 5: Suppression of Noise in Speech using Adaptive Gain Equalizer

JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES, VOLUME 3, ISSUE 4, APRIL 2013

5

Anil Chokkarapuwas born in

Huzarabad, India in 1987. He

completed his Dual Master’s

(M.Sc & M.Tech) programme in

Electrical Engineering with

emphasis on Signal Processing at

Blekinge Institute of Technology,

Karlskrona, Sweden during 2010-

2012. He completed his Bachelor

degree in Electronics and Communication Engineering

from Jawaharlal Nehru Technical University (JNTU)

Hyderabad, India in the year 2009. His areas of research

interests are in Audio/Speech Processing, hearing aids

Biomedical Signal Processing.

Sarath Chandra Uppalapati was

born in 1987 at Hyderabad, India.

He did his Master of Science

(M.Sc) in Electrical Engineering

with emphasis on Signal

Processing at Blekinge Institute of

Technology, Sweden and Master

of Technology (M.Tech) in Signal

Processing at JNTU University, India. He completed his

Bachelor degree in Electronics and Communication

Engineering, from Jawaharlal Nehru Technical University

(JNTU), India. He worked as a Trainee Engineer at

Metox Resistor Industries from July 2008 till august 2009.

His current research interests are Speech Processing,

Image Processing and Neural Networks.

Abhiram Chinthakuntlawas

born in Warangal, India in 1988.

He completed his Bachelor

degree in Electronics and

Communication Engineering

from Jawaharlal Technological

University (JNTU) Hyderabad.

He completed Master’s

programme in Electrical Engineering, with emphasis on

Signal Processing at Blekinge Institute of Technology,

Karlskrona, Sweden. His areas of interests are Digital

signal processing, image/audio processing, Adaptive

systems and Neural Networks.