suppression of noise in speech using adaptive gain equalizer
DESCRIPTION
Journal of Information and Communication Technologies, ISSN 2047-3168, Volume 3, Issue 4, April 2013 www.jict.co.ukTRANSCRIPT
JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES, VOLUME 3, ISSUE 4, APRIL 2013
1
Abstract— The quality of speech during the communication is generally effected by the surrounding noise and interference. To improve
the quality of speech signal and to reduce the noise, speech enhancement is one of the most used branches of signal processing. For the reduction of noise from speech signals, one method is the AGE (Adaptive Gain Equalizer). This report presents a real time implementation of an AGE noise suppressor method using Uniform FFT (Fast Fourier Transform) modulated filter bank for speech communication system. Our result shows that this method offers low complexity, low delay and high flexibility makes this method suitable for wide range of implementations.
Index Terms—Adaptive Gain Equalizer, FFT Filter Bank, Noise Suppression.
1. INTRODUCTION
he objective of AGE is to divide the input signal into
a number of frequency sub bands, that are
individually and adaptively boosted according to a short
term signal-to-noise ratio (SNR) estimate in each sub
band at every time instant, that means it is focusing on
enhancing the speech signal instead of suppression of the
noise. A high sub band SNR estimate indicates that the
sub band signal content is less corrupted by noise. Hence
the sub band should be boosted. A low sub band SNR
estimate indicates that the surrounding noise is dominant
in the sub band at hand. Hence no boosting of the sub
band speech should be performed.
To achieve this speech boosting effect, a short term
average per speech tracking and long term average for
background noise floor level tracking are calculated
simultaneously. Using the coefficients of these quantities,
a gain function is achieved that weights the sub band
signal directly according to a sub band signal SNR
estimate at that particular time instant. If only noise is
present in the signal, the noise floor level estimate and
the short term average will be approximately same.
Hence, the coefficients of these two measures will be
unity and no alteration of the sub band signal will be
performed. If speech is present, the short term average
will increase but the noise floor level estimate will remain
approximately unchanged. Hence, the coefficients will
become larger than unity, amplifying the signal in the
sub-band at hand.
A general filter bank is a group of parallel low pass,
band pass or high pass filters. It converts the normal
representation of the signal nothing but time domain into
time-frequency domain which is usually implemented in
modern speech processing methods. Here, we are using a
uniform FFT modulated filter bank which comprises of
band pass filters which have very little mutual overlap in
frequency which is shown in Fig.1. The notation
“uniform” is because of the fact that the filters are
uniformly distributed on the frequency axis during the
modulation process.
Fig.1. A bank of eight band pass filters hk[n], with Fourier transform
Hk[f]=F{hk[n]} , comprise a filter bank.
2. PROBLEM STATEMENT AND MAIN
CONTRIBUTION
In a typical situation where a speech signal is distorted
by noise i.e., the noise is acoustically added to the speech.
The goal is to suppress the noise using some speech
enhancement method resulting in an output signal with a
higher SNR.
Our main contribution is to design the adaptive gain
equalizer noise suppressor for speech enhancement using
MATLAB and then implement the method using CC
studio on TMSC6713 processor and validate the results.
3. PROBLEM SOLUTION
3.1 Uniform FFT Modulated Filter bank:
This filter bank consists of K band pass filters, 𝐻𝑘 𝑧 for
k=0, 1...K-1, with impulse response functions𝑘 𝑛 , each
of length N taps. These filter banks are created by
modulating (frequency-shifting) a low pass prototype
Suppression of Noise in Speech using Adaptive Gain Equalizer
1Anil Chokkarapu, 2Sarath C Uppalapati and3Abhiram Chinthakuntla
T
————————————————
Anil Chokkarapu is with School of Engineering, Blekinge Tekniska Högskola, Karlskrona, Sweden.
Sarath C Uppalapati . is with School of Engineering, Blekinge Tekniska Högskola, Karlskrona, Sweden.
Abhiram Chinthakuntla is with School of Engineering, Blekinge Tekniska Högskola, Karlskrona, Sweden
JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES, VOLUME 3, ISSUE 4, APRIL 2013
2
filter (which is equivalent to the first band pass filter at
DC frequency, i.e. 0 𝑛 ) , according to
𝑘 𝑛 = 𝑊𝑘−𝑘𝑛0 𝑛 for n=0, 1, . . .N-1 (1)
Therefore, the Z-transform of each modulated band
passfilter is given by:
𝐻𝑘 𝑧 = 𝐻0 𝑊𝐾𝑘𝑧 (2)
We assume that the input signal filtered by each
modulated band pass filter, are subject to decimation a
factor D, where D=K/O and O denotes the over-sampling
ratio . A filter with the decimator is implemented with
the polyphase implementation. It is achieved by dividing
the prototype filter 𝐻𝑘 𝑧 into O number of groups
containing D number of polyphase components. Now we
apply IFFT to the O number of group of polyphase
components individually and since the sub band indices
are not in order, we need to arrange them in increasing
order. Since, we have a symmetric, real valued input
speech signal and we have only the first half of the
frequency, then the other half can be generated by just
taking the complex conjugate values of the first half. This
implementation is known as analysis filter bank and it is
shown in Fig. 2.
Fig.2. Analysis filter bank
Designing the synthesis filter bank is similar to that of
analysis filter bank except for the polyphase components
of the synthesis filter bank are obtained by flipping and
applying conjugate to the analysis polyphase components
and instead of IFFT in the analysis filter bank, we use FFT
in the synthesis filter bank. The model of the synthesis
filter bank can be shown in Fig. 3.
3.1 Adaptive gain equalizer:
Suppose we have an acoustic noise denoted w[n] and a
speech signal denoted s[n]. The noise corrupted speech
signal x[n] can be written as 𝑥 𝑛 = 𝑠 𝑛 + 𝑤 𝑛 . By
filtering the input signal by using an analysis filter bank,
the signal is divided into K sub bands each denoted by
Fig.3. Synthesis filter bank
𝑥𝑘 𝑛 where k is the sub band index, we get𝑥𝑘 𝑛 =
𝑥 𝑛 ∗ 𝑘 𝑛 where * indicates convolution operator.The
input signal can be described as
𝑥 𝑛 = 𝑥𝑘 𝑛 = 𝑠𝑘 𝑛 + 𝑤𝑘 𝑛 𝐾−1𝑘=0
𝐾−1𝑘=0 (4)
where𝑠𝑘 𝑛 is the speech part sub-band k and 𝑤𝑘 𝑛 is the
noise part sub-band k. The output y[n] is formed by
𝑦 𝑛 = 𝐺𝑘 𝑛 𝑥𝑘 𝑛 𝐾−1𝑘=0 (5)
Where Gk[n] is a gain function (AGE weighting function)
which introduces a gain to each sub band and it amplifies
the signal when speech is active. Fig. 4 shows the simple
block diagram of the AGE.
Fig.4. Block diagram of Adaptive gain equalizer
Two terms used for the calculation of the gain function
are; a long term (slow) average𝐴𝑠,𝑡(𝑡)and the short term
(fast) average𝐴𝑓 ,𝑡(𝑡). The short term average for sub-band
k, 𝐴𝑓 ,𝑘(𝑛) is calculated as,
𝐴𝑓 ,𝑘 𝑛 = 1 − 𝛼𝑘 𝐴𝑓 ,𝑘 𝑛 − 1 + 𝛼𝑘 |𝑥𝑘 𝑛 | (6)
Where 𝛼𝑘 is small positive constant, given by
𝛼𝑘 =1
𝑇𝑠,𝑘∗𝐹𝑠 (7)
where 𝐹𝑠 is the sampling frequency in Hz and 𝑇𝑠,𝑘 is a time
constant in seconds.
In the same way, slow average is computed as,
𝐴𝑠,𝑘 𝑛 = 1 + 𝛽𝑘 𝐴𝑠,𝑘 𝑛 − 1 𝑖𝑓 𝐴𝑠,𝑘 𝑛 − 1 ≤ 𝐴𝑓 ,𝑘 𝑛
= 𝐴𝑓 ,𝑘 𝑛 𝑖𝑓 𝐴𝑠,𝑘 𝑛 − 1 > 𝐴𝑓 ,𝑘 𝑛 (8)
where 𝛽𝑘 is a small positive constant. The AGE gain
function is computed as:
JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES, VOLUME 3, ISSUE 4, APRIL 2013
3
𝐺𝑘 𝑛 = min(1, 𝐴𝑓 ,𝑘 𝑛
𝐴𝑠,𝑘 𝑛 ∗𝐿𝑘 𝑝𝑘
) (9)
where 𝐿𝑘 is some positive constant and 𝑝𝑘 decides the
gain raise individually applied to each of the sub-band
signals.
4. EVALUATION OF RESULTS
A general mathematical of the method was given in
section III where a number of sub-band dependent
parameters were introduced. In this section, where a
practical evaluation is performed, some of these
parameters are individually set to the same value for all
sub bands.
For evaluation purpose, we have recorded a signal
with sampling frequency of 8 KHz. The method has been
verified in MATLAB before implemented in real-time.
We then saved the filter coefficients of the analysis and
synthesis filter banks in a format suitable for its use in the
DSK C6713 processor. Then we add these filter
coefficients while implementing the method on a DSK
C6713 processor using CC Studio software. Now input a
sine wave of 2Vp-p from a function generator and observe
the sine wave output of the DSP.
Fig.5. Analysis-synthesis filter bank configuration used to verify the
filter bank implementation. The sub band signals 𝑥𝑘 𝑛 are scaled by sub band scaling constants 𝑔𝑘 to yield sub band output signals
𝑦𝑘 𝑛 = 𝑔𝑘 ∗ 𝑥𝑘 𝑛
To verify the implementation of the analysis and
synthesis filter bank, we find the power spectral densities
of the input noise sequence x[n] and output y[n] and
determine the transfer magnitude function. To do this, a
noise sequence x[n] is generated by using the “randn”
command in MATLAB and this sequence is stored as a
wav-file. We give the generated noise sequence x[n] to the
DSP project which simply copies the ADC input to the
DAC output i.e., without any filter bank processing as in
Fig. 6. We then calculate the PSD of the DAC output and
store it as 𝑃𝑥𝑥 𝑓 as in Fig. 7. These signals are recorded
at a frequency of 44 KHz. Now, we consider the filter
bank processing and then calculate the PSD of the DAC
output and store it as 𝑃𝑦𝑦 𝑓 as in Fig. 8 and Fig. 9. Using
the PSD of output and input signal, we can calculate the
transfer magnitude function𝐻𝑘 𝑓 = 𝑃𝑥𝑥 𝑓
𝑃𝑦𝑦 𝑓
We then compared the estimated transfer function to the
ideal transfer magnitude function. This will verify the
filter bank implementation.
Fig.6. Pass a generated noise sequence through the DSP system
without doing any processing of the signal and we estimate the PSD of the output signal
Fig.7. PSD of the output signal when noise is passed through the
DSP system without doing any processing of the signal
Fig.8. PSD of the output signal after filter bank processing (without AGE)
Now, we input a speech signal x[n] which contains some
music and noise to the DSP and then observe the output
signal after the filter bank processing(without including
AGE) and check whether we are able to hear the same
speech signal without any disturbances or noise
interference. This verifies the implementation of DSP and
then we follow the same procedure including AGE and
then observe the speech signal. Based on the SNR
estimate, the speech signal is amplified and noise is
attenuated which results in an enhanced speech signal
where 𝛼𝑘 and 𝛽𝑘 play an important role in determining
the output speech signal.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-120
-110
-100
-90
-80
-70
-60
-50
-40
-30
Normalized Frequency ( rad/sample)
Pow
er/
frequency (
dB
/rad/s
am
ple
)
Welch Power Spectral Density Estimate
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-100
-80
-60
-40
-20
0
20
Normalized Frequency ( rad/sample)
Pow
er/f
requ
ency
(dB
/rad
/sam
ple)
Welch Power Spectral Density Estimate
JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES, VOLUME 3, ISSUE 4, APRIL 2013
4
Fig.9. PSD of the output signal Pyy[f] after using the AGE Noise
suppressor
Fig.10. Input noisy speech signal
Fig.11. Enhanced speech signal using AGE noise suppression method obtained from DSP
A small 𝛼𝑘 results in unnatural sounding speech with
remaining artifacts. A very large 𝛼𝑘 results in a short term
average that reacts too slowly to the incoming signal
amplitude variations. Hence, the speech attacks will be
cropped and also, speech amplifications with a small
amount of noise will be limited. The positive constant 𝛽𝑘
controls how fast the noise floor level estimate will adapt
to changes in the noise environment. A small value of βk
results in an noise floor level estimate equal to the short
term average while very large value results in slow
convergence and poor noise level tracking capabilities in
non-stationary environments.
As gain function Gk[n] is ratio of short term
average and the noise floor level estimate, we must take
care to avoid singularities causing numerical overflow in
DSP. So, in this implementation, any sample which is
outside DSP range was clipped to minimum sample
value possible.
This method focuses on speech enhancement
rather than noise suppression. We are not actually trying
to improve the SNR by removing noise, but, we are trying
to improve the SNR by amplifying the speech. The time
constants that control the short term average should be
kept in the range of speech pseudo-stationary time, i.e.
about 20-30ms. The noise floor level estimate controlling
parameter 𝛽𝑘 can be varied around 10−6 depending on
the desired effects. The upper bounding of the gain
function affects the resulting speech distortion and
should be kept within 5-20 dB. A larger amplification of
the signal may result in a piercing sounding output
speech. Before implementing the AGE in real-time, it was
implemented using MATLAB andthe results obtained
from off-line MATLAB simulation and an online real-
time DSP implementation of the algorithm are almost
similar.
5. CONCLUSION
A preprocessing noise suppression algorithm using
AGE was developed, implemented and tested with use of
filter bank techniques on DSP processor kit using Code
Composer studio tool. AGE algorithm for speech
enhancement is a straight forward, robust and flexible
method for speech enhancement.
6. REFERENCES
[1] P. P. Vaidyanathan, Multirate Systems and Filter Banks. Prentice
Hall, 1993.
[2] N. Westerlund, M. Dahl, and I. Claesson, “Speech enhancement
for personal communication using an adaptive gain equalizer,”
Elsevier Signal Processing, vol. 85, pp. 1089–1101, 2005
[3] S. Benny, N. Grbic and I. Claesson, “Implementation aspects of
the adaptive gain equalizer,” Array Signal Processing, Research
Report 2006:4
[4] S. Benny, Digital Signal Processors ET1304 Projects. BTH, EE,
Karlskrona, Sweden.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-120
-100
-80
-60
-40
-20
0
Normalized Frequency ( rad/sample)
Pow
er/
frequency (
dB
/rad/s
am
ple
)Welch Power Spectral Density Estimate
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x 105
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
0 1 2 3 4 5 6
x 105
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES, VOLUME 3, ISSUE 4, APRIL 2013
5
Anil Chokkarapuwas born in
Huzarabad, India in 1987. He
completed his Dual Master’s
(M.Sc & M.Tech) programme in
Electrical Engineering with
emphasis on Signal Processing at
Blekinge Institute of Technology,
Karlskrona, Sweden during 2010-
2012. He completed his Bachelor
degree in Electronics and Communication Engineering
from Jawaharlal Nehru Technical University (JNTU)
Hyderabad, India in the year 2009. His areas of research
interests are in Audio/Speech Processing, hearing aids
Biomedical Signal Processing.
Sarath Chandra Uppalapati was
born in 1987 at Hyderabad, India.
He did his Master of Science
(M.Sc) in Electrical Engineering
with emphasis on Signal
Processing at Blekinge Institute of
Technology, Sweden and Master
of Technology (M.Tech) in Signal
Processing at JNTU University, India. He completed his
Bachelor degree in Electronics and Communication
Engineering, from Jawaharlal Nehru Technical University
(JNTU), India. He worked as a Trainee Engineer at
Metox Resistor Industries from July 2008 till august 2009.
His current research interests are Speech Processing,
Image Processing and Neural Networks.
Abhiram Chinthakuntlawas
born in Warangal, India in 1988.
He completed his Bachelor
degree in Electronics and
Communication Engineering
from Jawaharlal Technological
University (JNTU) Hyderabad.
He completed Master’s
programme in Electrical Engineering, with emphasis on
Signal Processing at Blekinge Institute of Technology,
Karlskrona, Sweden. His areas of interests are Digital
signal processing, image/audio processing, Adaptive
systems and Neural Networks.