a decision-directed adaptive gain equalizer for assistive hearing instruments

1886 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 63, NO. 8, AUGUST 2014

A Decision-Directed Adaptive Gain Equalizer forAssistive Hearing Instruments

Kit Yan Chan, Member, IEEE, Siow Yong Low, Sven Nordholm, Senior Member, IEEE, and Ka Fai Cedric Yiu

Abstract— Assistive hearing instruments have a significantimpact on speech enhancement when the signal-to-noise ratiois low. These instruments are usually developed using theconventional adaptive gain equalizer (AGE), which has lowcomputational complexity and low distortion in real-time speechenhancement. The conventional AGEs are intended to boostthe speech segments of speech signals but they are incapableof suppressing noise segments. The overall speech quality ofthe assistive hearing instruments may be reduced, as the noisesegments still cannot be filtered out. In this paper, a decision-directed AGE is proposed for assistive hearing instruments.It aims to overcome the limitation of the conventional AGE, whichis capable only of boosting speech segments in noisy speech butincapable of suppressing noise segments. The proposed approachsimultaneously boosts the speech segments and suppresses noisesegments in noisy speech. Experimental results with differenttypes of real-world noise indicate that the proposed methodachieves better speech quality than does the conventional AGE.The resulting method provides an improved functionality forassistive hearing instruments.

Index Terms— Adaptive gain equalizer (AGE), assistive hearinginstruments, heuristic algorithms, particle swarm optimization(PSO), perceptual evaluation of speech quality (PESQ), singlechannel filter, speech booster, speech enhancement.

I. INTRODUCTION

ASSISTIVE hearing instruments are in high practicaldemand in scenarios where there is loud environmental

noise such as those found in factories or workshops, whichaffects the speech intelligibility and effectiveness of telecom-munication media [1]. Noisy speech often results in lowerintelligibility and listener fatigue. Hence, it is necessary touse assistive hearing instruments to reduce additive noise fromnoisy speech while ensuring that speech components remainas undistorted as possible [2].

The adaptive gain equalizer (AGE) [3] can efficiently andeconomically be used in assistive hearing instruments, sinceit is a single channel approach and thus it offers low com-plexity, low processing delay and low distortion in real-time

Manuscript received August 7, 2013; revised November 28, 2013; acceptedDecember 1, 2013. Date of publication February 10, 2014; date of currentversion July 8, 2014. The work of K. F. C. Yiu was supported by the RGCunder Grant PolyU 5301/12E. The Associate Editor coordinating the reviewprocess was Dr. Dario Petri.

K. Y. Chan and S. Nordholm are with the Department of Electrical andComputer Engineering, Curtin University, Perth WA 6102, Australia (e-mail:[email protected]).

S. Y. Low is with the School of Electronics and Computer Science,University of Southampton, Malaysia campus, Johor, Malaysia.

K. F. C. Yiu is with the Department of Applied Mathematics, The HongKong Polytechnic University, Hong Kong.

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIM.2014.2302242

speech enhancement [4]–[6]. It overcomes the multichannelmicrophone approaches [7], which are more expensive as theyrequire more microphones.

The conventional AGE can be implemented in assistivehearing instruments by modifying the magnitude spectrumof a speech signal [8], [9]. It adjusts the weighting on themagnitude spectrum, to impose a high gain if speech ispresented. Alternatively, a unity gain is applied if no speech oronly ambient noise is presented. By doing so, the conventionalAGE acts like a speech booster. It amplifies the magnitudespectrum when speech is active, and remains idle when speechis inactive. However, one of the main limitations of the con-ventional AGE is that during the idling period, the backgroundnoise is not suppressed and this results in a reduction ofnoise suppression capability. Although the speech signal isboosted by the assistive hearing instrument implemented withthe conventional AGE, the background noise can still be heard.

In this paper, a novel AGE, namely a decision-direct AGE,is proposed by merging the mechanisms of the conventionalAGE [3] and the decision-directed approach [10]. Unlike theconventional AGE, the proposed technique not only boosts themagnitude spectrums containing the speech components, butit also suppresses ambient noise. Hence, background noise canbe suppressed when the direct-decision AGE is implementedin the assistive hearing instrument. Also, a tradeoff weight isintroduced in the direct-decision AGE to control the amount ofacceptable speech distortion and noise suppression. It allowsfor engineering judgement to obtain the desired tradeoff withrespect to a perceived quality of the speech signal, whiledeveloping the assistive hearing instrument.

To determine the optimal tradeoff weight, we could asknumerous people to listen to the enhanced speech signals,rank the audio quality with a number between one and five,and then average these for the final result. However, this is atime consuming and expensive procedure. Here, the tradeoffweight is optimized with respect to a commonly used objectivemeasure for speech quality namely the perceptual evaluationof speech quality (PESQ) [11], which has been standardizedby the International Telecommunication Union. The PESQscores yield a high correlation with hearing intelligibilityscores [12]. It is commonly used by speech algorithm to reflectan increase in speech intelligibility. Because the assistivehearing instrument is developed based on the optimal direct-decision AGE with respect to the PESQ, the audibility andannoyance of complex distortions can be optimized.

However, the relationship between the PESQ score andthe enhanced speech signal is not linear as an improve-ment in signal-to-noise ratio (SNR) does not translate to an

0018-9456 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

CHAN et al.: DECISION-DIRECTED AGE FOR ASSISTIVE HEARING INSTRUMENTS 1887

Fig. 1. Structure of the AGE with weighting Gm(k).

improvement in speech intelligibility [13]. The problem ondetermining an optimal tradeoff weight with respect to thePESQ is multioptimum. Conventional gradient-based opti-mization methods might not be able to determine this tradeoffweight properly, as there is no guarantee that the global opti-mum will be obtained. Therefore, to solve this optimizationproblem, a hybrid optimization algorithm (HOA) is proposedthat integrates the mechanisms of both local and global searchmethods. It first uses a global optimizer, namely the particleswarm optimization algorithm (PSO) [14], [15], to generatean initial solution with good PESQ scores, since the PSO iseffective in solving many difficult optimization problems [16]particularly relating to noise suppression [17]–[20]. Then, ituses a gradient-based optimization method [21] to locate theoptima with respect to the PESQ.

The effectiveness of the proposed direct-decision AGE isevaluated by performing speech enhancement under four noisyenvironments namely white, factory, babble, and volvo noises.Experimental results indicate that the proposed direct-decisionAGE obtains better PESQ compared with the conventionalAGE. Furthermore, analytical results show that the improvedperformance also contributes to better noise suppression,and negligible expense on both target signal distortion andmusical tones, which are produced by the proposed direct-decision AGE. Hence, higher hearing intelligibility scores canbe obtained when the direct-decision AGE is used in thedevelopment of assistive hearing instruments.

II. ASSISTIVE HEARING INSTRUMENTS WITH AGE

The conventional AGE can be used to boost or suppress thesignal with the specified frequency range by controlling thegain in a particular frequency band [8], [9]. It acts as a speechbooster when speech is present, and it remains idle whenspeech does not exist. It first decomposes the input signal intoM number of sub-bands, which is shown in Fig. 1. After that,each sub-band signal is individually adapted by a particulargain function, based on the estimated SNR in each sub-bandat every time instant. The weighting of each sub-band isadaptively increased by the magnitude of its correspondinggain function when the speech is dominant. Then, all thesub-band signals are added to reconstruct a full-band signal.Hence, this speech enhancement approached is named as AGEas aforementioned.

Since the conventional AGE is implemented in the timedomain, both the sub-band decomposition and reconstructionprocesses can be performed in a straightforward manner.It can be implemented either on digital or analog circuitswith small computational complexity in terms of few millioninstructions per second [4]. Also, no voice activity detection isrequired for the conventional AGE since speech enhancementis based on a continuous estimate of the SNR in each sub-band. It eliminates the step of voice activity detection whichis difficult to tune under low SNR. Therefore, it can beimplemented effectively and economically in real-time speechenhancement for assistive hearing instruments [5].

We consider s(k) to be the original speech where theindex k represents the sampled time index for time instant,t = kTs , in which Ts is the sampling period. Also, let v(k)be the environmental noise such that the noisy speech, x(k) isgiven by

x(k) = s(k)+ v(k). (1)

The mth sub-band representation of the noisy speech shownin Fig. 1 is given by

xm(k) = x(k) ∗ hm(k) = sm(k)+ vm(k) (2)

where ∗ denotes the convolution operator, hm(k) is the band-pass filter in the mth band, and sm(k) and vm(k) are themth sub-band representation of the original speech and noise,respectively.

The enhanced speech processed by the conventional AGEnamely GSB

m (k) is given as

y(k) =M−1∑

m=0

GSBm (k)xm(k) (3)

where GSBm (k) denotes the gain function of the mth sub-band

and M is the number of sub-bands. GSBm (k) acts as the speech

booster [3]. It boosts the signal power during the active speechperiod and remains idle during the speech silence period.

GSBm (k) is represented by the ratio of the estimate of the

noisy signal level, Am(k), to the estimate of the noise floor,Nm (k), and is illustrated as

GSBm (k) =

[Am(k)

Nm (k)

] 1p

(4)

where Am(k) and Nm (k) are given as (5) and (6), respectively,when the noise is slowly varying

Am(k) = (1− α)Am(k − 1)+ α|xm(k)|p (5)

Nm (k) ={

(1+ β)Nm (k − 1), Am(k) > Nm (k − 1)

Am(k), Am(k) ≤ Nm (k − 1).(6)

| · | denotes the absolute value; |xm(k)| is the magnitudespectrum of xm(k); and p is the gain Rise, which representsthe subtraction type. In the conventional spectral subtractiontechnique, GSB

m (k) is used for magnitude subtraction, whenp = 1. When p = 2, GSB

m (k) is used for power subtraction; αis a smoothing constant, which is used to control the sensitivityof GSB

m (k) with respect to the envelope of the sub-band signal;


and β is used to control how fast Nm (k) can be adapted tothe noise changes.

The current estimate of the noise floor in (6), Nm(k), isincrementally increased when the current estimate of the noisysignal level, Am(k), is greater than the previous estimate of thenoise floor, Nm (k−1). Conversely, when Am(k) is smaller thanNm (k−1), Nm (k) is updated based on Am(k). Therefore, thisupdate acts as an attack and decay change, and the estimateof the noise floor is imposed as Nm (k) > 0 to ensure thestability of GSB

m (k). Also, the following limiter is imposed toavoid excessive amplification of GSB

m (k):

GSBm (k) =

{GSB

m (k), GSBm (k) ≤ C

C, GSBm (k) > C

(7)

where C is the upper limit.Based on (4), if Am(k) ≈ Nm (k), the estimate of the noisy

signal level, Am(k), during the speech silent period, is approx-imately the same as the estimate of the noise floor. Hence,GSB

m (k) ≈ 1, and GSBm (k) approaches the function of a by-pass

filter. This is the basic limitation of the conventional AGE inthat the noise is fully by-passed and still exists, when there isno speech signal generated by the user. Therefore, backgroundnoise can still be heard, when the assistive hearing instrumentis implemented using the conventional AGE. To filter the noiseduring the speech silent period, an improved AGE namelydecision-directed AGE is proposed in the following section.

III. DECISION-DIRECTED AGE

A. Mechanism of Decision-Directed AGE

In this paper, a decision-directed AGE, namely G D Dm (k),

is proposed to operate in a twofold manner by incorporatingthe mechanisms of the noise suppression filter [22] and theconventional AGE [3]. When the speech is active, it amplifiesthe speech component and boosts the speech signal as similaroperation as the conventional AGE. When the speech issilent, it imposes a lower gain on the noise envelope andsuppresses noise contribution in an operation similar to thenoise suppression filter. Hence, the noise can still be processedduring the speech silent period, when the assistive hearinginstrument is implemented with the conventional AGE.

Prior to enhancing noisy speech, the decision-directed AGEfirst estimates the a priori SNR for the mth sub-band, Rprio

m (k),based on the following decision-directed approach [10]:

Rpriom (k) = (1− γ ) max{Rpost

m (k − 1), 0}+γ|GN S

m (k − 1)xm(k − 1)|pNm (k)p

(8)

where the a posteriori SNR, Rpostm (k − 1) is defined by

Rpostm (k − 1) = |xm(k − 1)|p

Nm(k − 1)p − 1 (9)

the parameter, γ , is a weighting constant and the noiseenvelope; Nm (k) is readily defined in (6); and Rpost

m (k − 1)in (9) is a local estimate of the SNR computed from thecurrent data frame and Rprio

m (k) in (8) represents the SNR ofthe estimated unknown spectrum from the previous frame.

By coupling the estimations of Rpriom (k) in (8) and

Rpostm (k − 1) in (9), the noise suppression filter, GN S

m (k), canbe produced, where Rprio

m (k) can be referred to the informationon the unknown envelope signals. The resulting Rprio

m (k) canbe used to minimize any annoying musical phenomenon thatis prevalent in conventional methods to an acceptable level[23]–[27]. Here, the noise suppression filter, GN S

m (k), isrepresented by a commonly used Wiener filter [22], which isgiven as

GN Sm (k) = Rprio

m (k)

Rpriom (k)+ 1

. (10)

Based on (10), during the speech silent period (i.e.,sm(k) ≈ 0), the a priori SNR becomes Rprio

m (k) ≈ 0, andthe gain function reduces consequently to GN S

m (k) ≈ 0. Thisin turn results in an artificial sounding output whereas duringthe speech silent period, there is a complete silence. Hence,this may result in an unnatural sounding background.

To avoid this unnatural sound, the decision-directed AGE,G D D

m (k), is proposed by injecting the comfort noise onto theoverall output. It is intended to produce two benefits namely:1) it avoids unnatural complete silence and 2) it masks thepresence of any potential noise artifacts. This comfort noise isinjected based on the tradeoff between GSB

m (k) and GN Sm (k),

which is given as

G D Dm (k) = λGN S

m (k)+ (1− λ)GSBm (k) (11)

where the tradeoff weight, λ, controls the amount of comfortnoise that is injected in the update. Based on (11), the amountof comfort noise is injected from the GSB

m (k) to G D Dm (k)

according to the value of λ, since GSBm (k) is a speech booster,

which passes the noise to G D Dm (k). When λ = 0, G D D

m (k) in(11) reverts to GSB

m (k). Hence, it provides little alteration ofthe noise during speech silent periods. Better speech qualitycan be produced, when the assistive hearing instrument isimplemented with the conventional AGE.

B. Optimization of Decision-Directed AGE

As an illustration, the mth enhanced speech, ym(k), obtainedby G D D

m (k) is given by

ym(k) = G D Dm (k)xm(k) = G D D

m (k)[sm(k)+ vm(k)]. (12)

By substituting (11) with (12), we obtain

ym(k) = [λGN Sm (k)+ (1− λ)GSB

m (k)][sm(k)+ vm(k)]= (1− λ)GSB

m (k)vm(k)+ (1− λ)GSBm (k)sm(k)

+ λGN Sm (k)vm(k)+ λGN S

m (k)sm(k) (13)

which can be simplified as

ym(k) = fAGE(xm(k), k, λ) (14)

where xm(k) is the noisy speech corrupted by the noise vm(k),and fAGE represents the mapping of the decision-directed AGEfrom xm(k) to ym(k) with the tradeoff parameter λ.

When the speech is not active (i.e., sm(k) ≈ 0), GN Sm (k) ≈ 0

according to (10) and the output of the decision-directed AGEis simply the noise floor, which is given by

ym(k) ≈ (1− λ)GSBm (k)vm(k). (15)


Based on (4), GSBm (k) ≈ 1 during the speech inactive

periods. Hence, the output is a scaled version of the actualbackground noise. After injecting the comfort noise as shownin (15), the overall output avoids the unnatural total silenceperiods during the speech inactive periods. Therefore, thetradeoff weight λ provides a variable adjustment with respectto noise suppression, speech comfort, and distortion levels.When λ approaches unity, a higher speech distortion levelis being generated for more noise suppression. When λapproaches zero, the enhanced speech has less distortionbut less noise is filtered from the noisy speech. There-fore, the speech quality in terms of speech reduction andnoise distortion for G D D

m (k) can be traded-off using anappropriate λ.

However, perceptional speech quality may not be improved,although we optimize either the speech reduction and noisedistortion given by the G D D

m (k), since the perceptional speechquality is not closely correlated to these two fundamen-tal audio criteria. For example, perceptually masked cod-ing noise, at a typical SNR of 13 dB, can be completelyinaudible, whereas random noise at the same value of SNRwould be extremely disturbing [28]. Hence, optimizationwith respect to either speech reduction or noise distor-tion might not generate the optimal speech quality underlow SNR.

To obtain a more reliable speech quality for the enhancedspeech, we use the popular objective quality measure namelyPESQ to determine the tradeoff parameter λ, as the PESQcan yield a modestly high correlation with intelligibilityscores [12]. Also, it can be used to correctly distinguishbetween the audible and inaudible distortions and this hasproven to be the best way of accurately predicting the audi-bility and annoyance level of complex distortions [29], [30].Hence, speech quality measure with respect to more audiocriteria can be indicated for speech enhancement mech-anisms [13], [31]. Better speech quality can be pro-duced, when the assistive hearing instrument is implementedusing the directed-decision AGE optimized with respectto PESQ.

The mean opinion score (namely rMOS) for the PESQcan be given by the formulation, which creates an intrusivetest between the enhanced speech, ym(k), and the originalspeech, sm(k)

rMOS = FPESQ[sm(k), ym(k)]. (16)

To determine rMOS given by FPESQ, the detailed computa-tion is given in [28]. First, sm(k) and ym(k) are aligned tothe same constant power level, which is corresponded to thelistening level used in subjective tests. The aligned signals,sm(k) and ym(k), undergo an auditory transformation to mimicthe key properties of human hearing. Hence, the speechcomponents, which are inaudible to listeners, can be filtered.Then, the two disturbance parameters, namely the absolute(symmetric) disturbance and the additive (asymmetric)disturbance are calculated using nonlinear averages over spe-cific areas of the error surface, where the absolute (sym-metric) disturbance is a measure of absolute audible errorand the additive (asymmetric) disturbance is a measure of

audible errors that are much louder than the original speechsignal.

By substituting (14) into (16), the optimal tradeoff weight,λopt, with respect to the PESQ can be determined by solvingthe following optimization problem:

J = minλ∈[0..1] FPESQ[sm(k), fAGE(xm(k), k, λ)]. (17)

Solving the optimization problem (17) could be difficultsince it consists of two nonlinear functions, FPESQ and fAGE,although the dimension of optimization problem (17) is nothigh. In the following section, a HOA that integrates themechanisms of both local and global searches is introducedto solve this optimization problem.

IV. HYBRID OPTIMIZATION ALGORITHM

The gradient-based method could be used to find the opti-mum, λopt, of the optimization problem (17) by systemati-cally moving the solution space [21]. However, there is noguarantee that the global optimum can be obtained due to thenonlinearity of (16). Therefore, a global optimizer, namelyPSO [14], [15], is used to seek the global optimum of (17),since the effectiveness of the PSO has been demonstratedin solving many difficult optimization problems [16], partic-ularly in the development of noise suppression approaches[17]–[20], [32]–[34]. Also, the PSO operation is mostly basedon two formulations on determining particle velocities andparticle locations. It is much simpler than the evolutionaryalgorithms, which involves a few evolutionary operations suchas crossover, mutation, reproduction, chromosome selection,and gradient determination (when the differential evolutionaryalgorithm is used). Hence, computational time can be saved,as the simpler PSO operation is used. The PSO is selected tobe used in this paper.

However, the PSO takes a longer convergence time tolocate the optimum compared with the gradient-based method.Therefore, the HOA comprising the mechanisms of thePSO algorithm and the gradient-based method is proposedto solve (16). In the HOA, the solution obtained by thePSO is used as the initial solution of the gradient-basedmethod. Based on the initial solution, the optimum can belocated more effectively by the gradient-based method thanby solely using the PSO. The pseudocode of the HOAgiven in Algorithm 1 is used to determine an optimum,λopt, with respect to the PESQ formulated in (16), whenthe noisy speech, xm(k), and the original speech, sm(k), aregiven.

In the HOA, λi (t) denotes the i th particle at the tth iteration,where i = 1, 2, …, Ns . Each particle λi (t) represents thetradeoff weight of the directed-decision AEG. At the firstiteration (i.e., t = 1), Ns particles are generated randomlywithin the range from 0 to 1, and then each particle is evaluatedbased on the objective function (16) with respect to xm(k) andsm(k). In (19), the current particle position, λi (t), at the tthiteration is governed by the current particle velocity, vi (t), andthe previous particle position, λi (t−1), at the (t−1)th iteration,where the velocity of the i th particle at the t th iteration is


Algorithm 1 Pseudocode (HAO)Input xm(k), sm(k)⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

t ← 0

Generate λi (t) randomly with i = 1, 2, . . . , Ns .

Evaluate λi (t) based on (16).

while (t is less than the predefined iteration)

do

⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

Set t ← t + 1

Update all velocities v i (t) based on (18).

Generate all new particles λi (t) based on (19).

Evaluate λi (t) based on (16) with respect to

ym(k) and sm(k).

Set λ0 ← g// g is the best particle among all particlesUse simplex search method to locate the optimalsolution λopt using λ0 as the ini tial solution.

return λopt

given by

v i (t) ={

vmax, if vmax < v i (t)′

v i (t), if vmax > v i (t)′

with

v i (t)′ = k · {ω(t) · v i (t − 1)′ + ϕ1 · r1 · (pi − λi (t − 1))

ϕ2 · r2 · (g − λi (t − 1))} (18)

the i th particle position at the tth iteration is

λi (t) = λi (t − 1)+ v i (t). (19)

pi is the position of the best particle at the i th iteration, gis position of the best particle among all the iterations, andr1 and r2 are the random numbers in the range [0, 1]. ω (t) isthe inertia weight factor. ϕ1 and ϕ2 are acceleration constants.k is the constriction factor derived from the stability analysis of(18) for assuring the convergence [35]. By setting appropriateω (t), k, ϕ1 and ϕ2 based on [36], the particle can be convergedeffectively toward pi and g.

After performing several iterations, the particles start mov-ing within a small region on the search domain and thesearching progress slows down. To speed up the process oflocating the optimum, the swarm movement is terminatedwhen it cannot locate a solution with good PESQ. Then,the searching process is conducted using the gradient-basedmethod, which is effective in locating the local optimum. Thefollowing best particle is used as the initial solution, λ0, forthe gradient-based method:

λ0 = maxi∈[1..Ns ]

(λi (T )). (20)

After running the gradient-based method for some iterations,the optimal solution, λopt, can be obtained. The gradient-basedmethod terminates when the objective function has a certaindegree of decline such that

|FPESQ{sm (k) , fAGE(xm (k) , λopt)}−FPESQ{sm (k) , fAGE(xm (k) , λ0)}| ≤ εg . (21)

V. PERFORMANCE EVALUATIONS

A. Experimental Settings

The performance of the proposed decision-directed AGEwas evaluated with five speech sequences voiced by tenspeakers including seven males and three females. The fivespeech sequences consisted of the five Christmas carol titles:Jingle Bells, Santa Claus is Coming to Town, Sleigh Ride,Let It Snow, and Winter Wonderland. Hence, 50 recordedspeech sequences were included. These recorded speechsequences were assumed to be noise free, and were conta-minated artificially with four noisy environments from fournoise data files, namely white, factory, babble, and volvonoises (or car noise), which were all collected from theNOISEX-92 database. They were contaminated by the fournoise sources with different settings of SNRs ranging from−16 to 16 dB.

For the decision-directed AGE, the following parameterswere used: M = 16 [given in (3)]; α = 0.004 [given in (5)],β = 10−4 [given in (6)], γ = 0.9 [given in (8)]; and a Kaiserwindow with a bandwidth of 1/16 was used for the designof a bank of finite impulse response filter. These parametershave been empirically determined to give the best performancepossible for a wide range of input SNRs. For a fair comparisonpurpose, all the experiments have been conducted using theseparameters.

For the HOA, the following parameters were used: Ns = 50;ϕ1 = ϕ2 = 2.05. The effectiveness of those parameters havebeen evaluated by solving a set of parametrical problemswith different landscapes including multioptimum, nonconvex,discontinuous, and undifferentiate functions [36]. Also, theseparameters have been used on PSO for solving many real-world problems [36], [37] including optimal power flow,nonlinear electronic packaging, and neural network design.Also, satisfactory results have been obtained on solving thoseproblems.

Based on those parameters, the tradeoff weight in thedecision-directed AGE can be optimized with respect tothe PESQ using the HOA. As the operational range forthe decision-directed AGE is predefined to be between −16and 16 dB, the optimal tradeoff weights were determined withrespect to the SNRs of −16, −12, −8, −4, 0, 4, 8, 12, and16 dB, respectively.

The following two speech performance indices, namelysegmental SNR measure (SNRseg) [28] and normalized signaldistortion, were used to quantify the noise suppression, and thedistortion to the speech source, respectively.

The SNRseg measure, S, is given as

S = 10

MM−1∑

m=0

10 log10‖ym‖2

‖ym − ym‖2 (22)

where ym represents a clean speech frame (in time domain),ym is the enhanced speech frame, and M is the number offrames of the signal. To discard the nonspeech frames, eachframe had threshold of a −10 dB lower bound and 35 dBupper bound. Here, the frame length of both ym and ym ischosen to be 20 ms.


Fig. 2. Results for the PESQ. (a) White noise. (b) Factory noise. (c) Babble noise. (d) Car noise.

The normalized signal distortion, D is given in decibels(dB) as

D = 10 log10

⎡

⎣ 1

MM−1∑

m=0

|C Pout,s(ωm)− Pin,s(ωm)|⎤

⎦ (23)

where ωm = 2πm/M, is the discretized and normalized fre-quency, and M is the number of FFT points. The normalizingconstant, C, is given as

C =∑M−1

m=0 Pin,s(ωm)∑M−1

m=0 Pout,s(ωm)(24)

where Pin,s(ωm) is the spectral power estimate of the inputsignal and Pout,s(ωm) is the spectral power estimate of theoutput signal.

B. Experimental Results

The speech quality of the directed-decision AEG wasindicated by the PESQ measure. Fig. 2(a) shows the PESQobtained by the original noisy speech signal corruptedwith the white noise with the SNRs of −16, −12, −8,−4, 0, 4, 8, 12, and 16, respectively. It also shows thePESQ obtained by the three enhanced signals, which wereprocessed by the three approaches, conventional AGE [3],Wiener filter [22], and the proposed decision-directed AGE.It indicates that the PESQ obtained by the three enhancedsignals is generally better than the PESQ of the origi-nal noisy speech signal. In general, the proposed decision-directed AGE can obtain better PESQ than those obtainedby both the Wiener filter and the conventional AGE. Also,it can be observed that the proposed decision-directed AGE

can obtain an improvement of 0.2 when the SNR is16 dB. When the SNR is −16 dB, an improvement of 0.5 canbe obtained by the proposed decision-directed AGE. For thefactory noise, Fig. 2(b) shows that more than 0.8 improvementin term of PESQ can be obtained by the proposed decision-directed AGE compared with the original noisy speech signal,when the SNR is 16 dB. When the SNR is −16 dB, animprovement of 0.7 can be obtained.

When SNR = −16 dB, the proposed decision-directed AGEobtains an improvement of 0.15 compared with the conven-tional AGE and Wiener filter. When the SNR is increasing,the proposed AGE can generally obtain better PESQ comparedwith the other two enhancement methods. At SNR = 0 dB, theproposed AGE can obtain an improvement of 0.25 comparedthe conventional AGE. Although the difference between theproposed AGE and the Wiener filter reduces when SNRincreases, better PESQ can still be obtained by the proposedAGE compared with the conventional AGE. At SNR = 16,the proposed AGE can obtain an improvement of 0.2 com-pared with the conventional AGE. In general, the proposeddecision-directed AGE can obtain better PESQ than the othertwo enhancement approaches.

For both the babble and car noises, similar results can befound in Fig. 2(c) and (d), respectively, which show that theproposed decision-directed AGE can produce improvementcompared with both conventional AGE and Wiener filter. Theproposed decision-directed AGE can generally obtain the bestamong the three enhancement approaches. Therefore, theseresults clearly indicate that better PESQ can be obtained by theproposed decision-directed AGE. This can be explained by thefact that the tradeoff weight of the proposed decision-directedAGE is optimized with respect to the PESQ in particular, while


Fig. 3. Measurements for the segSNR. (a) White noise. (b) Factory noise. (c) Babble noise. (d) Car noise.

TABLE I

OPTIMIZATION RESULT FOR λ

the conventional AGE is intended to boost just the speechsignals and the Wiener filter is intended to suppress only thenoise levels.

Table I shows the optimal tradeoff weights with respect tothe tested noise and the considered SNRs. It indicates thatthere is no linear relationship between the tradeoff weight andthe SNR. The PESQ of the directed-decision AEG cannot beimproved by linearly adjusting the tradeoff weights, when theSNR reduces. Hence, the optimal tradeoff weights are neces-sary to be determined for different noisy conditions. Based onthese optimal tradeoff weights, an expert system [38], [39] canbe developed to perform a map between noisy conditions totradeoff weight. When the expert system is incorporated withthe proposed decision-directed AGE, the tradeoff weight canbe adjusted automatically and better speech enhancement canbe achieved under various SNR conditions.

C. Analytical Results

The speech signals processed by the decision-directed AGEwere measured by the two speech quality criteria namely noise

Fig. 4. Normalized gain function for the conventional AGE, i.e., λ = 0 andthe proposed decision-directed AGE for λ = 0.6, λ = 0.9, and λ = 0.99.

suppression level (segSNR) and normalized signal distortion.Fig. 3(a)–(d) show the noise suppression levels (segSNR) forthe three approaches for the four types of noise with differentSNRs. These results clearly shows that the proposed decision-directed AGE generally achieves a higher noise suppressioncompared with the conventional AGE across different SNRs.These figures indicate that the suppression capability of theconventional AGE is generally at the lower level of theproposed AGE.

The noise suppression improvement can be explained byFig. 4. It shows the evolution of the normalized gain functionof the proposed decision-directed gain function at the fifth


Fig. 5. Measurements for the normalized signal distortion. (a) White noise. (b) Factory noise. (c) Babble noise. (d) Car noise.

Fig. 6. Output power plots for (a) corrupted observation. (b) Output from the conventional AGE. (c) Output from the proposed directed-decision AGE forthe four types of noise with SNR = 0 dB.

sub-band (arbitrarily chosen) for the signal corrupted with carnoise (SNR = 0 dB) and varying values of λ. It shows clearlythat the normalized gain function has a lower gain duringspeech silent periods. Hence, the gain function not only booststhe speech components but also imposes a lower gain on thenoise components. Thus, more noise is being attenuated andconsequently better noise suppression is obtained.

Fig. 4 also shows that the level of noise attenuation iscontrolled by the tradeoff parameter λ. For the case of λ =0.99, the normalized gain function is close to zero during thedetected nonspeech periods, whereas for the case of λ = 0.9and λ = 0.6, the noise floor is raised. When λ = 0, the gainfunction corresponds to the conventional AGE. The injectionof noise to raise the noise floor can be mildly viewed as

imposing a spectral floor [40]. However, this is contrary tothe spectral floor method where the noise introduced is theestimated noise spectrum. Since the noise injected is theactual background noise with a reduced volume [see (15)],fewer unnatural sounding artifacts can intrude. The tradeoffparameter, λ, provides a user dependent tradeoff relationshipfor the amount of comfort noise injected against noise suppres-sion. Therefore, better speech quality can be obtained by theproposed decision-directed AGE when the tradeoff parameter,λ, is optimized with respect to the PESQ.

Fig. 5(a)–(d) show the normalized signal distortion for thethree speech enhancement approaches. The results verify thatthe distortion levels for the proposed decision-directed AGE isin close proximity with the conventional AGE. These results


suggest that the proposed decision-directed AGE achieves abetter suppression with negligible expense of target signalintegrity, and also verify that the parameter λ provides avariable tradeoff between distortion and suppression. This isbecause while more noise suppression can be achieved as λis increased, the distortion level on the other hand increases.Therefore, the proposed decision-directed AGE can obtain thetradeoff between signal distortion and noise reduction but theconventional AGE can only boost the speech signal. It explainswhy the proposed decision-directed AGE can obtain betterPESQ than the conventional AGE.

Fig. 6 shows the output power plots for: (a) the originalnoisy speech signals, (b) the enhanced signals of the AGE,and (c) the enhanced signals of the proposed decision-directedAGE for all the noise types considered. The plots show thata lower noise floor is obtained with the proposed decision-directed AGE. They show that the noise floor is just beinglowered and not altered. By doing so, the proposed decision-directed AGE avoids the deleterious musical noise, which pro-duces an unnatural sounding output. Hence, better PESQ canbe obtained by the proposed decision-directed AGE comparedwith the conventional AGE.

VI. CONCLUSION

This paper presented an effective decision-directed AGEintended to be implemented in assistive hearing instruments.It overcomes the limitation of the conventional AGE, which isable to boost speech segments only in noisy speech but is notable suppress noise segments. Unlike the conventional AGE,the proposed decision-directed AGE can simultaneously boostspeech segments and suppress noise segments. Also, it injectscomfort noise to avoid the deleterious musical phenomenonand unnatural total silence. A tradeoff weight, which canbalance noise suppression and the signal distortion levelsthrough the injection of comfort noise, is incorporated inthe proposed decision-directed AGE to further enhance thespeech quality. Hence, the functionality of assistive hearinginstruments can be improved.

The effectiveness of the proposed direct-decision AGEwas evaluated based on a commonly used speech qualitymeasure, namely PESQ, under various SNR conditions withfour noisy environments: white, factory, babble, and car noises.Experimental results showed that the proposed direct-decisionAGE obtained better PESQ compared to the conventionalAGE. Also, analytical results suggested that the improvedperformance was a result of better noise suppression, andnegligible effect of both target signal distortion and musicaltones, which were generated by the proposed direct-decisionAGE. Therefore, better speech quality can be produced by theassistive hearing instrument, which is implemented with theproposed direct-decision AGE.

As environmental noise can affect the accuracy of theinstruments and measurement of speech recognition functions,the proposed method can be further extended by reformulatingthe decision-directed AGE to optimize speech recognitionaccuracy. The resulting method is expected to improve theinstruments and measurement of speech recognition functions[41], [42].

REFERENCES

[1] M. S. M. G. Vlaming, B. Kollmeier, W. A. Dreschler, R. Martin,J. Wouters, B. Grover, et al., “Hearcom hearing in the communicationsociety,” Acta Acustica United Acustica, vol. 97, no. 2, pp. 172–192,2011.

[2] A. H. K. Parsi and M. Bouchard, “Instantaneous binaural target PSDestimiation for hearing aid noise reduction in complex acoustic envi-ronments,” IEEE Trans. Instrum. Meas., vol. 60, no. 4, pp. 1141–1154,Apr. 2011.

[3] N. Westerlund, M. Dahl, and I. Claesson, “Speech enhancement for per-sonal communication using an adaptive gain equalizer,” Signal Process.,vol. 85, no. 6, pp. 1089–1101, Jun. 2005.

[4] B. Sällberg, N. Grbic, and I. Claesson, “Implementation aspects ofthe adaptive gain equalizerph,” Dept. Signal Process, Blekinge Inst.Technol., Boca Raton, FL, USA, Tech. Rep. 2006:04, 2006.

[5] N. Westerlund, M. Dahl, and I. Claesson, “Real-time implementation ofan adaptive gain equalizer for speech enhancement purposes,” WSEASTrans. Comput., vol. 3, no. 1, pp. 25–32, 2004.

[6] N. Westerlund, M. Dahl, and I. Claesson, “Speech enhancement using anadaptive gain equalizer with frequency dependent parameter settings,”in Proc. IEEE 60th Veh. Technol. Conf., Sep. 2004, pp. 3718–3722.

[7] J. P. Dmochowski and R. A. Goubran, “Decoupled beamformingand noise cancellation,” IEEE Trans. Instrum. Meas., vol. 56, no. 1,pp. 80–88, Feb. 2007.

[8] U. Sauvagerd, “A ten-channel equalizer for digital audio-applications,”IEEE Trans. Circuits Syst., vol. 36, no. 2, pp. 276–280, Feb. 1989.

[9] S. M. Kuo, W. S. Gan, and F. L. Shau, “Design and synthesis of theaudio equalizers,” in Proc. 7th Int. Conf. Signal Process., vol. 1. 2004,pp. 579–582.

[10] Y. Ephraim and D. Malah, “Speech enhancement using a minimummean-square error short-time spectral amplitude estimator,” IEEE Trans.Acoust., Speech, Signal Process., vol. 32, no. 6, pp. 1109–1121,Dec. 1984.

[11] I.-T. Recommendation, Perceptual Evaluation of Speech Quality(PESQ): An Objective Method for End-to-End Speech Quality Assess-ment of Narrow-Band Telephone Networks and Speech Codecs, Inter-national Telecommunication Union, Perceptual evaluation of speechquality (PESQ): An objective method for end-to-end speech qualityassessment of narrow-band telephone networks and speech codecs,Geneva, Switzerland, 2001.

[12] J. Ma, Y. Hu, and P. Loizou, “Objective measures for predictingspeech intelligibility in noisy conditions based on new band-importancefunctions,” J. Acoust. Soc. Amer., vol. 125, no. 5, pp. 3387–3405, 2009.

[13] P. Paglierani and D. Petri, “Uncertainty evaluation of objective speechquality measurement in voip systems,” IEEE Trans. Instrum. Meas.,vol. 58, no. 1, pp. 46–51, Jan. 2009.

[14] Y. Shi, “Empirical study of particle swarm optimization,” in Proc. Congr.Evol. Comput., Washington, DC, USA, 1999, pp. 1945–1950.

[15] M. Clerc and J. Kennedy, “The particle swarm—Explosion, stability,and convergence in a multidimensional complex space,” IEEE Trans.Evol. Comput., vol. 6, no. 1, pp. 58–73, Feb. 2002.

[16] K. Parsopoulos and M. Vrahatis, “On the computation of all globalminimizers through particle swarm optimization,” IEEE Trans. Evol.Comput., vol. 8, no. 3, pp. 211–224, Jun. 2004.

[17] N. V. George and G. Panda, “A particle-swarm-optimization-baseddecentralized nonlinear active noise control system,” IEEE Trans.Instrum. Meas., vol. 61, no. 12, pp. 3378–3386, Dec. 2012.

[18] S. T. Pan, “Evolutionary computation on programmable robust IIR filterpole-placement design,” IEEE Trans. Instrum. Meas., vol. 60, no. 4,pp. 1469–1479, Apr. 2011.

[19] J. B. V. Reddy, P. K. Dash, R. Samantaray, and A. K. Moharana,“Fast tracking of power quality disturbance signals using an opti-mized unscented filter,” IEEE Trans. Instrum. Meas., vol. 58, no. 12,pp. 3943–3952, Dec. 2009.

[20] N. K. Rout, D. P. Das, and G. Panda, “Particle swarm optimization basedactive noise control algorithm without secondary path identification,”IEEE Trans. Instrum. Meas., vol. 61, no. 2, pp. 554–563, Feb. 2012.

[21] R. Fletcher, Practical Methods of Optimization. Boca Raton, FL, USA:Wiley, 2000.

[22] B. Widrow and S. D. Stearns, Adaptive Signal Processing. Englewood,NJ, USA: Prentice-Hall, 1985.

[23] O. Cappé, “Elimination of the musical noise phenomenon with theEphraim and Malah noise supressor,” IEEE Trans. Signal AudioProcess., vol. 2, no. 2, pp. 345–349, Apr. 1994.

[24] S. Kanehara, H. Saruwatari, R. Miyazaki, K. Shikano, and K. Kondo,“Theoretical analysis of musical noise generation in noise reductionmethods with decision-directed a priori SNR estimator,” in Proc.IEEE Int. Workshop Acoust. Signal Enhancement, Aachen, Germany,Sep. 2012, pp. 1–4.


[25] R. Miyazaki, H. Saruwatari, T. Inoue, K. Shikano, and K. Kondo,“Musical-noise-free speech enhancement: Theory and evaluation,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., Mar. 2012,pp. 4565–4568.

[26] C. Plapous, C. Marro, and P. Scalart, “Improved signal-to-noise ratioestimation for speech enhancement,” IEEE Trans. Audio, Speech, Lang.Process., vol. 14, no. 6, pp. 2098–2108, Nov. 2006.

[27] P. Scalart and J. V. Filho, “Speech enhancement based on a priori signalto noise estimation,” in Proc. IEEE Int. Conf. Acoust., Speech SignalProcess., vol. 2. May 1996, pp. 629–632.

[28] P. Loizou, Speech Enhancement Theory and Practice. Boca Raton, FL,USA: CRC Press, 2007.

[29] Y. Hu and P. Loizou, “Evaluation of objective quality measures forspeech enhancement,” IEEE Trans. Audio, Speech, Lang. Process.,vol. 16, no. 1, pp. 229–238, Jan. 2008.

[30] L. Ding, A. Radwan, M. S. Hennawey, and R. A. Goubran, “Measure-ment of the effects of temporal clipping on speech quality,” IEEE Trans.Instrum. Meas., vol. 55, no. 4, pp. 1197–1203, Aug. 2006.

[31] A. Rix, J. Beerends, M. Hollier, and A. Hekstra, “Perceptual evalu-ation of speech quality (PESQ)—A new method for speech qualityassessment of telephone networks and codecs,” in Proc. IEEE Int. Conf.Acoust., Speech, Signal Process., vol. 2. Athens, Greece, May 2001,pp. 749–752.

[32] K. Y. Chan, S. Nordholm, K. Yiu, and R. Togneri, “Speech enhancementstrategy for speech recognition microcontroller under noisy environ-ments,” Neurocomputing, vol. 118, no. 22, pp. 279–288, 2013.

[33] K. Y. Chan, S. Nordholm, and K. Yiu, “Multichannel filters for speechrecognition using a particle swarm optimization,” in Proc. 12th Int. Conf.Control Autom. Robot. Vis., Dec. 2012, pp. 937–942.

[34] K. Y. Chan, K. Yiu, T. Dillon, S. Nordholm, and S. Ling, “Enhancementof speech recognitions for control automation using an intelligentparticle swarm optimization,” IEEE Trans. Ind. Inf., vol. 8, no. 4,pp. 869–879, Nov. 2012.

[35] R. C. Eberhart and Y. Shi, “Comparing inertia weights and constric-tionfactors in particle swarm optimization,” in Proc. IEEE Congr. Evol.Comput., vol. 1. La Jolla, CA, USA, Jul. 2000, pp. 84–88.

[36] S. H. Ling, H. H. C. Iu, K. Y. Chan, H. K. Lam, C. W. Yeung, andF. H. F. Leung, “Hybrid particle swarm optimization with waveletmutation and its industrial applications,” IEEE Trans. Syst., Man Cybern.B, Cybern., vol. 38, no. 3, pp. 743–763, Jun. 2008.

[37] N. Mo, Z. Zou, K. Chan, and T. Pong, “Transient stability constrainedoptimal power flow using particle swarm optimisation,” IET Proc.Generat., Transmiss. Distrib., vol. 1, no. 3, pp. 476–483, May 2007.

[38] I. Tashev and M. Slaney, “Data driven suppression rule for speechenhancement,” in Proc. Inf. Theory Appl. Workshop, 2013, pp. 1–6.

[39] S. Srinivasan, J. Samuelsson, and B. Kleijn, “Codebook driven short-term predictor parameter estimation for speech enhancement,” IEEETrans. Audio, Speech Lang. Process., vol. 14, no. 1, pp. 163–176,Jan. 2006.

[40] M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of speechcorrupted by acoustic noise,” in Proc. IEEE Int. Conf. Acoust., Speech,Signal Process., Cambridge, MA, USA, 1979, pp. 208–211.

[41] S. Pan and X. Li, “An FPGA-based embedded robust speech recognitionsystem designed by combining empirical mode decomposition anda genetic algorithm,” IEEE Trans. Instrum. Meas., vol. 61, no. 9,pp. 2560–2572, Sep. 2012.

[42] Y. Zhan, H. Leung, K. Kwak, and H. Yoon, “Automated speakerrecognition for home service robots using genetic algorithm andDempster–Shafer fusion technique,” IEEE Trans. Instrum. Meas.,vol. 58, no. 9, pp. 3058–3068, Sep. 2009.

Kit Yan Chan (M’11) received the Ph.D. degreein computing from London South Bank University,London, U.K., in 2006.

He has held a variety of post-doctoral positionsfrom The Hong Kong Polytechnic University, HongKong, and Curtin University, Perth, Australia. He iscurrently a Senior Lecturer with the Department ofElectrical and Computer Engineering, Curtin Uni-versity. He has published 44 journal papers, 36 con-ference papers, two book chapters, and two researchmonographs. His current research interests include

computational intelligence and its applications to speech enhancement, newproduct design, manufacturing process design, and traffic flow forecasting.

Siow Yong Low received the B.E. (Hons.) and Ph.D.degrees from Curtin University, Perth, Australia, in2001 and 2005, respectively.

He has held a variety of post-doctoral and researchengineer positions, before moving to Curtin Uni-versity, as a Senior Lecturer in 2010. Since 2013,he has been an Academic Member of the Schoolof Electronics and Computer Science, University ofSouthampton, Malaysia Campus, Johor, Malaysia.His current research interests include acoustics sig-nal processing.

Sven Nordholm (M’91–SM’04) received theM.Sc.E.E. (Civilingenjör), Licentiate of engineering,and Ph.D. degrees in signal processing from LundUniversity, Lund, Sweden, in 1992, 1989, and 1983,respectively.

He was one of the founding members of theDepartment of Signal Processing, Blekinge Insti-tute of Technology, Karlskrona, Sweden, in 1990.He held positions as a Lecturer, Senior Lecturer,Associate Professor, and Professor. Since 1999, hehas been with the Curtin University of Technology,

Perth, Australia. From 1999 to 2002, he was the Director of ATRI and Profes-sor with the Curtin University of Technology. From 2002 to 2009, he was theDirector of the Signal Processing Laboratory, Western Australian Telecommu-nication Research Institute, a joint institute between The University of WesternAustralia and Curtin University. He is a Professor of signal processing andthe Head of the Department with the Department of Electrical and ComputerEngineering, Curtin University, Perth. He is a Chief Scientist and co-founderof a start-up company Sensear providing voice communication in extremenoise conditions. His current research interests include speech enhancement,adaptive and optimum microphone arrays, acoustic echo cancellation, adaptivesignal processing, sub-band adaptive filtering, and filter design.

Dr. Nordholm is an Associate Editor for the EURASIP Journal on Advancesin Signal Processing. He is a member of the IEEE TC AASP.

Ka Fai Cedric Yiu received the M.Sc. degreefrom the University of Dundee, Dundee, U.K., andUniversity of London, London, U.K., and the D.Phil.degree from the University of Oxford, Oxford, U.K.

He was involved in industry on different projectsin University of Oxford and University College ofLondon, London. He was with the University ofHong Kong, Hong Kong. He is currently with HongKong Polytechnic University, Hong Kong. He haspublished over 60 journal publications and givenover 30 conference presentations. He was an invited

speaker for several international conferences. He holds a U.S. patent in signalprocessing. His current research interests include optimization and optimalcontrol, signal processing, sensor array processing, FPGA, and algorithmdesigns.

Dr. Yiu serves on the program committee of the conference ASAP and hasserved on the organizing committee of a number of conferences, includingFPT02, LSCM06, and FERM08. He has organized a number of special ses-sions in conferences, including ICOTA7, CISP08, ISCE2009, and EURO2009.

a decision-directed adaptive gain equalizer for assistive hearing instruments

Documents