on application of adaptive decorrelation filtering to assistive listening
TRANSCRIPT
Redistr
On application of adaptive decorrelation filteringto assistive listening
Yunxin Zhaoa)
Department of Computer Engineering and Computer Science, University of Missouri-Columbia, Columbia,Missouri 65211
Kuan-Chieh YenBeckman Institute and Department of ECE, University of Illinois at Urbana-Champaign, Urbana,Illinois 61801
Sig Soli, Shawn Gao, and Andy VermiglioHuman Communication Sciences and Devices Department, House Ear Institute, Los Angeles,California 90057
~Received 24 May 2000; revised 20 July 2001; accepted 12 November 2001!
This paper describes an application of the multichannel signal processing technique of adaptivedecorrelation filtering to the design of an assistive listening system. A simulated ‘‘dinner table’’scenario was studied. The speech signal of a desired talker was corrupted by three simultaneousspeech jammers and by a speech-shaped diffusive noise. The technique of adaptive decorrelationfiltering processing was used to extract the desired speech from the interference speech and noise.The effectiveness of the assistive listening system was evaluated by observing improvements inA-weighted signal-to-noise ratio~SNR! and in sentence intelligibility, where the latter was evaluatedin a listening test with eight normal hearing subjects and three subjects with hearing impairments.Significant improvements in SNR and sentence intelligibility were achieved with the use of theassistive listening system. For subjects with normal hearing, the speech reception threshold wasimproved by 3 to 5 dBA, and for subjects with hearing impairments, the threshold was improved by4 to 8 dBA. © 2002 Acoustical Society of America.@DOI: 10.1121/1.1433815#
PACS numbers: 43.72.Ew, 43.72.Kb, 43.72.Dv@DOS#
Inisr
enre
beabinonaim
temactivdd
ndpro-un-
ra-anhechate
e-tedtionCo-gnalsr-
tech-en-es.omtingono-
tionitiveo-
I. INTRODUCTION
Conventional hearing aids have many limitations.particular, hearing aid users experience difficulties in noacoustic environments with multiple sound sources andverberation~Smedley and Schow, 1992!. In conventionalhearing aids, sound amplification is performed to compsate for the reduction of dynamic range and frequencysponse in hearing-impaired ears without discriminationtween desired speech and interference sounds. Since heimpairments are commonly accompanied by a reducedaural directional hearing that enables selective receptiodesired signal in a sound field, hearing aid users are maffected by the amplified noises. In the past, speech enhament techniques were evaluated for attenuating noiseassisting listening. However, enhancement processingproves only quality but not intelligibility of speech~Delleret al., 1993!.
Since sound sources are in general spatially separamultimicrophone based speech processing offers the proof separating desired speech from interference soundsthereby improving intelligibility of desired speech. Researefforts in this area have been focused on fixed or adapmicrophone-array beamforming to enhance speech in thesired direction and suppress jammer signals in undesiredrections. Hearing aids utilizing fixed beamforming~Kates,1993; Soedeet al., 1993a, 1993b; Stadleret al., 1993! can
a!Electronic mail: [email protected]
J. Acoust. Soc. Am. 111 (2), February 2002 0001-4966/2002/111(2)/1
ibution subject to ASA license or copyright; see http://acousticalsociety.org
ye-
---
ringn-ofrece-nd
-
d,isendhee-i-
realize useful directional gains with relatively simple arobust processing, and hearing aids employing adaptivecessing can achieve very good interference cancellationder certain favorable conditions, in particular low reverbetion ~Peterson, 1989; Greenberg and Zurek, 1992; Hoffmet al., 1994!. Recent research efforts further investigate tincorporation of sound localization into hearing aids, whiwill not only assist speech comprehension but also facilita subjective sense of auditory space~Deslogeet al., 1997;Welker et al., 1997!.
In the current work, a new approach is taken in the dsign of an assistive listening system. This effort is motivaby recent developments of co-channel speech separatechniques in the field of speech and signal processing.channel speech separation extracts source speech sifrom their convolutive mixtures by reducing cross interfeences among the speech signals, and therefore, theniques offer the potential of improving speech comprehsion in acoustic environments with multiple sound sourcSimilarly automatic speech recognition systems suffer frperformance degradation in the presence of compespeech~Coleet al., 1995!, and co-channel speech separatiis therefore also important to real-world applications of spken language technology.
Early research efforts on co-channel speech separafocused on separating competing speech from their addmixture ~Comonet al., 1991; Jutten and Heuralt, 1991; Srouchyari, 1991; Tonget al., 1993!. While these algorithms
1077077/9/$19.00 © 2002 Acoustical Society of America
/content/terms. Download to IP: 129.24.51.181 On: Sun, 23 Nov 2014 01:00:20
erffedo
ehnhh
iste,
tte
97snars
8,eprinen
tivth
oras
nn
inlizalls
e-tivy:elecivs
Thti
ibl-nreo
,ecbeTh
gn
rcesro-umesig-as-
ech
theeath
the
ned
a
m-nd
ring
utu-nelrre-
te-
Redistr
are fast, simple, and capable of separating a large numbspeech sources, they cannot deal with the convolutive eof acoustic paths on mixtures of speech sources, referreas convolutive mixtures. The convolutive effect dependsthe durations of acoustic path impulse responses relativthe stationary periods of speech sound units, where a sresponse results in spectral distortion and a long resporesults in reverberation noise. Recently, research focusbeen directed to the separation of convolutive mixtures. Tmethods can be categorized by using second-order statWeinsteinet al., 1993; Van Gerven and Van Compernoll1995! or higher-order statistics~Yellin and Weinstein, 1994;Shamsunder and Giannakis, 1997!. The methods of the firscategory are easier to implement but may not guaranuniqueness of solution~Weinstein et al., 1993; Yellin and~Weinstein, 1994, 1996; Shamsunder and Giannakis, 19!.The methods in the second category can provide uniquelutions but are more complicated to implement. In additioempirical estimates of second-order statistics are usumore reliable than higher-order statistics, making the fitype algorithms preferable in certain applications.
In a previous work~Yen and Zhao, 1996, 1997, 1991999a!, Yen and Zhao developed a co-channel speech sration system based on the adaptive decorrelation filte~ADF! algorithm proposed by Weinstein, Feder, and Oppheim ~Weinsteinet al., 1993!. The system was effective inseparating two speech source signals from their convolumixtures. Using the system as a processing front end,accuracy of automatic speech recognition~Zhao, 1993, 1996!on co-channel speech was significantly improved. An infmal subjective listening test on the processed speechshowed increased intelligibility. The ADF algorithm wasubsequently generalized to the separation of co-chaspeech signals from more than two sources~Yen and Zhao,1999b! and was shown in a simulation to be effectiveseparating three speech sources. In addition, the generaalgorithm allows extraction of one or more source signfrom convolutive mixtures of the full set of source signaand it reduces computational complexity in such cases~Yenand Zhao, 1999b!.
In the current work, the generalized ADF, simply rferred to as ADF, is evaluated as a technique for assislistening. A ‘‘dinner table’’ scenario is simulated in the studa listener would like to hear a particular talker, but the intligibility of the desired speech is reduced by jammer spefrom other talkers at the table and by background diffusnoise. In each interference condition, the desired speechnals before and after ADF processing were evaluated.objective measure was A-weighted signal-to-noise ra~SNR!, and the subjective measure was sentence intelligity obtained from a formal listening test on eight normahearing subjects and three hearing-impaired subjects. Sigcant improvements in SNR and sentence intelligibility weobserved, indicating the potential of ADF in the designassistive listening systems.
This paper is organized into four sections. In Sec. IIbrief overview is made for ADF-based co-channel speseparation. Details of experimental conditions are descriin Sec. III, and the test results are presented in Sec. IV.
1078 J. Acoust. Soc. Am., Vol. 111, No. 2, February 2002
ibution subject to ASA license or copyright; see http://acousticalsociety.org
ofcttontoortseaseics
e
o-,llyt-
a-g-
ee
-lso
el
eds,
e
-heig-e
oil-
ifi-
f
ahde
implications of the findings of the current work to the desiof assistive listening systems are discussed in Sec. V.
II. OVERVIEW OF ADAPTIVE DECORRELATIONFILTERING
A. Mathematical model of co-channel environment
In a co-channel environment, several speech soumay be active simultaneously, and therefore each micphone acquires a mixture of these speech signals. Assthat there areM speech sources, and the source speechnals are zero-mean and uncorrelated to each other. Alsosume thatM microphones are used to acquire the spesignals, with the microphonei targeting the speech sourcei,i51,2,...,M. Denote the speech signal generated bysourcej asxj (t) and the signal acquired by the microphoniasyi(t), and denote the transfer function of the acoustic pfrom the speech sourcej to the microphonei by Hi j ( f ). Theco-channel speech environment can then be modeled infrequency domain as
Y~ f !5H~ f !X~ f !, ~1!
where the signal vectors are defined asY( f )5@Yi( f )#1< i<M
T , X( f )5@Xi( f )#1< i<MT , with T denoting
vector transpose, and the transfer function matrix is defias H( f )5@Hi j ( f )#1< i<M ,1< j <M . Generally, the acousticpaths are unknown and time varying. As indicated by Eq.~1!,the acquired signalsyi(t) are convolutive mixtures of thesource signalsxi(t).
Each acquired signalyi(t) can be decomposed intosum of two components as
yi~ t !5yi ,T~ t !1yi ,I~ t !, ~2!
whereyi ,T(t)5Hii $xi(t)% represents the target speech coponent, andyi ,I(t) represents the interfering component ais defined asyi ,I(t)5( j 51,j Þ i
M Hi j $xj (t)%. The objective ofco-channel speech separation is to attenuate the interfecomponentyi ,I(t) in each acquired signalyi(t), and henceextract the target componentyi ,T(t) from the convolutivemixture.
B. Adaptive decorrelation filtering
Given that the speech sources are zero-mean and mally uncorrelated, output signals of a perfect co-chanspeech separation system should also be mutually uncolated. Definef i j
(t) , i , j 51,...,M , iÞ j , to be length-N FIR fil-ters that are estimated at timet for separation of sourcespeech signals~Yen and Zhao, 1999b!. Then, the ADF algo-rithm processes the inputyj (t), j 51,...,M of Eq. ~1! andgenerates output signalsv i(t), i 51,...,M according to theequation
v i~ t !5yi~ t !2 (j 51,j Þ i
M
yjT~ t ! f i j
~ t ! , ~3!
where yj (t)’s are length-N vectors defined asyj (t)5@yj (t
2t)#0<t<N21T . Taking decorrelation as the separation cri
rion, i.e.,E$v i(t)v j (t2t)%50, iÞ j , ;t, the FIR filters canbe adaptively estimated as
Zhao et al.: Adaptive decorrelation filtering
/content/terms. Download to IP: 129.24.51.181 On: Sun, 23 Nov 2014 01:00:20
ef
s
r
ioli
torcn
gnmrn
ngli-re
pathrveencogsisepu
s-ag
a-te
thna
tionspairg are-enern-ere
1wasfor
eech
am-
kenoldsasromer,
ncecon-de-
si-
theof
ort-ere
hatise
ation
Redistr
f i j~ t11!5 f i j
~ t !1m~ t !v j~ t !v i~ t !, i , j 51, . . . ,M , iÞ j ,~4!
wherem~t! is an adaption gain. For system stability andficiency,m~t! is chosen as
m~ t !52g
~M21!N( j 51M syj
2 ~ t !, ~5!
where 0,g,1 is an empirical constant, andsyj
2 (t) is the
variance ofyj (t) estimated from the latest input sampleWhen the filters converge, the output signalv i(t) becomesthe extracted source signalxi(t), subject to a certain lineatransformation,i 51, . . . ,M .
Based on Robbins–Monro’s stochastic approximatmethod, a theoretical analysis has been made on the appbility condition of ADF. The analysis shows that in ordereffectively reduce cross interference among speech souin a given acoustic environment, the multimicrophone cofiguration needs to satisfy the condition ofuHi j ( f )H ji ( f )u,uHii ( f )H j j ( f )u, iÞ j , ; f ~Yen and Zhao, 1999b!, i.e., thecross-coupled acoustic paths need to attenuate source simore than the direct acoustic paths do. In the above assuco-channel model, when each microphone is placed closeits target source than to the interference sources, this cotion is satisfied in general.
C. Applications
The proposed application of ADF to assistive listeniin the ‘‘dinner table’’ scenario is but one of many possibities. Some others include teleconference, robust speechognition, stage sound processing, etc. For teleconferencco-channel speech separation system can be used to semultiple talkers’ speech at one site of the meeting andseparated speech signals can be sent to remote sites folective listening. For robust speech recognition inside ahicle, for example, microphones can be distributively placin accordance with the locations of speech and interferesources, such as driver, radio, engine, etc., and the extradriver’s speech would allow more accurate automatic recnition. For stage sound pick up, the recorded signals coning of actors’ voices and special sound effects can be srated, edited, and remixed to generate enhanced soenvironments.
III. EXPERIMENTAL CONDITIONS
In simulating the ‘‘dinner table’’ scenario, the croscoupled channel filters were measured in a sound booth,the co-channel speech signals were computed accordinEq. ~1! by using standard speech and noise materials~thesimulation conditions were verified to be consistent withreal sound field recording!. The estimation of separation filters and the filtering of acquired signals were implemenaccording to Eqs.~4! and ~3!, respectively.
A. Measurement of acoustic paths
The configuration of the sound booth for measuringcross-coupled acoustic paths is shown in Fig. 1. As showthe figure, five ‘‘people’’ sat at a round table with an equ
J. Acoust. Soc. Am., Vol. 111, No. 2, February 2002
ibution subject to ASA license or copyright; see http://acousticalsociety.org
-
.
nca-
es-
alsedto
di-
ec-, arateese--
dceted-t-a-nd
ndto
d
einl
spacing. The microphones were installed 99 below the re-spective loud speakers that represented the mouth posiof the dinner partners. The acoustic paths between eachof speaker–microphone locations were measured, yieldintotal of 25 FIR filters. Based on the measured impulsesponses, the direct-to-reverberant energy ratios at the listand talker locations were computed over 20 TIMIT setences. The ratios at the listener and talker locations wmeasured as20.57 dB and 3.01 dB, respectively.
B. Generation of speech, jammers and diffusive noise
Among the five locations shown in Fig. 1, locationwas chosen for the listener and the cross-table location 4chosen for the talker. The locations 2, 3, and 5 were usedsimultaneous jammers. For the designated talker, the spmaterials of HINT sentences~Nilsson et al., 1994! wereplayed through a load-speaker, and for the designated jmers, the TIMIT speech sentences~Lamelet al., 1986! wereplayed. The HINT speech consists of short sentences spoby a male talker, designed for measuring hearing threshof either normal or impaired hearing. The TIMIT speech wrandomly taken from a database of sentences collected fover 600 male and female talkers. In forming each jammthe silence periods before and after each TIMIT sentewere stripped off, and the extracted sentences were thencatenated. The two simulated interference conditions arescribed below.
Condition 1. Speech jammers
In this condition, the interference consisted of threemultaneous jammers. The signal-to-noise ratio~SNR! wasmeasured at the listener’s location as the ratio ofA-weighted energy of the desired speech to sumA-weighted energies of the three jammers, where the shtime spectra of the desired speech and jammers wA-weighted by a bandpass filter~Pierce, 1994! to emphasizeperceptually important frequency bands. It is noted tA-weighting is a standard method of characterizing noenvironment, and as a contrast, articulation-index~AI ! basedweighting ~French and Steinberg, 1947! is commonly used
FIG. 1. Sound booth where the acoustic paths were measured for simulof ‘‘dinner table’’ scenario.
1079Zhao et al.: Adaptive decorrelation filtering
/content/terms. Download to IP: 129.24.51.181 On: Sun, 23 Nov 2014 01:00:20
ratoo
.
dercoecpedhesoitiotheronrsjateis
erBA
sspfot td
ivbendhth
w
nieesronereaT
spaet
trolent,ula-
n-
in-m-ess-cktiondatadsasedelltic
alsa-sly
are
ence,edtionfor
nde-
h re-a-rceeenerin-ab-as
nt.tione-
theageas-thetheNRh ac-
im-. On
Redistr
for characterizing overall signal power as a result of arprocessing. AI-weighted SNR would be very similarA-weighted SNR in the current study since the spectrasignal and noise were similar. The A-weighted SNR~simplyreferred to as SNR here! was designated the unit of dBAFour levels of SNRs,212, 215, 218, and221 dBA, wereproduced at the listener’s location, with the energies ofsired speech and interference adjusted by the following pcedure. A speech shaped noise was used for sound levelbration, where the spectrum of the speech-shaped nmatched the long-term spectrum of the HINT test spematerials. At the talker’s location, calibration speech-shanoise was first played and its level was adjusted to yielfixed sound level of 65 dBA at the listener’s location. Taverage root mean-squared value of each HINT speechtence was adjusted to be the same as the calibration ni.e., when the sentences were played at the talker’s locathe average received sound level was also 65 dBA atlistener’s location. The sound levels of the three jammwere constrained to be identical at the listener’s locatiAgain, the sound level of the listener’s location was ficalibrated by the speech-shaped noise, and the levels ofmer speech were then adjusted in a sentence-by-senfashion to match the root-mean-squared value of the noAt the listener-location SNRs~LL–SNR! of 212,215,218,and221 dBA, the sound level of each of the three jammat the listener location was 72.2, 75.2, 78.2, and 81.2 drespectively.
Condition 2: Speech jammers and diffusive noise
In this condition, the interference consisted of threemultaneous jammers as well as a speech-shaped diffunoise. In generating the diffusive noise, the speech-shastationary noise as described above was played throughloud speakers that were placed close to the ceiling and afour corners of the sound booth. The SNR was measurethe ratio of the A-weighted energy of the desired speechsum of A-weighted energies of the jammers and the diffusnoise. The simulation procedure was similar to that descriin Condition 1. At the listener’s location, the talker’s soulevel was again fixed as 65 dBA, and the dBA levels of tthree jammers and the diffusive noise were identical. AtLL–SNRs of 212, 215, 218, and221 dBA, the soundlevel of each interference source at the listener location71.0, 74.0, 77.0, and 80.0 dBA, respectively.
C. Validation on the simulation conditions
In order to ensure the simulation conditions to be cosistent with sound field recordings, a pilot test was carrout at the House Ear Institute before the formal listening tIn the pilot test, four subjects with normal hearing were pvided with two sets of HINT speech sentences: one geated by the above simulation conditions and anothercorded in the sound field, where the sound field wconstructed according to the above described procedure.correct word percentages measured from the two setspeech over the four subjects were compared. The comson indicated insignificant differences between the two sof results and hence validated the simulation. Based on
1080 J. Acoust. Soc. Am., Vol. 111, No. 2, February 2002
ibution subject to ASA license or copyright; see http://acousticalsociety.org
y
f
-o-ali-isehda
en-se,n,es.
tm-ncee.
s,
i-iveedurheastoed
ee
as
-dt.-r--
sheofri-tshe
consideration that simulation allows a more precise conof experimental parameters and is convenient to implemthe formal listening tests were conducted under the simtion conditions.
D. ADF implementation
The ADF algorithm was implemented to run in an oline mode, i.e., the output signal samples of each timet wereestimated by using the filter estimates obtained from theput signals up tot, and therefore processing was accoplished in one pass. As a contrast, in a multiple-pass procing, filter estimation would be made iteratively over a bloof data and the filter estimates obtained in the last iterawould be used to perform source separation for the sameblock. Although an iterative implementation in general leato higher SNR gains, one-pass processing was chosen bon the consideration of computation load and delay, as was the potential need for tracking time variation of acouspaths within a block.
In a practical co-channel environment, jammer signmay not be always on. The issue of performing ADF estimtion in the absence of jammers was investigated previouin a two-source separation problem~Yen and Zhao, 1999a!.The finding is that if the cross-coupled acoustic pathsstrong (uHi j ( f )H ji ( f )u'uHii ( f )H j j ( f )u, iÞ j ) and if thejammers are inactive for an extended period of time, thadaptive filter estimation may go wrong. As a consequensignificant distortion or cancellation effect may be observin the estimated target speech signals. A coherence-funcbased active-source detection algorithm was developedthis case~Yen and Zhao, 1999a!, where a coherence functiobetween each pair of system output signals was used totect the active regions of each speech source, and sucgions formed the basis for on–off switching of filter estimtion. Although the method may be extended to the M-soucase along a similar line of idea, the algorithm has not bfully developed and is left for a future work. On the othhand, the interference conditions as considered in the ‘‘dner table’’ scenario were moderate. When jammers weresent, the distortion or cancellation effect on target speechintroduced by the estimation algorithm was insignificaTherefore, in processing the speech signals, the estimaalgorithm was applied from beginning to end instead of bing switched on or off from time to time.
IV. EXPERIMENTAL RESULTS
The assistive listening system was evaluated throughgains of A-weighted signal-to-noise ratio and the percentof correctly recognized words by human subjects. It wassumed that a listener could choose a talker by selectingassistive system’s output that targets the talker. SinceSNR was highest at the selected talker’s location, the Sand word correct percentage were measured on speecquired and processed at the talker’s location, i.e., ony4(t)for conditions before ADF processing and onv4(t) for con-ditions after ADF processing. As such, the measuredprovements represented the net effect of ADF processingthe other hand, the SNRs at the listener location~LL–SNR!
Zhao et al.: Adaptive decorrelation filtering
/content/terms. Download to IP: 129.24.51.181 On: Sun, 23 Nov 2014 01:00:20
nL
nc
luessariv3
ati-R
preD
l
iver
asre-
nu-e,n-uce
usehem-tricec-ciesndde-reethevercts
erere-s,
ereof
ectmo-14CDtingtedingtoofnottest-geding-
ultsfer-
3%,and
ereng-d inL–
oth
g
D
Redistr
represented the difficulty level as experienced by the listewithout the assistive listening system, and therefore the LSNRs were used for identifying the testing conditions.
A. Signal-to-noise ratio
For each specified LL–SNR and under each interferecondition, the SNRs at the talker’s location~TL–SNR! werecalculated before and after ADF processing. Each SNR vawas obtained from one distinct list of 10 HINT sentencwith each list consisting of approximately 50 short wordThe results for the three-jammer interference conditionshown in Fig. 2, and the results for the jammer and diffusnoise interference condition are shown in Fig. 3. In Fig.the results are also summarized by signal-to-jammer r~SJR! before and after ADF where for each LL-SNR condtion, the four bars ordered from left-to-right represent SNbefore processing, SNR after processing, SJR beforecessing, and SJR after processing, respectively. Figurshows that in the jammer-alone interference conditions, A
FIG. 2. The SNR at the talker’s location before and after ADF processinthe jammers only condition.
FIG. 3. The SNR and SJR at the talker’s location before and after Aprocessing in the jammers and diffusive noise condition.
J. Acoust. Soc. Am., Vol. 111, No. 2, February 2002
ibution subject to ASA license or copyright; see http://acousticalsociety.org
er–
e
e,.ee,io
o-2
F
improved the TL–SNR by 8.39 to 9.18 dBA for the initiaLL–SNR conditions of212, through221 dBA. Figure 3shows that in the presence of both jammers and diffusnoise, ADF improved the TL–SNR by 3.58 to 4.11 dBA fothe initial LL–SNRs of212 through221 dBA, and it im-proved the SJR in the range of 7.03 to 8.61 dBA, which wclose to the SNR gain in the jammer-alone case. Thesesults indicate that the ADF technique was effective in atteating jammers but ineffective in attenuating diffusive noiswhich is not surprising since diffusive noise is spatially ucorrelated and decorrelation processing would not prodany impact.
B. Intelligibility
The listening subject test was conducted at the HoEar Institute. A total of 11 subjects were recruited in tstudy, eight with normal hearing and three with hearing ipairments. Normal hearing is defined by having audiomethreshold,30 dBL across the wide band of speech sptrum, with threshold measurements made at the frequen~in Hz! of 250, 500, 1000, 2000, 3000, 4000, 6000, a8000. Every hearing-impaired subject has, to varyinggrees, a severe high-frequency hearing loss. Of the thhearing-impaired subjects, one wore hearing aids, andother two should be wearing hearing aids but have netried them. In the subject listening test, none of the subjewore hearing aids.
The speech data before and after ADF processing wrecorded on CD. There were a total of 14 testing cases,sulting from the combination of two interference conditionfour LL–SNRs, and before and after ADF processing, whin the jammer and diffusive noise condition, the two casesLL–SNR5221 dBA ~before and after ADF processing!were excluded due to extremely low scores of word corrpercentages. In order to avoid effects of learning and merization, the HINT lists were made distinct among thetesting conditions. Subjects were asked to listen to aunder headphones and transcribe a HINT list for each tescondition. Prior to the transcription, subjects were instructo listen to track number 1 and select a comfortable listenlevel with the CD volume control. They were then toldleave the volume control of the setting for the remainderthe experiment. It is noted that the test materials wereadjusted for frequency-dependent hearing loss. In eaching condition, the percentages of word correct were averaseparately for the normal-hearing group and the hearimpaired group.
For the normal-hearing subjects, the listening test resare summarized in Figs. 4 and 5. In jammer alone interence and at the LL–SNR of212, 215, 218, and221 dBA,the absolute gains of word correct percentage were 7.20.2%, 46.6%, and 52.3%, respectively. In both jammersdiffusive noise and at the LL–SNRs of212, 215, and218dBA, the absolute gains of word correct percentage w11.0%, 43.3%, and 29.6%, respectively. For the heariimpaired subjects, the listening test results are summarizeFigs. 6 and 7. In jammer alone interference and at the LSNRs of212, 215, 218, and221 dBA, the absolute gainswere 32.7%, 59.0%, 55.2%, and 37.8%, respectively. In b
in
F
1081Zhao et al.: Adaptive decorrelation filtering
/content/terms. Download to IP: 129.24.51.181 On: Sun, 23 Nov 2014 01:00:20
reseer
ctthhin.peFoat
nect
inc
as-
t
ning-
s,
-ive
r
ive
t
t
the
the
Redistr
jammers and diffusive noise, and at the LL–SNRs of212,215, and218 dBA, the absolute gains of word correct weapproximately 38.0%, 29.1%, and 5.4%, respectively. Baon the results of Figs. 4–7, it is estimated that in jammalone interference, ADF processing improved the speechception threshold by 5 dBA for the normal-hearing subjeand by 8 dBA for the hearing-impaired subjects, and injammer and diffusive noise interference, ADF improved tspeech reception threshold by 3 dBA for the normal-hearsubjects and by 4 dBA for the hearing-impaired subjects
The means and standard deviations of word correctcentage in each testing condition are provided in Table I.the normal-hearing group, ADF not only improved the mevalues but it also reduced the standard deviations. Forhearing-impaired group, the standard deviations remailarge due to the small number of hearing-impaired subje
C. Analysis of intelligibility improvement
A statistical significance test was made on the listentest results. The difference between the percentage word
FIG. 4. The word correct percentage before and after ADF processing injammers only condition for the normal-hearing group.
FIG. 5. The word correct percentage before and after ADF processing injammers and diffusive noise condition for the normal-hearing group.
1082 J. Acoust. Soc. Am., Vol. 111, No. 2, February 2002
ibution subject to ASA license or copyright; see http://acousticalsociety.org
dre-seeg
r-r
nhed
s.
gor-
rect values measured after and before ADF processing issumed to be a Gaussian random variablec with an unknownvariance. The null hypothesisH0 postulated an insignificaneffect of ADF processing, i.e.,E@c#50, and the alternativehypothesisH1 asserted a positive difference, i.e.,E@c#.0.Denote the sample mean and sample variance of the listeevaluation data asc ands2, and denote the number of subjects by n. The test statistic wasq5 c/(s/An), which hada t distribution with n21 degrees of freedom~Papoulis,1991!.
Without diffusive noise and across the four LL–SNRH0 can be rejected with the type-I errors ofa,0.005 (ta
53.335) anda,0.01 (ta54.541) for the two groups of normal hearing and hearing impaired, respectively. In diffusnoise and for the first group,a,0.005 is held at the LL–SNRs of215 and218 dBA, and the level of type-I errowas increased toa,0.025 (ta,2.306) at212 dBA, due to alarger variation of scores before ADF processing. In diffusnoise and for the second group,a,0.01 is held at the LL–SNRs of212 and215 dBA, and the level of significance
he
he
FIG. 6. The word correct percentage before and after ADF processing injammers only condition for the hearing-impaired group.
FIG. 7. The word correct percentage before and after ADF processing injammers and diffusive noise condition for the hearing-impaired group.
Zhao et al.: Adaptive decorrelation filtering
/content/terms. Download to IP: 129.24.51.181 On: Sun, 23 Nov 2014 01:00:20
for the
g
%%%%
%%
%
J. Acoust. Soc. Am.
Redistribution subject to ASA
TABLE I. Means and standard deviations of word correct percentage before and after ADF processingtwo interfering conditions.
Normal-hearing group Hearing-impaired group
Before processing After processing Before processing After processin
LL-SNR mean s.d. mean s.d. mean s.d. mean s.d.
Condition 1: Jammers only212 dBA 92.45% 4.62% 99.76% 0.67% 48.43% 15.25% 81.13% 14.74215 dBA 75.49% 8.95% 95.68% 3.50% 13.07% 4.08% 72.12% 18.21218 dBA 39.90% 11.43% 86.54% 8.66% 5.77% 1.92% 58.97% 13.64221 dBA 19.44% 8.85% 71.70% 8.32% 1.23% 1.07% 38.99% 3.93
Condition 2: Jammers and diffusive noise212 dBA 81.14% 13.22% 92.16% 5.03% 34.50% 7.09% 72.55% 8.99215 dBA 42.73% 10.51% 86.03% 6.66% 20.61% 10.01% 49.67% 13.06218 dBA 7.35% 8.24% 36.99% 17.06% 0.00% 0.00% 5.44% 6.23
sss
eescetyire
gothstetnthbDecreonatPhecg
r-g
icen
SPe
enntoo
zedndsof
ptiveingt toundnts,
toin
ingdi-are
-es,ble,its
ionsthsnes.vethe
ial-ed
ec-mhenalsuch,po-on-temess-
e-tiohe
was increased toa,0.1 (ta51.638) at LL-SNR of218dBA, due to a larger variation of scores after ADF proceing. The increaseda level in the hearing-impaired group wamainly attributed to the small number of subjects~n53!, as itwas difficult recruiting hearing-impaired subjects at the timthe listening test was carried out. In summary, the hypothtest supports the notion that the ADF processing produstatistically significant improvement to speech intelligibilias perceived by both normal hearing and hearing-impasubjects under the studied conditions.
D. System realization
The above experimental results show that the ADF alrithm can effectively reduce cross-interference amongspeech sources. In constructing an assistive listening syfor the ‘‘dinner table’’ scenario, microphones can be arrangon the table so that each microphone targets a dinner parEach listener can tune to a desired talker by selectingcorresponding system output. The ADF algorithm canimplemented on a digital signal processing board. The Aalgorithm as described in the current work is based on dirform FIR filters, which requires floating-point DSP hardwato provide the needed numerical accuracy. The simulatidescribed above, which worked with a 16 kHz sampling rand employed 600-tap FIR filters, required about 270 MM(106 multiplications per second! to separate all four speecsignals, or 154 MMPS to extract only one desired spesignal. As a result, it would require several high-end floatinpoint DSPs such as TMS320C44~with 60 MFLOPS capabil-ity! working in parallel to implement such a system. Altenatively, a similar system working with a 10 kHz samplinrate and extracting only one desired speech signal couldimplemented using a single TMS320C44. Recently, a lattladder structured ADF algorithm has been formulated adeveloped~Yan and Zhao, 2000!. In this formulation, nu-merical stability can be attained while using fixed-point Dhardware, which would allow faster computation and lowpower consumption.
The convergence behavior of the ADF algorithm dpends on the number of speech sources in a co-chaspeech system, the desired number of speech sourcesseparated, and the condition of acoustic paths. In the ab
, Vol. 111, No. 2, February 2002
license or copyright; see http://acousticalsociety.org
-
isd
d
-e
emder.e
eFt-
seS
h-
be-d
r
-elbe
ve
described experiments, the separation filters were initialias zeros. In this cold-start adaptive mode, 5 and 10 secoof adaptive estimation led to approximately 50% and 75%converged SJR gain, respectively, and 30 seconds of adaestimation basically led to converged SJR gain. In a trackmode, the system is expected to be able to quickly adaptime variations of acoustic paths due to changes of sosource positions resulting from head turns, body movemeetc.
Beside the method of placing one microphone closeone target sound source, which may be overly constrictivecertain applications, there are alternative ways of designmicrophone configurations to satisfy the theoretical contion described in Sec. II B. Two alternative methods thatplanned for a future study are described below.
For example, in the ‘‘dinner table’’ scenario, if directional microphones are used instead of omnidirectional onthen microphones may be placed at the center of the tawith each microphone’s receiving pattern directed towardtarget talker. Note that in such a case, the transfer functHi j ( f ) represent the combined effect of room acoustic paand the spatial–temporal response patterns of micropho
A more flexible implementation is to combine adaptiarray beamforming with co-channel speech separation. Incombined approach, beamforming is used for spatselective sound capturing, and multiple beams can be formto capture simultaneous speech signals in different dirtions. It is known that small-sized beamformers suffer frolimited beam resolution. With a further processing on tacquired speech signals by ADF, the jammer speech sigthat are leaked into the beams can be attenuated. As sbeamforming and co-channel speech separation wouldtentially complement each other and relax the design cstraints in each. It is noted that the requirement on syscomputation power may be increased due to the preprocing stage of beamforming.
V. DISCUSSION
Overall, ADF processing has led to significant improvment in both the objective measure of signal-to-noise raand the subjective measure of sentence intelligibility. T
1083Zhao et al.: Adaptive decorrelation filtering
/content/terms. Download to IP: 129.24.51.181 On: Sun, 23 Nov 2014 01:00:20
re
chnoc
thved
inl tonR
heo
iohth
ct
ofp
encrm
Fin
ahe
isir
n
foeeb
isctlathonhomr-icanre
na
e-s.
ys-
ent,’’
ess.
t.
-
lds in
o-
ch-
s
s.
oc.
’ J.
ue-
-.
a
st
a-n,’’
Redistr
following observations are made from the experimentalsults.
ADF is effective in the separation of co-channel speeeven in the presence of diffusive noise. However, ADF isable to attenuate diffusive noise. An integration of speeenhancement with ADF may provide a better solution tointerference condition with both jammers and diffusinoise, the requirement being that the enhancement stepnot significantly distort speech signals.
As indicated by the current work, interference consistof both jammers and diffusive noise was more detrimentaspeech comprehension than interference with jammers alInformal evaluation also revealed that at each given SNthree jammers were more detrimental to speech compresion than one or two jammers, and a higher proportiondiffusive noise was more detrimental than a lower proportof diffusive noise. This may be attributed to the fact that tenergy envelopes of speech jammers are modulated rathan constant, allowing subjects to perceive words correin the low-energy intervals of jammers.
For subjects with normal hearing, the LL–SNR level212 dBA in the interference condition of three jammers apears to be the benefit threshold for the ADF; above212dBA, the subjects were able to comprehend the HINT stence speech that was acquired by the talker-location miphone nearly perfectly without ADF processing. In both jamers and diffusion noise, the benefit threshold for ADappears to be shifted by 3 dBA, i.e., the ADF processshould provide improvement in word correct percentage29 dBA as well. For subjects with hearing impairments, tbenefit threshold of ADF could be shifted to26 dBA, oreven23 dBA in both types of interference conditions. Thshift of threshold also implied the degree of hearing impaments of the subjects who participated in the studies. Ipractical dinner table scenario, the LL–SNRs of26 to 23dBA may be more realistic than29 dBA or below, and theassistive listening system is expected to be useful mainlyusers with hearing impairments. In other scenarios whmore severe LL–SNR conditions are likely, or the targtalker has a very soft voice, the system is expected touseful for users with normal hearing as well.
Based on this ‘‘dinner table’’ study, a conclusiondrawn that ADF is a promising new technique for construing assistive listening devices that will benefit both poputions of normal hearing and hearing impairments. Onother hand, it is expected that in certain acoustic conditiand applications, processing techniques such as microparray beamforming and speech waveform enhancementbe integrated with ADF to achieve flexibility and perfomance that cannot be achieved by ADF alone. The appltion conditions of various types of processing techniquestheir potential integration will be investigated in a futuwork.
ACKNOWLEDGMENT
This work is supported in part by NSF under the graNSF EIA 9911095 and a grant from the Whitaker Foundtion.
1084 J. Acoust. Soc. Am., Vol. 111, No. 2, February 2002
ibution subject to ASA license or copyright; see http://acousticalsociety.org
-
,the
oes
goe.,n-f
neer
ly
-
-o--
gt
-a
rrete
--esneay
a-d
t-
Cole, R. et al. ~1995!. ‘‘The challenge of spoken language systems: Rsearch directions for the nineties,’’ IEEE Trans. Speech Audio Proces3,1–21.
Comon, P., Herault, J., and Jutten, C.~1991!. ‘‘Blind separation of sources,Part II: Problem statement,’’ Signal Process.24, 11–20.
Deller, J. R., Proakis, J. G., and Hensen, J. H.~1993!. Discrete Time Pro-cessing of Speech Signals~Prentice Hall, New York!.
Desloge, J. G., Rabinowitz, W. M., and Zurek, P. M.~1997!. ‘‘Microphone-array hearing aids with binaural output—Part I: Fixed processing stems,’’ IEEE Trans. Speech Audio Process.5, 529–542.
French, N. R., and Steinberg, J. C.~1947!. ‘‘Factors governing the intelligi-bility of speech sounds,’’ J. Acoust. Soc. Am.19, 90–119.
Greenberg, J. E., and Zurek, P. M.~1992!. ‘‘Evaluation of an adaptive beamforming method for hearing aids,’’ J. Acoust. Soc. Am.91, 1662–1676.
Hoffman, M. W., Trine, T. D., Buckley, K. M., and Van Tasell, D. J.~1994!.‘‘Robust adaptive microphone array processing for speech enhancemJ. Acoust. Soc. Am.96, 759–770.
Jutten, C., and Heuralt, J.~1991!. ‘‘Blind separation of sources, Part I: Anadaptive algorithm based on neuromimetic architecture,’’ Signal Proc24, 1–10.
Kates, J. M.~1993!. ‘‘Superdirective arrays for hearing aids,’’ J. AcousSoc. Am.94, 1930–1933.
Lamel, L. F., Kassel, R. H., and Seneff, S.~1986!. ‘‘Speech Database Development: Design and Analysis of the Acoustic Phonetic Corpus,’’Pro-ceedings of Speech Recognition Workshop (DARPA).
Nilsson, M., Soli, S. D., and Sullivan, J. A.~1994!. ‘‘Development of thehearing in noise test for the measurement of speech reception threshoquiet and in noise,’’ J. Acoust. Soc. Am.95, 1085–1099.
Papoulis, A.~1991!. Probability, Random Variables, and Stochastic Prcesses,3rd ed.~McGraw-Hill, New York!.
Peterson, P. M.~1989!. Ph.D. dissertation, Massachusetts Institute of Tenology, Cambridge, MA.
Pierce, A. D.~1994!. Acoustics, An Introduction to the Physical Principleand Applications.
Shamsunder, S., and Giannakis, G. B.~1997!. ‘‘Multichannel blind signalseparation and reconstruction,’’ IEEE Trans. Speech Audio Proces5,515–528.
Smedley, T. C., and Schow, R. L.~1992!. ‘‘Frustrations with Hearing AidUse: Candid Reports from the Elderly,’’ Hear. Res.43, 21–27.
Soede, W., Berkhout, A. J., and Bilsen, F. A.~1993a!. ‘‘Development of adirectional hearing instrument based on array technology,’’ J. Acoust. SAm. 94, 785–798.
Soede, W., Bilsen, F. A., and Berkhout, A. J.~1993b!. ‘‘Assessment of adirectional hearing microphone array for hearing impaired listener,’Acoust. Soc. Am.94, 790–808.
Sorouchyari, E.~1991!. ‘‘Blind separation of sources, Part III: Stabilityanalysis,’’ Signal Process.24, 21–29.
Stadler, R. W., and Rabinowitz, W. M.~1993!. ‘‘On the potential of fixedarrays for hearing aids,’’ J. Acoust. Soc. Am.94, 1332–1342.
Tong, L., Inouye, Y., and Liu, R.~1993!. ‘‘Waveform preserving blind esti-mation of multiple independent sources,’’ IEEE Trans. Signal Process.41,2461–2470.
Van Gerven, S., and Van Compernolle, D.~1995!. ‘‘Signal separation bysymmetric adaptive decorrelation: Stability, convergence, and uniqness,’’ IEEE Trans. Signal Process.43, 1602–1612.
Weinstein, E., and Oppenheim, A. V.~1993!. ‘‘Multi-channel signal separa-tion by decorrelation,’’ IEEE Trans. Speech Audio Process.1, 405–413.
Welker, D. P., Greenberg, J. E., Desloge, J. G., and Zurek, P. M.~1997!.‘‘Microphone-array hearing aids with binaural output—Part II: A twomicrophone adaptive system,’’ IEEE Trans. Speech Audio Process5,543–551.
Yellin, D., and Weinstein, E.~1994!. ‘‘Criteria for multichannel signal sepa-ration,’’ IEEE Trans. Signal Process.42, 2158–2168.
Yellin, D., and Weinstein, E.~1996!. ‘‘Multichannel signal separation: Meth-ods and analysis,’’ IEEE Trans. Signal Process.44, 106–118.
Yen, K., and Zhao, Y.~1996!. ‘‘Robust automatic speech recognition usingmulti-channel signal separation front-end,’’ Proc. ICSLP3, 1337–1340.
Yen, K., and Zhao, Y.~1997!. ‘‘Co-channel speech separation for robuautomatic speech recognition: Stability and efficiency,’’ Proc. ICASSP2,859–862.
Yen, K., and Zhao, Y.~1998!. ‘‘Improvements on co-channel speech sepration using ADF: Low complexity, fast convergence, and generalizatioProc. ICASSP2, 1025–1028.
Zhao et al.: Adaptive decorrelation filtering
/content/terms. Download to IP: 129.24.51.181 On: Sun, 23 Nov 2014 01:00:20
nd
-
-
ionme-
on
Redistr
Yen, K., and Zhao, Y.~1999a!. ‘‘Adaptive co-channel speech separation a
recognition,’’ IEEE Trans. Speech Audio Process.7, 138–151.
Yen, K., and Zhao, Y.~1999b!. ‘‘Adaptive decorrelation filtering for sepa
ration of co-channel speech signals fromM.2 sources,’’ Proc. ICASSP2,
801–804.Yen, K., and Zhao, Y.~2000!. ‘‘Lattice-ladder structured adaptive decorre
J. Acoust. Soc. Am., Vol. 111, No. 2, February 2002
ibution subject to ASA license or copyright; see http://acousticalsociety.org
lation filtering for cochannel speech separation,’’ Proc. ICASSP1,388–391.
Zhao, Y. ~1993!. ‘‘A speaker-independent continuous speech recognitsystem using continuous mixture Gaussian density HMM of phonesized units,’’ IEEE Trans. Speech Audio Process.1, 345–361.
Zhao, Y.~1996!. ‘‘Self-Learning Speaker and Channel Adaptation BasedSpectral Variation Source Decomposition,’’ Speech Commun.18, 65–77.
1085Zhao et al.: Adaptive decorrelation filtering
/content/terms. Download to IP: 129.24.51.181 On: Sun, 23 Nov 2014 01:00:20