integration of monaural and binaural evidence of vowel formants

13
Integration of monaural and binaural evidence of vowel formants Michael A. Akeroyd a) and A. Quentin Summerfield MRC Institute of Hearing Research, University Park, Nottingham NG7 2RD, United Kingdom ~Received 28 May 1999; revised 16 November 1999; accepted 4 March 2000! The intelligibility of speech is sustained at lower signal-to-noise ratios when the speech has a different interaural configuration from the noise. This paper argues that the advantage arises in part because listeners combine evidence of the spectrum of speech in the across-frequency profile of interaural decorrelation with evidence in the across-frequency profile of intensity. To support the argument, three experiments examined the ability of listeners to integrate and segregate evidence of vowel formants in these two profiles. In experiment 1, listeners achieved accurate identification of the members of a small set of vowels whose first formant was defined by a peak in one profile and whose second formant was defined by a peak in the other profile. This result demonstrates that integration is possible. Experiment 2 demonstrated that integration is not mandatory, insofar as listeners could report the identity of a vowel defined entirely in one profile despite the presence of a competing vowel in the other profile. The presence of the competing vowel reduced accuracy of identification, however, showing that segregation was incomplete. Experiment 3 demonstrated that segregation of the binaural vowel, in particular, can be increased by the introduction of an onset asynchrony between the competing vowels. The results of experiments 2 and 3 show that the intrinsic cues for segregation of the profiles are relatively weak. Overall, the results are compatible with the argument that listeners can integrate evidence of spectral peaks from the two profiles. © 2000 Acoustical Society of America. @S0001-4966~00!02206-2# PACS numbers: 43.66.Pn, 43.66.Ba, 43.71.Es, 43.71.An @DWG# INTRODUCTION The intelligibility of speech is sustained at lower signal- to-noise ratios ~SNRs! when the speech has a different inter- aural configuration from the noise than when both have the same configuration ~e.g., Licklider, 1948; Levitt and Rabiner, 1967!. The advantage is usually referred to as the binaural intelligibility level difference ~BILD!. It ranges in size up to about 8 dB, depending on the speech stimuli and the configurations that are compared ~e.g., Blauert, 1983!. The BILD arises because the difference in interaural configu- ration allows listeners to extract binaural cues to the spec- trotemporal structure of the speech signal, and these cues supplement cues obtained from monaural analysis. At high SNRs the auditory representation of the across-frequency profile of intensity—the monaural excitation pattern— contains evidence of the spectral structure of the speech ~e.g., Glasberg and Moore, 1990!. In this paper it is argued that, as the SNR is reduced, the representation of the across- frequency profile of interaural decorrelation—conveniently termed the ‘‘binaural excitation pattern’’—contains this evi- dence ~Culling and Summerfield, 1995!. Thus, the intelligi- bility of speech would be sustained across a range of SNRs if listeners could combine evidence from the two excitation patterns. The experiments reported below explored the extent to which such combination is possible. The potential benefits of combining evidence in this way are illustrated in Fig. 1. The stimulus was a voiced vowel synthesized with a fundamental frequency of 100 Hz ~Klatt, 1980!, with the first formant ~F1! at 508 Hz and the second formant ~F2! at 1240 Hz. 1 It is similar to the British-English monophthongal vowel in the word ‘‘heard.’’ It was defined as a two-channel stimulus in which the waveform in the right channel started 400 ms before the waveform in the left chan- nel. It was analyzed in quiet ~top row! or after the addition of diotic white noise ~lower three rows!. The SNR was progres- sively reduced to 220 dB by decreasing the level of the signal. Minus twenty dB is about the lowest SNR at which the members of small sets of synthetic vowels are identifi- ably distinct from each other when presented with an inter- aural time difference ~ITD! of 400 ms in diotic noise ~Culling et al., 1994!. The left column of Fig. 1 contains monaural excitation patterns calculated from the right channel of the stimulus; the right column contains binaural excitation pat- terns calculated from both channels. The procedures used to calculate the excitation patterns are described below in Sec. A. In the monaural excitation patterns, peaks corresponding to the first two formants of the vowel are visible at an SNR of 0 dB. As the SNR is reduced the peaks become less dis- tinct, and are not evident at 220 dB. In comparison, the peaks become more evident in the binaural excitation pat- terns as the SNR is reduced; they are particularly clear at 220 dB. Consequently, the formant frequencies would be best determined from the monaural excitation pattern at high SNRs and from the binaural excitation pattern at lower SNRs. At intermediate SNRs, the vowel would be identified most accurately if the formant information from the two ex- citation patterns was combined. a! Present address: Surgical Research Center, Department of Surgery ~Oto- laryngology! and Center for Neurological Sciences, University of Con- necticut Health Center, Farmington, CT 06030. Electronic mail: [email protected] 3394 3394 J. Acoust. Soc. Am. 107 (6), June 2000 0001-4966/2000/107(6)/3394/13/$17.00 © 2000 Acoustical Society of America Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 155.33.16.124 On: Thu, 27 Nov 2014 17:25:39

Upload: a-quentin

Post on 01-Apr-2017

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Integration of monaural and binaural evidence of vowel formants

Redistr

Integration of monaural and binaural evidenceof vowel formants

Michael A. Akeroyda) and A. Quentin SummerfieldMRC Institute of Hearing Research, University Park, Nottingham NG7 2RD, United Kingdom

~Received 28 May 1999; revised 16 November 1999; accepted 4 March 2000!

The intelligibility of speech is sustained at lower signal-to-noise ratios when the speech has adifferent interaural configuration from the noise. This paper argues that the advantage arises in partbecause listeners combine evidence of the spectrum of speech in the across-frequency profile ofinteraural decorrelation with evidence in the across-frequency profile of intensity. To support theargument, three experiments examined the ability of listeners to integrate and segregate evidence ofvowel formants in these two profiles. In experiment 1, listeners achieved accurate identification ofthe members of a small set of vowels whose first formant was defined by a peak in one profile andwhose second formant was defined by a peak in the other profile. This result demonstrates thatintegration is possible. Experiment 2 demonstrated that integration is not mandatory, insofar aslisteners could report the identity of a vowel defined entirely in one profile despite the presence ofa competing vowel in the other profile. The presence of the competing vowel reduced accuracy ofidentification, however, showing that segregation was incomplete. Experiment 3 demonstrated thatsegregation of the binaural vowel, in particular, can be increased by the introduction of an onsetasynchrony between the competing vowels. The results of experiments 2 and 3 show that theintrinsic cues for segregation of the profiles are relatively weak. Overall, the results are compatiblewith the argument that listeners can integrate evidence of spectral peaks from the two profiles.© 2000 Acoustical Society of America.@S0001-4966~00!02206-2#

PACS numbers: 43.66.Pn, 43.66.Ba, 43.71.Es, 43.71.An@DWG#

l-r-th

h

an

gue

cuign

ss

tlyi-

siote

ay

el

d

dht-

-echtifi-er-

altheat-d toec.ingRdis-

at-r atbeigheredx-

n-ail

INTRODUCTION

The intelligibility of speech is sustained at lower signato-noise ratios~SNRs! when the speech has a different inteaural configuration from the noise than when both havesame configuration~e.g., Licklider, 1948; Levitt andRabiner, 1967!. The advantage is usually referred to as tbinaural intelligibility level difference~BILD !. It ranges insize up to about 8 dB, depending on the speech stimulithe configurations that are compared~e.g., Blauert, 1983!.The BILD arises because the difference in interaural confiration allows listeners to extract binaural cues to the sptrotemporal structure of the speech signal, and thesesupplement cues obtained from monaural analysis. At hSNRs the auditory representation of the across-frequeprofile of intensity—the monaural excitation patterncontains evidence of the spectral structure of the speech~e.g.,Glasberg and Moore, 1990!. In this paper it is argued that, athe SNR is reduced, the representation of the acrofrequency profile of interaural decorrelation—convenientermed the ‘‘binaural excitation pattern’’—contains this evdence~Culling and Summerfield, 1995!. Thus, the intelligi-bility of speech would be sustained across a range of SNRlisteners could combine evidence from the two excitatpatterns. The experiments reported below explored the exto which such combination is possible.

The potential benefits of combining evidence in this w

a!Present address: Surgical Research Center, Department of Surgery~Oto-laryngology! and Center for Neurological Sciences, University of Conecticut Health Center, Farmington, CT 06030. Electronic [email protected]

3394 J. Acoust. Soc. Am. 107 (6), June 2000 0001-4966/2000/107

ibution subject to ASA license or copyright; see http://acousticalsociety.org

e

e

d

-c-eshcy

s-

ifnnt

are illustrated in Fig. 1. The stimulus was a voiced vowsynthesized with a fundamental frequency of 100 Hz~Klatt,1980!, with the first formant~F1! at 508 Hz and the seconformant~F2! at 1240 Hz.1 It is similar to the British-Englishmonophthongal vowel in the word ‘‘heard.’’ It was defineas a two-channel stimulus in which the waveform in the rigchannel started 400ms before the waveform in the left channel. It was analyzed in quiet~top row! or after the addition ofdiotic white noise~lower three rows!. The SNR was progressively reduced to220 dB by decreasing the level of thsignal. Minus twenty dB is about the lowest SNR at whithe members of small sets of synthetic vowels are idenably distinct from each other when presented with an intaural time difference~ITD! of 400ms in diotic noise~Cullinget al., 1994!. The left column of Fig. 1 contains monaurexcitation patterns calculated from the right channel ofstimulus; the right column contains binaural excitation pterns calculated from both channels. The procedures usecalculate the excitation patterns are described below in SA. In the monaural excitation patterns, peaks correspondto the first two formants of the vowel are visible at an SNof 0 dB. As the SNR is reduced the peaks become lesstinct, and are not evident at220 dB. In comparison, thepeaks become more evident in the binaural excitation pterns as the SNR is reduced; they are particularly clea220 dB. Consequently, the formant frequencies wouldbest determined from the monaural excitation pattern at hSNRs and from the binaural excitation pattern at lowSNRs. At intermediate SNRs, the vowel would be identifimost accurately if the formant information from the two ecitation patterns was combined.

:

3394(6)/3394/13/$17.00 © 2000 Acoustical Society of America

/content/terms. Download to IP: 155.33.16.124 On: Thu, 27 Nov 2014 17:25:39

Page 2: Integration of monaural and binaural evidence of vowel formants

-r

.

nt

-

Redistr

FIG. 1. Monaural and binaural excitation patterns for a synthetic exemplaof a voiced vowel with an ITD of1400 ms. The top row contains exci-tation patterns for the vowel in quietThe remaining rows contain excitationpatterns calculated after the additioof a diotic white-noise masker asignal-to-noise ratios of 0,210, and220 dB. The labels ‘‘F1’’ and ‘‘F2’’mark the frequencies of first and second formants.

wcaechtrangcyxcnt

ndomtherciththIle

heito

twce

arintw

the

1tory

tioner-fil-

alf-s a

ss-

ingve-ssar.ingfil-cti-rstght

e

The advantages of integrating evidence from the texcitation patterns should not be restricted to the identifition of steady-state sounds. Integration would also be bencial in the analysis of connected speech. Connected speea time-varying signal in which the amplitudes of the speccomponents may vary from moment to moment over a raof at least 30 dB. At some points in time and frequentherefore, the instantaneous SNR favors the monaural etation pattern, but at other points it favors the binaural extation pattern. In this case, as with steady-state sounds, iligibility would be maximized by combining evidence fromthe monaural and binaural excitation patterns.

The present experiments involved steady-state souExperiment 1 tested the hypothesis that listeners can cbine evidence of the spectral structure of vowels frommonaural and binaural excitation patterns. In this expment, the first formant of a vowel was defined in one extation pattern, while the second formant was defined inother excitation pattern. Experiments 2 and 3 investigatedextent to which such integration is optional or mandatory.these experiments, listeners were required to attend setively to two formants defined in one excitation pattern in tpresence of competing formants defined in the other exction pattern. The results from the three experiments shthat listeners can combine evidence of formants from theexcitation patterns, but find it more difficult to attend seletively to formants in one pattern in the presence of comping formants in the other pattern. Overall, the resultscompatible with the idea that listeners preferentially combevidence of the spectral structure of speech from the

3395 J. Acoust. Soc. Am., Vol. 107, No. 6, June 2000 M. A. Akeroyd a

ibution subject to ASA license or copyright; see http://acousticalsociety.org

o-fi-

isle,ci-i-el-

s.-

ei--ee

nc-

a-wo-t-eeo

excitation patterns, in accordance with the account ofBILD proposed above.

A. Monaural and binaural excitation patterns

The monaural excitation patterns illustrated in Fig.were calculated using a standard model of monaural audianalysis~Pattersonet al., 1995!. The stimulus was initiallyfiltered by a bandpass filter representing the transfer funcof the middle ear. It was then separated into a set of ovlapping frequency channels using a gammatone auditoryterbank. The outputs of the frequency channels were hwave rectified. Their intensity was measured and plotted afunction of the center frequency of the channels in dB.

We define the binaural excitation pattern as the acrofrequency profile of interaural decorrelation.2 The binauralexcitation patterns illustrated in Fig. 1 were calculated usan extension to the monaural model. The left and right waforms of the stimulus were initially filtered by the bandpafilter representing the transfer function of the middle eEach of them was separated into two sets of overlappfrequency channels using matched gammatone auditoryterbanks. The outputs of each channel were half-wave refied. Next, the interaural correlation was calculated by fimeasuring the maximum cross-product of the left and rioutputs l (t, f ) and r (t, f ) of a frequency channelf, as afunction of an internal time delayDt ranging from25000 to15000 ms applied to one output, and then dividing by thaverage power of those outputs3

3395nd A. Q. Summerfield: Monaural and binaural vowel formants

/content/terms. Download to IP: 155.33.16.124 On: Thu, 27 Nov 2014 17:25:39

Page 3: Integration of monaural and binaural evidence of vowel formants

ized tot a

3396 J. Acoust. S

Redistribution subject to ASA

TABLE I. Format frequencies used in the experiments. Within the constraint that frequency was quantmultiples of 2 Hz the bandwidth of each formant was 1 ERB. The labels ‘‘F1’’ and ‘‘F2’’ indicate thaparticular formant was the first formant~F1! or second formant~F2! of each vowel.

Centerfrequency~Hz!

Lowercutoff ~Hz!

Uppercutoff ~Hz!

Vowel

u i Ä }

250 224 276 F1 F1650 604 698 F1 F1950 888 1016 F2 F2

1850 1742 1964 F2 F2

tom

thl e-

authin

ic’’bin

c

nnn

ucelv

can

wla

la

oete,r

st-lueatem-eon

at-uralse.hanre-in

dscy.de-s isoss

ofge-

vedesr-ternralht

ita-

ndelsthe

wo

osept isly.two--

Hzh-

interaural correlation~ f !

5maxS (t50

t5250 ms

l ~ t, f !r ~ t1Dt, f !D YA (

t50

t5250 ms

l ~ t, f !2 (t50

t5250 ms

r ~ t, f !2. ~1!

The interaural decorrelation in each channel was calculaby subtracting the interaural correlation in that channel frone

interaural decorrelation~ f !

512interaural correlation~ f !. ~2!

Interaural decorrelation was then plotted as a function ofcenter frequency of the channel to generate the binauracitation pattern. A logarithmicy axis was chosen to emphasize the correspondence between the binaural and monexcitation patterns and to reflect, approximately, the factthe just-noticeable difference in interaural decorrelationcreases as interaural decorrelation increases~e.g., Pollackand Trittipoe, 1959!.

Interaural decorrelation measures the degree to whthe inputs at the left and right ears differ, where ‘‘differmeans that the inputs cannot be equated using any comtion of internal time delay or intensity change.4 Interauraldecorrelation has been proposed as one of the detectionunderlying the binaural-masking level difference~e.g.,Durlach et al., 1986!. The focus of the present paper is othe use of suprathreshold levels of interaural decorrelatiorecover speech from noise, as in Fig. 1. The across-frequeprofile of interaural decorrelation reveals the spectral strture of the vowel because the value of interaural decorrtion in any frequency channel depends upon the relative leof the vowel and the noise in that frequency channel. Rethat the excitation patterns in Fig. 1 were generated by alyzing a vowel with an ITD of1400 ms masked by a dioticnoise. In those frequency channels where the levels of voand noise are approximately equal, the internal time dewhich best compensates for the vowel~2400 ms! cannotcompensate for the noise. Equally, the internal time dewhich best compensates for the noise~0 ms! cannot compen-sate for the vowel. Therefore, the level of interaural decrelation in these channels is relatively high. In other channthat are dominated by either the vowel or the noise, an innal delay of either2400 or 0ms equalizes the filter outputsso the level of interaural decorrelation is lower. Listene

oc. Am., Vol. 107, No. 6, June 2000 M. A. Akeroyd a

license or copyright; see http://acousticalsociety.org

ed

ex-

ralat-

h

a-

ues

tocy-

a-ellla-

ely

y

r-lsr-

s

have high sensitivity to interaural decorrelation. The junoticeable difference in interaural decorrelation from a vaof zero is in the range from 0.02 to 0.05 for steady-stsounds~e.g., Gabriel and Colburn, 1981; Akeroyd and Sumerfield, 1999!. By the 0.05 criterion, both F1 and F2 of thvowel in Fig. 1 could be detected in the binaural excitatipattern at210 dB, and F1 could be detected at220 dB.

Figure 1 demonstrates that the binaural excitation ptern has didactic value as a means of summarizing binainformation that may support speech identification in noiBeyond that, however, its status is less well-established tthat of the monaural excitation pattern, in at least twospects. First, it has no proven physiological underpinningthe form of a population of units whose activity corresponto the profile of interaural decorrelation across frequenSecond, the capacity of listeners to distinguish differentgrees of interaural decorrelation may not be as great asuggested by the depth of modulation of excitation acrfrequency in Fig. 1. Nonetheless, the explanatory powerthe pattern is shown by the demonstration that the arranment of peaks in the pattern can account for the perceifrequencies of an important subset of the dichotic pitch~Culling et al., 1998a, b!. In those experiments, listeners peformed as if they could access the binaural excitation patand interpret its structure in a similar fashion to a monauexcitation pattern. Thus, it is plausible that listeners migalso extract evidence of formants from the binaural exction pattern.

B. Two-formant vowels

Although evidence of up to five formants can be fouin natural vowels, adequate approximations to some vowcan be created by synthesizing a single formant, andmajority of vowels is adequately defined by synthesizing tformants~e.g., Delattreet al., 1952!. A synthetic sound con-sisting of just two narrow bands of noise centered on thfrequencies also creates a vowel-like percept. The perceweak, but the identity of the vowel can be reported reliabThe present experiments used such a set of syntheticformant vowel-like stimuli. They were similar to stimuli devised by Culling and Summerfield~1995!. F1 was set toeither 250 or 650 Hz and F2 was set to either 950 or 1850~Table I!. The four possible combinations give the BritisEnglish monophthongal vowels /É/ ~found in ‘‘who’d’’ !, /{/~‘‘heed’’ !, /Ä/ ~‘‘hard’’ !, and /}/ ~‘‘haired’’ !. The experi-

3396nd A. Q. Summerfield: Monaural and binaural vowel formants

/content/terms. Download to IP: 155.33.16.124 On: Thu, 27 Nov 2014 17:25:39

Page 4: Integration of monaural and binaural evidence of vowel formants

dl

thnnt

ehipatonoelhierurfieuraoau-ls;re

mra

cy

d

aora

ulw

aytetlonarh

urne

glgl

fn

ant

t-r am 2

-

ndral

nd,

-ion.

efte-

ets

idethatant

Redistr

ments capitalize on the fact that both formants must betected and their frequencies determined for each vowethis set to be identified uniquely.

The formants were constructed by increasing eitherlevel or the interaural decorrelation of narrow bands withidiotic white noise. Each band had a width of 1 equivalerectangular bandwidth~‘‘ERB’’; Glasberg and Moore,1990!. ‘‘Monaural formants’’ were created by increasing thlevel of a band at either 250, 650, 950, or 1850 Hz. Tmanipulation created peaks in the monaural excitationtern at each ear. For this reason, the peaks are referredmonaural formants although the stimulus was diotic aidentical peaks were present at each ear. ‘‘Binaural fmants’’ were created by increasing the interaural decorrtion of a band at either 250, 650, 950, or 1850 Hz. Tmanipulation creates peaks in the binaural excitation patt

In summary, formants are described as either monaor binaural because a monaural formant can be identifrom the information present at either ear, whereas a binaformant can be identified only by a comparison of informtion from both ears. Likewise, ‘‘monaural vowels’’ had twmonaural formants and ‘‘binaural vowels’’ had two binaurformants. Stimuli with two monaural formants or two binaral formants are collectively termed ‘‘same-mode’’ vowestimuli with one monaural and one binaural formant atermed ‘‘mixed-mode’’ vowels.

I. EXPERIMENT 1: INTEGRATION OF INFORMATIONACROSS EXCITATION PATTERN

Experiment 1 investigated whether listeners can cobine evidence of formants from the monaural and binauexcitation patterns. The experiment measured the accuraidentification of synthetic vowels~Fig. 2, left column!. Fourclasses of vowel were used. The same-mode vowels hather two monaural formants~termed ‘‘MM’’ using a termi-nology offirst formant, second formant! or two binaural for-mants ~‘‘BB’’ !. The mixed-mode vowels had eithermonaural first formant combined with a binaural second fmant ~‘‘MB’’ ! or a binaural first formant combined withmonaural second formant~‘‘BM’’ !. If listeners can combineformant information across excitation patterns, they woachieve the same accuracy in identifying mixed-mode voels as same-mode vowels.

If instead listeners cannot combine evidence in this wtheir responses to the mixed-mode vowels must be demined by attention to each excitation pattern independenThus, listeners would report the vowel heard when the maural formant was presented in isolation or the vowel hewhen the binaural formant was presented in isolation. T‘‘single-formant’’ hypothesis was tested by including focontrol conditions in which the stimuli contained just oformant. The conditions are termed ‘‘M-,’’ ‘‘-M,’’ ‘‘B-,’’and ‘‘-B’’ ~Fig. 2, right column!. Listeners were required torespond /É/, /{/, /Ä/, or /}/ according to which vowel theyheard—or the closest vowel they heard—when each sinformant stimulus was presented. Four versions of the sinformant hypothesis were evaluated as quantitative modelspredicting the accuracy of identification in the two-forma

3397 J. Acoust. Soc. Am., Vol. 107, No. 6, June 2000 M. A. Akeroyd a

ibution subject to ASA license or copyright; see http://acousticalsociety.org

e-in

ea-

st-as

dr-a-sn.aldal-

l

-lof

ei-

-

d-

,r-

y.-dis

e-e-ort

conditions from accuracy observed in the single-formconditions.

A. Method

1. Stimuli

The stimuli were created by modifying a digital, flaspectrum, diotic noise constructed by summing togethesine and cosine frequency response at each frequency froto 4000 Hz~inclusive!, so giving the leftL(t) and rightR(t)waveforms

L~ t !5 (f 52 Hz

f 54000 Hz

~AL, f sin~2p f t !1BL, f cos~2p f t !!, ~3!

R~ t !5 (f 52 Hz

f 54000 Hz

~AR, f sin~2p f t !1BR, f cos~2p f t !!, ~4!

whereAL, f , AR, f , BL, f andBR, f were random variables chosen from a Gaussian distribution. For all frequenciesf otherthan those defining the binaural formants,AL, f was equal toAR, f and BL, f was equal toBR, f . The monaural formantswere created by setting the intensity of a 1-ERB-wide bato be 6 dB higher than the flat-spectrum noise. The binauformants were created by setting, within a 1-ERB-wide bathe random variablesAL, f and BL, f to be statistically inde-pendent ofAR, f and BR, f . This method creates an interaurally decorrelated band with a nominal interaural correlatof zero. The formant frequencies are specified in Table I

FIG. 2. Schematic illustrations of the stimuli for experiment 1. The lcolumn contains two-formant stimuli. The right column contains onformant stimuli. Monaural formants~‘‘M’’ ! were created by increasing thlevel of 1-ERB-wide bands in a diotic noise by 6 dB. Binaural forman~‘‘B’’ ! were created by setting the interaural correlation of 1-ERB-wbands to zero. Binaural formants are illustrated as hatched bars. Notebinaural formants do not entail an increase in spectral level. The formfrequencies are specified in Table I.

3397nd A. Q. Summerfield: Monaural and binaural vowel formants

/content/terms. Download to IP: 155.33.16.124 On: Thu, 27 Nov 2014 17:25:39

Page 5: Integration of monaural and binaural evidence of vowel formants

yndeo

hfie

ofll

iom

dB-li

tiadckth

tioouor

is

tichee

th

c

re

avew

iaaioncru-

dthe

cy

anyer

bee toas250

innti-els.of

forncen.

ve

hess

i-less

inwo

inn as

thehed

barsion.

Redistr

Twenty independent tokens of each stimulus were sthesized at a sampling rate of 20 kHz with 16-bit amplituquantization. They were converted to analog by a Loughbough Sound Images digital-to-analog converter~AM/D16DS!. Each channel was passed separately througcustom-built attenuator, a custom-built headphone ampliand a further attenuator~Marconi Instruments type TF2612!.They were presented to listeners over both channelsSennheiser HD-414 headset. Listeners sat in a double-wasound booth. The presentation of stimuli and the collectof responses were controlled by a Dell 486 personal coputer.

The noise spectrum level of the diotic noise was 35~re: 20 mPa!. All the stimuli had a duration of 500 ms, including 20-ms onset and offset raised-cosine ramps appafter the summation of components.

2. Procedure

A single-interval, four-alternative procedure was usedmeasure the accuracy of vowel identification. On each trone stimulus was presented and listeners were requireidentify it using one of four lebeled buttons. No feedbaabout the accuracy of responses was given. Stimuli fromeight conditions~MM, BB, MB, BM, M-, -M, B-, and -B!were randomized together. In each block each combinaof vowel and condition occurred three times. There were fsessions of four blocks each, giving a total for the twformant conditions of 192 trials per condition per listene(54 vowels33 trials per block316 blocks) and a total forthe one-formant conditions of 96 trials per condition per ltener (52 formants33 trials per block316 blocks). At thebeginning of each session listeners completed a pracblock of 24 trials in which feedback as to the identity of eastimulus was given. These blocks consisted of threeamples of each MM vowel5 and three examples of the samvowels but with the amplitudes of components outside1-ERB-wide formant bands set to zero~i.e., without the flat-spectrum diotic noise surrounding the formant peaks!. Theseblocks served to remind listeners of the identity of eavowel in the stimulus set.

3. Listeners

Four listeners participated. Their hearing levels weless than or equal to 20 dB HL at octave frequencies betw500 Hz and 4 kHz inclusive. Their ages were 27 years~Lis-tener A!, 48 ~B!, 21 ~C!, and 24~D!. Listeners A and B werethe authors. The other two listeners were paid for their pticipation. Before data collection began, all listeners receiextensive practice in identifying monaural and binaural voels similar to those used in the experiment.

4. Statistical analyses

A p,0.05 confidence interval based on the binomdistribution tested whether accuracy of identification wabove chance for each combination of listener and conditA single-factor repeated-measures analysis of varia~ANOVA ! and apost hocTukey HSD test tested whetheaccuracy in any two-formant condition differed from acc

3398 J. Acoust. Soc. Am., Vol. 107, No. 6, June 2000 M. A. Akeroyd a

ibution subject to ASA license or copyright; see http://acousticalsociety.org

-

r-

ar,

aedn-

ed

ol,to

e

nr

-

-

ce

x-

e

h

een

r-d-

lsn.e

racy in any other two-formant condition. A two-taileplanned comparison tested whether overall accuracy intwo mixed-mode conditions differed from overall accurain the two same-mode conditions.

A two-factor ~formant frequency3presentation mode!repeated-measures ANOVA tested whether accuracy inone-formant condition differed from accuracy in any othone-formant condition. Note that, becausetwo formants arerequired to define each vowel, no single response cancorrect in the one-formant conditions. Instead, a responseither of the vowels partially defined by the formant wmarked as correct; for example, when the formant was atHz both /É/ or /{/ were scored as correct responses~Table I!.

B. Results

1. Two-formant conditions

The results for the two-formant conditions are shownFig. 3. The open symbols plot the mean accuracy of idefication in each condition, averaged across the four vowThe solid symbols plot the predictions from one versionthe single-formant hypothesis~see the Appendix!. The hori-zontal dashed lines plot the 95%-confidence intervalchance performance. The error bars plot the 95%-confideintervals of the observed levels of accuracy of identificatio

In all conditions, accuracy of identification was abochance. The mean identification scores were 85%~MM !,79% ~BB!, 70% ~MB!, and 77%~BM!. Accuracy of identi-fication differed between the four conditions@F(3,9)54.8,p50.03]. The only significant comparison revealed by tpost hoctest was that the MB vowels were identified leaccurately than the MM vowels~Tukey HSD512.3% atp50.05; difference in means514.9%). The planned comparson showed that the mixed-mode vowels were identifiedaccurately than the same-mode vowels@ t(12)53.0, p50.01]; the difference in accuracy was 8.3%.

2. One-formant conditions

The results for the one-formant conditions are shownTable II. The majority of responses was to either of the t

FIG. 3. Results of experiment 1. Mean percent-correct identificationseach two-formant condition, averaged across the four vowels, are showopen symbols. Predictions from method 1~described in the Appendix! of thelevels of performance in each condition from patterns of responses inone-formant conditions are shown as filled symbols. The horizontal daslines plot the 95%-confidence interval for chance performance. Errorplot the 95%-confidence intervals estimated using the binomial distributEach symbol is based on 192 observations~experimental data! or 96 obser-vations~predictions!.

3398nd A. Q. Summerfield: Monaural and binaural vowel formants

/content/terms. Download to IP: 155.33.16.124 On: Thu, 27 Nov 2014 17:25:39

Page 6: Integration of monaural and binaural evidence of vowel formants

itions,ich each

3399 J. Acoust. S

Redistribution subject to ASA

TABLE II. Results of experiment 1. Proportion of responses to each vowel in the single-formant condaveraged across the four listeners. The responses marked in bold correspond to the two vowels of whsingle formant is a component.

Formantfrequency~Hz! Condition Mode

Responses~percent!

u i Ä }

250 M2 Monaural 66 23 5 6650 M2 Monaural 5 2 35 58950 2M Monaural 27 1 71 1

1850 2M Monaural 1 61 1 37

250 B2 Binaural 55 38 4 3650 B2 Binaural 4 4 36 55950 2B Binaural 19 1 76 5

1850 2B Binaural 11 45 7 36

edndnt-

Hz

ntThsnoontifeltw

cur

be

fie

H-a

mt

Tha

fett

liedzHz

theuldof

mi-hy.

thatse-theerssteddsntt lis-en-thet a

lsts inhati-

rshenralen-theral

heat-ac-in

antult

uralfre-

‘‘correct’’ vowels in which the formant could occur~cellsmarked in bold!, although responses were not distributevenly between the two. Only occasionally was a respomade to either of the other vowels. Correct performancenot differ between the monaural and binaural conditio@F(1,3)51.7, p50.3]. There was no main effect of formanfrequency@F(3,9)50.96,p50.5], although there was an interaction between frequency and mode@F(3,9)57.3, p50.009], in that performance was worst in the 1850-binaural-formant condition.

3. Relationship between one-formant and two-formantresults

One account of the identification of the two-formavowels is that listeners combined evidence of F1 and F2.alternative, single-formant hypothesis, is that listeners batheir identification responses on either F1, or F2, butboth. Superficially, this second hypothesis is plausible. Csider first that there was a tendency for listeners to ideneach single formant predominantly as one of the four vowrather than dividing their responses equally between thevowels to which the formant could belong~Table II!. Thisoutcome was expected. In general, a single formant, planear the frequency of a concentration of energy in a natvowel, may be heard as that vowel~e.g., Delattreet al.,1952!. Thus, a single formant at a low frequency mayheard as /É/, at a low/mid frequency as /}/, at a high/midfrequency as /Ä/, and at a high frequency as /{/. This patterncan be seen in Table II.

The pattern suggests that listeners might have identitwo-formant vowels by attending to a particular~‘‘domi-nant’’! single formant in each pair. For example, the 250-monaural formant was identified as /É/ on 66% of presentations as a single-formant stimulus, while the pairing of thformant with the 950-Hz monaural formant yielded the sapercentage of /É/ responses~66%!. The primary requiremenfor this explanation to be viable is that there should besystematic hierarchy of dominance among the formants.hierarchy can be constructed by assessing each two-formstimulus as follows, starting with the stimulus composed o250-Hz F1 and a 950-Hz F2. This stimulus was identifipredominantly as /É/, while, in isolation, the 250-Hz formanwas identified predominantly as /É/ and the 950-Hz formanwas identified predominantly as /Ä/. Thus, the 250-Hz for-

oc. Am., Vol. 107, No. 6, June 2000 M. A. Akeroyd a

license or copyright; see http://acousticalsociety.org

seids

eedt-

ys,o

edal

d

z

te

aent

ad

mant dominated the 950-Hz formant. The same logic appto the other two-formant stimuli implies that the 1850-Hformant dominated the 250-Hz formant, and that the 650-formant dominated the 1850-Hz formant. To completehierarchy, it is necessary that the 650-Hz formant shodominate the 950-Hz formant, so that the combinationthese two formants would be heard as /}/. In fact, the com-bination was heard strongly as /Ä/, so that 950-Hz formantdominated the 650-Hz formant. Thus, the pattern of donance is circular, rather than forming a systematic hierarc

This outcome is strong evidence against the idealisteners identified the two-formant vowels by attendinglectively to one of the constituent formants because, inabsence of a systematic hierarchy of dominance, listenwould not know which formant in a two-formant stimulushould receive their attention. The conclusion is consolidain the Appendix, which evaluates four quantitative methofor predicting the two-formant results from the one-formaresults. One of these methods assumes, implausibly, thateners knew which single formant should receive their atttion to maximize their accuracy. Both this method, andother three which did not include the assumption, prediclower level of accuracy in identifying two-formant vowethan the listeners achieved. Taken together, the argumenthis section and the analyses in the Appendix imply tlisteners identified the two-formant vowels from the evdence of both formants.

C. Discussion

There are four results of interest. First, listeneachieved accuracy that was significantly above chance widentifying two-formant vowels composed of one monauformant and one binaural formant. Second, accuracy of idtification of these vowels was nevertheless poorer thanaccuracy achieved with vowels composed of two monauformants or two binaural formants. The penalty from trequirement to combine information across excitation pterns was small; however, it amounted to a reduction incuracy of less than 10%. Third, accuracy of identificationthe mixed-mode conditions did not depend on which formwas monaural and which formant was binaural. This reswas not expected, given that the ease with which interadecorrelation can be detected declines with increasing

3399nd A. Q. Summerfield: Monaural and binaural vowel formants

/content/terms. Download to IP: 155.33.16.124 On: Thu, 27 Nov 2014 17:25:39

Page 7: Integration of monaural and binaural evidence of vowel formants

,w

anurmra

orinc

therefo

o

cois

h

ti

ilabu7

tioa

oblt

nethle

aaun-orere

raxra

eaoith

o-wofn-ura

ter-weld atctedulitionn.ar-

d totifi-beel.

au-ofof

lusnt

w-of

task,re-eyndhethe

tedas in—r-—tural-en-

tingnedurale-

thethener-

e-es-otalau-ralask

dck

Redistr

quency~e.g., Gabrielet al., 1992; Akeroyd and Summerfield1999!. The expected result was shown in other ways, hoever. In the single-formant conditions, the binaural format 1850 Hz was identified less consistently than the binaformants at the three lower frequencies. In addition, copared with performance when both formants were monauperformance was significantly lower when the higher fmant was binaural but not when the lower formant was baural. The fourth result was that in all conditions accurawas higher than predicted by any of the versions ofsingle-formant hypothesis that are evaluated in the Appdix. We conclude that listeners based their identificationsponses to the two-formant vowels on evidence of bothmants together, rather than by attending to one modesome trials and the other mode on other trials.

Each binaural formant was created by interaurally derrelating a narrow region of an originally diotic noise. Thmanipulation creates a dichotic pitch~Akeroyd and Summer-field, 2000! perceptually similar to the Huggins pitc~Cramer and Huggins, 1958!. Culling et al. ~1998a! showedthat pairs of formants created using a different dichopitch—the binaural edge pitch~Klein and Hartmann,1981!—can also be identified as vowels. They used a simdesign to the BB condition of the present experiment,with F1 set to either 225 or 625 Hz and F2 set to either 9or 1925 Hz. They observed a mean accuracy of identificaof 59%. Although above chance, this value is smaller thwas observed here in the BB condition. The difference prably arises because the binaural edge pitch can be difficuperceive at the highest frequency~1925 Hz! at which Cullinget al. placed a formant. In contrast, the results from the oformant conditions of the present experiment show thathighest formant~1850 Hz! was detected reliably in binauraconditions~Table II!, though with lower accuracy than thformants at the other three frequencies.

In summary, the results of experiment 1 indicate thlisteners can combine formant information from the monral excitation pattern with formant information from the biaural excitation pattern. The authors’ experience suppthis conclusion. In the mixed-mode conditions we perceiva single noisy vowel, although with some stimuli we wealso aware of the tonal quality of the binaural formant.

II. EXPERIMENT 2: SELECTIVE ATTENTION TO ONEEXCITATION PATTERN

Experiment 2 sought to determine whether the integtion of monaural and binaural information observed in eperiment 1 is mandatory or optional. The design of expement 1 deliberately encouraged integration, insofarlisteners were instructed to make a single response ontrial, and integration resulted in percepts that permitted mconsistent patterns of response than did attention to emode alone. Accordingly, in experiment 2 a strongdis-incentive for integration was introduced. Pairs of twformant vowels were presented simultaneously. One vowas defined monaurally~in the sense described in Sec. Bthe Introduction! and the other was defined binaurally. Biaural and monaural vowels can be distinguished becathey have different timbres and lateral positions. A binau

3400 J. Acoust. Soc. Am., Vol. 107, No. 6, June 2000 M. A. Akeroyd a

ibution subject to ASA license or copyright; see http://acousticalsociety.org

-tal-l,

--yen--

r-n

-

c

rt5nn-to

-e

t-

tsd

--i-sch

reer

el

sel

vowel presented in isolation has a tonal timbre and is laalized to one or the other side of the head. A monaural vopresented in isolation has a noisy timbre and is lateralizethe center of the head. Listeners could therefore be instruto identify one or the other vowel. The design of the stimensured that any combination of formants across excitapatterns would generally result in an incorrect identificatio

The spectral contrast of each monaural formant was vied between 0 and 12 dB. If listeners can selectively attenthe binaural excitation pattern, then the accuracy of idencation of the binaural vowel will be above chance and willindependent of the spectral contrast of the monaural vowFurthermore, if listeners can selectively attend to the monral excitation pattern, then the accuracy of identificationthe monaural vowel will not be reduced by the presencethe binaural vowel.

A. Method

There were two sets of stimuli. In one set, each stimuwas composed of a binaural vowel paired with differemonaural vowel~‘‘binaural-present’’ stimuli; Fig. 4!. Onlythe 12 nonidentical paris that can be formed from four voels were included. The second set of stimuli consistedmonaural vowels alone~‘‘binaural-absent’’ stimuli!. Listen-ers performed two tasks in separate sessions. In onethey heard only the binaural-present stimuli and werequired to identify the binaural vowel. In the other task, thheard a randomized mixture of binaural-present abinaural-absent stimuli and were required to identify tmonaural vowel. Listeners undertook the sessions withmonaural task before those with the binaural task.

The stimuli were constructed, calibrated, and presento listeners using the same techniques and apparatusexperiment 1. A single value of interaural correlationnominally zero—defined the binaural formants. Five diffeent values of spectral contrast—0, 3, 6, 9, and 12 dBdefined the monaural formants~when the spectral contraswas 0 dB, the stimulus had a flat spectrum and no monavowel was specified!. The two monaural formants in a stimulus always had the same spectral contrast. Twenty indepdent tokens of each stimulus were created. Pilot tesshowed that monaural vowels composed of formants defiby 3 dB of spectral contrast were less clear than the binavowels, while monaural vowels composed of formants dfined by 12 dB of spectral contrast were clearer thanbinaural vowels. Thus, the experiment would exploreability of listeners to segregate binaural formants with odegree of clarity when in competition with monaural fomants covering a range of clarity.

A single-interval, four-alternative forced-choice procdure similar to experiment 1 was used. Across multiple ssions, listeners undertook a series of blocks containing a tof 240 trials for each spectral contrast for each of the binral task with the binaural-present stimuli, the monautask with the binaural-present stimuli, and the monaural twith the binaural-absent stimuli~512 vowel pairs320 blocks31 trial per block in each case!. Each session was precedeby a 24-trial practice block. For the binaural tasks this blocontained single binaural vowels~BB!. For the monaural

3400nd A. Q. Summerfield: Monaural and binaural vowel formants

/content/terms. Download to IP: 155.33.16.124 On: Thu, 27 Nov 2014 17:25:39

Page 8: Integration of monaural and binaural evidence of vowel formants

osa

iaincesn-u

cuthpec

gethin

dB

of

efthethe

atasti-a-

ghthe

--dB

endacy

eon-

ofa

ralely

thewelig-ral

i fo

-om

ad s

rrectnelthe

herval

Redistr

tasks this block contained single monaural vowels~MM !.Listeners were thus reminded of the timbre and lateral ption of the to-be-identified vowels. The same four listenersbefore participated.

A p,0.05 confidence interval based on the binomdistribution tested whether accuracy of identification bydividual listeners in individual conditions was above chanA single-factor ~spectral contrast! repeated-measureANOVA tested whether accuracy of identification in the biaural task depended on the spectral contrast of the monavowels. A two-factor ~binaural presence/absence3spectralcontrast! repeated-measures ANOVA tested whether acracy of identification in the monaural task depended onpresence or absence of the binaural vowel. In the 0-dB stral contrast condition, no monaural vowel was defined. Acordingly, the majority of the ANOVAs considered the ranof spectral contrast from 3 to 12 dB. The exception wasanalysis of the number of correct binaural identificationsthe binaural task. The binaural vowel did exist in the 0-

FIG. 4. Schematic illustrations of a subset of the binaural-present stimulexperiment 2~cf Fig. 2!. All combinations for a monaural /É/ ~upper threepanels! and for a binaural /É/ ~lower three panels! are shown. Some combinations require formants to be presented at three frequencies; other cnations require formants at four frequencies. Note that in the three-formcombinations, a binaural formant coincides with a monaural formant anis associated with an increase in spectral level.

3401 J. Acoust. Soc. Am., Vol. 107, No. 6, June 2000 M. A. Akeroyd a

ibution subject to ASA license or copyright; see http://acousticalsociety.org

i-s

l-.

ral

-ec--

e

condition, and so the ANOVA considered the full rangespectral contrasts from 0 to 12 dB.

B. Results

1. Binaural task

The results for the binaural task are shown in the lpanel of Fig. 5. The mean accuracy of identification of tbinaural vowel deteriorated as the spectral contrast ofmonaural vowel increased; accuracy decreased from 64%0-dB spectral contrast to 45% at 12-dB spectral contr@F(4,12)59.4, p50.001#. Nonetheless, accuracy of identfication was above chance in all but two of the 20 combintions of listener and spectral contrast.

2. Monaural task

The results for the monaural task are shown in the ripanel of Fig. 5. The mean accuracy of identification of tmonaural vowel in the binaural-absent stimuli~filled sym-bols! was higher than in the binaural-present stimuli~opensymbols!; the mean difference was 17%@F(1,3)567, p50.004#. With the binaural-present stimuli accuracy improved as the spectral contrast increased: from 31% at 3spectral contrast to 83% at 12-dB spectral contrast@F(3,9)552, p,0.0001#. Accuracy of identification was abovchance in all but two of the 16 combinations of listener aspectral contrast. With the binaural-absent stimuli accuralso improved: from 50% at 3 dB to 95% at 12 dB@F(3,9)5159, p,0.0001#. Accuracy of identification was abovchance in all 16 combinations of listener and spectral ctrast.

C. Discussion

In the majority of conditions listeners achieved levelsperformance that were above chance when identifyingmonaural vowel in the presence of a competing binauvowel and vice versa. Evidence that listeners can selectivattend to each excitation pattern individually comes fromconditions where the spectral contrast of the monaural vowas 6 dB. In this condition, all four listeners performed snificantly above chance both when identifying the binauvowel and when identifying the monaural vowel.6 The same

r

bi-nto

FIG. 5. Results of experiment 2. The left panel shows mean percent-coidentifications of the binaural vowel in the binaural task. The right pashows mean percent-correct identifications of the monaural vowel, inmonaural task for the binaural-present stimuli~open symbols!, and for thebinaural-absent stimuli~solid symbols!. The results are averaged across t12 vowel pairs. The horizontal dashed lines plot the 95%-confidence intefor chance performance. Each symbol is based on 240 observations.

3401nd A. Q. Summerfield: Monaural and binaural vowel formants

/content/terms. Download to IP: 155.33.16.124 On: Thu, 27 Nov 2014 17:25:39

Page 9: Integration of monaural and binaural evidence of vowel formants

trthheinraa

thpa

alee

tats

eranarofriethourmobitrasepeeth

ineulenth

onml incouem

t

teascesw

0-

-d

by

tele

-slll–-dral

ass-e

nre

s

ri-

ted

Redistr

pattern was also shown by listeners A, B, and D at speccontrasts of 9 and 12 dB. The second relevant result isaccuracy in identifying a binaural vowel declined as tspectral contrast of the competing monaural vowel wascreased. Also, the accuracy of identifying the monauvowel was reduced when a competing binaural vowel wpresent. Therefore, selective attention is not perfect, inlisteners cannot completely discount the other excitationtern.

The stimuli in the binaural condition with 0-dB spectrcontrast were the same as those in the BB condition ofperiment 1. Performance was, however, poorer in experim2 ~64%! than experiment 1~79%!. The difference may refleca decline in the listeners’ motivation to perform well indifficult condition. However, differences in the requiremenof the two experiments may also have contributed. In expment 1 only a single vowel was presented on each triallisteners were required to report whatever vowel they hewithout regard for the modality from which the evidencethe vowel originated. In the binaural conditions of expement 2, listeners were required to attend to the binauralcitation pattern and to actively disregard evidence frommonaural excitation pattern. On trials where the spectral ctrast of the monaural vowel was 6, 9, or 12 dB, the stimprovided a strong percept of a monaural vowel. To perfoaccurately, therefore, it was necessary to ignore the strpercept and report the weaker competing percept of theaural vowel. These trials were randomized together withals where the spectral contrast of the monaural vowels wor 3 dB. Adopting the strategy of ignoring the strong percon these trials would, however, have resulted in listenrejecting the binaural vowel. Instead, on some trials, thmight have mistaken the phonetic impression created bynoise as evidence of the binaural vowel, thereby lowertheir accuracy. In summary, differences between the expments in the context in which stimuli were presented cohave contributed to the relatively poor performance of listers in the binaural task when the spectral contrast ofmonaural vowel was 0 or 3 dB in experiment 2.

III. EXPERIMENT 3: EFFECT OF ONSETASYNCHRONY ON SELECTIVE ATTENTION

The foregoing argument implies that a manipulatiwhich draws attention to the binaural formants should iprove the ability of listeners to hear out the binaural vowea stimulus that also contains a monaural vowel. Twooccurring stimuli are more distinct perceptually and the mtual contamination of their perceptual properties is reducif they start at different times than if they start at the satime ~e.g., Darwin and Carlyon, 1995!. Accordingly, in ex-periment 3 an onset asynchrony was introduced betweenmonaural and binaural formants.

A. Method

The stimuli were constructed, calibrated, and presento listeners using the same techniques and apparatusexperiments 1 and 2. An onset asynchrony was introduinto stimuli similar to those used in experiment 2. In theconditions one vowel started 480 ms before the other vo

3402 J. Acoust. Soc. Am., Vol. 107, No. 6, June 2000 M. A. Akeroyd a

ibution subject to ASA license or copyright; see http://acousticalsociety.org

alat

-lsatt-

x-nt

i-d

d,

-x-en-li

ngn-i-0t

rsye

gri-d-e

-

--d,e

he

dind

eel

~Fig. 6!. The two vowels ended together after a further 48ms. In the binaural ‘‘onset-asynchrony’’ conditions the binaural vowel startedsecond. This order was reversed in themonaural onset-asynchrony conditions. In the ‘‘simultaneous’’ conditions both vowels were 500 ms in duration anbegan and ended together, as in experiment 2.

Each binaural onset-asynchrony stimulus was createddigitally editing two of the stimuli used in experiments 1 and2. The 20-ms offset ramp of an MM vowel from experimen1 and the 20-ms onset ramp of a monaural–binaural vowpair from experiment 2 were digitally removed and then thtwo new waveforms were abutted. The click resulting fromspectral splatter at the join of the two waveforms was removed by applying a steep 4-kHz low-pass digital filter, thulimiting the bandwidth of the new stimulus to the originabandwidth of its constituents. The identity of the MM vowewas the same as the monaural vowel in the monaurabinaural vowel pair. Each new stimulus was 960 ms in duration, including 20-ms raised-cosine ramps at its offset anonset. A similar procedure was used to create each monauonset-asynchronous stimulus, except that a BB vowel wused instead of an MM vowel. Its identity was the same athe binaural vowel in the monaural–binaural vowel pair. Finally, the simultaneous stimuli were constructed in the samway as the monaural–binaural vowel-pair stimuli used iexperiment 2. Ten independent tokens of each stimulus wecreated.

In contrast to experiment 2, all 16 pairings of the vowel

FIG. 6. Schematic spectrograms of a subset of vowel pairs used in expement 3. All the combinations for a binaural /É/ are illustrated. A monauralformant is represented as a light-gray bar. A binaural formant is represenas a hatched bar. In the binaural task~left column! listeners were required toidentify the binaural vowel. In the monaural task~right column! they wererequired to identify the monaural vowel.

3402nd A. Q. Summerfield: Monaural and binaural vowel formants

/content/terms. Download to IP: 155.33.16.124 On: Thu, 27 Nov 2014 17:25:39

Page 10: Integration of monaural and binaural evidence of vowel formants

atnheart

e-b

teane2y

ticinlere

iaince

igforfc

lefowoune

di-allbe-ons

onybil-theispec-

u-at

ac-0%

ghte inraln-in

thendi-

inbe-thend.ten-lof

tiontheancetheherlyhesetlityin-

sker-on-isstedr

re-aller

rreetsonco

sb

Redistr

were generated, including the identical pairings as wellthe nonidentical pairings. This arrangement ensured thatidentity of the vowel that started first in an onset-asynchrocondition could not restrict the possible choices for tvowel that started second. As a result, meaningful compsons can be made between levels of performance inonset-asynchrony and simultaneous conditions.

A single-interval, four-alternative forced-choice procdure similar to that of experiment 2 was used. Each comnation of task~binaural and monaural! and synchrony~onset-asynchrony and simultaneous! was undertaken in a separasession. In each block, each combination of vowel pairspectral contrast occurred once in a random order. Thwere four sessions of five blocks each, giving a total of 3trials per spectral contrast per combination of task and schrony per listener (516 vowel pairs320 blocks31 trialper block!. Each session was preceded by a 24-trial pracblock. For the binaural tasks this block contained single baural vowels~BB!. For the monaural tasks it contained singmonaural vowels~MM !. The same four listeners as befoparticipated.

A p,0.05 confidence interval based on the binomdistribution tested whether accuracy of identification bydividual listeners in individual conditions was above chanA two-factor ~onset-asynchrony/simultaneous3spectral con-trast! repeated-measures ANOVA and a nonparametric stest based on the binomial distribution were performedthe data of each task separately to establish whether pemance depended upon onset asynchrony or the spectraltrast of the monaural vowels.

B. Results

1. Binaural task

The results for the binaural task are shown in thepanel of Fig. 7. The results are averaged across the 16 vpairs. Listener C performed at chance in the simultaneconditions. The other three listeners performed above chain all conditions. Although the difference in accuracy btween the onset-asynchrony conditions~solid symbols! andsimultaneous conditions~open symbols! was 27%, it was not

FIG. 7. Results of experiment 3. The left panel shows mean percent-coidentifications of the binaural vowel in the binaural task, in the onsasynchrony conditions~filled symbols! and the simultaneous condition~open symbols!. The right panel shows mean percent-correct identificatiof the monaural vowel in the monaural task, in the onset-asynchronyditions ~solid symbols! and the simultaneous conditions~open symbols!.The results are averaged across the 16 vowel pairs. The horizontal dalines plot the 95%-confidence interval for chance performance. Each symis based on 320 observations.

3403 J. Acoust. Soc. Am., Vol. 107, No. 6, June 2000 M. A. Akeroyd a

ibution subject to ASA license or copyright; see http://acousticalsociety.org

shey

i-he

i-

dre0n-

e-

l-.

nror-on-

telsce-

statistically significant in the ANOVA@F(1,3)56.7, p50.08]. The lack of significance was due to the large invidual differences; Listener A showed a particularly smeffect. Nevertheless, of the 16 individual comparisonstween onset-asynchrony and simultaneous conditi~54 listeners34 nonzero levels of spectral contrast!, 15show numerically higher accuracy in the onset-asynchrcondition. A nonparametric sign test shows that the probaity of 15 out of 16 outcomes having the same sign, whenprobability of an outcome having either sign is 0.5,0.0002. In both conditions accuracy deteriorated as the stral contrast of the monaural vowel was increased@F(4,12)517.0,p,0.001]. For the onset-asynchrony condition accracy decreased from 87% at 0-dB spectral contrast to 83%12-dB spectral contrast. For the simultaneous conditioncuracy decreased from 62% at 0-dB spectral contrast to 5at 12-dB spectral contrast.

2. Monaural task

The results for the monaural task are shown in the ripanel of Fig. 7. The four listeners performed above chancall conditions. The accuracy of identification of the monauvowel was significantly higher in the onset-asynchrony coditions than in the simultaneous conditions; the differencemean identification score was 8.3%@F(1,3)529, p50.01].Fifteen of the 16 comparisons showed higher accuracy inonset-asynchrony conditions than in the simultaneous cotions ~sign test;p50.0002).

C. Discussion

Making some allowance for individual differencesperformance, the introduction of an onset asynchronytween a binaural vowel and monaural vowel improvedaccuracy of identification of the vowel which started secoNumerically, the average advantage was larger when lisers attended to the binaural vowel~27%! than the monauravowel ~8%!. In the binaural task, there was a wider spreadperformance among listeners in the simultaneous condithan the onset-asynchrony condition. The introduction ofonset asynchrony reduced the spread and raised performto over 70% correct for each listener, independent ofspectral contrast of the competing monaural vowel. In otwords, listeners could identify the binaural vowel relativeaccurately, provided that their attention was drawn to it. Tauthors’ experience supports this explanation. The onasynchrony increased the clarity of the differences in tonaand lateralization which distinguished the monaural and baural vowels.

The poor performance of Listener C on the binaural tawith the simultaneous stimuli is puzzling, given that he pformed as well as the other listeners in the other three cditions. While the possibility cannot be ruled out that hmotivation was simply low in this condition, the difficultiecreated by the context in which the stimuli were presen~discussed in Sec. II C! may also have contributed to his pooperformance.

The monaural task produced a different pattern ofsults: the spread of performance among listeners was sm

ct-

sn-

hedol

3403nd A. Q. Summerfield: Monaural and binaural vowel formants

/content/terms. Download to IP: 155.33.16.124 On: Thu, 27 Nov 2014 17:25:39

Page 11: Integration of monaural and binaural evidence of vowel formants

onsyouth

tro

t

ngethi

wanfoethlthrige

aine

raer

gd

icingtwrathsetillth-,m

itiaeealy

ofof

a-

ra-ce

ngnternhe

notingci-edisralre-

n-the

vi-ci-ousse

s si-is

n-

theus

lpfulthisiety

au-

g-

t.

icalon-s.

sesurur

ngF1

Redistr

than in the binaural task and the onset asynchrony led toa small increase in accuracy. Subjectively, the onset achrony did not cause the monaural vowel to standstrongly as a separate object from the binaural vowel andbackground noise.

IV. CONCLUSIONS

The main result of these experiments is the demonstion that evidence of vowel formants can be combined acrthe monaural and binaural excitation patterns~experiment 1!,despite listeners having some ability to attend selectivelyeither excitation pattern~experiment 2!. The vowel in noiseillustrated in Fig. 1 shows why the strategy of integratievidence can be advantageous. The formants create pin both the monaural and binaural excitation patterns. Ifcues that might cause listeners to segregate the peaksseparate auditory objects are weak—as the resultsthe three experiments show they are—then the peaksbe integrated into a single representation. This outcomecords with the physical reality that a single source of souhas generated both sets of peaks. Integration is thereappropriate. In contrast to this situation, the binaural vowwere heard out consistently when they started aftermonaural vowel~experiment 3!. In this situation, spectrapeaks in one excitation pattern start before those in the oexcitation pattern. The peaks cannot, therefore, have onated from the same source. Logically, they should be sregated.

Evidence of spectral peaks in the binaural excitation ptern can be difficult to hear out when there is competstructure in the monaural excitation pattern. Listeners pformed relatively poorly when required to report binauvowels in experiments 2 and 3, even when those vowwere not in competition with monaural vowels. Poor perfomance occurred despite listeners having received traininidentifying binaural vowels, and having performed aequately in experiment 1.

These findings set limits on the circumstances in whlisteners are likely to benefit from the strategy of integratmonaural and binaural evidence of formants. There areprimary requirements for successful application of the stegy to recover speech from competing sounds. First,competing sounds must be interaurally highly correlatedthat they can be selectively decorrelated at specific frequcies by the formants of the speech. Second, the compesounds must have a uniform spectral structure so that athe peaks in the two excitation patterns derive fromspeech. These conditions arenot met when there is reverberation, which itself decorrelates the competing soundswhen the competing sounds consist of voices, which theselves possess spectral structure. Instead, these condgenerally are met when speech is masked by interaurcoherent white or pink noise. Although such noises arecountered outside the laboratory, they are unusual. Instthe major application of the strategy of integration is liketo be in demonstrations of the BILD.

3404 J. Acoust. Soc. Am., Vol. 107, No. 6, June 2000 M. A. Akeroyd a

ibution subject to ASA license or copyright; see http://acousticalsociety.org

lyn-te

a-ss

o

aksentoofillc-dre

lse

eri-g-

t-gr-lls-in-

h

ot-eon-ngofe

or-

onsllyn-d,

V. SUMMARY

~1! It was hypothesized that the binaural identificationspeech in noise would be facilitated over a rangesignal-to-noise ratios by the ability to combine informtion found only in the monaural excitation pattern~i.e.,spectral peaks in intensity! with information found onlyin the binaural excitation pattern~i.e., spectral peaks ininteraural decorrelation!.

~2! The results of experiment 1 confirmed that such integtion can occur. Listeners attained levels of performanthat were significantly above chance when identifyitwo-formant vowel-like sounds in which one formawas defined by a peak in the monaural excitation pattwhile the other formant was defined by a peak in tbinaural excitation pattern.

~3! Experiments 2 and 3 demonstrated that integration ismandatory, although the intrinsic cues for segregatevidence of formants in the monaural and binaural extation patterns are relatively weak. Experiment 2 showthat the accuracy of identification of a monaural vowelimpaired by the presence of a simultaneous binauvowel, and vice versa. Experiment 3 showed that seggation, particularly of the binaural vowel, can be icreased by introducing an onset asynchrony betweencompeting vowels.

~4! Overall, the results show that listeners can integrate edence of formants from the binaural and monaural extation patterns. That strategy should be advantagewhen identifying speech binaurally in noise becausounds from a single source, the talker, create peakmultaneously in both excitation patterns. The strategylikely to contribute to demonstrations of the binaural itelligibility level difference.

ACKNOWLEDGMENTS

We thank John Foster for technical assistance inearly stages of this project. Hedwig Gockel, two anonymoreviewers, and associate editor Wes Grantham made hecomments on a previous version of the paper. Part ofwork was presented at the meetings of the Acoustical Socof America held in Spring 1998@M. A. Akeroyd, A. Q. Sum-merfield, and J. R. Foster, ‘‘Integrating monaural and binral spectral information,’’ J. Acoust. Soc. Am.103, 2976~A!~1998! and Proceedings of the 16th ICA and 135th MeetinASA, Vol. III, pp. 1975–1976#, and Spring 1999@A. Q.Summerfield, J. F. Culling, and M. A. Akeroyd, J. AcousSoc. Am.105, 1158~A! ~1999!#.

APPENDIX: ANALYSIS OF SINGLE-FORMANTRESPONSES IN EXPERIMENT 1

Four methods were evaluated for generating numerpredictions of the response patterns in the two-formant cditions from the patterns in the single-formant conditionTable AI lists the observed proportions of correct responmade to each of two-formant stimuli in each of the foconditions, along with the proportions predicted by the fomethods. The following terminology is used in describithe methods. The proportions of responses to an isolated

3404nd A. Q. Summerfield: Monaural and binaural vowel formants

/content/terms. Download to IP: 155.33.16.124 On: Thu, 27 Nov 2014 17:25:39

Page 12: Integration of monaural and binaural evidence of vowel formants

d theurexceeds

n

n

3405 J. Acous

Redistribution subject to

TABLE AI. Top panel: Observed percentages of correct responses to individual two-formant stimuli anmean across stimuli in four conditions~MM, BB, MB, BM !. Bottom panel: percentages predicted by the fomethods described in the Appendix. In cases marked by underlining, the predicted percentage equals orthe observed percentage.

Formant frequencies~Hz!Condition 250–950 250–1850 650–950 650–1850 Mea

Observed responsesMM 66 88 96 92 85BB 77 71 95 72 79MB 64 58 79 80 70BM 64 86 89 70 77

Formant frequencies~Hz!Method Condition 250–950 250–1850 650–950 650–1850 Mea

Predicted responses1 and 4 MM 46 42 53 48 47

BB 37 41 56 46 45MB 42 34 55 47 45BM 41 49 54 46 48

2 MM 27 61 71 37 49BB 55 38 36 55 46MB 66 23 35 58 46BM 27 61 71 37 49

3 MM 66 61 71 58 64BB 55 45 76 55 58MB 66 45 76 58 61BM 55 61 71 55 61

to

s

ifedInpnspobe

reoan

bsF1icoodise

er

be

gs

Mrelian,

sel,stn-santim-er-n-

s aonis

re-ingey

ofacycu-

d toa-ls.ch

ord-

are u1, i1, Ä1 , and }1 ~where u1 is the proportion of /u/responses, i1 of /i/ responses,Ä1 of /Ä/ responses, and}1 of/}/ responses, respectively!. The proportions of responsesan isolated F2 are u2, i2, Ä2 , and}2 .

Method 1 was based on the assumptions that listener~a!attend to only one of the formants on each trial, but~b!attend equally often to the lower and higher format on dferent trials, and~c! make the response which the attendformant would receive if it were presented in isolation.effect, therefore, the method predicts that the responsetern to a two-formant vowel is the average of the respopatterns made to the constituent single formants. The protions of the four responses were predicted to(u130.51u230.5), (i130.51i230.5), (Ä130.51Ä230.5),and (}130.51}230.5). These percentages of correctsponses predicted by this method are plotted as the ssymbols in Fig. 3. Table AI shows that they are lower ththe observed percentages for all 16 stimuli.

The weighting of formants adopted in method 1 candenoted ‘‘~0.5, 0.5!,’’ where the first number in brackets ithe proportion of trials on which evidence is taken fromand the second number is the proportion of trials on whevidence is taken from F2. Method 2 was a variantmethod 1. It explored the consequences of attending toformant to the exclusion of the other on all trials in a contion. In other words, within a condition, the weighting waeither ~1, 0! or ~0, 1!. The effect of adopting these extremsettings of the weights, rather than~0.5, 0.5!, was small. Inthe MM condition, for example, it caused the predicted ovall accuracy of identifications to range from 46%~1, 0! to49% ~0, 1!. Table AI lists the highest accuracy that could

t. Soc. Am., Vol. 107, No. 6, June 2000 M. A. Akeroyd a

ASA license or copyright; see http://acousticalsociety.org

-

at-er-:

-lid

e

hfne-

-

achieved with the weights fixed across the formant pairinwithin a condition, but varying between conditions. The~0,1! weighting gave the higher mean score in the MM and Bconditions. The~1, 0! weighting gave the higher mean scoin the BB and MB conditions. For only 1 of the 16 stimudid the method predict an accuracy equal to, or higher ththat observed in the two-formant conditions.

Method 3 was a further variant of method 1. It wabased on the assumption that, given a two-formant vowlisteners attend to the formant which, individually, is molikely to yield the correct response. It is not clear how listeers could learn thispost hocstrategy. Nonetheless, it waevaluated to test whether performance in the two-formconditions exceeds that predicted by the optimal, thoughplausible, one-formant strategy. It is illustrated by considing the pairing of a 250-Hz monaural F1 and a 950-Hz moaural F2. The correct response to this pairing is /u/. Asingle formant, the 250-Hz monaural F1 is heard as /u/66% of presentations, while the 950-Hz monaural F2heard as /Ä/ on 71% of presentations. Thus, the correctsponse to the two-formant stimulus is obtained by attendto the 250-Hz F1. If listeners could adopt this strategy, thwould identify the two-formant stimulus as /u/ on 66%trials. Table AI shows that this method predicts an accurof identification equal to, or higher than, the observed acracy for only 2 of the 16 two-formant stimuli.

Method 4 was based on the idea that listeners attenthe two formants individually, and assess the probbility that each provides evidence of each of the four voweThis analysis yields four response probabilities for eaformant. Two responses are selected randomly in acc

3405nd A. Q. Summerfield: Monaural and binaural vowel formants

/content/terms. Download to IP: 155.33.16.124 On: Thu, 27 Nov 2014 17:25:39

Page 13: Integration of monaural and binaural evidence of vowel formants

ra

;eenin

bao

m

mthso

th-

trtm

lintria

dB

.3

-mon

ntal-om-

in-

the

b-tro-

k-

ar-

n

,’’

s

d a

Redistr

ance with these probabilities. They are combined to genea single response according to the following rules:~i! Ifboth responses are the same, that response is chosen~ii !if the responses differ, a random choice is made betwthem with an equal probability of either being choseThus, for example, the probability of the response /u/ bechosen is given by the sum of seven products: (u13u2)1(u13 i 2)/21(u13Ä2)/21(u13}2)/21(i13u2)/21(Ä1u2)/21(}13u2)/2. The accuracy of correct responses predictedthis method is the same as that predicted by method 1hence, like that method, underestimates the accuracyserved in the two-formant conditions.

Unlike methods 2 and 3, methods 1 and 4 do not assuthat listeners are biased to attend predominantly to oneother of the formants in a pair. Such a lack of bias is copatible with the circular pattern of dominance amongsingle formants~Sec. I B 3!. For this reason, the predictionselected for inclusion in Fig. 3 are those provided by meth1 ~and identically by method 4!.

1The frequencies of the higher formants were 2547 Hz (F3), 3272 Hz(F4), and 4500 Hz (F5).

2The binaural excitation patterns defined in this paper are similar to‘‘residual activation’’ spectra~occasionally referred to as ‘‘minimal activation spectra’’! generated by Culling and Summerfield’s~1995! modifiedequalization–cancellation model.

3The range of values ofDt was constant across frequency. It was chosenaccommodate the internal delay required to cancel an interaurally invenoise in order to yield a masking-level difference at 100 Hz. Each sumtion is performed over the 250-ms duration of the stimulus.

4Grantham~1995, pp. 302–303! provided a succinct definition of interauracorrelation: ‘‘Interaural correlation is loosely defined as the point-by-pocorrelation coefficient computed for a stimulus segment after an appropdelay is imposed on one of the inputs to maximize the correlation.’’

5In these practice blocks the MM stimuli had a spectral contrast of 12not 6 dB.

6With 6 dB of spectral contrast in the binaural task, Listener C scored 31compared with a chance level of 30.5%.

Akeroyd, M. A., and Summerfield, A. Q.~1999!. ‘‘A binaural analog of gapdetection,’’ J. Acoust. Soc. Am.105, 2807–2820.

Akeroyd, M. A., and Summerfield, A. Q.~2000!. ‘‘The lateralization ofsimple dichotic pitches,’’ J. Acoust. Soc. Am.~to be published!.

Blauert, J.~1983!. Spatial Hearing~MIT, Cambridge!.

3406 J. Acoust. Soc. Am., Vol. 107, No. 6, June 2000 M. A. Akeroyd a

ibution subject to ASA license or copyright; see http://acousticalsociety.org

te

n.g

yndb-

eor-e

d

e

oeda-

te

,

%

Cramer, E. M., and Huggins, W. H.~1958!. ‘‘Creation of pitch throughbinaural interaction,’’ J. Acoust. Soc. Am.30, 413–417.

Culling, J. F., and Summerfield, Q.~1995!. ‘‘Perceptual separation of concurrent speech sounds: Absence of across-frequency grouping by cominteraural delay,’’ J. Acoust. Soc. Am.98, 785–797.

Culling, J. F., Summerfield, Q., and Marshall, D. H.~1994!. ‘‘Effects ofsimulated reverberation on the use of binaural cues and fundamefrequency differences for separating concurrent vowels,’’ Speech Cmun.14, 71–95.

Culling, J. F., Summerfield, A. Q., and Marshall, D. H.~1998a!. ‘‘Dichoticpitches as illusions of binaural unmasking. I. Huggins’ pitch and the ‘baural edge pitch,’ ’’ J. Acoust. Soc. Am.103, 3509–3526.

Culling, J. F., Marshall, D. H., and Summerfield, A. Q.~1998b!. ‘‘Dichoticpitches as illusions of binaural unmasking. II. The Fourcin pitch anddichotic repetition pitch,’’ J. Acoust. Soc. Am.103, 3527–3539.

Darwin, C. J., and Carlyon, R. P.~1995!. ‘‘Auditory grouping,’’ in Hearing,edited by B. C. J. Moore~Academic, London!.

Delattre, P., Liberman, A. M., Cooper, F. S., and Gerstman, L. J.~1952!.‘‘An experimental study of the acoustic determinants of vowel color: Oservations on one- and two-formant vowels synthesized from specgraphic patterns,’’ Word8, 195–210.

Durlach, N. I., Gabriel, K. J., Colburn, H. S., and Trahiotis, C.~1986!.‘‘Interaural correlation discrimination. II. Relation to binaural unmasing,’’ J. Acoust. Soc. Am.79, 1548–1557.

Gabriel, K. J., and Colburn, H. S.~1981!. ‘‘Interaural correlation discrimi-nation. I. Bandwidth and level dependence,’’ J. Acoust. Soc. Am.69,1394–1401.

Gabriel, K. J., Koehnke, J., and Colburn, H. S.~1992!. ‘‘Frequency depen-dence of binaural performance in listeners with impaired binaural heing,’’ J. Acoust. Soc. Am.91, 336–347.

Glasberg, B. R., and Moore, B. C. J.~1990!. ‘‘Derivation of auditory filtershapes from notched-noise data,’’ Hear. Res.47, 103–138.

Grantham, D. W.~1995!. ‘‘Spatial hearing and related phenomena,’’ iHearing, edited by B. C. J. Moore~Academic, London!.

Klatt, D. H. ~1980!. ‘‘Software for a cascade/parallel formant synthesizerJ. Acoust. Soc. Am.67, 971–995.

Klein, M. A., and Hartmann, W. M.~1981!. ‘‘Binaural edge pitch,’’ J.Acoust. Soc. Am.70, 51–61.

Levitt, H., and Rabiner, L. R.~1967!. ‘‘Binaural release from masking forspeech and gain in intelligibility,’’ J. Acoust. Soc. Am.42, 601–608.

Licklider, J. C. R. ~1948!. ‘‘The influence of interaural phase transitionupon the masking of speech by white noise,’’ J. Acoust. Soc. Am.20,150–159.

Patterson, R. D., Allerhand, M. H., and Gigue`re, C.~1995!. ‘‘Time-domainmodeling of peripheral auditory processing: A model architecture ansoftware platform,’’ J. Acoust. Soc. Am.98, 1890–1894.

Pollack, I., and Trittipoe, W. J.~1959!. ‘‘Binaural listening and interauralnoise cross correlation,’’ J. Acoust. Soc. Am.31, 1250–1252.

3406nd A. Q. Summerfield: Monaural and binaural vowel formants

/content/terms. Download to IP: 155.33.16.124 On: Thu, 27 Nov 2014 17:25:39