the contribution of the murmur and vowel to the place of

15
The contribution of the murmur and vowel to the place of articulation distinction in nasal consonants Jonathan Harrington Citation: The Journal of the Acoustical Society of America 96, 19 (1994); doi: 10.1121/1.410465 View online: https://doi.org/10.1121/1.410465 View Table of Contents: https://asa.scitation.org/toc/jas/96/1 Published by the Acoustical Society of America ARTICLES YOU MAY BE INTERESTED IN Acoustic properties for place of articulation in nasal consonants The Journal of the Acoustical Society of America 81, 1917 (1987); https://doi.org/10.1121/1.394756 Perceptual integration of the murmur and formant transitions for place of articulation in nasal consonants The Journal of the Acoustical Society of America 76, 383 (1984); https://doi.org/10.1121/1.391139 Analysis of Nasal Consonants The Journal of the Acoustical Society of America 34, 1865 (1962); https://doi.org/10.1121/1.1909142 Place cues for nasal consonants with special reference to Catalan The Journal of the Acoustical Society of America 73, 1346 (1983); https://doi.org/10.1121/1.389238 Acoustic correlates of English and French nasalized vowels The Journal of the Acoustical Society of America 102, 2360 (1997); https://doi.org/10.1121/1.419620 An acoustic study of nasal consonants in three Central Australian languages The Journal of the Acoustical Society of America 139, 890 (2016); https://doi.org/10.1121/1.4941659

Upload: others

Post on 14-May-2022

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The contribution of the murmur and vowel to the place of

The contribution of the murmur and vowel to the place of articulation distinction innasal consonantsJonathan Harrington

Citation: The Journal of the Acoustical Society of America 96, 19 (1994); doi: 10.1121/1.410465View online: https://doi.org/10.1121/1.410465View Table of Contents: https://asa.scitation.org/toc/jas/96/1Published by the Acoustical Society of America

ARTICLES YOU MAY BE INTERESTED IN

Acoustic properties for place of articulation in nasal consonantsThe Journal of the Acoustical Society of America 81, 1917 (1987); https://doi.org/10.1121/1.394756

Perceptual integration of the murmur and formant transitions for place of articulation in nasal consonantsThe Journal of the Acoustical Society of America 76, 383 (1984); https://doi.org/10.1121/1.391139

Analysis of Nasal ConsonantsThe Journal of the Acoustical Society of America 34, 1865 (1962); https://doi.org/10.1121/1.1909142

Place cues for nasal consonants with special reference to CatalanThe Journal of the Acoustical Society of America 73, 1346 (1983); https://doi.org/10.1121/1.389238

Acoustic correlates of English and French nasalized vowelsThe Journal of the Acoustical Society of America 102, 2360 (1997); https://doi.org/10.1121/1.419620

An acoustic study of nasal consonants in three Central Australian languagesThe Journal of the Acoustical Society of America 139, 890 (2016); https://doi.org/10.1121/1.4941659

Page 2: The contribution of the murmur and vowel to the place of

The contribution of the murmur and vowel to the place of articulation distinction in nasal consonants

Jonathan Harrington Speech Hearing and Language Research Centre, Macquarie University, Sydney 2109, Australia (Received 6 May 1993; revised 16 December 1993; accepted 16 March 1994) Recent studies have shown that the acoustic relationship between the murmur and the vowel at the nasal-vowel boundary is highly informative for the [m]-[n] distinction. In the present paper, the contribution of relational information is reassessed by classifying 1946 syllable-initial and 2848 syllable-final nasal consonants taken from continuous speech data. Relational information in the acoustic waveform is based on difference spectra, in which spectral information in the vowel is subtracted from spectral information in the murmur, and on combined spectra in which classifications are made from combinations of murmur and vowel spectra. These two kinds of relational spectra are compared with static spectra, in which single spectral slices are taken in either the murmur or the vowel. Contrary to recent theoretical predictions, difference spectra are shown to perform more poorly than some kinds of static spectra. However, since classification scores from combined spectra are better than from either static or difference spectra, cues to nasal place of articulation can nevertheless be defined as relational. In the best scoring combined spectra, classification scores on open tests are just under 94% correct for syllable-initial nasals and just under 82% correct for syllable-final nasals. The high classification scores show that there is considerable information in the acoustic waveform for identifying nasal place of articulation from continuous speech data.

PACS numbers: 43.72.Ar, 43.70.Fq, 43.70.Hs, 43.71.An INTRODUCTION

The production of nasal consonants is characterized acoustically by a murmur which corresponds to the closure phase of the oral tract, and by transitions into, or out of, a preceding segment. The acoustic structure of the murmur includes nasal resonances and paired oral resonances/ antiresonances which depend on the closed mouth cavity that acts as a side-branching resonator to the main nasal- pharyngeal tube. The frequency location of the oral antireso- nance is largely determined by the length of the mouth cav- ity, being highest for [rj], intermediate for [n], and lowest for [m] which has the longest closed mouth cavity (Fant, 1960; Fujimura, 1962; Hattori et al., 1958). The importance of transitions is closely related to the concept of a fixed locus frequency associated with different places of articulation: Al- most all such studies are based on synthesis and labeling experiments and have shown that the starting frequency of the second formant frequency provides a cue for the distinc- tion between [m] and [n] in CV syllables (Delattre, 1958; Liberman et al., 1954; Larkey et al., 1978).

Several studies, beginning with Mal•cot's (1956) per- ception experiments, have sought to assess the relative im- portance of murmurs and transitions as nasal place cues (e.g., Garcia, 1966, 1967; Nakata, 1959; Nord, 1976; Zee, 1981). Prior to Kurowski and Blumstein's (1984) study, there was some support for the view that nasal place of articulation was cued primarily by transitions, with the murmur providing listeners predominantly with information about manner of articulation. In Mal•cot's (1956) study, listeners' judgments of nasal place were guided mostly by transitions when they were presented with stimuli that had been spliced together from murmurs and transitions that had different places of articulation. In a more recent study with synthetic speech, in

which variable F2/F3 transitions were combined with mur- murs that were optimal for different places of articulation in Catalan, Recasens (1983) also concludes that listeners' judg- ments of nasal place were cued predominantly by transitions rather than murmurs. Nevertheless, Recasens (1983) shows that murmurs contribute significantly to nasal place distinc- tions in some cases (e.g.,/•/vs/n/), while in Kurowski and Blumstein (1984), the nasal murmur was found to be as ef- fective in cueing nasal place of articulation as transitions.

A current perspective on the acoustic cues to nasal con- sonants is that nasal place of articulation is cued by both the murmur and transitions together (in CV and VC syllables). In a perception experiment in which listeners were presented with various kinds of edited speech stimuli from naturally produced CV syllables (C=[m n]; V=[i e a o u]) produced by a single male speaker, Kurowski and Blumstein (1984) demonstrated that a total of six glottal pulses on either side of the nasal release, spanning the offset of the murmur and the onset of transitions, cued place of articulation more ef- fectively than stimuli presented from either the murmur or transitions on their own. Similar results were obtained by Repp (1986) in listener judgments of a variety of different types of stimuli produced by six speakers.

Taken together, the speech perception studies by Kurowski and Blumstein (1984) and Repp (1986) suggest that a section of the speech signal encompassing the offset of the murmur and onset of the vowel in CV syllables provides the most salient cues to the place of articulation distinction in nasal consonants. Based on these findings, Kurowski and Blumstein (1987) and Seitz et al. (1990) subsequently de- vised metrics for classifying nasal consonants into place of articulation categories from the changing part of the speech signal at nasal-vowel boundaries (additionally vowel-nasal boundaries in Seitz et al., 1990). The metrics in both these

19 J. Acoust. Soc. Am. 96 (1), July 1994 0001-4966/94/96(1)/19/14/$6.00 ¸ 1994 Acoustical Society of America 19

Page 3: The contribution of the murmur and vowel to the place of

studies are in essence similar to the "dynamic relative" met- ric which has been proposed for oral stop classification in CV syllables by Lahiri et al. (1984): They are dynamic be- cause place classification depends on two spectra (in the murmur and the vowel onset/offset), rather than on a single "static" spectral slice as in Blumstein and Stevens (1979); and they are relative because in both cases, place of articu- lation is determined from relative changes in spectral energy between the murmur and the vowel.

Kurowski and Blumstein's (1987) metric derives from considerations of the frequency of the first antiformant in nasal consonants and their visual inspection of Bark spectra of nasal consonants produced by three male speakers. They conclude that there should be a greater change in energy from the murmur to the release in Bark 5-7 (395-770 Hz) than in Bark 11-14 (1265-2310 Hz) in bilabials than alveo- lars. On the other hand, for In], the corresponding changes in energy should be greater in Bark 11-14 than in Bark 5-7. (The Bark 5-7 and Bark 11-14 regions are predicted to encompass the first antiformants for [m• and [n], respec- tively). Using this simple metric, Kurowski and Blumstein (1987) achieved 89% correct classification scores for 150 nasal consonants produced by three male speakers in (citation-form) CV syllables, and 84% correct classification in 100/sCV/syllables.

The metric developed by Seitz et al. (1990) is concep- tually similar to that of Kurowski and Blumstein (1987), but is different in being based on relative, rather than absolute, frequencies. In their classification of nasal consonants, Seitz et al. initially derived a difference spectrum by subtracting the vowel spectrum from the murmur spectrum. They then found the maximum and minimum amplitude values of the difference spectrum and also noted the frequencies at which these maximum and minimum values occur. Classifications

were made from over 2000 syllable-initial and syllable-final nasals taken from a database of isolated words, sentences, and passages produced by 20 male and 20 female subjects. Using seven different kinds of auditory rescalings prior to classification, Seitz et al. achieved a 77% correct classifica- tion in separating [m] from [n] in syllable-initial position, which is 5% better than the scores they obtained when they applied Kurowski and Blumstein's (1987) metric to the same data. For the three-way separation between [m], [n], and [r3] in syllable-final position, correct classification scores were 51% using Seitz et al.'s metric (Kurowski and Blumstein's metric was not tested in syllable-final position).

Relational dynamic metrics such as these are interesting, not only because they are consistent with results from per- ception studies of nasal consonants, but also because they corroborate other models which suggest that a changing, of- ten highly coarticulated part of the signal, provides some of the most salient cues to speech sounds (e.g., Strange, 1989a, b for vowels). Nevertheless, there are certain aspects of the relational dynamic classification as developed in Kurowski and Blumstein (1987), and more recently Seitz et al. (1990), which deserve further attention.

First, the perception experiments in Kurowski and Blumstein (1984) and Repp (1986) have demonstrated that a section of the speech signal surrounding the nasal release

conveys more salient cues to place of articulation than either the murmur, or the transitions in the vowel onglide on their own. Compatibly, both Kurowski and Blumstein (1987) and Seitz et al. (1990) have shown that high scores can be ob- tained from metrics that are based on the spectral differences between murmur and vowel spectra. However, it has not so far been demonstrated that nasals are better classified from a

metric based on spectral differences than from either a single static murmur spectrum close to the nasal release, or from. a single static spectrum in the vowel onglide, even if the ex- periments from speech perception suggest that they will be.

Second, as discussed in Kurowski and Blumstein (1984), the results from their perception experiments can be inter- preted in at least two different ways: Either the murmur and vowel spectra are integrated at an early stage of auditory processing, or else the spectra are independently processed and then integrated at a higher central stage of processing. Kurowski and Blumstein (1984) admit that there is little evi- dence to choose between the hypotheses, but clearly favor the former: Relatedly, their metric for nasal place of articu- lation is based on spectral change from the murmur to the vowel onglide. However, although Repp (1986) found some evidence to support the auditory integration hypothesis, some of the experiments in Repp (1987) suggest by contrast that the information from the murmur and vowel onglide signals is integrated centrally, without the signals themselves necessarily being integrated at a lower, auditory level. From this point of view at least, it may be inappropriate to classify nasals from difference spectra in which only the gradient between the murmur and vowel onglide is preserved (as is the case in classifications based on spectral change). It may instead be the case that place of articulation is better encoded in the murmur and vowel spectra together than from the spectral change between the two. While it is not possible to attribute direct support for a theory of speech perception from classification algorithms, a metric based on combined murmur and vowel spectra would be as compatible with the outcome of the perception studies of Kurowski and Blum- stein (1984) and Repp (1986, 1987) as one based solely on spectral change.

Third, in one of the very few studies of nasal consonants in syllable-final position, Repp and Svastikula (1988) showed that spectral change provides listeners with less in- formation about place of articulation in syllable-final, than syllable-initial nasals. They offer various reasons for why this should be so, including the diminished prominence of an abrupt spectral change between the vowel and nasal in VC syllables due to the nasalisation of the preceding vowel. They also showed that sections from either the murmur or the vowel are as informative to listeners in VC as in CV

syllables and from this they conclude that the spectra in the vowel and the murmur may function as independent cues. These considerations would suggest that a metric based on spectral change may be inappropriate for nasal place of ar- ticulation distinctions in VC syllables, and perhaps not sur- prisingly Seitz et al. (1990) found that their metric classified syllable-final nasals very poorly. Since they did not test whether combined vowel and murmur spectra encode place of articulation in syllable-final nasals, it would seem to be

20 J. Acoust. Soc. Am., Vol. 96, No. 1, July 1994 Jonathan Harrington' Place of articulation distinction of nasals 20

Page 4: The contribution of the murmur and vowel to the place of

important to do so, in the light of Repp and Svastikula's (1988) results from speech perception.

The focus of this paper is to compare how effectively place of articulation is encoded in difference spectra com- pared with either static spectra from the murmur and vowel in isolation, or from combined murmur and vowel spectra. The comparisons are made for both syllable-initial and syllable-final nasals in a speech database of Australian En- glish continuous speech (sentences and passages) produced by five male speakers. The experiments are divided into three main sections which compare static spectra with difference spectra, static and difference spectra with combined spectra, and spectra of syllable-final with syllable-initial nasals. Ex- periment I includes a pilot study in which the metrics of Kurowski and Blumstein (1987) and Seitz et al. (1990) are replicated and applied to nasal consonants in a database of Australian English.

I. MATERIALS

A. Speech database

The speech database which is used in this study forms part of the Australian National Database of Spoken Lan- guage (Millar et al., 1990a,b) that has been digitised and phonetically annotated at the Speech Hearing and Language Research Centre, Macquarie University since 1990. The ma- terials for the database are taken from the Spoken Corpus Recordings in British English (SCRIBE) project (Hierony- mus et al. 1990) and they include 660 phonetically dense and balanced sentences, and a passage. Currently, 400 sen- tences (200 balanced, 200 dense) and the SCRIBE passage have been produced by five male speakers between the ages of 20 and 50 with no known speech or he/•ring disorders. All speakers produced a variety of Australian English that can be described as intermediate between General Australian and Cultivated Australian (Bernard, 1970; Mitchell and Del- bridge, 1965). The materials (2000 sentences, five passages) were recorded under excellent recording conditions, digitised at 20 kHz, and phonetically annotated at acoustic phonetic, lexical, and prosodic levels at the Speech Hearing and Lan- guage Research Centre, Macquarie University.

B. Segmentation

The phonetic segmentation of the entire database has been carried out by up to five trained Phoneticians; all seg- mentations are checked for consistency by a Transcription Coordinator. Further details on the database and levels of

labeling are given in Croot et al. (1992). The criteria for acoustic phonetic segmentation follow very closely the prin- ciples set out in Barry and Fourcin (1992).

The segmentation points at nasal consonant boundaries are marked using the waves+ speech signal processing sys- tem from a combination of the audio waveform, spectro- graphic displays with superimposed first four formant center frequencies, and facilities for speech playback. The acoustic criteria for segmenting syllable-initial nasals from vowels, and vowels from syllable-final nasals, are very similar to those described in Seitz et al. (1990). In most cases, bound- aries are marked on spectrogram displays at a time point

corresponding to an abrupt change in energy at the nasal- vowel or vowel-nasal boundary. In many cases, there is also a clear, and sudden, jump in the automatically tracked for- mant frequencies at this point. If a token cannot be reliably segmented, then no boundary is placed. In the case of nasal consonants, boundaries can often not be reliably marked in abutting nasal consonants across word boundaries (e.g., some more), or in the context of weak oral sonorants (e.g., mini- mum). Only those tokens that could be segmented using well-defined acoustic criteria have been used in this study.

C. Selection of nasal consonants from the database

The section of the database which is used in this study includes prevocalic nasal consonants in syllable-initial posi- tion (henceforth syllable-initial nasals), and postvocalic na- sal consonants in syllable-medial or syllable-final position (henceforth syllable-final nasals). Syllables are defined by applying the maximum onset principle (Kahn, 1980; Selkirk, 1982) word internally (thus in terms of the definition used here, syllables do not cross word boundaries). Examples of syllable-initial nasals include the [m] and In] segments in manuscript, normally, technology, weakness; syllable-final nasals include [m], In], and [rj], in words such as concealing, feeling, length, ran, tempting, seem.

There were a total of 2197 syllable-initial nasals (1220 [m]; 977 In]) and 5063 syllable-final nasals (886 [m]; 3462 [n]; 715 [rj]) in the database. The large imbalance in the distribution across the three nasal categories in syllable-final position is due to the very high frequency of In] following schwa, and the prevalence of certain function words that have a word-medial or word-final In]. In order to create a more even distribution across the three place categories, all syllable-final nasals following schwa were removed, as well as all In] segments in the function words an, and, been, can, can't, in, into, on, than, when. Additionally all nasals which were less than 26 ms were removed, as well as all syllable- initial nasals preceding vowels of less than 26 ms, and all syllable-final nasals following yowels of less than 26 ms. This was done in order to ensure that the edges of the win- dow used in subsequent frequency analyses would not ex- tend beyond the segmental boundaries of the nasal conso- nants, or the vowels that preceded or followed them. The final count of nasal consonants used in this study include 1946 in syllable-initial position, and 2848 in syllable-final position. The selection of these nasals, as described above, and all the subsequent analyses in experiments I-III were carried out using the mu+ system for speech database analy- sis (Harrington et al., 1993).

II. EXPERIMENT I

The purpose of experiment I is to compare how effec- tively nasal place of articulation is classified using static spectra and difference spectra. A static spectrum is defined here as a spectrum taken somewhere in the murmur or the abutting vowel. A difference spectrum is the result of sub- tracting a spectrum taken in the vowel from a spectrum in the murmur.

21 J. Acoust. Soc. Am., Vol. 96, No. 1, July 1994 Jonathan Harrington: Place of articulation distinction of nasals 21

Page 5: The contribution of the murmur and vowel to the place of

Syllable-initial 1,2 3 4 5

.,

260 280 300 320 ' 340 360 380 time (ms)

syllable-final 4 3 1,2

3720 3740 3760 3780 3800 3820 3840 time (ms)

FIG. 1. The five static windows used in this study. 1,2: murmur (512) and murmur (256); 3: murmur boundary; 4: boundary; 5: vowel boundary. The vertical dotted lines mark the segmentation boundaries of the nasal consonant.

Following the reasoning in Kurowski and Blumstein (1984,1987) and Seitz et al. (1990), who argue for the pri- macy of relational cues between the murmur and the vowel, it was expected that classification scores would in general be better from difference spectra than from static spectra.

A. Static spectra 1. Temporal location

Five static spectra were calculated from Hamming win- dows that were centered at four different time points (Fig. 1). Two windows, murmur (512) and murmur (256) (windows 1 and 2 in Fig. 1) were centered at the temporal midpoint of the murmur and were 512 sampled data points (25.6 ms) and 256 sampled data points (12.8 ms) wide, respectively. The remaining three Window types were all 512 sampled data points wide. The murmur-boundary window occurred pre- dominantly in the murmur and preceded the nasal-vowel boundary by 10 ms in syllable-initial nasals, and followed the vowel-nasal boundary by 10 ms in syllable-final nasals (window 3 in Fig. 1). The boundary window (window 4 in Fig. 1) was centered at the nasal-vowel boundary in syllable- initial nasals, and at the vowel-nasal boundary in syllable-

final nasals. Finally, the vowel-boundary window (window 5 in Fig. 1) occurred predominantly in the vowel and followed the nasal-vowel boundary by 10 ms in syllable-initial nasals, and preceded the vowel-nasal boundary by 10 ms in syllable- final nasals. The motivation for the 10 ms offset in the murmur-boundary and vowel-boundary windows is that this is the same temporal offset that was used by Seitz et al. (1990) in calculating their difference spectra.

2. Frequency analysis

Spectra were calculated by applying FFTs to the win- dows shown in Fig. 1 and described above. No pre-emphasis was applied. The resulting spectral values from each separate FFT we?e amplitude normalized by dividing them by the spectral value of greatest amplitude. Energy values in the first 22 critical bands were calculated from each amplitude- normalized spectrum by summing the spectral values that fell within the separate critical bands. The frequency ranges for the first 22 critical bands were taken from Zwicker (1961). These are also the frequency ranges (up to 5 kHz) used in Kurowski and Blumstein (1987).

22 J. Acoust. Soc. Am., Vol. 96, No. 1, July 1994 Jonathan Harrington' Place of articulation distinction of nasals 22

Page 6: The contribution of the murmur and vowel to the place of

TABLE I. The numbers of segments in the training and testing data.

Syllable initial [m] [n] Total

Training 672 539 1211 Testing 453 282 735

1946

Syllable final [m] [n] [rj] Total

training 461 936 430 1827 testing 221 541 259 1021

2848

B. Difference spectra

Three kinds of difference spectra were calculated which are all based on subtracting the vowel-boundary spectrum from the nasal-boundary spectrum. The first of these (hence- forth KB) follows exactly the implementation by Seitz et al. (1990) of Kurowski and Blumstein's (1987) relational met- ric. KB difference spectra were calculated by subtracting vowel-boundary from murmur-boundary spectra in low and high frequency ranges, where low is the range 450-700 Hz (roughly 5-7 bark) and high is the range 1370-2150 Hz (roughly 11-14 bark). As in Seitz et al. (1990), two maxi- mum dB values were found in this subtracted spectrum, one in each of the two fixed frequency ranges under consider- ation (thus two parameters per segment). KB difference spectra were calculated only for syllable-initial nasals, not for syllable-final nasals.

The second of these (henceforth Seitz) is the relational algorithm in Seitz et al. (1990) which, unlike Kurowski and Blumstein's (1987) metric, does not use predetermined fre- quency bands. Seitz difference spectra were calculated by subtracting vowel-boundary from murmur-boundary spectra and extracting four parameters: The maximum and minimum dB values in the subtracted spectrum, and the frequencies (Hz) at which the maximum and minimum dB values occur. Seitz difference spectra were calculated only for syllable- initial, not for syllable-final nasals. In order to replicate ex- actly the dB/Hz clifforono cp,•o .... ;" Seitz et al t•oom the four parameters were calculated in the 0-5 kHz range.

The third of these (henceforth bark) involves subtracting vowel-boundary spectra from murmur-boundary spectra across the entire bark frequency range (therefore 22 param- eters per segment). Bark difference spectra were calculated for both syllable-initial and syllable-final nasals.

C. Classification

1. Training and testing materials All classifications in this paper are based on open tests,

in which training and testing are carried out on separate bod- ies of data. The training data was taken from phonetically balanced sentences and the SCRIBE passage, while nasals in the phonetically dense sentences made up the testing data. The number of nasals in the training and testing data for syllable-initial and syllable-final nasals are given in Table I.

Since both the metrics of Kurowksi and Blumstein

(1987) and Seitz et al. (1990), are based on closed tests, only

closed test classifications will be presented on these metrics. In this case, training and testing are carried out on all 1946 (syllable-initial) and 2848 (syllable-final) nasals.

2. Distance metric

An unknown token was classified first by calculating its Mahalanobis distance to each of the class centroids, and then by finding which of these distances was the smallest.

The Mahalanobis distance can be viewed in two ways. First, it is the extension of the standard normal variate to the multidimensional case in which the variance for one param- eter is replaced by the covariance between all the parameters. Second, when the parameters are uncorrelated, the Mahal- anobis distance becomes equivalent to the Euclidean dis- tance. (For a more detailed discussion on the use of Mahal- anobis distance in speech classification, see e.g., Chan and Cheung, 1986; O'Shaughnessy, 1987.)

3. Data reduction

All high dimensional spaces were transformed to a new set of dimensions using principal components analysis. The transformations are produced by deriving a set of weightings, or eigenvectors, from the original data and matrix multiply- ing them with the untransformed data. The testing data is transformed using the same eigenvectors derived from the training stage (see Chan and Cheung, 1986 for a similar pro- cedure).

Following the application of principal components analysis, Mahalanobis distance calculations were made sepa- rately in the first two, first three...first n transformed dimen- sions, where n is the number of original dimensions (22 for bark scaled spectra). A fundamental aspect of principal com- ponents analysis is that the scores that are obtained from all n transformed parameters are identical to those obtained from all n original, untransformed, parameters.

D. Results

1. Static spectra The results of the classifications from the static spectra

on syllable-initial and syllable-final nasals are shown in Fig. 2. The peak scores for each condition, together with the num- ber of dimensions on which these occurred are shown in Table TT.

Turning first to the syllable-initial nasals (Fig. 2), the best scores are generally produced by the murmur-boundary and midpoint (512) spectra. The classification scores range from 79.2% for the vowel-boundary spectra to 90.1% for the murmur-boundary spectra (Table TT). Focusing on the syllable-final nasals (Fig. 2), the best classification scores are generally produced by the murmur-boundary and vowel- boundary spectra. The scores range from 57.0% correct for boundary (22 dimensions) to 73.5% correct murmur- boundary (22 dimensions).

In summary, murmur-boundary spectra generally pro- duce the highest classification scores of the five spectral types in both syllable-initial and syllable-final conditions. The best classification scores obtained on an open test with murmur-boundary spectra are over 90% correct for the two-

23 J. Acoust. Soc. Am., Vol. 96, No. 1, July 1994 Jonathan Harrington: Place of articulation distinction of nasals 23

Page 7: The contribution of the murmur and vowel to the place of

Syllable-initial Syllable-final

o midpoint (256) ß midpoint (512) ß murmur-boundary ß vowel-boundary ß boundary

, , , , , ! , , , , ! i , , , , ! ! , ,

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

number of dimensions

o o

o •

o o

i , , , , , , , , , , i i , , , , , , , ,

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

number of dimensions

FIG. 2. Classification scores for the static spectra. The vertical axis corresponds to the percent total correct classification; the horizontal axis denotes the number of dimensions (transformed by principal components analysis) on which these classifications were made.

way distinction in syllable-initial nasals, and over 73% cor- rect for the three-way distinction in syllable-final nasals.

2. Difference spectra Three types of difference spectra were considered: KB

difference spectra, which are based on Kurowski and Blum- stein's (1987) metric; $eitz difference spectra which are based on Seitz et al.'s (1990) relational metric; and Bark difference spectra which are obtained by subtracting 22- parameter vowel-boundary spectra from murmur-boundary

TABLE II. Peak classification scores for the different types of static spectra in syllable-initial and syllable-final position and the number of dimensions on which these scores were obtained.

Number of dimensions Total correct (%)

Condition Initial Final Initial Final

Midpoint (256) 16 19 84.9 60.0 Midpoint (512) 15 20 88.3 61.7 Murmur boundary 20 22 90.1 73.5 Vowel boundary 19 15 79.2 72.7 Boundary 17 22 83.7 57.0

spectra. These three types of spectra are considered in turn. Classifications from KB spectra and Seitz spectra were only made for syllable-initial nasals.

a. KB difference spectra. The results of applying the two-parameter KB difference spectra to syllable-initial nasals are shown as ellipses in Fig. 3 and as confusion matrices in Table III. Figure 3 shows that there is a good deal of overlap between the two nasal categories. There is however a ten- dency for [m] segments to occupy the lower, and rightmost, part of the display, and for [n] segments to occur in the upper, and leftmost, part of the display. This is consistent with Kurowski and Blumstein's (1987) display of [m] and [n] segments in the same plane (their Fig. 3). However, Kurowski and Blumstein (1987) were able to separate over 89% of their two nasal categories with a straight line diago- nal, but there is no suggestion of any such clear-cut separa- tion from the data in Fig. 3. The closed test classification score (training and testing on all the nasals shown in Fig. 3) is just over 63.4% correct, which is much lower than the 89% score obtained in Kurowski and Blumstein (1987). The lower score is almost certainly the result of considering con- tinuous speech nasals from a variety of different contexts

24 J. Acoust. Soc. Am., Vol. 96, No. 1, July 1994 Jonathan Harrington' Place of articulation distinction of nasals 24

Page 8: The contribution of the murmur and vowel to the place of

o

t- o

n m m m n m n m m m

m m n n m

r• m

i ',, i i i i

-10 0 10 20 30

Energy difference between the vowel and murmur in the 5-7 bark range (dB)

FIG. 3. Ellipse plots for [m] and [n] on KB difference spectra. The axes of the ellipses are proportional to 2.45 standard deviations (include over 95% of the data points). The display includes all data points (from both the training and testing data).

compared with the carefully controlled citation-form speech used in Kurowski and Blumstein (1987).

b. Seitz difference spectra. The results of classifying syllable-initial nasals using Seitz difference spectra are shown in Table IV. As described earlier, there were four parameters: The energy minimum and maximum values, and their corresponding frequencies. The results show that even for the closed test, [m] segments are classified worse than chance (50%).

A further analysis of the four separate features showed that the energy maximum values, and the frequencies at which they occurred, contributed no information to the place of articulation distinction. In fact, when tests are performed using only the energy minimum values and their frequencies (two parameters), the results are slightly better than when all four parameters are used (Table V). This suggests that the energy maximum values actually hinder, rather than enhance, place of articulation separation in nasal consonants.

However, the two-parameter classification using solely energy minimum values still only enables a separation be- tween nasal consonants which is just greater than chance. The corresponding ellipse plots of nasals in the plane of fre- quency minimum/energy minimum value shows a consider- able overlap between the two categories (Fig. 4).

c. Bark difference spectra. The results in this section will be based on a comparison between Bark difference spec- tra and the two static spectra (murmur-boundary and vowel- boundary) from which the bark difference spectra are de- rived. Classification scores on all dimensions for the three

conditions are shown in Fig. 5. The peak scores, and the number of dimensions on which these were obtained, are shown in Table VI.

Turning first to syllable-initial nasals, the results show high classification scores on Bark difference spectra between 9 and 22 dimensions. However, murmur-boundary spectra produce higher scores on more dimensions than Bark differ-

TABLE III. Confusion matrix for classification on KB difference spectra. TABLE IV. Confusion matrices classifications on Seitz difference spectra.

(Closed test), 63.4% correct m n (Closed test), 53.8% correct m n

m 64.3 35.7 m 43.6 56.4 n 37.8 62.2 n 32.3 67.7

25 J. Acoust. Soc. Am., Vol. 96, No. 1, July 1994 Jonathan Harrington: Place of articulation distinction of nasals 25

Page 9: The contribution of the murmur and vowel to the place of

TABLE V. Confusion matrices for classifications on Seitz difference spectra using only two parameters.

(Closed test), 56.3% correct m n

m 52.1 47.9 n 38.0 62.0

ence spectra. Additionally, murmur-boundary spectra have significantly higher peak classification scores (Table VI) than bark difference spectra (X 2= 12.62, d f= 1, p<0.0005). At least for syllable-initial nasals, these results lend no support to the prediction made earlier that nasal place of articulation can be more reliably recovered from difference spectra than from static spectra.

For the syllable-final nasals, the Bark difference spectra have lower classification scores than the murmur-boundary spectra between dimensions 14 and 22 (Fig. 5). The differ- ences in the peak classification scores (Table VI) comparing murmur-boundary with Bark difference, and vowel-boundary with Bark difference are not significant. For syllable-final position, the evidence shows that nasal place of articulation is no more accurately encoded in Bark difference spectra than from either of the two static spectra (murmur boundary, vowel boundary) from which the bark difference spectra are derived.

E. Discussion of the results of experiment I

The principal aim of experiment I was to compare how effectively static and difference spectra classify nasal conso- nants into place of articulation categories. An important sub- part of this aim was to assess whether difference spectra produce better scores than the separate murmur-boundary and vowel-boundary static spectra from which the difference spectra are calculated. The results have shown that murmur- boundary spectra produce better scores than the Bark differ- ence spectra in syllable-initial nasals, and that murmur- boundary and vowel-boundary spectra perform no differently from Bark difference spectra in syllable-final position. Addi- tionally, the two-parameter metric of Kurowski and Blum- stein (1987), and the four-parameter metric of Seitz et al. (1990), both of which are based on further parameterisations of bark difference spectra, produced classification scores which were lower than any of the five types of static spectra considered in this study.

The results from experiment I cast doubt on the idea, developed in Kurowski and Blumstein (1987) and subse- quently in Seitz et al. (1990), that nasal place of articulation is encoded by the spectral change between the margins of the murmur and the vowel. However, this does not necessarily imply that relational features between the murmur and vowel

Ill n Ill Ill

• n n _.. n •, n n m

m m rn'lrm n m rt•n n Ill n Ill

Ill n

Ill nl

I I I | I I

0 1000 2000 3000 4000 5000

frequency (Hz)

FIG. 4. Ellipse plots for [m] and [n] on two of the four parameters of the Seitz difference spectra. The axes of the ellipses are proportional to 2.45 standard deviations (include over 95% of the data points) The display includes all data points (from both the training and testing data)

26 J. Acoust. Soc. Am., Vol. 96, No. 1, July 1994 Jonathan Harrington: Place of articulation distinction of nasals 26

Page 10: The contribution of the murmur and vowel to the place of

Syllable-initial Syllable-final

, ,

ß murmur-boundary • Bark difference ß vowel-boundary

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

number of dimensions

o

o $

o

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

number of dimensions

FIG. 5. Classification scores for two static spectra and the Bark difference spectra. The veriical axis corresponds to the percent total correct classification; the horizontal axis denotes the number of dimensions (transformed by principal components analysis) on which these classifications were made.

are irrelevant. Another possibility is that nasal place Of ar- ticulation can be most accurately recovered by combining the separate murmur and vowel spectra, rather than by subtract- ing one from the other. As Repp (1987) comments:

,

"It could be that such relational spectral information is the critical cue for place of articulation distinc- tions (see Lahiri et al., 1984). This need not be so however, since the murmur, as well as the later por- tions of the vowel, provides additional spectral (and temporal) information that may feed into a central integration process. Repp's (1986) preliminary

TABLE VI. Peak classification scores for Bark difference and two types of static spectra in syllable:initial and syllable-final position and the number of dimensions on which these scores were obtained.

Number of dimensions Total correct (%)

Condition Initial Final Initial Final

Bark difference 18 13 83.7 69.6 Murmur boundary 20 22 90.1 73.5 Vowel boundary 19 !5 79.2 72.7

acoustic analyses suggest that spectral difference in- formation alone is not sufficient to distinguish [m] and [n] across all vowel contexts, at least not in an invariant fashion... Thus it [spectral difference in- formation] may be only one of several ingredients that enter into phonetic decisions. This means that the inputs to the central decision process probably include the murmur spectrum, the spectral relation- ship between the murmur and the vowel onset, and the continuing pattern of spectral change during the vowel" (p. 1526).

in experiment II, classifications are made from various kinds of combined murmur and vowel spectra for both syllable-initial and syllable-final nasals, and these are com- pared with the results obtained from static spectra in experi- ment I. Since various researchers, and most recently Repp (1986, !987) and Seitz et al. (1990), have argued for rela- tional features between the murmur and vowel in decoding nasal place of articulation, it was hypothesized that the re- sults from combined spectra would, in general, be better than from static spectra.

27 J. Acoust. Soc. Am., Vol. 96, No. 1, July 1994 Jonathan Harrington: Place of articulation distinction of nasals 27

Page 11: The contribution of the murmur and vowel to the place of

Syllable-initial Syllable-final

m

ß

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 36 40 42 44

number of dimensions

WW AAAAA AAA

AA

= combined ß murmur-boundary • Bark difference

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44

number of dimensions

FIG. 6. Classification scores for the combined spectra. The vertical axis corresponds to the percent total correct classification; the horizontal axis denotes the number of dimensions (transformed by principal components analysis) on which these classifications were made.

III. EXPERIMENT II

A. Method

1. Combined spectra

The purpose of experiment II was to classify the syllable-initial and syllable-final nasals from experiment I using combined murmur and vowel spectra (henceforth com- bined spectra). The combined spectra are created by concat- enating into a single space the same spectra that were used to derive the Bark difference spectra: Thus combined spectra are made of up 44 dimensions, 22 from the murmur- boundary spectra, and 22 from the vowel-boundary spectra.

2. Classification

Classification follows exactly the same procedure as in experiment I. In this case, classifications are made in up to 44 dimensions that are transformed using principal compo- nents analysis. The same divisions into training and testing data were made as for experiment I (see Table I).

B. Results

The results are based on a comparison between com- bined spectra, murmur-boundary spectra, and Bark differ- ence spectra. Classification scores on all dimensions for these three conditions are shown in Fig. 6. The peak scores, and the number of dimensions on which these were obtained, are shown in Table VII.

Considering first the syllable-initial nasals, the combined spectra have similar scores to the murmur-boundary spectra on dimensions 11-16, but better scores are obtained from the combined spectra beyond dimension 16. The peak score for

TABLE VII. Peak classification scores for Bark difference and two types of static spectra in syllable-initial and syllable-final position and the number of dimensions on which these scores were obtained.

Number of dimensions Total correct (%)

Condition Initial Final Initial Final

Combined 30 34 93.6 81.7

Murmur boundary 20 22 90.1 73.5 Bark difference 18 13 83.7 69.6

28 J. Acoust. Soc. Am., Vol. 96, No. 1, July 1994 Jonathan Harrington: Place of articulation distinction of nasals 28

Page 12: The contribution of the murmur and vowel to the place of

syllable-initial syllable-final

ß •oø•øø•ø•ø øøo øooooooøoooooooooooo

/?

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 36 40 42 44

number of dimensions

o o

•D

o combined ß murmur-boundary

ß

ß ooooo' / ß

ß / /

ß

o o

o

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 36 40 42 44

number of dimensions

FIG. 7. Classification scores for combined and static spectra.

the combined spectra (Table VII) is significantly greater than that of the murmur-boundary spectra (X2=5.672, df=l, p<0.02).

With regard to the syllable-final nasals, the combined spectra have higher scores than the murmur-boundary spec- tra on all dimensions. The peak score for the combined spec- tra (Table VII) is significantly greater than that of the murmur-boundary spectra 0( 2= 19.391, dr= 1, p<0.00002).

The analysis may appear to bias the results in favor of the combined spectra simply because they are based on a large number of dimensions (44 in the combined spectra, 22 in the static spectra). However, increasing the number of dimensions does not necessarily produce better scores, and sometimes the scores can deteriorate if the additional dimen- sions provide no independently useful information for sepa- rating the phonetic categories. In Fig. 7 for example, classi- fications were made from a combination of two static spectra, one taken at the midpoint--[midpoint (256)]-- and the other taken just inside the nasal consonant boundary (murmur boundary). These combined spectra are compared with tl•:6sc from the murmur boundary. In the syllable-initial case, there is a slight improvement for the combined spectra over the static spectrum beyond dimension 20, but acom-

parison of the peak scores (obtained with 25 and 20 dimen- sions for the combined and static spectra, respectively) showed nonsignificant differences. With regard to syllable- final position, Fig. 7 shows that the combined_spectra gener- ally have lower scores than the static spectrum; once again, the differences between the peak scores (obtained with 29 and 22 dimensions for the combined and static spectra, re- spectively) are not significantly different.

In general, the results can be taken to support the hy- pothesis that nasal place of articulation is more reliably en- coded in combined spectra than in static spectra (and there- fore also than in Bark difference spectra).

C. Discussion of the results of experiments I and II The results from experiment I showed that difference

spectra performed more poorly than many of the static spec- tra in classifying nasal place of articulation in syllable•-initial and syllable-final positions. These results provided iio sup- port for the main theory on which the metrics in Kurowski and Blumstein (1987) and Seitz et al. (1990) are based, that in. formation for nasal place of articulation resides in the spec- tral change between the murmur and the vowel. In one sense,

29 J. Acoust. Soc. Am., Vol. 96, No. 1, July 1994 Jonathan Harrington: Place of articulation distinction of nasals 29

Page 13: The contribution of the murmur and vowel to the place of

the results from experiment I cast doubt on the idea that relational features between the murmur and vowel are impor- tant in decoding place of articulation distinctions. However, there is another possible interpretation of relational which can be defined in terms of a combination of separately pre- served murmur and vowel spectra. Thus under the first inter- pretation of relational (Kurowski and Blumstein, 1987), many of the distinct acoustic attributes of the murmur and vowel are discarded, and only the velocity (difference) be- tween the two is assumed to be relevant. Under the second

interpretation of relational (combined spectra), both the acoustic attributes of the murmur and of the vowel are as-

sumed to be relevant to nasal place distinctions. While the results of experiment I provided no support for the first in- terpretation of relational, this second interpretation of rela- tional is generally supported by the results of experiment II. This is because the combined spectra which include informa- tion about both the murmur and the vowel perform better than the static spectra in which murmur or vowel information is represented separately.

The results that have been reported in experiments I and II are based on open tests in which training and testing are carried out on different sets of data. However, these are re- ally only semiopen tests, both because the training and test- ing data are based on the same speakers, and also because training and testing ai'e carried out on syllable-initial and syllable-final nasals separately. While the speech database used in this study does not currently extend to a new group of speakers, training on syllable-final nasals and testing on syllable-initial nasals would provide a more robust way of assessing how effectively nasal place of articulation can be classified from the various types of spectra considered so far. It would also address the question of the extent to which the information which is encoded in the different types of spec- tra is invariant with respect to syllable position.

In experiment III, the performance of the static, differ- ence, and combined spectra from experiments I and II is compared by tr.aining on syllable-final segments, and testing on syllable-initial segments. Based on the results of experi- ments I and II, it was hypothesized that the best scores would be obtained from combined spectra, and that some of the static spectra would perform better than the bark difference spectra.

IV. EXPERIMENT III

A. Method

The present experiment is based on a comparison be- tween the combined spectra, murmur-boundary spectra and Bark difference spectra. Training, testing, and transformation of dimensions using principal components analysis, were the same as for experiments I and Ii.

The training data include all 2848 syllable-final nasals and the testing data all 1946 syllable-initial nasals (see Table I). As in experiments I and II, classifications were made in up to 22 transformed dimensions for the static and difference spectra, and in up to 44 transformed dimensions for the com- bined spectra.

Training on finals, testing on initials

_,,,mmmmmmffi

ittmmmmmmmmm mm"'- ittmmm m

EBßß ß _ V•V V V v•,,•,v VV

v vv/tomy

ß ß

/x/

/ ß

m combined ß murmur-boundary • Bark difference

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44

number of dimensions

FIG. 8. Classification scores for static, difference, and combined spectra based on training on syllable-final segments, and testing on syllable-initial segments. The vertical axis corresponds to the percent total correct classifi- cation; the horizontal axis denotes the number of dimensions (transformed by principal components analysis) on which these classifications were made.

B. Results

Classification scores on all dimensions for the three con-

ditions are shown in Fig. 8. The peak scores, and the number of dimensions on which these were obtained, are shown in Table VIII.

Between dimensions 6-17, the best scores are obtained from' the Bark difference spectra while the combined and murmur-boundary spectra have roughly comparable scores; between dimensions 18-22, the combined spectra produce better scores than the other two types of spectra; the scores

TABLE VIII. Peak classification scores for combined, murmur-boundary, and Bark-difference spectra, and the number of dimensions on which these scores were obtained.

Condition Number of dimensions Total correct (%) ,

Combined 43 86.3

Murmur boundary 2i' 76.7 Bark difference 20 73.5

,

30 J. Acoust. Soc. Am., Vol. 96, No. 1, July 1994 Jonathan Harrington' Place of articulation distinction of nasals 30

Page 14: The contribution of the murmur and vowel to the place of

from combined spectra from higher dimensions exceed those of the murmur-boundary and bark difference spectra on any number of dimensions. A comparison of peak classification scores (Table VIII) shows the following results: The score for combined spectra is significantly greater than that of murmur-boundary spectra 0(2=59.6, dr=l, p<0.00001), and the score for murmur-boundary spectra is significantly greater than that of the Bark difference spectra 0(2=4.9, df = 1, p <0.05).

These results are therefore consistent with those of ex-

periment II in showing that combined spectra encode more information about place of articulation distinctions in nasal consonants than either static or difference spectra.

V. GENERAL DISCUSSION

As discussed in Seitz et al. (1990), there are at least three different kinds of metric for classifying speech seg- ments from acoustic data that are partly motivated by theo- ries of speech perception. The first of these is the "template" model of Blumstein and Stevens (1979, 1980) in which in- variant acoustic cues (for oral stops) are assumed to be de- rived from a single spectral section with a long time window that may (in voiced stops) encompass both the oral stop re- lease and vowel onset. These templates are sometimes re- ferred to as "static" because the temporal changes within the long time window are smeared to create a single spectral template with global spectral properties that encode place of articulation differences. Like Blumstein and Stevens' tem-

plates, many of the static spectra in the present study are derived from a long time window in which temporal changes between nasal and vowel segments are not preserved. The second type of metric is "dynamic" and has been advanced by Kewley-Port (1983), partly because velar place of articu- lation in oral stops was poorly represented in Blumstein and Stevens' (1979, 1980) static templates. In Kewley-Port's (1983) model, spectral changes in time, as evidenced from running spectral displays, are assumed to be relevant to place of articulation distinctions in oral stops. The classification of nasal place of articulation from some of the combined spec- tra in this study is similar in many respects to Kewley-Port's (1983) use of running spectral displays. The third type of metric is the "dynamic relative" strategy of Lahiri et al. (1984) which is based on spectral changes between two spec- tral slices. In one sense, Lahiri et al.'s (1984) metric is a data-reduced version of Kewley-Port's (1983): In Kewley- Port (1983), the spectra of the running spectral displays are assumed to be relevant to place distinctions, whereas in La- hiri et al. (1984), it is only the difference (between two spec- tra taken at different time points), and therefore the gradient or velocity, between the spectral positions which is assumed to be important.

Both Kurowski and Blumstein (1987) and Seitz et al. (1990) favor a metric based on spectral differences for place classifications in nasal consonants i.e., one which is closely related to the difference metric of Lahiri et al. (1984). The justification for this position in Kurowski and Blumstein (1987) is the evidence from perception experiments (Kurowski and Blumstein, 1984; Repp, 1986) which shows that a section of the speech signal extending from the margin

of the murmur into the acoustic vowel onglide provides lis- teners with some of the most informative cues for place dis- tinctions. Additionally, Kurowski and Blumstein (1987) re- ject a metric based on static spectral sections, not only because of the evidence in Kewley-Port (1983) and Lahiri et al. (1984) which points to the importance of dynamic cues in phonetic categorization, but also because a preliminary study by Blumstein and Stevens (1980) showed that static templates failed to classify nasal stops as well as oral stops. However, the link that Kurowski and Blumstein (1987) make between high listener identification scores from a section of the waveform straddling the nasal-vowel boundary and a metric based on a difference spectrum is tenuous and, per- haps surprisingly, this weak relationship remains unchal- lenged in Seitz et al. (1990) who instead offer a revised met- ric based on relative frequencies, but one which is also based on difference spectra.

While the results of the present study have less to say about the relationship between listeners' perceptions of nasal consonants and metrics used for their classification in the

acoustic waveform, they nevertheless run counter to the view, expressed in both Kurowski and Blumstein (1987) and Seitz et al. (1990), that the primary source of information for nasal place detection resides in a difference spectrum derived from the murmur and abutting vowel. The most challenging evidence is that, for every kind of test on which a compari- son was made, difference spectra never produced better scores than the static spectra in the murmur from which they were derived. Finally, Kurowski and Blumstein's (1987) two-parameter metric, and Seitz etal.'s (1990) four- parameter metric were both 20%-30% less accurate in nasal place classifications than all five types of static spectra. These considerations can be used to argue against the view that nasal place of articulation is primarily encoded in differ- ence spectra derived from the murmur and abutting vowel.

The results from experiments II and III are nevertheless consistent with the theory advocated by various researchers (Kurowski and Blumstein, 1984, 1987; Repp, !986, 1987; Seitz et al., 1990) that relational cues in both the murmur and the vowel are important for identifying nasal place of articulation. Experiments II and III showed that a combina- tion of two spectral slices taken from the murmur and the vowel produced better scores than the separate, static, mur- mur or vowel spectra in isolation. This was the case for both syllable positions, and also when the training and testing data differed with respect to syllable position (experiment III). These results demonstrate, therefore, that the murmur and vowel contribute independently useful information to place of articulation distinctions. With regard to the three types of metric considered earlier, classifications from combined spectra are most closely related to categorizations from run- ning spectra (Kewley-Port, 1983), in which not only the changing spectrum, but also the actual shape of successive spectra contribute separately useful information to the pho- netic identity of the segment.

The results of this study which show that just under 94% of syllable-initial nasals, and just under 83% of syllable-final nasals, can be correctly classified in open tests, in which training and testing are done on different parts of the data-

31 J. Acoust. Soc. Am., Vol. 96, No. 1, July 1994 Jonathan Harrington' Place of articulation distinction of nasals 31

Page 15: The contribution of the murmur and vowel to the place of

base, demonstrates that there is considerable information in the acoustic speech signal for the recovery of phonetic seg- ments. This conclusion is further strengthened by the nature of the segments which were taken from continuous, rather than citation-form speech, and which, apart from syllable position and the presence of an unspecified abutting vowel, were also uncontrolled for phonetic context. In fact, since just over 86% of syllable-initial nasals could be correctly classified when training was carried out on syllable-final na- sals, the acoustic information for nasal place identification must be present, to a certain extent, independently of syllable position. The study has also shown that a highly coarticu- lated part of the speech signal, encompassing the transition between the consonant and the vowel, provides some of the most salient cues to the place of distinction in nasal conso- nants.

Barry, W. J, and Fourcin, A. J. (1992). "Levels of labelling," Comput. Speech Language 6, 1-14.

Bernard, J. (1970). "Towards the acoustic specification of Australian En- glish," Z. Phonetik 23, 113-128.

Blumstein, S., and Stevens, K. (1979). "Acoustic invariance in speech pro- duction: evidence from measurements of the spectral characteristics of stop consonants," J. Acoust. Soc. Am. 66, 1001-1017.

Blumstein, S., and Stevens, K. (1980). "Perceptual invariance and onset spectra for stop consonants in different vowel environments," J. Acoust. Soc. Am. 67, 648- 662.

Chan, L., and Cheung, Y. (1986). "Analysis and recognition of isolated Putonghua vowels by Karhunen-Lo•ve transformation techniques," Speech Commun. 5, 299-330.

Croot, K., Fletcher, J., and Harrington, J. (1992). "Levels of segmentation and labelling in the Australian national database of spoken language," in Proceedings of the 4th Australian International Conference on Speech Science and Technology, edited by J. Pittam (Australian Speech Science and Technology Association, Canberra), pp. 86-90.

Delattre, P. (1958). "Les indices acoustiques de la parole," Phonetica 2, 226-251.

Fant, G. (1960). Acoustic Theory of Speech Production (Mouton, The Hague).

Fujimura, O. (1962). "Analysis of nasal consonants," J. Acoust. Soc. Am. 34, 1865-1875.

Garcia E. (1966). "The identification and discrimination of synthetic na- sals," Haskins Lab. Status Rep. Speech Res. 7/8, 3.1-3.16.

Garcia E. (1967). "Discrimination of three-formant nasal-vowel syllables," Haskins Lab. Status Rep. Speech Res. SR-12, 143-153.

Harrington, J., Cassidy, S., Fletcher J., and McVeigh A. (1993). "The mu+ system for corpus based speech research," Comput. Speech Lang. 7, 305- 331.

Hattori, S., Yamamoto, K., and Fujimura, O. (1958). "Nasalisation of vow- els in relation to nasals," J. Acoust. Soc. Am. 30, 267-274.

Hieronymus, J., Alexander H., Bennett, C., Cohen, I., Davies, D., Dalby, J., Laver, J., Barry, W., Fourcin, A., and Wells, J. (1990). Proposed speech segmentation criteria for the SCRIBE project. SCRIBE-Project Report.

Kahn, D. (1980). Syllable-Based Generalizations in English Phonology (Garland, New York).

Kewley-Port, D. (1983). Time-varying features as correlates of place of articulation in stop consonants. J. Acoust. Soc. Am. 73, 322-335.

Kurowski, K., and Blumstein, S. (1984). "Perceptual integration of the mur- mur and formant transitions for place of articulation in nasal consonants," J. Acoust. Soc. Am. 76, 383-390.

Kurowski, K., and Blumstein, S. (1987). "Acoustic properties for place of articulation in nasal consonants," J. Acoust. Soc. Am. 81, 1917-1927.

Lahiri, A., Gewirth, L., and Blumstein, S. (1984). "A reconsideration of acoustic invariance for place of articulation in diffuse stop consonants: Evidence from a cross-language study," J. Acoust. Soc. Am. 76, 391-404.

Larkey. L., Wald, J., and Strange, W. (1978). "Perception of synthetic nasal consonants in initial and final syllable position," Percept. Psychophys. 23, 299-311.

Liberman, A.M., Delattre, P. C., Cooper, F. S., and Gerstman, L. J. (1954). "The role of consonant-vowel transitions in the perception of the stop and nasal consonants," Psychol. Monographs: Gen. Appl. 68, 1-13.

Ma16cot, A. (1956). "Acoustic cues for nasal consonants: An experimental study involving a tape-splicing technique," Language 32, 274-284.

Millar J., Dermody P., Harrington J., and Vonwiller, J. (1990a). A national cluster of spoken language databases for Australia," in Proceedings of the 3rd Australian International Conference on Speech Science and Technol- ogy, edited by R. Seidl (Australian Speech Science and Technology Asso- ciation, Canberra), pp. 440-445.

Millar J., Dermody P., Harrington J., and Vonwiller J. (1990b). "A national database of spoken language; concept, design, and implementation," in Proceedings of the International Conference on Spoken Language Pro- cessing (ICSLP-90) (Kobe, Japan).

Mitchell, A. G., and Delbridge, A. (1965). The Speech of Australian Ado- lescents (Angus and Robertson, Sydney).

Nakata, K. (1959). "Synthesis and perception of nasal consonants," J. Acoust. Soc. Am. 31, 661-666.

Nord, L. (1976). "Perceptual experiments with nasals," Speech Transmis- sion Laboratory Quarterly Progress and Status Report 2/3, 5-8.

O'Shaughnessy, D. (1987). Speech Communication (Addison-Wesley, Read- ing, MA).

Recasens, D. (1983). "Place cues for nasal consonants with special refer- ence to Catalan," J. Acoust. Soc. Am. 73, 1346-1353.

Repp, B. (1986). "Perception of the [m]-[n] distinction in CV syllables," J. Acoust. Soc. Am. 79, 1987-1999.

Repp, B. (1987). "On the possible role of auditory short-term adaptation in perception of the prevocalic [m]-[n] contrast," J. Acoust. Soc. Am. 82, 1525-1538.

Repp, B., and Svastikula K. (1988). "Perception of the [m]-[n] distinction in VC syllables," J. Acoust. Soc. Am. 83, 237-247.

Seitz, P. F, McCormick, M. M., Watson, I. M. C, and Bladon, R. A. (1990). "Relational spectral features for place of articulation in nasal consonants," J. Acoust. Soc. Am. 87, 351-358.

Selkirk, E. O. (1982). "The syllable," in The Structure of Phonological Representations Part II, edited by H. van der Hulst and N. Smith (Foris, Dordrecht).

Strange, W. (1989a). "Evolving theories of vowel perception," J. Acoust. Soc. Am. 85, 2081-2087.

Strange, W. (1989b). "Dynamic specification of coarticulated vowels spo- ken in sentence context," J. Acoust. Soc. Am. 85, 2135-2153.

Zee, E. (1981). "Effect of vowel quality on perception of post-vocalic nasal consonants in noise," J. Phon. 9, 35-48.

Zwicker, E. (1961). "Subdivision of the audible frequency range into critical bands (Frequenzgruppen)," J. Acoust. Soc. Am. 33, 248.

32 J. Acoust. Soc. Am., Vol. 96, No. 1, July 1994 Jonathan Harrington: Place of articulation distinction of nasals 32