the perception of the vowel continuum in british and us...

Papers from the Lancaster University Postgraduate Conference in Linguistics

and Language Teaching, Vol 11–12. Papers from LAEL PG 2016–2017

Edited by Márton Petykó, Olena Rossi, and Shuo Yu

© 2018 by the author

76

The Perception of the Vowel Continuum in British and US English

Speakers

Chad Hall

Michigan State University (USA)

Abstract

In this paper, the perception of the /æ/-/ɛ/ vowel continuum was analysed in

British and United States English speakers by testing their word

identification across the pan-pen continuum. A clear difference was found

between the two speaker groups, with the U.S. speakers continuing to

perceive ‘pan’ beyond the British speakers, presumably due to /æ/-tensing in

U.S. dialects, particularly before nasal codas. It was found that the amount of

/æ/-tensing across phonetic environments in a U.S. speaker’s dialect as well

as their exposure to British English affected how they perceived the

continuum. The results prove Bell-Berti’s (1979) argument that speech

production and perception are closely related, and the steep drop in

perception from ‘pan’ to ‘pen’ displayed by both speaker groups may prove

that vowel perception is categorical, in contrast to popular opinion, though a

discrimination task would have to be run before any reliable claim can be

made.

Keywords: Vowels, Perception, U.S. English, British English,

Sociolinguistics, Sociophonetics, Dialectology

Chad Hall

77

1. Introduction

1.1. British and United States English Short-a Pronunciation

The production of short-a by British and United States speakers is one of the key linguistic

features that differentiates the dialects of these two countries. In England during the Great

Vowel Shift, the short-a or TRAP vowel was fronted and raised to [æ] (Lass 2000). Then in

the early twentieth century, the short-a vowel in Britain was lowered again to a fully open [a]

sound (Wells 1997).

Across the United States, short-a configuration is complex. U.S. speakers follow a system

of /æ/-tensing (fronting and raising) which varies depending on dialect, according to Labov et

al. (2006).

The first system is the nasal system where there is a wide acoustic separation between

vowels that occur before nasals and vowels that do not. Short-a is more tensed before nasals

than before other consonants in this system. The nasal system is concentrated in New England,

New Jersey outside the New York City area, and across the Midland, particularly the large

cities of Pittsburgh, Columbus and Indianapolis.

The second is the raised /æ/ system where all short-a vowels are tensed. They then

develop a second mora and glide inwards ([ɛə], [eə], [iə]). It is one of the triggering events of

the Northern Cities Vowel Shift and is dominant in the Inland Northern area of the United

States. Labov found that for many speakers in this area, the second mora consists of a second

steady state instead of an inglide. As a result, the short-a tends to break in to two morae of

equal length, with one in mid front position and the other in low front or central position.

The third system of /æ/-tensing in the U.S. is the Southern breaking system that occurs in

Southern American English. This is when the vowel begins in a low front position, is then

followed by a [j] and then returns to a position not far acoustically from the origin. It is most

favoured by nasal codas, with /n/ showing the highest percentage of breaking.

The fourth is the short-a split system which occurs in New York City and the Mid-

Atlantic. In New York City, short-a is tensed before voiceless fricatives, voiced stops and

nasals whereas in the Mid-Atlantic, short-a is tensed before nasals and voiceless fricatives

apart from /ʃ/. It is referred to as a split system because the variations of short-a in these areas

can become distinct phonemes.

The final system is the continuous short-a system, the most common short-a configuration

in the West and the Midland. It is a continuum of allophones of the short-a vowel from low

The Perception of the Vowel Continuum in British and US English Speakers

78

front to mid position. The most conservative phonetic environments where short-a is lowest is

before voiceless velars, and the most advanced where short-a is highest is before nasal codas.

From these five systems, it can be deduced that for all speakers in the United States, short-a

undergoes a process of tensing before nasals in closed syllables.

What is clear to see from previous research is that short-a production is markedly

different between British and United States English Speakers, with British speakers far more

likely to produce a more open short-a vowel than U.S. speakers, particularly before nasal

codas. This experiment will seek to find if this difference in short-a production between

British and United States English speakers is reflected in their perception of the /æ/-/ɛ/ vowel

continuum, specifically the pan-pen continuum.

1.2. The Relationship between Vowel Production and Perception

Before being able to make a hypothesis on how British and United States speakers will

perceive the pan-pen continuum, it is first necessary to consider the relationship between a

speaker’s vowel production and that same speaker’s perception. There are numerous studies

that support the claim that an individual’s production and perception are closely related. Bell-

Berti et al. (1979) found that speaker differences in the production of the /i/ vowel were

strongly linked to differences in their perception of /i/. This finding was taken as evidence for

a shared mechanism mediating the production and perception of vowels. There are less

sociolinguistic investigations on the relationship between vowel production and perception,

though there is one notable study that is closely related to this current experiment in terms of

its aim, the subjects investigated and the experimental methods implemented. Fridland &

Kendall (2012) examined how speakers in the U.S. perceived the /e/-/ɛ/ vowel continuum.

Southern speakers, who typically display /e/ centralization and /ɛ/ peripheralization due to the

Southern Vowel Shift (Feagin 1986; Labov et al. 2006; Thomas 2001) sustained a longer /e/

perception along the continuum than Westerners and Northerners, whose vowel production

along this continuum are not dissimilar. Results also suggested that vowel perception depends

on both the speaker’s own production of speech and what that speaker is exposed to in their

region. For example, Southerners who actively engaged in the Southern Vowel Shift displayed

a longer /e/ perception than Southerners who did not, though these non-shifters still displayed

a longer /e/ perception than Northerners and Westerners, presumably due to exposure to vowel

shifters in their region. It is therefore important in this experiment to consider that a speaker’s

Chad Hall

79

dialect may not be similar to those that live in their region, which could influence their

perception of the /æ/-/ɛ/ continuum. Unfortunately however, due to time constraints, speakers

were not recorded in this study and assumptions had to be made on the participant’s dialect

based on region of origin.

It is also worth noting that if a U.S. speaker has spent an extended period of time in the

U.K., their perceptual boundaries may have shifted towards a more British perception of the

/æ/-/ɛ/ continuum. As will be seen in the “Participants” section, many of the U.S. speakers in

this experiment lived in the U.K. for at least eight months, and thus results may be affected.

Literature supports this proposal: Clarke & Garret (2004) and Nygaard (et al. 2005) found that

listener perception adapts rapidly to foreign-accented speech while Clarke & Luce (2005)

discovered that listeners shift their Voice Onset Time categorization boundary for stop

consonants to match a speaker’s production after less than two minutes with that speaker.

Furthermore, Clopper & Pisoni (2007) and Sumner & Samuel (2009) (as well as Fridland

& Kendall (2012)) have found that speaker perception depends on the production of the

individual and the dialect of the speakers around them.

Based on the findings of previous research, it is hypothesized that in this experiment,

speakers from the U.S. who in most cases display /æ/-tensing before nasal codas will maintain

a longer ‘pan’ perception along the pan-pen continuum than British speakers whose

production of short-a before nasal codas is more open, close to the cardinal vowel /a/.

Additionally, in the case that a U.S. speaker has spent a significant amount of time in the

United Kingdom, it is possible that the speaker’s pan-pen perception will be affected. If the

perception of the pan-pen continuum is proven to be different between British and U.S.

speakers, it would support the claim of Bell-Berti et al. (1979) that a speaker’s production and

perception are correlated.

1.3. The Perception and Categorization of Vowels

Liberman et al. (1957) was the first to introduce the concept that speech sounds, in particular

consonants, are perceived categorically. In his experiment, 14 synthetic stimuli were produced

that varied along a particular acoustic continuum, being the direction and extent of the second

formant (F2) transition. The first formant (F1) was kept consistent with a rising transition, a

marker for voiced stops (Delattre et al. 1955). It was found that in an identification task,

participants displayed clear categorical boundaries between the consonants /b/, /d/ and /g/. The

shift in perception from one consonant to another along the continuum was abrupt, indicating


80

that the phoneme boundaries were stable and sharp. In one-step, two-step and three-step

discrimination tasks, participants found it easier to discriminate stimuli that lay on either side

of a phoneme boundary than stimuli within the same category. Figure 1 shows the results of

Liberman’s identification and discrimination experiments.

Figure 1: The results of Liberman’s experiment. (Liberman et al. 1957)

Fry et al. (1962) proposed that the boundaries between vowel phonemes are less sharply

defined than consonants. In his vowel phoneme identification task, speakers displayed a

gradual shift in the perception of one vowel to another compared to Liberman’s experiment

where shifts were sudden. Moreover, speakers performed consistently above chance in the

one-step, two-step and three-step discrimination tasks, meaning that participants were able to

discriminate vowel stimuli even a step apart, regardless of where they lay in relation to

phoneme boundaries. Figures 2 and 3 show the results of Fry’s identification and

discrimination experiment.

Chad Hall

81

Figure 2: The results of Fry’s identification experimen (Fry et al. 1962)

Figure 3: The result of Fry’s discrimination experiment ( Fry et al.1962)

If previous literature is to be assumed true, participants in this study would be expected to

show a more continuous, rather than categorical, perception of the /æ/-/ɛ/ vowel continuum in

this experiment. However, should speakers collectively show an abrupt shift from perceiving

‘pan’ to ‘pen’, it could be argued that perhaps vowel categorization is sharper than previously

believed. This experiment however cannot prove that vowels are perceived categorically since


82

a discrimination task would have to be run in order to provide any credible arguments on

vowel perception. Due to time constraints, this was not possible.

2. Method

2.1. Participants

Fourteen participants took part in the study. Seven were native speakers of Standard Southern

British English aged 21-35 (mean = 24.7 years), two of the seven British speakers were female

and five were male. They were all raised in the South of England and had lived in Oxford for

at least eight months. The other seven participants were native speakers of United States

English aged 18-21 (mean = 20.4 years). Two of them were male and five were female. They

were all raised in the U.S.; two in Massachusetts, one in Michigan, one in Maryland, one in

Iowa, one who had lived for over three years in Massachusetts and Illinois, and one who

moved frequently between Massachusetts, North Carolina and Pennsylvania. They had all

lived in Oxford for at least eight months, apart from one speaker, participant 1, who had been

living in the country for 4 days on the day of her experiment. Table 1 shows the U.S.

participants, their region of origin, and assumed dialect based on region according to Labov et

al. (2006).

Table 1: Names of the U.S. speakers, their region of origin and assumed dialect

(Labov et al. 2006)

Participant Region of Origin Assumed Dialect

1 Highland and Livonia,

Michigan

Inland Northern American

English

2 Middleborough,

Massachusetts

Northeastern New

England English

3 Worcester, Massachusetts Northeastern New

England English

4 Hagerstown, Maryland Western Pennsylvania

English

5 Des Moines, Iowa Midland American English

6 - Sherborn, Massachusetts - Northeastern New

Chad Hall

83

2.2 Materials

Each participant took part in a word identification task, listening to 100 synthetic stimuli

ranging across the pan-pen continuum (10 unique stimuli multiplied by 10 and played in

random order). The stimuli were created with the IPOX Speech Synthesizer (Dirksen &

Coleman 1995) which is able to generate synthetic words using a phonemic input. Initially

two sound files were created; a “pan” .wav file, which would be stimulus 1, and a “pen” .wav

file, which would be stimulus 10. The files were 16-bit sound files sampled at 11025Hz. The

/æ/ vowel in stimulus 1 had an F1 value of 838 Hz, an F2 value of 1560 Hz, an F3 value of

2430 Hz and an F4 value of 3300 Hz. The /ɛ/ vowel in stimulus 10 had an F1 value of 620 Hz,

an F2 value of 1660 Hz, an F3 value of 2430 Hz and an F4 value of 3300 Hz. Eight more

stimuli were then created that moved in equal steps across the acoustic continuum from

stimulus 1 to 10. This process was carried out by creating a parameter file of stimulus 1 and

10 through IPOX, duplicating the stimulus 1 parameter file eight times and then adjusting the

formant values so that there were 10 parameter files moving in uniform steps from stimulus 1

to 10. These parameter files were then converted back to .wav files, giving 10 sound files

ranging across the pan-pen continuum. These sounds were then embedded in to a word

identification task, run on a computer using a Praat script (Boersma & Weenink 2016). The

script played the ten sound files five times in random order, giving 50 stimuli, asking the

listener to identify whether they heard the word “pan” or “pen”, to which they would respond

by pressing a button on the keyboard with their right hand. The script was then run a second

time with the participant using their left hand, meaning each listener heard 100 stimuli.

- Charlotte, North

Carolina

- Kennett Square,

Pennsylvania

England English

- Southern American

English

- Mid-Atlantic American

English

7 - Martha’s Vineyard,

Massachusetts

- Chicago, Illinois

- Southeastern New

England English

- Inland Northern

American English


84

Participants were required to answer with their right hand in the first block and then their left

hand for the second block to reduce any bias towards a particular answer since there is

evidence that language dominance in the right or left hemisphere of the brain is linked to

handedness (Knecht et al. 2000).

2.3. Procedure

Before the task begun, each participant was required to fill out a questionnaire on his or her

language background in order to confirm that they were native English speakers and to extract

information on where they were raised in the U.K. or U.S. The participant was then sat in a

soundproof room in front of a computer screen and keyboard, and was told that they would

hear a block of fifty words through headphones that would sound like the word “pan” or

“pen”. After hearing each sound, they would have to identify which word they heard. The

screen showed the word “pan” on the left hand side and “pen” on the right as shown in Figure

4, and participants were required to press the letter “Q” on the keyboard with their right index

finger if they heard the word “pan” and the letter “W” with their right middle finger if they

heard the word “pen”.

Figure 4: A screenshot of the first block of the identification task

It was specified to the participant that there was no right or wrong answer to each stimulus and

that they must choose their answer as quickly as possible, in order to force the participant in to

acting on instinct as to which word they heard, reflecting normal communication as

realistically as possible. Once the block was over, participants then re-sat the test, this time by

pressing the letter “Q” with their left middle finger if they heard the word “pen” and the letter

“W” with their left index finger if they heard the word “pan”. Moreover, in the second block,

Chad Hall

85

“pen” was on the left hand side of the screen and “pan” was on the right as shown in Figure 5.

A shorter block of 25 stimuli was run before the main experiment in order for participants to

familiarize themselves with the task and to test sound volume.

Figure 5: A screenshot of the second block of the identification task

2.4. Analysis

The responses of each participant were collected to a table on Praat and saved as a comma-

separated file. Responses that had reaction times above 1.5 seconds were removed due to the

fact that they often yielded unusual responses and were thus deemed to not be instinctive.

Since the majority of responses were below 1.5 seconds, it seemed suitable to remove these

anomalies from the final results. Additionally, responses with reaction times below 0.16

seconds were removed since it is not possible to respond to visual stimuli faster than this

(Kosinski 2008). Answers given below this reaction time indicate that a speaker answered too

soon before cognitively processing what they heard. In total, 62 responses were omitted from

the results out of 1400. The percentage of “pan” and “pen” responses were calculated for each

individual and then for both speaker groups.

3. Results

Figure 6 and Table 2 show the results of the experiment for the British and United States

English speaker groups. In Figure 6, across the x-axis is stimulus 1-10, and across the y-axis is

the overall percentage of “pan” responses to each stimulus. The blue line represents the British

English speaker group and the red line represents the United States English speaker group.


86

Figure 6: The percentage of “pan” responses by the British and U.S. speaker groups (Graph)

Table 2: The percentage of “pan” responses by the British and U.S. speaker groups

Stimulus

1 2 3 4 5 6 7 8 9 10

Speaker

Group

British

English 100% 100% 97.1% 93.1% 54.9% 12.4% 4.9% 0% 1.4% 0%

U.S.

English 100% 97.4% 98.7% 85.5% 42.9% 27.6% 11.8% 13.9% 3.8% 2.7%

The results show that from stimulus 1-4, both the British English and United States English

speaker groups performed above chance, identifying the word “pan” over 85% of the time. At

stimulus 5, there was a steep drop in the perception of “pan” for both groups who performed at

chance: The British speakers identified “pan” 54.9% of the time and the U.S. speakers

identified “pan” 42.9% of the time. By stimulus 6, both speaker groups performed above

chance, with the British speakers identifying “pan” 12.4% of the time (meaning they identified

“pen” 87.6% of the time) and the U.S. speakers identifying “pan” 27.6% of the time (meaning

Chad Hall

87

they identified “pen” 72.4% of the time). At stimulus 6-8, the United States English speaker

group displayed a significantly higher percentage of “pan” responses than the British English

speaker group (15.2% more at stimulus 6, 6.9% more at stimulus 7 and 13.9% more at

stimulus 8) and at stimulus 7-8, the United States English speaker group still responded with

“pan” over 10% of the time (11.8% and 13.9% respectively) while less than 5% of the British

English responses were “pan” (4.9% and 0% respectively). At stimulus 9-10, only 1 out of 139

responses by all British speakers was “pan”, while 5 out of 155 U.S. speaker responses were

“pan”. The main finding from these results is that both the British English and United States

English speaker groups sharply dropped in their perception of “pan” at stimulus 5, but from

stimulus 6 onwards, United States English speakers perceived “pan” significantly more than

the British English speakers.

It is crucial to examine the results of the individual U.S. participants to investigate if

particular speakers influenced the U.S. group results considerably from stimulus 6 onwards.

Additionally, one should analyze the variation within the British English speaker group to

ensure that the overall pattern seen in the British group is consistent amongst the individual

speakers. Table 3 and Figure 7 shows the results of the individual U.S. speakers. There

appears to be one U.S. speaker who consistently identified “pan” well above the group

average across stimulus 6, 7 and 8. Participant 1 identified “pan” 50% of the time at stimulus

6 (22.4% above the average), 44.4% of the time at stimulus 7 (32.6% above the average), and

20% of the time at stimulus 8 (6.2% above the average).

Table 3: The percentage of “pan” responses by the individual U.S. speakers

1 2 3 4 5 6 7 8 9 10

1 100% 100% 100% 100% 42.9% 50% 44.4% 20% 20% 0%

2 100% 90% 100% 70% 11.1% 10% 0% 0% 0% 0%

3 100% 100% 100% 100% 57.1% 44.4% 11.1% 0% 0% 0%

4 100% 90% 100% 90% 50% 30% 0% 30% 0% 0%

5 100% 100% 100% 80% 20% 0% 0% 0% 0% 10%

6 100% 100% 100% 70% 60% 20% 20% 20% 10% 20%

7 100% 100% 90% 87.5% 33.3% 22.2% 10% 20% 0% 0%


88

Figure 7: The percentage of “pan” responses by the individual U.S. speakers (Graph)

However, removing participant 1’s results does not dramatically alter the results for stimulus 6

(24.2%), 7 (7.5%) and 8 (13.0%): The U.S. English speaker group still displays a markedly

higher percentage of “pan” responses than the British speaker group. Each U.S. speaker

identified “pan” above the group average at some point from stimulus 6-10, apart from

participant 2 who always identified “pan” below the average from stimulus 6 onwards. This

indicates that beyond stimulus 5, almost the entire U.S. speaker group perceived “pan” more

than the British English group whose “pan” perception never increased beyond 5%. From

stimulus 7-10, only three of the seven British speakers perceived “pan” at any time with four

“pan” responses in total whereas six of the seven U.S. speakers perceived “pan” with 25 “pan”

responses in total. Thus it is clear that the United States English speaker group perceived

“pan” significantly more than the British English speaker group in the second half of the pan-

pen continuum. Table 4 and Figure 8 show the results of the individual British English

speakers.

Chad Hall

89

Table 4: The percentage of “pan” responses by the individual British speakers

Figure 8: The percentage of “pan” responses by the Individual Speakers (Graph)

1 2 3 4 5 6 7 8 9 10

1 100% 100% 100% 89% 63% 20% 0% 0% 0% 0%

2 100% 100% 90% 90% 30% 10% 0% 0% 0% 0%

3 100% 100% 100% 100% 90% 50% 0% 0% 0% 0%

4 100% 100% 100% 100% 63% 11% 0% 0% 0% 0%

5 100% 100% 100% 90% 44% 0% 10% 0% 10% 0%

6 100% 100% 90% 89% 80% 0% 10% 0% 0% 0%

7 100% 100% 100% 100% 40% 20% 11% 0% 0% 0%


90

The table and graph confirm that there was not much inter-group variation within the British

Group. Apart from speaker 3 whose drop in “pan” perception starts at stimulus 6, every other

speaker’s “pan” perception drops at stimulus 5, indicating that the pattern shown in Figure 6

for the British speakers is uniform across six out of seven British speakers. Furthermore,

beyond stimulus 6, the perception of “pan” across the listeners was minimal, unlike the U.S.

English group.

4. Discussion

The results show that both the British and U.S. speaker group’s drop in “pan” perception

occurred at the same stimulus. This was an unanticipated result since it was expected that the

/æ/-tensing before nasal codas by United States English speakers would cause them to

perceive “pan” for longer across the pan-pen continuum than the British English speakers.

However, it is worth noting that the U.S. participant who had the highest percentage of “pan”

responses at stimulus 6 and 7 (50% and 44.4% respectively) was participant 1 who had only

lived in England for four days at the time of the experiment, compared to the other

participants who had been living in England for at least eight months. It is possible that

extensive exposure to British English speakers may have caused the other six speakers to

adjust their perceptual boundary between /æ/ and /ɛ/, having a more British perception of the

pan-pen continuum than the U.S. speaker who had only lived in England for a limited time. It

is also possible that the U.S. speakers dropped in their “pan” perception earlier than expected

due to an unconscious assumption that they were listening to British speech. Since the

experiment was run by a British phonetician in England, U.S. participants may have assumed

that the synthetic speech they were listening to was British, and adjusted their perceptual

vowel boundaries accordingly. Niedzielski (1999) found that altering the national identity of a

speaker as indicated on an answer sheet changed the perception of vowel categories. Though

the national identity of the synthetic speech was never specified, the environment and location

of the experiment, as well as the presence of open British /æ/ in the stimuli may have led to a

subconscious assumption that the participant was listening to British speech.

Although the U.S. English speakers display a sharp drop in “pan” perception earlier than

expected, the drop was not as great as the British English speakers, and the U.S. speakers

continued to perceive “pan” significantly more than the British speakers to the end of the

continuum. This is likely due to two reasons: The first is that although the U.S. speakers may

Chad Hall

91

have shifted their perceptual boundaries, it is not a firmly established, fully shifted boundary

and at times, the U.S. speakers were still susceptible to hearing “pan” past stimulus 6. The

second reason is that in some U.S. dialects, /æ/ before nasal consonants is often tensed beyond

/ɛ/, towards the higher /e/ and even /ɪ/ (Labov et al. 2006). It is therefore unsurprising to

observe the U.S. speakers continuing to hear “pan” at the end of the /æ/-/ɛ/ continuum.

The individual variation of the U.S. participants can also be explained with relation to the

assumed dialect of each speaker. Speakers of dialects that tense /æ/ in more phonetic

environments displayed “pan” perception the most across the pan-pen continuum: The U.S.

speakers who perceived “pan” consistently at 10% or above from stimulus 1-8 were

participant 1, 6 and 7. According to Labov et al. (2006), participant 1 came from an area

where Inland Northern American English is spoken, a dialect with the raised /æ/ system where

all short-a vowels are tensed. Participant 7 also spent time in an area where Inland Northern

American English is spoken; Chicago, Illinois, and it appears to have affected her perception

of the /æ/-/ɛ/ continuum, causing it to be markedly different to the other speakers who lived in

Massachusetts, perceiving “pan” much later in the continuum than them. Participant 6 spent

four years in Charlotte, North Carolina where the Southern American English dialect is

prominent, and most /æ/ vowels are tensed due to the Southern Breaking system. She also

spent three years in Kennett Square, Pennsylvania, where Mid-Atlantic American English is

spoken. This dialect has a short-a split system where /æ/ is tensed before nasals and all

voiceless fricatives apart from /ʃ/. The other four speakers originated from areas with dialects

that have the nasal system where short-a is only tensed before nasals: Participant 2 and

participant 3 were raised in an area where Northeastern New England English is spoken,

participant 4 originated from a region where Western Pennsylvania English is spoken and

participant 5 came from an area where Midland American English dominates. These three

dialects have the nasal system which has the least occurrences of /æ/-tensing. It is therefore

likely that the more accustomed a listener was to /æ/-tensing across phonetic environments,

the more they perceived “pan” along the continuum.

The final point to discuss is whether any claim can be made that the vowel continuum was

perceived categorically. Compared to Fry’s et al. (1962) vowel identification results, the graph

from this experiment shows a steeper and more sudden shift in the perception of one vowel to

another. In his experiment (see Figure 2) there were 8 synthetic stimuli between the /ɪ/ and /ɛ/

vowel, and the decrease in /ɪ/ perception was far more gradual and continuous, decreasing

from 100% to 95%, then 80% to 70%, 50% to 48%, then 15% to just above 0% by stimulus 8.


92

The results of the current experiment are to the contrary, with a sharp drop in “pan”

perception, especially for the British English speaker group, falling from 93.1% at stimulus 4

to 12.4% at stimulus 6. There are a couple of possible reasons why the results of these two

studies differ: Firstly, in Fry’s experiment, participants were asked to identify phonemes, not

words. Sachs & Klatt (1969) found that the Phoneme Boundary Effect was observed across

the /ɑ/‐/æ/ continuum only in word context by U.S. speakers (“bottle” /bɑdɔl/ vs battle

/bædɔl/), meaning that subjects were able to discriminate stimuli in the middle of the

continuum easier than on the edges. In contrast, isolated vowel stimulus pairs were

differentiated equally well over the entire continuum. Their results indicated that the

perceptual boundaries between vowels are more defined in word context than in isolation.

Secondly, Fry’s results may contradict the findings of this experiment because the quality of

speech synthesis has improved since 1962. The machine that Fry used, “Alexander” was a

formant-type analogue speech synthesizer with four formant generators, of which only two

were used in his experiment (Fry et al. 1962). The IPOX speech synthesizer used in this study

generates six formant frequencies (including nasal zero frequency), five source parameters,

three formant bandwidths and six parallel branch amplitudes (Dirksen & Cole 1995), all of

which assist in making the speech sound more authentic.

Regardless if these elements factor in to the difference between the present study and

Fry’s paper, the results of this experiment suggest that perhaps vowel perception isn’t as

continuous as previously believed, motivated further by the fact that the patterns we see are

consistent across most of the individual speakers, though a discrimination task would have to

be run before any reliable claim can be made.

5. Conclusion

This study has found that the difference in short-a production between British and United

States English speakers, especially before nasals, leads to a contrast in the perception of the

pan-pen continuum between these two speaker groups, but not in the expected manner: The

U.S. speaker group did not collectively drop in their perception of “pan” later than the British

English speaker group, however after the initial fall in “pan” perception, the U.S. speakers still

perceived “pan” beyond the phoneme boundary, significantly more so than the British English

speakers, with speakers exposed to /æ/-tensing most frequently across phonetic environments

more likely to perceive “pan” a larger percentage of the time further along the continuum. The

Chad Hall

93

results confirm the findings of Bell-Berti et al. (1979) that speech production and perception

are closely related. It is also suggested in this study that extensive exposure to the British

English dialect by the majority of the U.S. speakers, and perhaps the subconscious assumption

that the synthetic speech was British, caused the U.S. speakers to shift their perceptual

boundary towards /æ/, though the greater percentage of “pan” perception by the U.S. speakers

at the later stages of the continuum suggest that these phoneme boundaries had not fully

shifted nor were they as sharply defined as the British speakers’ perceptual boundary. It would

thus be interesting to test U.S. speakers who have not been as exposed to British speakers to

determine if lack of exposure to British English does lead to having the expected higher

phoneme boundary, closer to /ɛ/. In turn it would be fascinating to investigate the pan-pen

perception of British English speakers who have been living in the U.S. for a considerable

period of time, especially in areas with the raised /æ/ system such as the Northern cities

(including Rochester New York, Cleveland Ohio, Detroit Michigan, Chicago Illinois and

Madison Wisconsin), to see if extensive exposure to the United States English dialect would

induce a shift in their phoneme boundary between /æ/ and /ɛ/ to a higher position, offering

more of an insight in to the effect of dialect exposure to speech perception. In addition, with

less strict of a time constraint, it would have been useful to record the speakers’ speech instead

of assuming the speaker’s dialect based on region of origin. By recording the speakers,

considerations could have been made on how the dialect of their region as well as their own

speech contributed to their perception of the /æ/-/ɛ/ continuum.

A necessary addition to add to this study is a discrimination task to explore the possibility

that vowels are perceived categorically, contrary to popular opinion (Fry et al. 1962). If it is

found that speakers do find it easier to discriminate the vowels towards the middle of the

continuum than the periphery in word context, the next step would be to test the

discrimination of these synthetic vowels outside of word context. Though previous literature

suggests that isolated vowels would be discriminated equally across the continuum and not

categorically (Fry et al. 1962; Sachs & Klatt 1969), an experiment run with better quality

speech synthesis could yield contrasting results, and the argument that vowels are perceived

continuously may yet be proved wrong.


94

References

Bell-Berti, F., Raphael, L.J., Pisoni, D.B. & Sawusch, J.R. (1979). Some relationships between speech

production and perception. Phonetica 36, 373-383.

Boersma, P & Weenink, D. (2016). Praat: doing phonetics by computer [Computer program]. Version

6.0.14, retrieved 27 February 2016 from http://www.praat.org/

Clarke, C.M. & Garrett, M.F. (2004). Rapid adaptation to foreign-accented English. Journal of the

Acoustical Society of America 116, 3647-3658.

Clarke, C.M. & Luce, P.A. (2005). Perceptual adaptation to speaker characteristics: VOT boundaries in

stop voicing categorization in Proceedings of the ISCA Workshop on Plasticity in Speech

Perception. London, U.K.

Clopper, C. & Pisoni, D. (2007). Free classification of regional dialects of American English. Journal

of Phonetics 35, 421-438.

Delattre, P.C., Liberman, A.M., & Cooper, F.S. (1955). Acoustic loci and transitional cues for

consonants. Journal of the Acoustical Society of America 27, 769-773.

Dirksen, A. (1995). IPOX Lecture Notes. [Online]. Available at: www.phon.ox.ac.uk/ipox_notes/

[Accessed: 28th May 2016].

Dirksen, A. & Coleman, J. (1995). IPOX Speech Synthesizer [Computer program]. Retrieved 10th April

2016 from www.phon.ox.ac.uk/ipox/

Feagin, C. (1986). More evidence for vowel change in the South in Sankoff D. (eds.), Diversity and

Diachrony (pp. 83-95). Amsterdam and Philadelphia : Benjamins Publishing Company.

Fridland, V. & Kendall T. (2012). Exploring the relationship between production and perception in the

mid front vowels of U.S. English. Lingua 122, 779-793.

Fry, D.B., Abramson, A.S., Eimas, P.D., Liberman, A.M. (1962). The identification and discrimination

of synthetic vowels. Lang Speech 5, 171-189.

Knecht, S., Dräger, B., Deppe, M., Bobe, L., Lohman, H., Flöel, A., Ringelstein, E.-B. & Henningsen,

H. (2000). Handedness and hemispheric language dominance in healthy humans. Brain 123, 2512-

2518.

Kosinski, R.J. (2008) A Literature Review on Reaction Time. Clemson University.

Labov, W., Ash, S. & Boberg, C. (2006). The short-a and short-o configurations. In Labov, W., Ash, S.

& Boberg (Eds.), The Atlas of North American English (pp. 169-181). Berlin: Mouton de Gruyter.

Lass, R. (2000). Phonology and Morphology. In Lass, R (Ed.), The Cambridge History of the English

Language, Volume III: 1476-1776 (pp. 56-186). Cambridge: Cambridge University Press.

Liberman, A.M., Harris, K.S., Hoffman, H.S. & Griffith, B.C. (1957). The discrimination of speech

sounds within and across phoneme boundaries. Journal of Experimental Psychology 54(5), 358-

368.

http://www.phon.ox.ac.uk/ipox_notes/

http://www.phon.ox.ac.uk/ipox/

Chad Hall

95

Niedzielski, N. (1999). The effect of social information on the perception of sociolinguistic variables.

Journal of Language and Social Psychology 18, 62-85.

Nygaard, L., Sidaras, S. & Duke, J. (2005). Perceptual learning of accented speech in Proceedings of

the ISCA Workshop on Plasticity in Speech. London, U.K.

Sachs, R.M. & Klatt, D.H. (1969). Vowel identification in isolation and in word context. Journal of the

Acoustical Society of America 46, 115.

Sumner, M. & Samuel, A. (2009). The effect of experience on the perception and representation of

dialect variants. Journal of Memory and Language 60, 487-501.

Thomas, E. (2001). An acoustic analysis of vowel variation in New World English. Publication of the

American Dialect Society 85. Duke University, Durham, NC.

Wells, J.C. (1997). Whatever happened to Received Pronunciation? In Medina & Soto (eds.), II

Jornadas de Estudios Ingleses (pp. 19-28). Universidad de Jaén, Spain.

the perception of the vowel continuum in british and us...

Documents