the perception of the vowel continuum in british and us...
TRANSCRIPT
Papers from the Lancaster University Postgraduate Conference in Linguistics
and Language Teaching, Vol 11–12. Papers from LAEL PG 2016–2017
Edited by Márton Petykó, Olena Rossi, and Shuo Yu
© 2018 by the author
76
The Perception of the Vowel Continuum in British and US English
Speakers
Chad Hall
Michigan State University (USA)
Abstract
In this paper, the perception of the /æ/-/ɛ/ vowel continuum was analysed in
British and United States English speakers by testing their word
identification across the pan-pen continuum. A clear difference was found
between the two speaker groups, with the U.S. speakers continuing to
perceive ‘pan’ beyond the British speakers, presumably due to /æ/-tensing in
U.S. dialects, particularly before nasal codas. It was found that the amount of
/æ/-tensing across phonetic environments in a U.S. speaker’s dialect as well
as their exposure to British English affected how they perceived the
continuum. The results prove Bell-Berti’s (1979) argument that speech
production and perception are closely related, and the steep drop in
perception from ‘pan’ to ‘pen’ displayed by both speaker groups may prove
that vowel perception is categorical, in contrast to popular opinion, though a
discrimination task would have to be run before any reliable claim can be
made.
Keywords: Vowels, Perception, U.S. English, British English,
Sociolinguistics, Sociophonetics, Dialectology
Chad Hall
77
1. Introduction
1.1. British and United States English Short-a Pronunciation
The production of short-a by British and United States speakers is one of the key linguistic
features that differentiates the dialects of these two countries. In England during the Great
Vowel Shift, the short-a or TRAP vowel was fronted and raised to [æ] (Lass 2000). Then in
the early twentieth century, the short-a vowel in Britain was lowered again to a fully open [a]
sound (Wells 1997).
Across the United States, short-a configuration is complex. U.S. speakers follow a system
of /æ/-tensing (fronting and raising) which varies depending on dialect, according to Labov et
al. (2006).
The first system is the nasal system where there is a wide acoustic separation between
vowels that occur before nasals and vowels that do not. Short-a is more tensed before nasals
than before other consonants in this system. The nasal system is concentrated in New England,
New Jersey outside the New York City area, and across the Midland, particularly the large
cities of Pittsburgh, Columbus and Indianapolis.
The second is the raised /æ/ system where all short-a vowels are tensed. They then
develop a second mora and glide inwards ([ɛə], [eə], [iə]). It is one of the triggering events of
the Northern Cities Vowel Shift and is dominant in the Inland Northern area of the United
States. Labov found that for many speakers in this area, the second mora consists of a second
steady state instead of an inglide. As a result, the short-a tends to break in to two morae of
equal length, with one in mid front position and the other in low front or central position.
The third system of /æ/-tensing in the U.S. is the Southern breaking system that occurs in
Southern American English. This is when the vowel begins in a low front position, is then
followed by a [j] and then returns to a position not far acoustically from the origin. It is most
favoured by nasal codas, with /n/ showing the highest percentage of breaking.
The fourth is the short-a split system which occurs in New York City and the Mid-
Atlantic. In New York City, short-a is tensed before voiceless fricatives, voiced stops and
nasals whereas in the Mid-Atlantic, short-a is tensed before nasals and voiceless fricatives
apart from /ʃ/. It is referred to as a split system because the variations of short-a in these areas
can become distinct phonemes.
The final system is the continuous short-a system, the most common short-a configuration
in the West and the Midland. It is a continuum of allophones of the short-a vowel from low
The Perception of the Vowel Continuum in British and US English Speakers
78
front to mid position. The most conservative phonetic environments where short-a is lowest is
before voiceless velars, and the most advanced where short-a is highest is before nasal codas.
From these five systems, it can be deduced that for all speakers in the United States, short-a
undergoes a process of tensing before nasals in closed syllables.
What is clear to see from previous research is that short-a production is markedly
different between British and United States English Speakers, with British speakers far more
likely to produce a more open short-a vowel than U.S. speakers, particularly before nasal
codas. This experiment will seek to find if this difference in short-a production between
British and United States English speakers is reflected in their perception of the /æ/-/ɛ/ vowel
continuum, specifically the pan-pen continuum.
1.2. The Relationship between Vowel Production and Perception
Before being able to make a hypothesis on how British and United States speakers will
perceive the pan-pen continuum, it is first necessary to consider the relationship between a
speaker’s vowel production and that same speaker’s perception. There are numerous studies
that support the claim that an individual’s production and perception are closely related. Bell-
Berti et al. (1979) found that speaker differences in the production of the /i/ vowel were
strongly linked to differences in their perception of /i/. This finding was taken as evidence for
a shared mechanism mediating the production and perception of vowels. There are less
sociolinguistic investigations on the relationship between vowel production and perception,
though there is one notable study that is closely related to this current experiment in terms of
its aim, the subjects investigated and the experimental methods implemented. Fridland &
Kendall (2012) examined how speakers in the U.S. perceived the /e/-/ɛ/ vowel continuum.
Southern speakers, who typically display /e/ centralization and /ɛ/ peripheralization due to the
Southern Vowel Shift (Feagin 1986; Labov et al. 2006; Thomas 2001) sustained a longer /e/
perception along the continuum than Westerners and Northerners, whose vowel production
along this continuum are not dissimilar. Results also suggested that vowel perception depends
on both the speaker’s own production of speech and what that speaker is exposed to in their
region. For example, Southerners who actively engaged in the Southern Vowel Shift displayed
a longer /e/ perception than Southerners who did not, though these non-shifters still displayed
a longer /e/ perception than Northerners and Westerners, presumably due to exposure to vowel
shifters in their region. It is therefore important in this experiment to consider that a speaker’s
Chad Hall
79
dialect may not be similar to those that live in their region, which could influence their
perception of the /æ/-/ɛ/ continuum. Unfortunately however, due to time constraints, speakers
were not recorded in this study and assumptions had to be made on the participant’s dialect
based on region of origin.
It is also worth noting that if a U.S. speaker has spent an extended period of time in the
U.K., their perceptual boundaries may have shifted towards a more British perception of the
/æ/-/ɛ/ continuum. As will be seen in the “Participants” section, many of the U.S. speakers in
this experiment lived in the U.K. for at least eight months, and thus results may be affected.
Literature supports this proposal: Clarke & Garret (2004) and Nygaard (et al. 2005) found that
listener perception adapts rapidly to foreign-accented speech while Clarke & Luce (2005)
discovered that listeners shift their Voice Onset Time categorization boundary for stop
consonants to match a speaker’s production after less than two minutes with that speaker.
Furthermore, Clopper & Pisoni (2007) and Sumner & Samuel (2009) (as well as Fridland
& Kendall (2012)) have found that speaker perception depends on the production of the
individual and the dialect of the speakers around them.
Based on the findings of previous research, it is hypothesized that in this experiment,
speakers from the U.S. who in most cases display /æ/-tensing before nasal codas will maintain
a longer ‘pan’ perception along the pan-pen continuum than British speakers whose
production of short-a before nasal codas is more open, close to the cardinal vowel /a/.
Additionally, in the case that a U.S. speaker has spent a significant amount of time in the
United Kingdom, it is possible that the speaker’s pan-pen perception will be affected. If the
perception of the pan-pen continuum is proven to be different between British and U.S.
speakers, it would support the claim of Bell-Berti et al. (1979) that a speaker’s production and
perception are correlated.
1.3. The Perception and Categorization of Vowels
Liberman et al. (1957) was the first to introduce the concept that speech sounds, in particular
consonants, are perceived categorically. In his experiment, 14 synthetic stimuli were produced
that varied along a particular acoustic continuum, being the direction and extent of the second
formant (F2) transition. The first formant (F1) was kept consistent with a rising transition, a
marker for voiced stops (Delattre et al. 1955). It was found that in an identification task,
participants displayed clear categorical boundaries between the consonants /b/, /d/ and /g/. The
shift in perception from one consonant to another along the continuum was abrupt, indicating
The Perception of the Vowel Continuum in British and US English Speakers
80
that the phoneme boundaries were stable and sharp. In one-step, two-step and three-step
discrimination tasks, participants found it easier to discriminate stimuli that lay on either side
of a phoneme boundary than stimuli within the same category. Figure 1 shows the results of
Liberman’s identification and discrimination experiments.
Figure 1: The results of Liberman’s experiment. (Liberman et al. 1957)
Fry et al. (1962) proposed that the boundaries between vowel phonemes are less sharply
defined than consonants. In his vowel phoneme identification task, speakers displayed a
gradual shift in the perception of one vowel to another compared to Liberman’s experiment
where shifts were sudden. Moreover, speakers performed consistently above chance in the
one-step, two-step and three-step discrimination tasks, meaning that participants were able to
discriminate vowel stimuli even a step apart, regardless of where they lay in relation to
phoneme boundaries. Figures 2 and 3 show the results of Fry’s identification and
discrimination experiment.
Chad Hall
81
Figure 2: The results of Fry’s identification experimen (Fry et al. 1962)
Figure 3: The result of Fry’s discrimination experiment ( Fry et al.1962)
If previous literature is to be assumed true, participants in this study would be expected to
show a more continuous, rather than categorical, perception of the /æ/-/ɛ/ vowel continuum in
this experiment. However, should speakers collectively show an abrupt shift from perceiving
‘pan’ to ‘pen’, it could be argued that perhaps vowel categorization is sharper than previously
believed. This experiment however cannot prove that vowels are perceived categorically since
The Perception of the Vowel Continuum in British and US English Speakers
82
a discrimination task would have to be run in order to provide any credible arguments on
vowel perception. Due to time constraints, this was not possible.
2. Method
2.1. Participants
Fourteen participants took part in the study. Seven were native speakers of Standard Southern
British English aged 21-35 (mean = 24.7 years), two of the seven British speakers were female
and five were male. They were all raised in the South of England and had lived in Oxford for
at least eight months. The other seven participants were native speakers of United States
English aged 18-21 (mean = 20.4 years). Two of them were male and five were female. They
were all raised in the U.S.; two in Massachusetts, one in Michigan, one in Maryland, one in
Iowa, one who had lived for over three years in Massachusetts and Illinois, and one who
moved frequently between Massachusetts, North Carolina and Pennsylvania. They had all
lived in Oxford for at least eight months, apart from one speaker, participant 1, who had been
living in the country for 4 days on the day of her experiment. Table 1 shows the U.S.
participants, their region of origin, and assumed dialect based on region according to Labov et
al. (2006).
Table 1: Names of the U.S. speakers, their region of origin and assumed dialect
(Labov et al. 2006)
Participant Region of Origin Assumed Dialect
1 Highland and Livonia,
Michigan
Inland Northern American
English
2 Middleborough,
Massachusetts
Northeastern New
England English
3 Worcester, Massachusetts Northeastern New
England English
4 Hagerstown, Maryland Western Pennsylvania
English
5 Des Moines, Iowa Midland American English
6 - Sherborn, Massachusetts - Northeastern New
Chad Hall
83
2.2 Materials
Each participant took part in a word identification task, listening to 100 synthetic stimuli
ranging across the pan-pen continuum (10 unique stimuli multiplied by 10 and played in
random order). The stimuli were created with the IPOX Speech Synthesizer (Dirksen &
Coleman 1995) which is able to generate synthetic words using a phonemic input. Initially
two sound files were created; a “pan” .wav file, which would be stimulus 1, and a “pen” .wav
file, which would be stimulus 10. The files were 16-bit sound files sampled at 11025Hz. The
/æ/ vowel in stimulus 1 had an F1 value of 838 Hz, an F2 value of 1560 Hz, an F3 value of
2430 Hz and an F4 value of 3300 Hz. The /ɛ/ vowel in stimulus 10 had an F1 value of 620 Hz,
an F2 value of 1660 Hz, an F3 value of 2430 Hz and an F4 value of 3300 Hz. Eight more
stimuli were then created that moved in equal steps across the acoustic continuum from
stimulus 1 to 10. This process was carried out by creating a parameter file of stimulus 1 and
10 through IPOX, duplicating the stimulus 1 parameter file eight times and then adjusting the
formant values so that there were 10 parameter files moving in uniform steps from stimulus 1
to 10. These parameter files were then converted back to .wav files, giving 10 sound files
ranging across the pan-pen continuum. These sounds were then embedded in to a word
identification task, run on a computer using a Praat script (Boersma & Weenink 2016). The
script played the ten sound files five times in random order, giving 50 stimuli, asking the
listener to identify whether they heard the word “pan” or “pen”, to which they would respond
by pressing a button on the keyboard with their right hand. The script was then run a second
time with the participant using their left hand, meaning each listener heard 100 stimuli.
- Charlotte, North
Carolina
- Kennett Square,
Pennsylvania
England English
- Southern American
English
- Mid-Atlantic American
English
7 - Martha’s Vineyard,
Massachusetts
- Chicago, Illinois
- Southeastern New
England English
- Inland Northern
American English
The Perception of the Vowel Continuum in British and US English Speakers
84
Participants were required to answer with their right hand in the first block and then their left
hand for the second block to reduce any bias towards a particular answer since there is
evidence that language dominance in the right or left hemisphere of the brain is linked to
handedness (Knecht et al. 2000).
2.3. Procedure
Before the task begun, each participant was required to fill out a questionnaire on his or her
language background in order to confirm that they were native English speakers and to extract
information on where they were raised in the U.K. or U.S. The participant was then sat in a
soundproof room in front of a computer screen and keyboard, and was told that they would
hear a block of fifty words through headphones that would sound like the word “pan” or
“pen”. After hearing each sound, they would have to identify which word they heard. The
screen showed the word “pan” on the left hand side and “pen” on the right as shown in Figure
4, and participants were required to press the letter “Q” on the keyboard with their right index
finger if they heard the word “pan” and the letter “W” with their right middle finger if they
heard the word “pen”.
Figure 4: A screenshot of the first block of the identification task
It was specified to the participant that there was no right or wrong answer to each stimulus and
that they must choose their answer as quickly as possible, in order to force the participant in to
acting on instinct as to which word they heard, reflecting normal communication as
realistically as possible. Once the block was over, participants then re-sat the test, this time by
pressing the letter “Q” with their left middle finger if they heard the word “pen” and the letter
“W” with their left index finger if they heard the word “pan”. Moreover, in the second block,
Chad Hall
85
“pen” was on the left hand side of the screen and “pan” was on the right as shown in Figure 5.
A shorter block of 25 stimuli was run before the main experiment in order for participants to
familiarize themselves with the task and to test sound volume.
Figure 5: A screenshot of the second block of the identification task
2.4. Analysis
The responses of each participant were collected to a table on Praat and saved as a comma-
separated file. Responses that had reaction times above 1.5 seconds were removed due to the
fact that they often yielded unusual responses and were thus deemed to not be instinctive.
Since the majority of responses were below 1.5 seconds, it seemed suitable to remove these
anomalies from the final results. Additionally, responses with reaction times below 0.16
seconds were removed since it is not possible to respond to visual stimuli faster than this
(Kosinski 2008). Answers given below this reaction time indicate that a speaker answered too
soon before cognitively processing what they heard. In total, 62 responses were omitted from
the results out of 1400. The percentage of “pan” and “pen” responses were calculated for each
individual and then for both speaker groups.
3. Results
Figure 6 and Table 2 show the results of the experiment for the British and United States
English speaker groups. In Figure 6, across the x-axis is stimulus 1-10, and across the y-axis is
the overall percentage of “pan” responses to each stimulus. The blue line represents the British
English speaker group and the red line represents the United States English speaker group.
The Perception of the Vowel Continuum in British and US English Speakers
86
Figure 6: The percentage of “pan” responses by the British and U.S. speaker groups (Graph)
Table 2: The percentage of “pan” responses by the British and U.S. speaker groups
Stimulus
1 2 3 4 5 6 7 8 9 10
Speaker
Group
British
English 100% 100% 97.1% 93.1% 54.9% 12.4% 4.9% 0% 1.4% 0%
U.S.
English 100% 97.4% 98.7% 85.5% 42.9% 27.6% 11.8% 13.9% 3.8% 2.7%
The results show that from stimulus 1-4, both the British English and United States English
speaker groups performed above chance, identifying the word “pan” over 85% of the time. At
stimulus 5, there was a steep drop in the perception of “pan” for both groups who performed at
chance: The British speakers identified “pan” 54.9% of the time and the U.S. speakers
identified “pan” 42.9% of the time. By stimulus 6, both speaker groups performed above
chance, with the British speakers identifying “pan” 12.4% of the time (meaning they identified
“pen” 87.6% of the time) and the U.S. speakers identifying “pan” 27.6% of the time (meaning
Chad Hall
87
they identified “pen” 72.4% of the time). At stimulus 6-8, the United States English speaker
group displayed a significantly higher percentage of “pan” responses than the British English
speaker group (15.2% more at stimulus 6, 6.9% more at stimulus 7 and 13.9% more at
stimulus 8) and at stimulus 7-8, the United States English speaker group still responded with
“pan” over 10% of the time (11.8% and 13.9% respectively) while less than 5% of the British
English responses were “pan” (4.9% and 0% respectively). At stimulus 9-10, only 1 out of 139
responses by all British speakers was “pan”, while 5 out of 155 U.S. speaker responses were
“pan”. The main finding from these results is that both the British English and United States
English speaker groups sharply dropped in their perception of “pan” at stimulus 5, but from
stimulus 6 onwards, United States English speakers perceived “pan” significantly more than
the British English speakers.
It is crucial to examine the results of the individual U.S. participants to investigate if
particular speakers influenced the U.S. group results considerably from stimulus 6 onwards.
Additionally, one should analyze the variation within the British English speaker group to
ensure that the overall pattern seen in the British group is consistent amongst the individual
speakers. Table 3 and Figure 7 shows the results of the individual U.S. speakers. There
appears to be one U.S. speaker who consistently identified “pan” well above the group
average across stimulus 6, 7 and 8. Participant 1 identified “pan” 50% of the time at stimulus
6 (22.4% above the average), 44.4% of the time at stimulus 7 (32.6% above the average), and
20% of the time at stimulus 8 (6.2% above the average).
Table 3: The percentage of “pan” responses by the individual U.S. speakers
1 2 3 4 5 6 7 8 9 10
1 100% 100% 100% 100% 42.9% 50% 44.4% 20% 20% 0%
2 100% 90% 100% 70% 11.1% 10% 0% 0% 0% 0%
3 100% 100% 100% 100% 57.1% 44.4% 11.1% 0% 0% 0%
4 100% 90% 100% 90% 50% 30% 0% 30% 0% 0%
5 100% 100% 100% 80% 20% 0% 0% 0% 0% 10%
6 100% 100% 100% 70% 60% 20% 20% 20% 10% 20%
7 100% 100% 90% 87.5% 33.3% 22.2% 10% 20% 0% 0%
The Perception of the Vowel Continuum in British and US English Speakers
88
Figure 7: The percentage of “pan” responses by the individual U.S. speakers (Graph)
However, removing participant 1’s results does not dramatically alter the results for stimulus 6
(24.2%), 7 (7.5%) and 8 (13.0%): The U.S. English speaker group still displays a markedly
higher percentage of “pan” responses than the British speaker group. Each U.S. speaker
identified “pan” above the group average at some point from stimulus 6-10, apart from
participant 2 who always identified “pan” below the average from stimulus 6 onwards. This
indicates that beyond stimulus 5, almost the entire U.S. speaker group perceived “pan” more
than the British English group whose “pan” perception never increased beyond 5%. From
stimulus 7-10, only three of the seven British speakers perceived “pan” at any time with four
“pan” responses in total whereas six of the seven U.S. speakers perceived “pan” with 25 “pan”
responses in total. Thus it is clear that the United States English speaker group perceived
“pan” significantly more than the British English speaker group in the second half of the pan-
pen continuum. Table 4 and Figure 8 show the results of the individual British English
speakers.
Chad Hall
89
Table 4: The percentage of “pan” responses by the individual British speakers
Figure 8: The percentage of “pan” responses by the Individual Speakers (Graph)
1 2 3 4 5 6 7 8 9 10
1 100% 100% 100% 89% 63% 20% 0% 0% 0% 0%
2 100% 100% 90% 90% 30% 10% 0% 0% 0% 0%
3 100% 100% 100% 100% 90% 50% 0% 0% 0% 0%
4 100% 100% 100% 100% 63% 11% 0% 0% 0% 0%
5 100% 100% 100% 90% 44% 0% 10% 0% 10% 0%
6 100% 100% 90% 89% 80% 0% 10% 0% 0% 0%
7 100% 100% 100% 100% 40% 20% 11% 0% 0% 0%
The Perception of the Vowel Continuum in British and US English Speakers
90
The table and graph confirm that there was not much inter-group variation within the British
Group. Apart from speaker 3 whose drop in “pan” perception starts at stimulus 6, every other
speaker’s “pan” perception drops at stimulus 5, indicating that the pattern shown in Figure 6
for the British speakers is uniform across six out of seven British speakers. Furthermore,
beyond stimulus 6, the perception of “pan” across the listeners was minimal, unlike the U.S.
English group.
4. Discussion
The results show that both the British and U.S. speaker group’s drop in “pan” perception
occurred at the same stimulus. This was an unanticipated result since it was expected that the
/æ/-tensing before nasal codas by United States English speakers would cause them to
perceive “pan” for longer across the pan-pen continuum than the British English speakers.
However, it is worth noting that the U.S. participant who had the highest percentage of “pan”
responses at stimulus 6 and 7 (50% and 44.4% respectively) was participant 1 who had only
lived in England for four days at the time of the experiment, compared to the other
participants who had been living in England for at least eight months. It is possible that
extensive exposure to British English speakers may have caused the other six speakers to
adjust their perceptual boundary between /æ/ and /ɛ/, having a more British perception of the
pan-pen continuum than the U.S. speaker who had only lived in England for a limited time. It
is also possible that the U.S. speakers dropped in their “pan” perception earlier than expected
due to an unconscious assumption that they were listening to British speech. Since the
experiment was run by a British phonetician in England, U.S. participants may have assumed
that the synthetic speech they were listening to was British, and adjusted their perceptual
vowel boundaries accordingly. Niedzielski (1999) found that altering the national identity of a
speaker as indicated on an answer sheet changed the perception of vowel categories. Though
the national identity of the synthetic speech was never specified, the environment and location
of the experiment, as well as the presence of open British /æ/ in the stimuli may have led to a
subconscious assumption that the participant was listening to British speech.
Although the U.S. English speakers display a sharp drop in “pan” perception earlier than
expected, the drop was not as great as the British English speakers, and the U.S. speakers
continued to perceive “pan” significantly more than the British speakers to the end of the
continuum. This is likely due to two reasons: The first is that although the U.S. speakers may
Chad Hall
91
have shifted their perceptual boundaries, it is not a firmly established, fully shifted boundary
and at times, the U.S. speakers were still susceptible to hearing “pan” past stimulus 6. The
second reason is that in some U.S. dialects, /æ/ before nasal consonants is often tensed beyond
/ɛ/, towards the higher /e/ and even /ɪ/ (Labov et al. 2006). It is therefore unsurprising to
observe the U.S. speakers continuing to hear “pan” at the end of the /æ/-/ɛ/ continuum.
The individual variation of the U.S. participants can also be explained with relation to the
assumed dialect of each speaker. Speakers of dialects that tense /æ/ in more phonetic
environments displayed “pan” perception the most across the pan-pen continuum: The U.S.
speakers who perceived “pan” consistently at 10% or above from stimulus 1-8 were
participant 1, 6 and 7. According to Labov et al. (2006), participant 1 came from an area
where Inland Northern American English is spoken, a dialect with the raised /æ/ system where
all short-a vowels are tensed. Participant 7 also spent time in an area where Inland Northern
American English is spoken; Chicago, Illinois, and it appears to have affected her perception
of the /æ/-/ɛ/ continuum, causing it to be markedly different to the other speakers who lived in
Massachusetts, perceiving “pan” much later in the continuum than them. Participant 6 spent
four years in Charlotte, North Carolina where the Southern American English dialect is
prominent, and most /æ/ vowels are tensed due to the Southern Breaking system. She also
spent three years in Kennett Square, Pennsylvania, where Mid-Atlantic American English is
spoken. This dialect has a short-a split system where /æ/ is tensed before nasals and all
voiceless fricatives apart from /ʃ/. The other four speakers originated from areas with dialects
that have the nasal system where short-a is only tensed before nasals: Participant 2 and
participant 3 were raised in an area where Northeastern New England English is spoken,
participant 4 originated from a region where Western Pennsylvania English is spoken and
participant 5 came from an area where Midland American English dominates. These three
dialects have the nasal system which has the least occurrences of /æ/-tensing. It is therefore
likely that the more accustomed a listener was to /æ/-tensing across phonetic environments,
the more they perceived “pan” along the continuum.
The final point to discuss is whether any claim can be made that the vowel continuum was
perceived categorically. Compared to Fry’s et al. (1962) vowel identification results, the graph
from this experiment shows a steeper and more sudden shift in the perception of one vowel to
another. In his experiment (see Figure 2) there were 8 synthetic stimuli between the /ɪ/ and /ɛ/
vowel, and the decrease in /ɪ/ perception was far more gradual and continuous, decreasing
from 100% to 95%, then 80% to 70%, 50% to 48%, then 15% to just above 0% by stimulus 8.
The Perception of the Vowel Continuum in British and US English Speakers
92
The results of the current experiment are to the contrary, with a sharp drop in “pan”
perception, especially for the British English speaker group, falling from 93.1% at stimulus 4
to 12.4% at stimulus 6. There are a couple of possible reasons why the results of these two
studies differ: Firstly, in Fry’s experiment, participants were asked to identify phonemes, not
words. Sachs & Klatt (1969) found that the Phoneme Boundary Effect was observed across
the /ɑ/‐/æ/ continuum only in word context by U.S. speakers (“bottle” /bɑdɔl/ vs battle
/bædɔl/), meaning that subjects were able to discriminate stimuli in the middle of the
continuum easier than on the edges. In contrast, isolated vowel stimulus pairs were
differentiated equally well over the entire continuum. Their results indicated that the
perceptual boundaries between vowels are more defined in word context than in isolation.
Secondly, Fry’s results may contradict the findings of this experiment because the quality of
speech synthesis has improved since 1962. The machine that Fry used, “Alexander” was a
formant-type analogue speech synthesizer with four formant generators, of which only two
were used in his experiment (Fry et al. 1962). The IPOX speech synthesizer used in this study
generates six formant frequencies (including nasal zero frequency), five source parameters,
three formant bandwidths and six parallel branch amplitudes (Dirksen & Cole 1995), all of
which assist in making the speech sound more authentic.
Regardless if these elements factor in to the difference between the present study and
Fry’s paper, the results of this experiment suggest that perhaps vowel perception isn’t as
continuous as previously believed, motivated further by the fact that the patterns we see are
consistent across most of the individual speakers, though a discrimination task would have to
be run before any reliable claim can be made.
5. Conclusion
This study has found that the difference in short-a production between British and United
States English speakers, especially before nasals, leads to a contrast in the perception of the
pan-pen continuum between these two speaker groups, but not in the expected manner: The
U.S. speaker group did not collectively drop in their perception of “pan” later than the British
English speaker group, however after the initial fall in “pan” perception, the U.S. speakers still
perceived “pan” beyond the phoneme boundary, significantly more so than the British English
speakers, with speakers exposed to /æ/-tensing most frequently across phonetic environments
more likely to perceive “pan” a larger percentage of the time further along the continuum. The
Chad Hall
93
results confirm the findings of Bell-Berti et al. (1979) that speech production and perception
are closely related. It is also suggested in this study that extensive exposure to the British
English dialect by the majority of the U.S. speakers, and perhaps the subconscious assumption
that the synthetic speech was British, caused the U.S. speakers to shift their perceptual
boundary towards /æ/, though the greater percentage of “pan” perception by the U.S. speakers
at the later stages of the continuum suggest that these phoneme boundaries had not fully
shifted nor were they as sharply defined as the British speakers’ perceptual boundary. It would
thus be interesting to test U.S. speakers who have not been as exposed to British speakers to
determine if lack of exposure to British English does lead to having the expected higher
phoneme boundary, closer to /ɛ/. In turn it would be fascinating to investigate the pan-pen
perception of British English speakers who have been living in the U.S. for a considerable
period of time, especially in areas with the raised /æ/ system such as the Northern cities
(including Rochester New York, Cleveland Ohio, Detroit Michigan, Chicago Illinois and
Madison Wisconsin), to see if extensive exposure to the United States English dialect would
induce a shift in their phoneme boundary between /æ/ and /ɛ/ to a higher position, offering
more of an insight in to the effect of dialect exposure to speech perception. In addition, with
less strict of a time constraint, it would have been useful to record the speakers’ speech instead
of assuming the speaker’s dialect based on region of origin. By recording the speakers,
considerations could have been made on how the dialect of their region as well as their own
speech contributed to their perception of the /æ/-/ɛ/ continuum.
A necessary addition to add to this study is a discrimination task to explore the possibility
that vowels are perceived categorically, contrary to popular opinion (Fry et al. 1962). If it is
found that speakers do find it easier to discriminate the vowels towards the middle of the
continuum than the periphery in word context, the next step would be to test the
discrimination of these synthetic vowels outside of word context. Though previous literature
suggests that isolated vowels would be discriminated equally across the continuum and not
categorically (Fry et al. 1962; Sachs & Klatt 1969), an experiment run with better quality
speech synthesis could yield contrasting results, and the argument that vowels are perceived
continuously may yet be proved wrong.
The Perception of the Vowel Continuum in British and US English Speakers
94
References
Bell-Berti, F., Raphael, L.J., Pisoni, D.B. & Sawusch, J.R. (1979). Some relationships between speech
production and perception. Phonetica 36, 373-383.
Boersma, P & Weenink, D. (2016). Praat: doing phonetics by computer [Computer program]. Version
6.0.14, retrieved 27 February 2016 from http://www.praat.org/
Clarke, C.M. & Garrett, M.F. (2004). Rapid adaptation to foreign-accented English. Journal of the
Acoustical Society of America 116, 3647-3658.
Clarke, C.M. & Luce, P.A. (2005). Perceptual adaptation to speaker characteristics: VOT boundaries in
stop voicing categorization in Proceedings of the ISCA Workshop on Plasticity in Speech
Perception. London, U.K.
Clopper, C. & Pisoni, D. (2007). Free classification of regional dialects of American English. Journal
of Phonetics 35, 421-438.
Delattre, P.C., Liberman, A.M., & Cooper, F.S. (1955). Acoustic loci and transitional cues for
consonants. Journal of the Acoustical Society of America 27, 769-773.
Dirksen, A. (1995). IPOX Lecture Notes. [Online]. Available at: www.phon.ox.ac.uk/ipox_notes/
[Accessed: 28th May 2016].
Dirksen, A. & Coleman, J. (1995). IPOX Speech Synthesizer [Computer program]. Retrieved 10th April
2016 from www.phon.ox.ac.uk/ipox/
Feagin, C. (1986). More evidence for vowel change in the South in Sankoff D. (eds.), Diversity and
Diachrony (pp. 83-95). Amsterdam and Philadelphia : Benjamins Publishing Company.
Fridland, V. & Kendall T. (2012). Exploring the relationship between production and perception in the
mid front vowels of U.S. English. Lingua 122, 779-793.
Fry, D.B., Abramson, A.S., Eimas, P.D., Liberman, A.M. (1962). The identification and discrimination
of synthetic vowels. Lang Speech 5, 171-189.
Knecht, S., Dräger, B., Deppe, M., Bobe, L., Lohman, H., Flöel, A., Ringelstein, E.-B. & Henningsen,
H. (2000). Handedness and hemispheric language dominance in healthy humans. Brain 123, 2512-
2518.
Kosinski, R.J. (2008) A Literature Review on Reaction Time. Clemson University.
Labov, W., Ash, S. & Boberg, C. (2006). The short-a and short-o configurations. In Labov, W., Ash, S.
& Boberg (Eds.), The Atlas of North American English (pp. 169-181). Berlin: Mouton de Gruyter.
Lass, R. (2000). Phonology and Morphology. In Lass, R (Ed.), The Cambridge History of the English
Language, Volume III: 1476-1776 (pp. 56-186). Cambridge: Cambridge University Press.
Liberman, A.M., Harris, K.S., Hoffman, H.S. & Griffith, B.C. (1957). The discrimination of speech
sounds within and across phoneme boundaries. Journal of Experimental Psychology 54(5), 358-
368.
Chad Hall
95
Niedzielski, N. (1999). The effect of social information on the perception of sociolinguistic variables.
Journal of Language and Social Psychology 18, 62-85.
Nygaard, L., Sidaras, S. & Duke, J. (2005). Perceptual learning of accented speech in Proceedings of
the ISCA Workshop on Plasticity in Speech. London, U.K.
Sachs, R.M. & Klatt, D.H. (1969). Vowel identification in isolation and in word context. Journal of the
Acoustical Society of America 46, 115.
Sumner, M. & Samuel, A. (2009). The effect of experience on the perception and representation of
dialect variants. Journal of Memory and Language 60, 487-501.
Thomas, E. (2001). An acoustic analysis of vowel variation in New World English. Publication of the
American Dialect Society 85. Duke University, Durham, NC.
Wells, J.C. (1997). Whatever happened to Received Pronunciation? In Medina & Soto (eds.), II
Jornadas de Estudios Ingleses (pp. 19-28). Universidad de Jaén, Spain.