errors in english spoken word recognition: effects of word

12
Errors in English Spoken Word Recognition: Effects of Word Frequency, Familiarity, and Phoneme Structure Yuka YAMAUCHI Graduate School of Education, Hiroshima University Kazuhito YAMATO Kobe University Shusaku KIDA Hiroshima University Abstract The present study investigated the types of errors in English spoken word recognition by replicating an experiment reported by Field (2004), in which the top-down and bottom-up processes of second/foreign language (L2) listeners were investigated. In Field’s study, 47 L2 learners listened to a series of English words and were asked to write down the final words. The contexts for the target words were manipulated in a way designed to induce top-down listening, using a contextual word or words (e.g., wet, cloudy, dry, cold, got [hot]). The results of Field’s study showed that listeners were not affected by the top-down process, even when listening to semantically associated words. These results were dubious, however, because some limitations of the study may have affected the results. The present study therefore examined the effect of target word frequency, familiarity, and phoneme structure on Japanese-speaking English learners’ bottom-up and top-down listening processes by replicating Field’s study. The results showed that approximately 20%50% of the listeners employed top-down processing when phonemes in the material were strictly controlled and the frequency and familiarity of substituted words were higher than those of the target words. These results indicated that Field’s material should be further modified and widely replicated. 1. Introduction 1.1 Processing Oral Input: Top-Down Versus Bottom-Up When listening to a language, regardless of whether it is a first language (L1) or a second/foreign language (L2), listeners actively process oral input toward two directions: top-down and bottom-up processes. A top-down process is a “meaning level of processing that originates in the listener’s memory,” whereas a bottom-up process is “linguistic level [processing], which originates in the speech signal” (Rost, 2011, p. 75). It is rare that listeners understand L2 125

Upload: others

Post on 11-Jan-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Errors in English Spoken Word Recognition: Effects of Word

Long, M. H. (1996). The role of the linguistic environment in second language acquisition. In W. C. Ritchie, & T. K. Bahtia (Eds.), Handbook of second language acquisition (pp. 413–68). New York: Academic Press.

Macaro, E. (2001). Analysing student teachers’ codeswitching in foreign language classrooms: Theories and decision making. The Modern Language Journal, 85(4), 531–548.

McCarthy, M. (1999). What constitutes a basic vocabulary for spoken communication? Retrieved from http://www.cambridge.org.br/authors-articles/articles?id=7022

Ministry of Education, Culture, Sports, Science & Technology in Japan. (1989). Heisie Gannenndo Kaitei Kyuu Gakushuu Shidou Youryou, Gaikokugo. [The Course of Study for Foreign Languages, Revised in 1989]. Retrieved from http://www.mext.go.jp/a_menu/shotou /old-cs/1322544.htm

Ministry of Education, Culture, Sports, Science & Technology in Japan. (1999). Kyuu Gakushuu Shidou Youryou Koutougakkou Gaikokugo. [The Old Course of Study for High School Foreign Languages, Revised in 1999]. Retrieved from http://www.mext.go.jp/a_menu/shotou /cs/1320179.htm

Ministry of Education, Culture, Sports, Science & Technology in Japan. (2013a). The Course of Study for High School. Retrieved from http://www.mext.go.jp/a_menu/shotou/new-cs/ youryou/eiyaku/__icsFiles/afieldfile/2011/04/11/1298353_9.pdf

Ministry of Education, Culture, Sports, Science & Technology in Japan. (2013b). English Educational Reform Plan corresponding to Globalization. Retrieved from http:/ /www.mext.go.jp/english/topics/__icsFiles/afieldfile/2014/01/23/1343591_1.pdf

Nakatsukasa, K. & Loewen, S. (2014). A teacher’s first language use in form-focused episodes in Spanish as a foreign language classroom. Language Teaching Research. doi: 10.1177/1362168814541737

Sakui, K. (2004). Wearing two pairs of shoes: language teaching in Japan. ELT Journal, (58)2, 155–163.

Sato, K. (2002). Practical Understandings of Communicative Language Teaching and Teacher Development. In S. J. Savignon (Ed.), Interpreting Communicative Language Teaching (pp. 41–81). New Haven, London: Yale University Press.

Spada, N., & Fröhlich, M. (1995). COLT Communicative Orientation of Language Teaching Observation Scheme, Coding Conventions and Applications. Sydney: Macquarie University.

Stern, H. (1983). Functional Concepts of Language Teaching. Oxford University Press. Suzuki, H. & Roger, P. (2014). Foreign Language Anxiety in Teachers. JALT Journal, 36(2),

175–200. Swain, M. (1985). Communicative competence: Some roles of comprehensible input and

comprehensible output in its development. In S. Gass & C. Madden (Eds.), Input in second language acquisition (pp. 235–253). Rowley, MA: Newbury House.

Errors in English Spoken Word Recognition:

Effects of Word Frequency, Familiarity, and Phoneme Structure

Yuka YAMAUCHIGraduate School of Education, Hiroshima University

Kazuhito YAMATOKobe University

Shusaku KIDAHiroshima University

Abstract

The present study investigated the types of errors in English spoken word recognition by replicating an experiment reported by Field (2004), in which the top-down and bottom-upprocesses of second/foreign language (L2) listeners were investigated. In Field’s study, 47 L2 learners listened to a series of English words and were asked to write down the final words. The contexts for the target words were manipulated in a way designed to induce top-down listening, using a contextual word or words (e.g., wet, cloudy, dry, cold, got [hot]). The results of Field’s study showed that listeners were not affected by the top-down process, even when listening to semantically associated words. These results were dubious, however, because some limitations of the study may have affected the results. The present study therefore examined the effect of target word frequency, familiarity, and phoneme structure on Japanese-speaking English learners’ bottom-up and top-down listening processes by replicating Field’s study. The results showed that approximately 20%–50% of the listeners employed top-down processing when phonemes in the material were strictly controlled and the frequency and familiarity of substituted words were higher than those of the target words. These results indicated that Field’s material should be further modified and widely replicated.

1. Introduction

1.1 Processing Oral Input: Top-Down Versus Bottom-Up When listening to a language, regardless of whether it is a first language (L1) or a second/foreign language (L2), listeners actively process oral input toward two directions: top-down and bottom-up processes. A top-down process is a “meaning level of processing that originates in the listener’s memory,” whereas a bottom-up process is “linguistic level [processing], which originates in the speech signal” (Rost, 2011, p. 75). It is rare that listeners understand L2

125

Page 2: Errors in English Spoken Word Recognition: Effects of Word

input employing only a single direction of processing because incoming linguistic and/or nonlinguistic information will unconsciously activate information in listeners’ memories. In this respect, top-down processing is a compensation for listening comprehension and must be supported by information gained from bottom-up processing (Bonk, 2000; Field, 2003; Graham, Santos, & Vanderplank, 2010; Rost, 2011; Tsui & Fullilove, 1998; Yi’an, 1998). Good L2 listeners flexibly and effectively use two-direction processing (O’Malley & Chamot, 1990). Poor listeners, however, sometimes heavily rely on incoming input at a lower processing level (Graham et al., 2010) because of their limited cognitive resources. On the other hand, L2 learners are not always confident in bottom-up processing because of their limited linguistic resources and skills and tend to use to top-down processing more often (Chrabaszcz & Gor, 2014; Tsui & Fullilove, 1998). Field’s (2004) study suggested an interactive–compensatory mechanism between top-down and bottom-up processing of L2 learners.

1.2 Field’s Study of Top-Down and Bottom-Up Processing Field (2004) conducted three experiments to reveal how listeners of English as a foreign language (EFL) process words, from the viewpoint of top-down and bottom-up processing, particularly in situations where top-down information conflicts contextual oral input. The participants were 48 students from an EFL school in the UK (31 lower intermediate level and 17 high elementary level) who had never previously visited an English-speaking country. They spoke 11 different L1s. Three were Japanese speakers. In the first experiment, the participants were asked to write down the last word (i.e., the target word) of a series of orally presented English words (see the Appendix for complete materials). It was predicted that if L2 learners were using top-down processing, word clusters preceding a final target word (e.g., June, March, summer) should lead to a word recognition error, inducing a semantically related word (i.e., where string, in the previous sequence, was noted as spring). The results showed that this type of word recognition error was found only in one series, namely June, March, summer, string, which triggered spring in place of string, and this was only at a low rate (17%). Field stated that “[n]o evidence was obtained of subjects reinterpreting what they had heard in order to fit it to the lexical set” (p. 371). In Experiment 2, participants listened to orally presented sentences and wrote down a target word in a sentence. This target word was ahigh-frequency word. Although the results of Experiment 2 showed more evidence of top-down processing, it occurred in an extremely constrained context (e.g., The people at the party were Germans, Italians, Spanish, and some friends, to which 42% responded French/France). Experiment 3 replicated Experiment 2; however, in this experiment, the target word was a low-frequency word, resulting in 30% of respondents giving a substituted word.

Field (2004) concluded that “when a [target] word is unfamiliar, learners ... frequently choose to match what they hear with a known word, which is approximately similar” (p. 374). The results may also have indicated that L2 listeners paid more attention to word onset in order to

126

Page 3: Errors in English Spoken Word Recognition: Effects of Word

input employing only a single direction of processing because incoming linguistic and/or nonlinguistic information will unconsciously activate information in listeners’ memories. In this respect, top-down processing is a compensation for listening comprehension and must be supported by information gained from bottom-up processing (Bonk, 2000; Field, 2003; Graham, Santos, & Vanderplank, 2010; Rost, 2011; Tsui & Fullilove, 1998; Yi’an, 1998). Good L2 listeners flexibly and effectively use two-direction processing (O’Malley & Chamot, 1990). Poor listeners, however, sometimes heavily rely on incoming input at a lower processing level (Graham et al., 2010) because of their limited cognitive resources. On the other hand, L2 learners are not always confident in bottom-up processing because of their limited linguistic resources and skills and tend to use to top-down processing more often (Chrabaszcz & Gor, 2014; Tsui & Fullilove, 1998). Field’s (2004) study suggested an interactive–compensatory mechanism between top-down and bottom-up processing of L2 learners.

1.2 Field’s Study of Top-Down and Bottom-Up Processing Field (2004) conducted three experiments to reveal how listeners of English as a foreign language (EFL) process words, from the viewpoint of top-down and bottom-up processing, particularly in situations where top-down information conflicts contextual oral input. The participants were 48 students from an EFL school in the UK (31 lower intermediate level and 17 high elementary level) who had never previously visited an English-speaking country. They spoke 11 different L1s. Three were Japanese speakers. In the first experiment, the participants were asked to write down the last word (i.e., the target word) of a series of orally presented English words (see the Appendix for complete materials). It was predicted that if L2 learners were using top-down processing, word clusters preceding a final target word (e.g., June, March, summer) should lead to a word recognition error, inducing a semantically related word (i.e., where string, in the previous sequence, was noted as spring). The results showed that this type of word recognition error was found only in one series, namely June, March, summer, string, which triggered spring in place of string, and this was only at a low rate (17%). Field stated that “[n]o evidence was obtained of subjects reinterpreting what they had heard in order to fit it to the lexical set” (p. 371). In Experiment 2, participants listened to orally presented sentences and wrote down a target word in a sentence. This target word was ahigh-frequency word. Although the results of Experiment 2 showed more evidence of top-down processing, it occurred in an extremely constrained context (e.g., The people at the party were Germans, Italians, Spanish, and some friends, to which 42% responded French/France). Experiment 3 replicated Experiment 2; however, in this experiment, the target word was a low-frequency word, resulting in 30% of respondents giving a substituted word.

Field (2004) concluded that “when a [target] word is unfamiliar, learners ... frequently choose to match what they hear with a known word, which is approximately similar” (p. 374). The results may also have indicated that L2 listeners paid more attention to word onset in order to

identify a word. These results, however, could initiate discussion on some limitations of the experiment material, particularly in Experiment 1.

1.3 Limitations of Field’s (2004) Experiment 1 Although Field’s (2004) study may be interesting from both a research and a pedagogical

perspective, at least three limitations exist, in the form of the following factors: (a) frequency of target and substituted words, (b) phoneme similarities of the words, and (c) participants. First, the term top-down processing, triggered by contextual words in Field’s study, is a limited concept. Field acknowledged two interpretations of context: one is a semantic association between the words occurring with and shortly before a target and the other is various knowledge not actually mentioned in the content, such as world knowledge, discourse knowledge, and lexical knowledge. When hearing sounds as brief as a phoneme, listeners immediately and unconsciously search for a word in their memory. Some factors influence oral word recognition, such as frequency and familiarity with a word in listener memory (Connie, Mullennix, Shernoff, & Yelen, 1990; Takashima, 2009). Field explained that all the words listed in Experiment 1 were within about the 1,000-word level. However, reviewed in the British National Corpus (BNC) of spoken language, some of the words were above the 1,000-word level, although they were within that level in the written language corpus. In Experiment 3, which targeted low-frequency words, the frequency orders of some target words were quite close to those of the words in Experiment 1 (e.g., views within the 2,000-word level and source within the 3,000-word level). Thus, we should consider some target and substituted words in Experiment 1 as low-frequency words. Such relatively low-frequency words may have made word recognition difficult, which led to a “no evidence” conclusion (Field, 2004) for top-down processing. Second, the target and the substituted words may not be actually similar each other. Most English words have strong syllables at onset (Cutler & Carter, 1987), and native speakers of English tend to segment a word at the onset of strong syllable (Cutler & Norris, 1988). While Field’s (2004) results also indicated that word onset was a key for word recognition, most target words did not share same or even similar onset to substituted words in his experiment. Therefore, phoneme structures of target words, substituted words, and participant responses should have been fully investigated to determine whether word choice is appropriate enough to activate top-down processing.

Finally, study participants included a wide range of nationalities/L1s in a small group size (N = 48). This is partially related to the second point; we should expect an effect of L1 transfer in the perception of orally presented English words (Weber & Cutler, 2004). For example, Japanese-speaking learners of English would have difficulty in distinguishing /ʃ/, /s/, and /θ/, while Arabic-speaking learners would have difficulty in distinguishing vowels. Field’s study, however, contained only three Japanese-speaking learners out of 47 participants, and the results described in his study were not shown according to participant L1s. Thus, the sample size is not large enough

127

Page 4: Errors in English Spoken Word Recognition: Effects of Word

to generalize the result to or show an L1 effect for Japanese learners of English. As the first step to overcome these limitations, we will rerun the experiment.

1.4 Research Questions As shown in the literature review above, Field’s (2004) Experiment 1 revealed that the sequence of oral word input could not be shown to have induced top-down processing. However, there still remain some limitations, such as the frequency, familiarity, and phonemic structure of the words as well as the small number of participants with various L1s, all of which possibly affected the results of the study. The present study replicated his experiment with a large group size of participants who shared only one L1 (i.e., intermediate Japanese EFL learners), focusing on different perspectives of the top-down process (i.e., the frequency and familiarity of words stored in learners’ long-term memories). The following three research questions were addressed:

1) Do Japanese intermediate EFL leaners respond in the same way as Field’s participants? 2) Do word frequency and familiarity have an effect on spoken word recognition?3) Does phonetic similarity have an effect on the responses by Japanese EFL learners?

To respond to the first research question, we focused on participant responses to Q16, the sequence June, March, summer, string, comparing our results with those of Field (2004), as this is the only point discussed in Field’s Experiment 1. Following this, we further investigated error patterns of Japanese intermediate ELF learners, looking into features of the target words, substituted words, and other responses from the viewpoint of word frequency and familiarity (top-down) and phonetic similarities (bottom-up).

2. Method

2.1 Participants The participants in the present study were 287 intermediate-level Japanese EFL learners. Of these, 39 did not know at least one target word and were therefore excluded from analysis. Thus, the number of participants in the analysis was 248. Most of them were in the first year of university. They had majors in science, math, engineering, medicine, and education; no participant had a major in English or any other foreign language. The main focus in the English classes they were taking was listening skills; thus, they had exposure to some listening techniques and had practiced this skill, mainly through dictation activities.

2.2 Material The material of this study was the same vocabulary set as Field’s (2004) Experiment 1 (see Appendix). There were two sets of material, including nine questions each. Each question

128

Page 5: Errors in English Spoken Word Recognition: Effects of Word

to generalize the result to or show an L1 effect for Japanese learners of English. As the first step to overcome these limitations, we will rerun the experiment.

1.4 Research Questions As shown in the literature review above, Field’s (2004) Experiment 1 revealed that the sequence of oral word input could not be shown to have induced top-down processing. However, there still remain some limitations, such as the frequency, familiarity, and phonemic structure of the words as well as the small number of participants with various L1s, all of which possibly affected the results of the study. The present study replicated his experiment with a large group size of participants who shared only one L1 (i.e., intermediate Japanese EFL learners), focusing on different perspectives of the top-down process (i.e., the frequency and familiarity of words stored in learners’ long-term memories). The following three research questions were addressed:

1) Do Japanese intermediate EFL leaners respond in the same way as Field’s participants? 2) Do word frequency and familiarity have an effect on spoken word recognition?3) Does phonetic similarity have an effect on the responses by Japanese EFL learners?

To respond to the first research question, we focused on participant responses to Q16, the sequence June, March, summer, string, comparing our results with those of Field (2004), as this is the only point discussed in Field’s Experiment 1. Following this, we further investigated error patterns of Japanese intermediate ELF learners, looking into features of the target words, substituted words, and other responses from the viewpoint of word frequency and familiarity (top-down) and phonetic similarities (bottom-up).

2. Method

2.1 Participants The participants in the present study were 287 intermediate-level Japanese EFL learners. Of these, 39 did not know at least one target word and were therefore excluded from analysis. Thus, the number of participants in the analysis was 248. Most of them were in the first year of university. They had majors in science, math, engineering, medicine, and education; no participant had a major in English or any other foreign language. The main focus in the English classes they were taking was listening skills; thus, they had exposure to some listening techniques and had practiced this skill, mainly through dictation activities.

2.2 Material The material of this study was the same vocabulary set as Field’s (2004) Experiment 1 (see Appendix). There were two sets of material, including nine questions each. Each question

consisted of four to six words, which prevents the participants from listening only the last (target) word. Set 1 consisted of three questions with one semantically associated (contextual) word (Q4, Q5, and Q9), two with four contextual words (Q2 and Q8), and four fillers (Q1, Q3, Q6, and Q7).Set 2 consisted of four questions with one contextual word (Q11, Q14, Q15, and Q18), two with three contextual words (Q13 and Q16), and three fillers (Q10, Q12, and Q17). Two-second pauses were embedded between words, and there was a pause of seven and a half seconds after each target word. The audio of vocabulary sets was downloaded from online OneLook Dictionary Search, which provides American and British pronunciation mp3 files based on MacMillan Dictionary. In the material, only American pronunciation was included but the speakers varied. The clarity of the stimulus words was verified by a native speaker of English before the experiment. Five out of the 89 words were noted out as unclear or very unclear. These were altered using recordings from Google, which were evaluated as clearer than the former versions. The frequency of the words was ranked by BNC frequency order of spoken English (not lemmatized) (Leech, Rayson, & Wilson, n.d.). From the BNC lists, individual letters were omitted. Among the 18 target words and 11 substituted words (seven fillers did not have substituted words), 17 were ranked within the top 1,000 words, the remaining six were within 2,000 words, five were not within 2,000, and one word was not available from the database list. Familiarity was based on Yokokawa (2009), which used a seven-point Likert scale ranging from 1 (I never see/hear it) to 7 (I often see/hear it very much). All target and the substituted words were ranked over 4.00; however, eight were not available from the database.

2.3 Procedure The experiment was conducted in October and November 2014. Participants were in groups of around 40, and each took one of the material sets (122 received Set 1, while the remaining 126 received Set 2). The material was presented to the whole group of participants from a speaker located at the front of the classroom. Thus, this was not an individual experiment. The participants were instructed that (a) they would be hearing nine four-to-six-word sets with a pause of seven and a half seconds after the last word, (b) they were expected to write down the last word during the pause, and (c) word classes varied. Following this, a practice session was carried out using the oral input below:

Practice No. 1, apple (two seconds), top (two seconds), girl (two seconds), box (seven and a half seconds). No. 2, dark (two seconds), see (two seconds), keep (two seconds), park (two seconds), enjoy. (Target words are underlined.)

After responding to the main trial session and checking the answers, the participants were asked to indicate unknown words. When scoring, we marked homophones as correct responses

129

Page 6: Errors in English Spoken Word Recognition: Effects of Word

(e.g., meet for meat, right for write) but did not allow any spelling mistakes. In addition, as noted in section 2.1, answer sheets that included a word the participant did not know were excluded from analysis.

3. Results

3.1 Accuracy Rates The accuracy rate of all questions (k = 18) was 68% and that of targeted questions (k = 11)

was 62%; however, the accuracy rate of each item varied, ranging from 17% to 98% (Table 1). The highest accuracy rate was the response to Q4 night/knight/nite (94%, 2%, and 2%, respectively), followed by the response to Q5 meat/meet (30% and 61%, respectively) and Q11tell (90%). On the other hand, the lowest accuracy rate was the response to Q16 string (17%), followed by the response to Q8 clean (45%) and Q15 think (50%). With regard to the substituted words, no participants answered substituted words for seven of the questions, as shown in Table 1. Few did for Q9 and Q18 (4% and 9%, respectively). For the two lowest-accuracy questions, Q16 and Q8, however, more substituted words were responded [more than half of the responses (53%)to string were spring and 20% to clean were green]. The results of other in Table 1 indicate any responses other than the target words, the substituted words, and blank; thus, many nonwords were present in the response rates.

Table 1 The Response Rates for Target Words, Substituted Words, and Other Words

Question Response rate Frequency order FamiliarityNo. Target/Sub Target Sub Other Blank Target Sub Target Sub

Q2 got/hot 50% 0% 45% 5% 35 818 n.a. 5.60Q4 night/write 98% 0% 1% 1% 239 437 6.65 (5.78)Q5 meat/feet 91% 0% 8% 1% 1,570 904 5.79 (4.34)Q8 clean/green 45% 20% 34% 1% 1,447 873 5.18 5.95Q9 hat/cat 74% 4% 20% 2% 1,957 1,245 4.68 5.00Q11 tell/sell 90% 0% 9% 1% 178 685 4.83 n.a.Q13 talk/fork 88% 0% 11% 1% 347 n.a. 5.56 n.a.Q14 quite/right 63% 0% 36% 2% 141 79 5.74 5.78Q15 think/drink 50% 0% 49% 1% 46 1,034 5.35 5.04Q16 string/spring 17% 53% 27% 3% 3,774 2,591 5.51 6.45Q18 wait/late 71% 9% 21% 0% 511 785 6.21 6.58

Note. Sub = Substituted words; n.a. = not available from the data base. Response rates are sums of homophones (e.g., meat = meat + meet). Figures in parentheses refer data from homophones (e.g.,rite for write, cell for sell).

130

Page 7: Errors in English Spoken Word Recognition: Effects of Word

(e.g., meet for meat, right for write) but did not allow any spelling mistakes. In addition, as noted in section 2.1, answer sheets that included a word the participant did not know were excluded from analysis.

3. Results

3.1 Accuracy Rates The accuracy rate of all questions (k = 18) was 68% and that of targeted questions (k = 11)

was 62%; however, the accuracy rate of each item varied, ranging from 17% to 98% (Table 1). The highest accuracy rate was the response to Q4 night/knight/nite (94%, 2%, and 2%, respectively), followed by the response to Q5 meat/meet (30% and 61%, respectively) and Q11tell (90%). On the other hand, the lowest accuracy rate was the response to Q16 string (17%), followed by the response to Q8 clean (45%) and Q15 think (50%). With regard to the substituted words, no participants answered substituted words for seven of the questions, as shown in Table 1. Few did for Q9 and Q18 (4% and 9%, respectively). For the two lowest-accuracy questions, Q16 and Q8, however, more substituted words were responded [more than half of the responses (53%)to string were spring and 20% to clean were green]. The results of other in Table 1 indicate any responses other than the target words, the substituted words, and blank; thus, many nonwords were present in the response rates.

Table 1 The Response Rates for Target Words, Substituted Words, and Other Words

Question Response rate Frequency order FamiliarityNo. Target/Sub Target Sub Other Blank Target Sub Target Sub

Q2 got/hot 50% 0% 45% 5% 35 818 n.a. 5.60Q4 night/write 98% 0% 1% 1% 239 437 6.65 (5.78)Q5 meat/feet 91% 0% 8% 1% 1,570 904 5.79 (4.34)Q8 clean/green 45% 20% 34% 1% 1,447 873 5.18 5.95Q9 hat/cat 74% 4% 20% 2% 1,957 1,245 4.68 5.00Q11 tell/sell 90% 0% 9% 1% 178 685 4.83 n.a.Q13 talk/fork 88% 0% 11% 1% 347 n.a. 5.56 n.a.Q14 quite/right 63% 0% 36% 2% 141 79 5.74 5.78Q15 think/drink 50% 0% 49% 1% 46 1,034 5.35 5.04Q16 string/spring 17% 53% 27% 3% 3,774 2,591 5.51 6.45Q18 wait/late 71% 9% 21% 0% 511 785 6.21 6.58

Note. Sub = Substituted words; n.a. = not available from the data base. Response rates are sums of homophones (e.g., meat = meat + meet). Figures in parentheses refer data from homophones (e.g.,rite for write, cell for sell).

Table 1 also shows that when participants responded with substituted words (i.e., spring and green), target words (i.e., string and clean) were relatively low-frequency words with lower familiarity than the substituted words. The two highest-frequency words got and think, however, had low accuracy rates. In addition, meat and hat were relatively low-frequency words; however, they had high accuracy rates. In the next section, those words will be further discussed from the viewpoint of frequency order, familiarity, and phoneme similarities with incorrect responses.

3.2 Frequency Order and Familiarity of the Responded Words Table 2 shows target words, substituted words, and other typical responses. As shown in the table, two substituted words spring and green, which were often altered to the target words stringand clean, were higher in frequency order and familiarity than the target words. Nonetheless, the extremely high-frequency words got and think also remained at a lower level of accuracy. Other responses and the expected response to got were relatively high-frequency words within the 1,000-word level, such as god and hot; the responses to think, however, even included words from over the 2,000-word level (e.g., thin). Meanwhile, some relatively low-frequency words were given as well, such as meat and hat. In this respect, it may be misleading to point out that meat is alow-frequency word because two-thirds of correct answers are accounted for by its homophone meet, which is within the 1,000-word level. Finally, although the frequency order and familiarity of hat are not quite as high, Table 2 shows that its alternatives are also low, or even lower in frequency. Despite the results of frequency order and familiarity of the two cases string/spring and clean/green, we must admit that the frequency and familiarity of alternatives were not always greater than those of the target words. For instance, the second-frequency alternatives for cleanand think were cream and sing, whose frequency order and familiarity were lower than those of the target and substituted words. From the typical responses shown in Table 2, those errors seem to have some influence from phonetic similarities with the target words. Here we note the need review phonetic similarities and/or differences among the target words and perception errors including substituted words.

3.3 Phonetic Features Among the Target Words and the Errors The material presented consisted of one-syllable words; thus, the similarities among target words and mistranscribed words (errors) can be looked at from the standpoint of onset, vowel, and offset. Among typical mistakes with regard to the onsets of the target words, the most robust one is consonant clusters /str/ in string and /kl/ in clean: 61% of responses were mistranscribed using /spr/, /spl/, or /skr/ for /str/, and 49% of the responses started with /kr/ or /ɡr/ for /kl/. The second most difficult onset was /θ/ in think; mistranscribed onsets included /s/ (22%) as in sing, sin, sink.

131

Page 8: Errors in English Spoken Word Recognition: Effects of Word

Table 2 Typical Responses and Their Response Rates, Frequency Order, and Familiarity

Response Rate Frequency FamiliarityLow-accuracy, low frequency

(Substituted) spring /sprɪŋ/ 53% 2,591 6.45(Target) string /strɪŋ/ 17% 3,774 5.51

stream /strim/ 9% n.a. 5.50

(Target) clean /klin/ 45% 1,447 5.18cream /krim/ 25% 1,570 4.80

(Substituted) green /ɡrin/ 20% 873 5.95Low-accuracy, high frequency

(Target) got /ɡɒt/ 50% 35 n.a.god /ɡɒd/ 21% 316 5.59gat /ɡæt/ 7% n.a. n.a.

(Substituted) hot /hɒt/ 0% 818 5.60

(Target) think /θɪŋk/ 50% 46 5.35sing /sɪŋ/ 17% 1,367 3.90thing /θɪŋ/ 15% 131 3.14thin /θɪn/ 10% 2,691 3.70

(Substituted) drink /drɪŋk/ 0% 1,034 5.04High-accuracy, low frequency

(Homophone of target) meet /mit/ 61% 798 n.a.(Target) meat /mit/ 30% 1,570 5.79

neat /nit/ 2% n.a. n.a.mint /mɪnt/ 2% n.a. n.a.mit /mɪt/ 2% n.a. n.a.

(Substituted) feet /fit/ 0% 904 n.a.

(Target) hat /hæt/ 77% 1,981 4.68(Substituted) cat /kæt/ 4% 1,267 6.58

pad /pæd/ 4% 4,268 n.a.pat /pæt/ 4% 2,424 n.a.

With regard to perception errors in vowels, participants did not properly distinguish /e/, /æ/, /ɒ/, and /ɔ/ as in got, perceiving them interchangeably. They also tended to mishear and mistranscribe /ɪ/ for /i/ in meat/meet.

132

Page 9: Errors in English Spoken Word Recognition: Effects of Word

In their perception of word offsets, participants made typical mistakes such as /m/ for /ŋ/ and /n/ as in string and clean, respectively, /ŋ/ for /ŋk/ as in think, and /d/ for /t/ as in hat. As Field noted, both native and nonnative speakers tend to trust the onset of words rather than that of the coda. In the same vain, Japanese EFL learners in this study tended to pick up the onset of words and proceeded to perceive, or in some cases, infer vowels and word offsets.

4. Discussion

The present study addressed three research questions and investigated EFL listeners’ responses to orally presented English words. In this section, we will compare our results to Field’s (2004) and discuss types of errors from the perspectives of top-down and bottom-up processing. First, we would like to focus on the results of Q16 (June, March, summer, string) because this is the only item reported in Field’s (2004) article. Compared with his results, a much larger percentage of participants gave this substitution in our study (53% vs. Field’s 17%). One possible causal factor for this difference is English proficiency of the participants: we had a monolingual group, intermediate-level learners, opposed to the high-elementary and lower-intermediate learners in the multilingual group of Field’s study. Thus, this result may provide evidence that poor/beginner listeners process L2 input at a very low level (Graham et al., 2010), being poor at connecting immediate input with preceding input. However, the two participant groups cannot be exactly compared unless both groups took a common standardized test; therefore, future research is expected to compare different proficiency groups. Second, the types of errors showed that when the frequency order and familiarity of the substituted words are higher than those of the target words, participants are more likely to respond with substituted words. Simply, the response rate of meat and meet is evidence that a high-frequency word occurs ahead of a low frequency word. This study further revealed that higher-frequency words were able to be activated ahead of a played lower frequency target word. The result supported previous findings showing that word frequency is a factor affecting oral word recognition (Connie et al., 1990; Takashima, 2009) and implied that the material should be more precisely arranged, taking this factor into account. Finally, phonetic features also had an effect on word recognition. In this study, various patterns of responses other than the target and the substituted words occurred, including nonwords. Such responses indicate that participants wrote something using sound information as a clue, even if they could not understand the word. The target and substituted words, as Field (2004) noted, were all minimal pairs with the same vowel and offset. Thus, the features of word onsets varied. In some combinations, for instance, both words are voiceless but do not share the same manner and place of articulation (e.g., talk/fork and hat/cat), whereas some share no feature between the pairs (e.g., think/drink and meat/feet). The pair that shares most phoneme similarities is string/spring, only differing in a place of articulation of /t/ and /p/ (i.e., alveolar and bilabial). The

133

Page 10: Errors in English Spoken Word Recognition: Effects of Word

second most mistranscribed substituted word, green, also shares some features with the target word clean: velar stop preceding alveolar sound. Not only the substituted words but also many responses represent features that Japanese EFL learners tend to mishear, such as /l/ and /r/ and various vowels. In addition, more careful consideration should be given to verify the effect of consonant clusters such as in /spr/ and /str/, and in this case, knowledge of phonotactics may have affected their responses. We can infer from these results of frequency, familiarity, and phoneme effects that L2 listeners are more likely to respond the substituted words (a) when phoneme structure, particularly word onset, are similar to the target words and (b) when the frequency or familiarity is high. Although there is a top-down effect, it seems that phonetic similarity is anecessary requirement.

As discussed above, remain several factors should be considered for Field’s (2004) material development; moreover, Fields’ result, where there is “no evidence” of the participants’ top-down reinterpretation of a heard word in a meaningful set of words, was not truly evaluated but was a consequence of wrongly selected material. In other words, although Field’s Experiments 2 and 3 showed that listeners reinterpreted unfamiliar input as phonetically similar known words, this process could have been proved even in Experiment 1, only with a word set, if the material had been appropriately controlled. For the development of better material, we must consider effects both from top-down and bottom-up processing. For example, we can compare the occurrence probabilities of the substituted word such as green, on two conditions: one includes preceding contextual words (e.g., orange, black, red, blue, clean) and the other is without contextual words (e.g., thin, name, catch, cup, easy, clean). The recognition of a known word is a difficulty for L2 listening comprehension (Goh, 2000; Graham, 2006). Although skill in spoken word recognition is closely related to bottom-upphoneme perception skill, language teachers and learners should remember that difficulties also result from top-down prediction, even in a word set level. Prediction as well as all other types of top-down processing may facilitate and debilitate the comprehension of following input. L2 listeners, however, are often unable to modify previous understandings when a discourse is broken down (Field, 2008); thus, misunderstandings from preceding input may be a crucial difficulty for L2 listening comprehension. Although we have noted some factors that may have affected the results of Field’s (2004) study and suggested appropriate modification of the material, two limitations should be further investigated after modifying the material set. First, we were not able to compare the proficiency level of participants with a standardized test. Future research with wider ranges of levels of proficiency among leaners will reveal whether higher-proficiency learners really process input at a higher semantic level or whether skilled perception overtakes top-down processing. Second, wewere not able to judge whether participants have realized the meaningful connections between the substituted words and contextual words. This could be another point for the material development but surely needs to be included in future studies.

134

Page 11: Errors in English Spoken Word Recognition: Effects of Word

second most mistranscribed substituted word, green, also shares some features with the target word clean: velar stop preceding alveolar sound. Not only the substituted words but also many responses represent features that Japanese EFL learners tend to mishear, such as /l/ and /r/ and various vowels. In addition, more careful consideration should be given to verify the effect of consonant clusters such as in /spr/ and /str/, and in this case, knowledge of phonotactics may have affected their responses. We can infer from these results of frequency, familiarity, and phoneme effects that L2 listeners are more likely to respond the substituted words (a) when phoneme structure, particularly word onset, are similar to the target words and (b) when the frequency or familiarity is high. Although there is a top-down effect, it seems that phonetic similarity is anecessary requirement.

As discussed above, remain several factors should be considered for Field’s (2004) material development; moreover, Fields’ result, where there is “no evidence” of the participants’ top-down reinterpretation of a heard word in a meaningful set of words, was not truly evaluated but was a consequence of wrongly selected material. In other words, although Field’s Experiments 2 and 3 showed that listeners reinterpreted unfamiliar input as phonetically similar known words, this process could have been proved even in Experiment 1, only with a word set, if the material had been appropriately controlled. For the development of better material, we must consider effects both from top-down and bottom-up processing. For example, we can compare the occurrence probabilities of the substituted word such as green, on two conditions: one includes preceding contextual words (e.g., orange, black, red, blue, clean) and the other is without contextual words (e.g., thin, name, catch, cup, easy, clean). The recognition of a known word is a difficulty for L2 listening comprehension (Goh, 2000; Graham, 2006). Although skill in spoken word recognition is closely related to bottom-upphoneme perception skill, language teachers and learners should remember that difficulties also result from top-down prediction, even in a word set level. Prediction as well as all other types of top-down processing may facilitate and debilitate the comprehension of following input. L2 listeners, however, are often unable to modify previous understandings when a discourse is broken down (Field, 2008); thus, misunderstandings from preceding input may be a crucial difficulty for L2 listening comprehension. Although we have noted some factors that may have affected the results of Field’s (2004) study and suggested appropriate modification of the material, two limitations should be further investigated after modifying the material set. First, we were not able to compare the proficiency level of participants with a standardized test. Future research with wider ranges of levels of proficiency among leaners will reveal whether higher-proficiency learners really process input at a higher semantic level or whether skilled perception overtakes top-down processing. Second, wewere not able to judge whether participants have realized the meaningful connections between the substituted words and contextual words. This could be another point for the material development but surely needs to be included in future studies.

5. Conclusion

The present study replicated an experiment of Field’s (2004) study. The results were slightly different from Field’s results: Japanese EFL learner responses had a greater influence of top-down processing. In addition, we discussed this topic from the viewpoints of frequency and familiarity (top-down) and phoneme structure (bottom-up). The results revealed that both top-down and bottom-up processing had effects on the spoken word recognition of Japanese EFL listeners even with a contextual word set. In other words, listeners reinterpret inputs on condition that the target and substituted words are phonetically similar for Japanese learners and the frequency or familiarity of the substituted words is high. The material should be carefully reconsidered, and more replications are expected to reveal L2 listeners’ word recognition processes.

References

Bonk, W. J. (2000). Second language lexical knowledge and listening comprehension. International Journal of Listening, 14, 14–31. doi:10.1080/10904018.2000.10499033

Chrabaszcz, A., & Gor, K. (2014). Context effects in the processing of phonolexical ambiguity in L2. Language Learning, 64, 415–455. doi:10.1111/lang.12063

Connie, C. M., Mullennix, J., Shernoff, E., & Yelen, J. (1990). Word familiarity and frequency in visual and auditory word recognition. Journal of Experimental Psychology: Learning, memory, and Cognition, 16, 1084–1096. doi:10.1037//0278-7393.16.6.1084

Cutler, A., & Carter, D. M. (1987). The predominance of strong initial syllables in the English vocabulary. Computer Speech and Language, 2, 133–142. doi:10.1016/0885-2308(87)90004- 0

Cutler, A., & Norris, D. G. (1988). The role of strong syllables in segmentation for lexical access. Journal of Experimental Psychology: Human Perception and Performance, 14, 113–121.doi:10.1037/0096-1523.14.1.113.

Field, J. (2003). Promoting perception: Lexical segmentation in L2 listening. ELT Journal, 57,325–334. doi:10.1093/elt/57.4.325

Field, J. (2004). An insight into listeners’ problems: Too much bottom-up or too much top-down? System, 32, 363–377. doi:10.1016/j.system.2004.05.002

Field, J. (2008). Revising segmentation hypotheses in first and second language listening. System,36, 35–51. doi:10.1016/j.system.2007.10.003

Leech, G., Rayson, P., & Wilson, A. (n.d.). The frequency lists. In G. Leech, P. Rayson, & A. Wilson (2001). Word frequencies in written and spoken English: Based on the British National Corpus. London, UK: Longman. Retrieved from http://ucrel.lancs.ac.uk/bncfreq/

Goh, C. (2000). A cognitive perspective on language learners’ listening comprehension problems. System, 28, 55–75. doi:10.1016/S0346-251X(99)00060-3

135

Page 12: Errors in English Spoken Word Recognition: Effects of Word

Graham, S. (2006). Listening comprehension: The learners’ perspective. System, 34, 165–182.doi:10.1016/j.system.2005.11.001

Graham, S., Santos, D., & Vanderplank, R. (2010). Strategy clusters and sources of knowledge in French L2 listening comprehension. Innovation in Language Learning and Teaching, 4, 1–20.doi:10.1080/17501220802385866

O’Malley, J. M., & Chamot, A. U. (1990). Learning strategies in second language acquisition.New York, NY: Cambridge University Press.

OneLook Dictionary Search. (n.d.). Retrieved from http://onelook.com/ Rost, M. (2011). Teaching and researching listening (2nd ed.). Edinburgh, UK: Person Education. Takashima, H. (2009). Comparing ease-of processing values of the same set of words for native

English speakers and Japanese learners of English. Journal of Psycholinguistic Research, 38,549–572. doi:10.1007/s10936-009-9118-2

Tsui, A. B. M., & Fullilove, J. (1998). Bottom-up or top-down processing as a discriminator of L2 listening performance. Applied Linguistics, 19, 432–451. doi:10.1093/applin/19.4.432

Weber, A., & Cutler, A. (2004). Lexical competition in non-native spoken-word recognition. Journal of Memory and Language, 50, 1–25. doi:10.1016/S0749-596X(03)00105-0

Yi’an, W. (1998). What do tests of listening comprehension test?: A retrospection study of EFL test-takers performing a multiple-choice task. Language Testing, 15, 21–44. doi:10.1177/026 553229801500102

Yokokawa, H. (2009). Nihonjin Eigo Gakushusha no Eitango shinmitsudo onsei hen: Kyoiku/Kenkyu no tameno dainigengo deta besu [English word familiarity of Japanese English learners oral edition: Second language data base for education and research.] Tokyo: Kuroshio Shuppan.

Appendix

Set 1 Set 2Q1Q2Q3Q4Q5Q6Q7Q8Q9

jump big pen give takewet cloudy dry cold got (hot)tired end live cheap cupboard shelfwalk earn read night (write)look shirt heavy hands meat (feet)quiet bag push phone short tallshoe broke angry car trainorange black red blue clean (green)knife earth child dog hat (cat)

Q10Q11Q12Q13Q14Q15Q16Q17Q18

thin name catch cup easy hardfriend ill lake buy tell (sell)aunt man same hole drive flyplate cup knife talk (fork)high sorry small near wrong quite (right)light time new key eat think (drink)June March summer string (spring)ten hurry sharp bag caseold young early wait (late)

Note. Italicized words construct meaningful connections. Parentheses show substituted words. Question numbers of Set 2 were presented as Q1-9 to the participants.

136