computer assisted vocabulary learning: design and...

31
Computer Assisted Vocabulary Learning: Design and evaluation Qing Ma a * and Peter Kelly b a University of Louvain, Belgium; b Three Gorges University, China This paper focuses on the design and evaluation of the computer-assisted vocabulary learning (CAVL) software WUFUN. It draws on the current research findings of vocabulary acquisition and CALL, aiming to help Chinese university students to improve their learning of English vocabulary, particularly that with which they experience most difficulty. It is argued that vocabulary should be learned explicitly as well as implicitly; learners need to be trained to become good learners, e.g., by being instructed in useful learning strategies, to enable them to learn vocabulary more efficiently and effectively. A design model of CALL efficacy is constructed to ensure the quality of vocabulary learning in CALL programs; it is employed in the design of the software WUFUN. Finally, the preliminary results of the software evaluation are reported and discussed. Introduction Vocabulary learning has always been a popular subject in CALL programs, especially in the early stages of CALL (1980s) when technology was relatively simple and it was thought that vocabulary learning could be easily integrated into CALL programs. The earlier programs typically included a single type of language learning activity, such as text reconstruction, gap-filling, speed-reading, simulation, and vocabulary games (Levy, 1997). The range was narrow, probably because previously computers were less powerful and language teachers did not have sufficient knowledge of programming (Goodfellow, 1995). It might also have been due to the limited number of vocabulary learning theories at a time when vocabulary learning was just starting to attract people’s attention. *Corresponding author. Rue Grafe ´ 4, App. 206, Namur, 5000, Belgium. Email: [email protected] Computer Assisted Language Learning Vol. 19, No. 1, February 2006, pp. 15 – 45 ISSN 0958-8221 (print)/ISSN 1744-3210 (online)/06/010015–31 Ó 2006 Taylor & Francis DOI: 10.1080/09588220600803998

Upload: ngohuong

Post on 11-May-2018

217 views

Category:

Documents


1 download

TRANSCRIPT

Computer Assisted Vocabulary

Learning: Design and evaluation

Qing Maa* and Peter Kellyb

aUniversity of Louvain, Belgium; bThree Gorges University, China

This paper focuses on the design and evaluation of the computer-assisted vocabulary learning

(CAVL) software WUFUN. It draws on the current research findings of vocabulary acquisition and

CALL, aiming to help Chinese university students to improve their learning of English vocabulary,

particularly that with which they experience most difficulty. It is argued that vocabulary should be

learned explicitly as well as implicitly; learners need to be trained to become good learners, e.g., by

being instructed in useful learning strategies, to enable them to learn vocabulary more efficiently

and effectively. A design model of CALL efficacy is constructed to ensure the quality of vocabulary

learning in CALL programs; it is employed in the design of the software WUFUN. Finally, the

preliminary results of the software evaluation are reported and discussed.

Introduction

Vocabulary learning has always been a popular subject in CALL programs, especially

in the early stages of CALL (1980s) when technology was relatively simple and it was

thought that vocabulary learning could be easily integrated into CALL programs. The

earlier programs typically included a single type of language learning activity, such as

text reconstruction, gap-filling, speed-reading, simulation, and vocabulary games

(Levy, 1997). The range was narrow, probably because previously computers were

less powerful and language teachers did not have sufficient knowledge of

programming (Goodfellow, 1995). It might also have been due to the limited

number of vocabulary learning theories at a time when vocabulary learning was just

starting to attract people’s attention.

*Corresponding author. Rue Grafe 4, App. 206, Namur, 5000, Belgium.

Email: [email protected]

Computer Assisted Language LearningVol. 19, No. 1, February 2006, pp. 15 – 45

ISSN 0958-8221 (print)/ISSN 1744-3210 (online)/06/010015–31

� 2006 Taylor & Francis

DOI: 10.1080/09588220600803998

Nowadays, vocabulary learning is often viewed as a sub-component of a

multimedia package or a CALL program, particularly in commercialised materials.

Some researchers have tried to create CALL programs devoted to vocabulary learning

(Goodfellow, 1994; Groot, 2000; Boers, Eyckmans, & Stengers, 2004). One common

feature is situating vocabulary learning in context instead of treating it as an isolated

activity, as was the case before. Another important trend is for learners to be given as

much freedom as possible to choose what to learn and how to learn. However, this

could be problematic if learners do not know how to deal with the learning tasks and

use the software effectively. Too much freedom will sometimes adversely affect the

learning result. A way forward is for learners to be given some help to become ‘good

learners’—that is, to acquire sufficient knowledge about language learning and have

the ability to take charge of their own learning effectively and efficiently.1 They can

thus benefit maximally from the freedom of learning.

In this article, we first review the literature on those approaches to vocabulary

learning and CALL programs that take into account vocabulary learning. This is

followed by an introduction to a CALL efficacy model, which aims to help and guide

the learner to the completion of learning tasks as a way to ensure the quality of a

CALL program. This CALL efficacy model is used to design the software WUFUN,

developed for Chinese university students to help them to learn vocabulary perceived

as difficult. A pilot study was carried out to evaluate a prototypical unit of the software

as well as to validate the CALL efficacy model empirically in two settings: individual

use and classroom use. Results are reported and discussed. Finally, possible

improvements regarding both the software design and future research are outlined.

Current Approaches to Vocabulary Learning

Approaches to vocabulary learning can be generally categorized under two broad

paradigms: the implicit and the explicit learning paradigm. In this article, the

meaning of ‘implicit’ and ‘explicit’ is not restricted to what they mean in ‘implicit

learning’ and ‘explicit learning’2 in cognitive psychology; rather, the literal meanings

of the two words are used to refer to the main features associated with the two

paradigms. Implicit learning is associated with natural, effortless and meaning

focused learning; explicit learning implies that learning requires more deliberate

mental effort than simply engaging in meaning focused activities and that a link has to

be established between meaning and form by various means.

The Implicit Learning Paradigm

The basic assumption of the implicit learning paradigm is that words can be acquired

naturally through repeated exposure in various language contexts with reading as the

major source of input, a notion that is strongly supported by findings in respect of L1

vocabulary acquisition. Incidental learning is perhaps the most important feature of

this learning paradigm. It can be defined as the process of acquiring vocabulary and

grammar through meaning focused communicative activities such as reading and

16 Q. Ma and P. Kelly

listening (Hulstijn, 2003, p. 349). Several studies support the implicit learning

paradigm. Krashen’s input hypothesis (1989, 1993) postulates that vocabulary can be

acquired by reading as long as the input is comprehensible to the learner. Nagy,

Herman, and Anderson (1985) hold the view that children acquire most L1 words

through reading and that they do so incidentally. In the same vein, Sternberg (1987,

p. 89), relying on studies in L1 acquisition, claims that ‘‘most vocabulary is learned

from context’’ by contextual guessing, although whether this process can take place

successfully or not depends on several ‘‘moderating variables’’ (pp. 92 – 94), such as

the density of unknown words; the learner may be overwhelmed by a large number of

unknown words with the result that no learning takes place.

The main problem with regard to acquiring vocabulary incidentally in L2

acquisition seems to be attributable to three sources. First, incidental learning

inevitably involves a great deal of contextual guessing of the unknown words. Context

alone does not always facilitate meaning transfer; in some cases even educated adults

cannot infer the meaning of L1 words in context (Ames, 1966; Beck, McKeown, &

McCaslin, 1983, cited in Duquette, Renee, & Laurier, 1998). Second, as a

consequence, the learning rate is very low (see Hulstijn, 1992). According to Nation

(1990), 5 – 16 exposures are needed to fully acquire a word. This is implicitly

supported by Nagy et al. (1985) who reported a 5% – 15% probability of a word being

learned at first exposure; similarly, Knight (1994) demonstrated a learning rate of

5% – 21% from her studies, also for one exposure. Third, the vocabulary acquired

through incidental learning is mainly for recognition and hardly at all for production

(see Paribakht & Wesche, 1997; Wesche & Paribakht, 2000). This is due to the nature

of incidental learning: the main language activity is reading where the focus is on

meaning and content and only limited attention is paid to the lexical and syntactic

features of the new words. The quality and quantity of lexical processing in incidental

learning is simply insufficient to enable the learner to grasp the precise meanings and

correct usage of words that will lead to correct production.

The Explicit Learning Paradigm

Authors who adhere to this paradigm argue that vocabulary and vocabulary learning

strategies should be learned or taught explicitly so that learning can be more efficient.

They agree with upholders of incidental learning that context is the main source for

acquiring vocabulary, but they claim that learners need some extra help to build up an

adequate vocabulary and to acquire the strategies necessary to cope with the vast

reading context (see Coady, 1997). There are two main approaches in respect of the

explicit learning paradigm: explicit instruction and strategy instruction.

Authors who favour explicit instruction argue that learners should be taught

vocabulary explicitly by using various means including direct memorization

techniques (Coady, 1993; Nation, 1990, 2001). Here the concern is mainly with

low level learners who do not have enough vocabulary to read extensively. Nation

(2001) suggests that high frequency (2,000 word level) and low frequency vocabulary

should be treated differently. High frequency words have a high coverage (80%) of

Computer Assisted Vocabulary Learning 17

text (p. 11) and should be mastered as soon as possible; this can be achieved by direct

teaching (teacher explanation, peer teaching), direct learning (using word cards,

consulting dictionaries), incidental learning (contextual guessing, communicative

activities) and planned encounters with the words (graded reading, vocabulary

exercises) (p. 16). As for the low frequency words, teachers should train learners to

use strategies such as contextual guessing, dictionary use, memory techniques and

vocabulary cards to cope with these words and to enlarge their vocabulary (p. 20).

According to Laufer (1997, p. 23), learners should master a basic vocabulary of 3,000

word families to be able to use the ‘‘high level processing strategies’’ needed to

comprehend a general text. The empirical studies of Paribakht and Wesche (1997)

and Wesche and Paribakht (2000) show that reading plus explicit vocabulary training

enables learners to learn vocabulary both quantitatively and qualitatively better than

by simply relying on context alone. Laufer (2001) demonstrated a superior lexical

gain when decontextualised word-focused activities were used than when learners

were simply engaged in reading comprehension.

The second approach, strategy instruction, emphasizes teaching the learners

specific learning strategies to make learning more efficient (Cohen, 1998; Cohen,

Weaver, & Li, 1996; O’Malley et al., 1985; Oxford & Scarcella, 1994). Researchers of

strategy instruction often hold the view that context can provide the essential means

for learning vocabulary but additional support, such as explicit instruction, is also

needed (Oxford & Scarcella, 1994). The typical strategies recommended that learners

be instructed in are word grouping, word association, imagery, mnemonics (for

example, keyword method, hookword method), and semantic mapping, etc.

Traditionally, strategy instruction seems to concern advanced learners rather than

low level learners (Coady, 1997). However, strategy instruction to low level learners

can be very useful. For example, strategies such as imagery and mnemonics will be

very helpful since the greatest difficulty in acquiring a word in the initial stages is to

link the form and the meaning in memory (Kelly, 1986; Laufer, Eldder, Hill, &

Congdon, 2004). This is particularly true in respect of an unrelated language and was

the initial driving force behind the keyword method (Atkinson & Raugh, 1975).

It would seem that the explicit learning paradigm is best summarized as a ‘‘mixed

approach’’, to use Coady’s words (1993, p. 17). Supporters of this paradigm combine

a whole variety of activities, including explicit vocabulary instruction, vocabulary

exercises, vocabulary learning strategies, and extensive reading. The strength of the

explicit learning paradigm is that implicit learning is not excluded but rather is seen as

one of the two complementary learning approaches that are necessary to vocabulary

acquisition. The two would work best in combination with each other.

Review of Call Programs for Vocabulary Learning

Multimedia Packages with Vocabulary Learning Activities

This is perhaps the most popular type in terms of the number of products that have

been sold and their wide use in educational settings. Most are commercialised

18 Q. Ma and P. Kelly

programs. The criticism is often made that these programs lack a pedagogical basis.

The investment in such projects is usually considerable but it does not necessarily

mean that solid research has preceded them. They are particularly vulnerable when it

comes to the issue of users’ needs being addressed. Commercialised programs are

often remote from the users; background information, such as the age, sex, cultural

background, other foreign language knowledge, computer knowledge and so on, of

those users for whom the programs are intended is not specified and can only be

guessed (Levy, 1999, 2002). Given their general lack of research basis as well as the

comparatively small amount of time and space devoted to vocabulary learning, the

quality of the vocabulary learning resulting from the utilization of these programs is

often disappointing.

Programs Made up of Written Texts with Electronic Glosses

This is probably the most popular type in research-based programs, and is a

reflection of the prevailing interest in incidental learning. These programs are

written texts with hyperlinks and equipped with an electronic dictionary or glossary.

The main emphasis is on reading comprehension and the acquisition of some new

lexical items is a by-product of the reading process. The advantage of providing

electronic glosses is that the lexical information can be accessed easily simply by a

click (or by typing the word) with little interruption of the reading process.

Moreover, glosses are made much more informative and attractive than traditional

lexical entries by utilizing multimedia effects. Chun and Plass (1996a), Laufer and

Hill (2000), and De Ridder (2002), have carried out studies that demonstrated how

vocabulary can be learned in such a setting, though each of these studies focuses on

a different aspect. The main concern of this type of program is with the information

that should be included about a word and with the way the information should be

presented. The learning rates are reportedly higher in the computer-mediated

situations than in paper materials for incidental learning. Chun and Plass (1996a)

reported a learning rate of 24.1% – 26.6%; Hill and Laufer (2000) reported a

learning rate of 33.3% – 62%. However, the learning rate in each of the studies is,

strictly speaking, only tested at recognition level. It is reasonable to anticipate a

lower learning rate at production level due to the nature of the learning task in this

type of program. It is productive vocabulary learning that this type of program

cannot address adequately.

Programs Dedicated to Vocabulary Learning

Another type of CALL program, which is often based on research, usually takes a

different approach. The CALL authors choose a particular theory of language

learning and implement it via computer technology. A good example is provided by

Groot (2000, p. 64), where the three stages of acquiring a new word in the mental

lexicon: ‘‘noticing’’, ‘‘storage’’ and ‘‘consolidation’’, are simulated by the CALL

program ‘‘CAVACO’’. The learning process is composed of four stages in sequential

Computer Assisted Vocabulary Learning 19

order: ‘‘deduction’’, ‘‘usage’’, ‘‘examples’’ and ‘‘retrieval’’. A careful look at the

program reveals that learners were encouraged to deduce word meanings and word

usage. However, instead of leading to deeper processing, this may risk inducing mere

guessing since learners are prone to take short-cuts and perform activities requiring

less mental effort. This may explain why the learning result of the experimental

groups was not much higher than that of the control bilingual list groups in Groot’s

investigation. Goodfellow’s ‘Lexica’ (1994) is based on Kukusla-Hulme’s model of

‘‘Journey of a vocabulary item’’ (1988, p. 164) in which a vocabulary item to be

learned goes through the following procedure (Figure 1).

‘Lexica’ adopted this model and elaborated the ‘written record’; the user of Lexica

is asked to group words according to ‘form’, ‘meaning’, and ‘context’ and then find

the meanings and usage of the words with the help of lexical tools (for example,

dictionary, concordancer). The weakness of such a design, according to Goodfellow

(1995, p. 220), is the lack of explicit instruction how each task should be carried out.

Consequently, word grouping was found to be difficult for some learners and on the

whole they tended to adopt a superficial learning approach, such as using L1

translations. The expected learning rate of eight words per hour was achieved by very

few subjects.

A Design Model for CALL Efficacy

The model that we suggest is an attempt to provide an alternative way of addressing

CALL design, bearing in mind that this is only a starting point and that there still

remains plenty of scope for the better integration of computer technology into the

design of CALL programs. What is provided here is a simple preliminary model;

the major concern is with identifying the most important parameters that determine

the efficacy or the quality of a CALL program (see Figure 2).

CALL Efficacy

CALL efficacy can be interpreted as the quality of the CALL program—that is,

how effective and helpful it is when used by the learner. It can be assessed both

quantitatively and qualitatively. Quantitative data include the performance of the

user on the program’s tasks, which can be revealed by the scoring system of the

program; they also include the progress (or the regression) that takes place

through using the program, which can be assessed by pre-tests and post-tests in

an experimental setting. Qualitative data include the recording of the user

Figure 1. Journey of a vocabulary item (adapted from Kukulska-Hulme, 1988)

20 Q. Ma and P. Kelly

interaction with the program, which could be provided by a profile recording

system built into the program. The user’s own evaluation of the program is

another important source of qualitative data regarding the efficacy of the program.

This can be obtained by a questionnaire and/or an interview on the completion of

the tasks.

Theory

First, it is commonly agreed that a sound theoretical underpinning is vital to ensure

the quality of a CALL program. It has been demonstrated that the quality of a CALL

program is determined by the methodology behind it rather than the computer

technology itself. Methodology refers to the overall approach to the design of the

program; the underlying theoretical principles constitute a very important component

of the methodology. Here theory mainly means language learning theory, which is used

as a general term to refer to the program designer’s assumptions about the nature of

language, language learning and the process of learning. What specific language

learning theory to choose depends on what language knowledge aspects or skills the

CALL program would like to focus on. In CALL programs for vocabulary learning,

learning theories or research findings specific to vocabulary learning should be

considered first. On the other hand, language is best learned as a whole rather than in

separate components. There are thus some general learning aspects shared by CALL

programs though they have different focuses. The selection of a specific or general

language learning theory will serve as a guide in the selection of the technologies to

be used.

Figure 2. The CALL efficacy model

Computer Assisted Vocabulary Learning 21

Computer Technology

Traditionally, computer technology is referred to as the means or the medium used to

deliver learning materials to learners. Clark (1994) distinguished between ‘methods’

and ‘media’. Media are the means of delivering the methods which consist in

‘‘a number of possible representations of a cognitive process or strategy that is

necessary for learning’’ (p. 26). He claimed that a method can be implemented by

many means other than computer technology; thus media (or the computer

technology) might ‘‘influence the cost or speed efficiency of learning but methods

are causal in learning’’ (p. 26). There seems to be a tradition of dividing CALL into

two broad categories: technology-driven and pedagogy-driven based projects (Colpaert,

2003; Levy, 1997). Developers in this category are often accused of producing CALL

materials based on their intuition instead of on research in language learning.

There is a dividing line in conceptualising CALL design: there are those who do so

according to technologies and those who do so according to methodology: each side

focuses on its own aspect and plays down the other. There is therefore on both sides

an inclination to view method (methodology) and media (computer technology) as

two separate components. Technology alone cannot determine the design, but should

it be viewed solely as a means of implementing the materials? A crucial question

arises: Is there a merging point of technology and pedagogical knowledge in

conceptualising CALL design? If so, where is it? We argue that computer technology

could be thoroughly integrated into the design and become an inseparable part of the

methodology; technology can be used to monitor and control user actions so that

users can be guided in performing language learning activities and achieve high

learning potential.

User Actions

Learner performance, or user actions, is an important source of data for the

evaluation of CALL programs. Chapelle (2001) puts learner performance at the third

level of evaluation after that of the CALL program itself and the teacher’s planned

activity. What the learner has actually done and how s/he interacts with the program is

a good indicator of the learning outcome. In line with the current emphasis on

‘learner autonomy’ and ‘learner focus’, the trend is for the user to be given as much

freedom as possible in the use of the program. Closely associated with these concepts

is ‘learner development’, discussed in detail by Wenden (2002, p. 32) who defines it

as ‘‘a learner-centred innovation in FL/SL instruction that responds to the learner by

aiming to improve the language learner’s ability to learn a language’’. It can be said

that learner development is the process and improved ability in language learning is

the objective. This entails the premise that learners initially do not necessarily possess

good learning ability and efficient learning strategies for language learning; they need

to learn how to become good learners.

Obviously, something has to be done to facilitate learner development and it is very

unlikely that learners could simply learn to become good learners by themselves

22 Q. Ma and P. Kelly

without any help. They have to be equipped with metacognitive knowledge, learning

strategies, and skills for self-direction to be able to become good learners. ‘Strategies-

based instruction’, reported by Cohen et al. (1996), has two major components:

explicit strategy instruction and strategy instruction integration. In the first case, students

are explicitly taught how, when, and why strategies can be used to facilitate language

learning and language use tasks. In the second case, strategies are integrated into

everyday class materials and may be explicitly or implicitly embedded in the language

tasks. If we are going to train learners to master good learning strategies in language

activities, we have to draw up rules to constrain what they do instead of giving them

complete freedom, which will go against the learning goals of the program. We

therefore propose that user actions be controlled to some degree and that this be done

by integrating computer technology into the overall design. The user is directed to the

completion of the learning tasks as well as the embedded learning strategies

instruction. S/he is guided and not left to wander at will through the program.

Learner Information

Language learning is also a very idiosyncratic process that is subject to a series of

learner characteristics, such as mother tongue, knowledge of other foreign languages,

level of proficiency in the target language, learning difficulty, learning style, learning

strategy, motivation, age, sex, etc. Obviously, different types of learners have different

needs, and these should be taken into account when designing the CALL program. In

the same way, it is suggested that a CALL program should be targeted to a particular

group of learners who have in common a series of characteristics. As much

information as possible should be obtained about the learner before the software is

designed; this information constitutes an important part of ‘analysis’ in Colpaert’s

RBRO design model (2004, p. 135).

From the CALL efficacy model, it can be seen that learner information influences

the other three components by providing background information to inform the

choices made in respect of theory, technology, and user actions. When choosing

language learning theories for the program, we should ask ourselves a series of

questions in respect of the learner. For example:

1. Will the theory chosen help our learners to acquire language knowledge or skills?

And if so, how?

2. Will the theory chosen have the potential to improve our learners’ ability to learn?

3. How can we apply the theory to the learning activities so that learners will enjoy

them?

Other questions can be asked, depending on the specific context of learner

information. As for computer technology, learner information can tell us what type

of specific technology is favoured or rejected by our users. For example, if users

are used to having the right mouse click function to display information for a word,

we need to consider developing this type of technology in the software. If they are

Computer Assisted Vocabulary Learning 23

interested in sound or visual effects, we should develop more audio, video, speech, or

animation technologies so that learners’ different sensory learning styles can be

accommodated. Learner information can also provide information on how user

actions should be controlled so that learners can be guided in their language learning.

It should also be borne in mind that, if complete freedom will harm the learning

results, so doubtless will no freedom at all. The degree to which learners’ freedom or

control over the program should be restricted largely depends on the learner

characteristics: learning style, learning strategy, knowledge about language learning,

and perceived useful and harmful effects of the guided instruction, etc. All these have

to be taken into account in deciding what should be allowed and what should be

restricted regarding learner freedom.

Design of WUFUN

Some General Learner Information

The main learning difficulties for vocabulary include fixing the new vocabulary in

memory, mastering the meaning(s) of new items, using vocabulary items correctly,

and incorporating idiomatic expressions into one’s vocabulary. These are the

problems faced by every language learner. Chinese learners of English face other

specific problems in acquiring vocabulary due to the huge linguistic distance between

English and Chinese and the considerable cultural gap. In particular, it is observed

that:

1. The practice of mechanical memorization (rote), which is deep-rooted in

Chinese culture, characterizes their learning.

2. Lack of direct contact with western culture makes it extremely difficult to bridge

the cultural gap and to use language appropriately.

3. The exam orientation of all language teaching and learning, which has for so

long encouraged rote learning and discouraged a communicative learning

approach.

A CAVL program, named WUFUN, is being developed to help Chinese university

learners of English to overcome the learning difficulty they face by incorporating

learning activities that specifically address their needs. The design of the software is

based on the CALL efficacy model presented earlier.

Integrating the Theory into the Design

Vocabulary learning in WUFUN is addressed in a holistic way; learning is situated in

context with particular attention being paid to the items. Following a series of

systematic studies on Chinese learners (Kelly, Li, Vanparys, & Zimmer, 1996;

Vanparys, Zimmer, Li, & Kelly, 1997; Li et al., 1999), it was decided that the specific

elaboration strategies for item learning that will be potentially useful to Chinese

24 Q. Ma and P. Kelly

learners are imagery, verbal association, programmed rehearsal and oral input. The

originality of the approach consists in the integration of a listening approach and

memorization techniques and in sensitising the learner to cultural differences.

Mnemonic techniques have been documented since an interest was first shown in

language strategies (see Atkinson & Raugh, 1975; Cohen, 1987; Pavio &

Desrochers, 1979; Pressley & Levin, 1981) and they have been generally proved

to be much more effective than rote. The effectiveness of most mnemonics consists

in the interplay between images and verbal representations, formulated by Pavio

and Desrochers as the ‘dual-coding theory’ (1980). However, these memory

strategies are investigated largely in language laboratory settings (O’Malley &

Chamot, 1990), and their potential is rarely exploited in classroom teaching/

learning with the result that they remain largely unknown to most learners. Not

only have these mnemonic techniques been demonstrated to be as much as three

times more effective than the traditional rote method (Paivio & Desrochers, 1979)

but, as so many researchers have pointed out, they transform the learning of

vocabulary from what is invariably viewed as a tedious, boring task into one that is

enjoyable and even amusing.

Listening is viewed by a number of leading researchers as the basic skill in SLA

(see Asher, 1983; Krashen & Terrell, 1983; Nord, 1978; Winitz, 1978). Through the

progressive build-up of a language store, involving both hemispheres of the brain,

speaking results, in much the same way as with the acquisition of L1. Many

investigations have demonstrated this transfer (see Asher, 1964; Ervin-Tripp, 1974;

Postovsky, 1975; Winitz & Reeds, 1973). In addition, the auditory perception of the

learner progressively develops and this becomes the basis of a good pronunciation

(Gary, 1975; Winitz, 1977). Furthermore, it has been shown that listening aids in the

long-term retention of vocabulary, whether it be for reading or for listening purposes

(Gary & Gary, 1982; Kelly, 1992).

Language always mirrors the background culture of whoever is speaking. This is

particularly true in respect of vocabulary (see De Saussure, 1974; Miller, 1996).

There should, in consequence, be a strong focus on the cultural aspect3 in respect of

vocabulary learning. This is done in WUFUN via the images, the stories which

introduce the vocabulary, the idioms and proverbs and, in particular, via the humour/

true stories. Humour can be a valuable tool for bringing out salient characteristics of a

culture, without indulging in negative stereotypes. Differences between different

western cultures—their customs, practices, attitudes, behaviour, humour, even

political and economic situation—are also brought out.

When the learning theories are decided on, the next step is to create the learning

content underpinned by these theoretical guidelines. Approximately 300 words were

taken from the 4,000 word list that Chinese university students (non-language

specialists) are required to master; their selection was based on student and teacher

judgement of word learning difficulty (for example, pronunciation, word length,

spelling, confusion with similar forms, cultural connotations, and so on), together

with other criteria, such as usefulness and relevance (Kelly & Li, 2005). These

words were then used to create 20 stories as learning texts combined with other

Computer Assisted Vocabulary Learning 25

different learning activities, forming 20 units in all. One of the 20 units has

been developed into a computer program as the prototypical unit of WUFUN,

containing 25 words plus three idioms4 to be studied (see Appendix A). Some

of the words may already be known receptively or productively to learners. The

results of the pre-vocabulary tests in the pilot study confirmed that this was the

case. The following provides an overview of the sequence of learning activities in

WUFUN.

First, a preview of the context (overview of the story), serving as an ‘advanced

organizer’, a device aiming to activate useful background information (see Chun &

Plass, 1996b, p. 504), is presented to the learner. The user can view a series of

pictures, each of them accompanied by a short spoken sentence (some of the words

to be learned will appear in the sentences for the first time; the word meaning can

be easily guessed from the pictures). This will give the user a general idea of the

story presented later. Then some vocabulary items are presented in the form of a

mini dictionary (Word Focus); the glosses include meanings, collocations, example

sentences and usage. In addition, the learner can listen to the word, view the

picture if available, and ask for a Chinese translation of the word. The user can then

read the text, the complete version of the preview, in which glossed words in Word

Focus will reappear in the context. Then some vocabulary learning strategies are

introduced to the learner in Word Memorisation Aids, the main ones relating to

verbal association, imagery, rhyming or alliteration (see Boers & Lindstromberg,

2005), etc. The user chooses a word s/he wants to know better from a list, and s/he

will be given a useful tip (with the option to display the Chinese translation) on how

to memorise the word. For example, for the word acquaintance, a sentence is given:

The queen is an acquaintance of mine. The user can listen to the sentence and is asked

to form a mental image of the sentence while listening to it. Different tips are given

to facilitate the learning of the word; whether the word contains affixes or roots,

whether it is imageable, whether it can be associated with other known words, etc.

What is central to these tips is the combination of image, sound and verbal

information. Their combination will help word memorization and accommodate

different learning styles.

Next are the exercises where the words will be practised and rehearsed in context.

By doing exercises, the learner becomes familiar with the meaning and usage of the

words. Exercises include supplying synonymous expressions, finding antonyms,

using words in collocations or as they typically occur in contexts, differentiating

words having similar but not identical meanings (for example, ridiculous and funny5),

etc. The whole procedure can be repeated. The vocabulary processing procedure in

WUFUN is described in Figure 3.

After the exercises comes the section on idioms followed by that on humour/true

stories. The idioms are usually found to be very difficult to learn as their meaning is

not apparent and often heavily culture bound. In accordance with the duel coding

theory of Pavio and Desrochers (1980), which advocates dual modality input to

enhance vocabulary learning, the user clicks on the idiom s/he wants to study and a

picture that illustrates the meaning of the idiom will pop up on the right of the screen;

26 Q. Ma and P. Kelly

in the meantime, the user can listen to an explanation of the idiom. The humour/true

stories are to arouse the learner’s awareness of the cultural elements underlying

language learning. Thus each story or joke to a certain degree reflects a facet of

western culture (though not necessarily the culture of English-speaking countries

since the language is spoken by a much larger population). The learner can read and

listen to the stories.

Integrating Computer Technology into the Design

The computer technology has a two-fold function. It is used to create the multimedia

program and, more importantly, to make the user follow the design model of the

program. Users have restricted freedom in using the software. The idea is that they

can always go back to the previous steps while they have to complete some basic

requirements before going on to the next step. If the user does not obey these rules,

the forward button on the navigation bar to go to the next page will be disabled. Here

are a few examples: the user has to have listened to all the short sentences in the

overview of the story before being able to go to the Word Focus (WF); s/he has to

look up at least one word in WF before reading the story; s/he can only access the

correct answers of the exercises in written form after having listened to them first; s/he

normally has to finish one exercise before starting the next one or s/he can go directly

to the next exercise but will get a score of ‘0’ for the exercise skipped. Every decision

regarding user freedom for each step is thought out so that the user can obtain some

benefit from doing the activities without being frustrated to the point that s/he no

longer wishes to continue. Technology is employed in such a way as to ensure that

each step is completed to a minimum requirement.

Taking into Account User Actions in the Design

In order to induce the learner to follow the design model of our program, a learning

metaphor is represented in the menu screen, namely, learning is a cyclic process

and learning tasks are to be finished step by step (see Figure 4). A help system will

Figure 3. The vocabulary processing procedure in WUFUN

Computer Assisted Vocabulary Learning 27

be at hand to show the learner how the software should be used. Each learning

activity is accompanied by detailed instructions on how to carry out the task. The

interface design in each page of the software is consistent and easy to understand.

To monitor users’ performance, some user actions are recorded by the system while

s/he is using the software: the total time spent on the software, the number of words

viewed in WF and in Word Memorization Aids (WMA),6 the time spent on exercises and

the score obtained. These data will provide important information for evaluation of

the software.

A Pilot Study for Software Evaluation

When the prototypical unit was ready we carried out a pilot study in a Chinese

university to evaluate the software. The study is a pre-test and post-test design

combined with questionnaires and interview. The evaluation of the software will be

conducted in terms of: learning outcome as measured by vocabulary learning rate and

the vocabulary learning strategies acquired; learner evaluation as revealed by degree of

satisfaction in the use of the software; restricted freedom impact (on learning outcome

and learner evaluation) as measured by the relationship between user actions and

learning outcome/learner evaluation. Through the software evaluation, the CALL

efficacy model discussed earlier can be empirically validated.

Figure 4. The main menu screen of WUFUN

28 Q. Ma and P. Kelly

Research Questions

Our research questions are the following:

1. What is the learning outcome of WUFUN? More specifically:

(a) To what extent will WUFUN help Chinese learners to acquire vocabulary

perceived as difficult at the receptive and the productive level in two

different settings: individual use and classroom use?

(b) Are learners likely to develop vocabulary learning strategies that will

facilitate vocabulary learning in the long run in the two different settings?

2. How do users evaluate WUFUN in the two different settings?

3. How are user actions related to learner evaluation and to the learning results in

the two different settings?

The Study

Subjects. Two groups of first year students at Three Gorges University, Yichang,

China, of various study backgrounds (non-language specialists) participated in the

study. They are low intermediate learners who have a vocabulary of 2,000 – 3,000

words. Initially we tried to include more subjects, but due to some unexpected

practical constraints we only had 35 subjects, divided into two groups according to

the experiment setting. Group 1 (G1) contains 17 students who volunteered to

participate in the experiment after a brief introduction to WUFUN. They made an

appointment with the researcher and completed the experiment on an individual

basis. Group 2 (G2) contains 18 students who did the experiment together in a

computer room as a self-learning class. They were required by their teacher to

participate in the study. It should be noted that individual use or classroom use of

language learning software are the two most prototypical settings for CALL. When

learners volunteer or choose to use a piece of language software, as in the case of G1,

it can be assumed that they are displaying an interest in the task. According to the

process model of motivation (Dornyei, 2001), this generates motivation7 at the start

of the learning task. However, no such assumption can be made in respect of subjects

who are coerced into performing the task, which was the case with G2.

Experiment instruments (see examples for each type of instrument in Appendix B): pre- and

post-vocabulary (receptive/productive) tests. A separate receptive and productive test was

administered before software use to test whether the students knew the new

vocabulary items that appeared in WUFUN receptively or productively. Laufer

(1998) distinguished three types of vocabulary knowledge, namely passive vocabu-

lary, controlled active vocabulary and free active vocabulary. In a more recent article

(2004), she divides knowledge of a word into four degrees of strength: productive

recall, receptive recall, productive recognition, receptive recognition, which are ranked

hierarchically (from the highest to the lowest) in terms of the strength of the word

knowledge. We chose two test formats for the receptive knowledge test: the receptive

Computer Assisted Vocabulary Learning 29

recognition test (the lowest strength) and the vocabulary level test (Laufer & Nation,

1995). For the productive knowledge test, we used the controlled active vocabulary test

(Laufer, 1998), which closely resembles the equivalent of the receptive recall test for

the second highest strength of the word knowledge. To avoid the test-wise effect, we

used some distracters in both tests. There were 25 words to be marked in the

receptive test and 21 words in the productive test. The same two tests were

administered again after software use to see whether there were vocabulary gains and

what these might be.

Pre- and post-questionnaires. A pre-questionnaire (Q.1) was administered before

software use to glean information about the students’ vocabulary learning strategies

and their expectations of the software (WUFUN) they were going to use. We mainly

used multiple-choice questions; both the questions and the choice of answers were

carefully designed to ensure the information given would be as complete as possible

and thus give as accurate a picture as possible of the students’ opinions.

A post-questionnaire (Q. 2) was administered after software use. It aimed to find

out to what degree the students were satisfied after using WUFUN and to obtain their

comments and suggestions. It is divided into 13 sections and made up of 44 questions

on a 5-point scale plus a few open questions. Students were asked to give a rating in

terms of their satisfaction regarding the various components (see Figure 3 for a brief

review) of the program. Questions were also asked on the scoring and checking

system (feedback system), interface design, graphic design, sound system, etc. At

the end there was an open section for any comments and suggestions regarding the

software and to find out whether the students had learned or been aware of the

vocabulary learning strategies embedded in the software.

Experiment procedure. The whole experiment follows an eight-step linear sequence:

pre-receptive test, pre-productive test, pre-questionnaire, software use, post-

questionnaire, post-receptive test, post-productive test and an interview. The last

step, the interview, was limited to G1; it was not used with G2 due to the practical

constraints. It took about 2 – 2.5 hours to complete the whole procedure. It should be

noted that learners were told beforehand that they would study a piece of vocabulary

learning software but they did not know about the detailed procedure involved.

Although they were tested before using the software, most students would not have

expected a test afterwards.

Data collection and analysis. For each subject in G1 we collected four scores on

vocabulary tests, two sets of information in the pre- and post-questionnaires, user

actions recorded by the software system, and some follow-up information in the

interview. For G2, we have all the information except the follow-up information.

We obtained each student’s vocabulary gain at both the receptive and the

productive level by subtracting the pre-scores from the post-scores. We performed a

t-test to see whether there was a significant difference between the two groups. We

calculated all the ratings in all the sections for the post-questionnaire and calculated a

30 Q. Ma and P. Kelly

mean for each student with a rating from 1 – 5 as the learner evaluation. A profile

recording system built into the software enabled us to examine the user actions

during software use. For both groups we performed a correlation test between the

user actions and the vocabulary gain and another correlation test between the user

actions and the learner evaluation.

Results and Discussion

Pre-questionnaire. From Q1, we get a detailed picture of the students’ profile. In

addition, an in-depth study of the quantified results reveals some characteristics of the

students’ learning habits and of their perception of CALL program learning. As for

vocabulary learning, the most popular memorisation strategies are rote accompanied

by periodic review. Other more elaborate techniques, such as mnemonics and word

grouping, are also reported to have been used, but less frequently. The listening

approach is adopted by the least number of students. They tend to be ready to

perform tasks perceived as interesting or less demanding, such as viewing pictures or

reading stories, and are more likely to avoid demanding tasks such as doing exercises

or learning vocabulary. However, the avoidance could be compensated for by the

usefulness they perceived in performing the task. If they received help to make the

task easier, they would certainly be more willing to do it.

Gain in receptive and productive vocabulary. Table 1 presents the mean score of both

the receptive and the productive tests for both groups. Table 2 presents the means of

receptive gain between the pre-test and the post-test for both groups.

Table 1. Mean and standard deviation (SD) for pre-test and post-test

Mean SD Minimum Maximum

Pre Post Pre Post Pre Post Pre Post

Receptive G1 15.59 21.88 2.62 1.45 11 19 19 24

Full¼ 25 G2 16.06 20.5 3.11 3.84 8 10 20 24

Productive G1 10.91 16.12 2.31 2.64 7 11.5 15 20.5

Full¼ 21 G2 9.64 14.25 4.05 4.41 2 2 15.5 20

Table 2. Mean for receptive gain

Mean SD Learning rate* Minimum Maximum

G1 6.29 2.69 40% 3 11

G2 4.44 2.38 28% 1 9

Note: *Learning rate is calculated by dividing the mean of the pre-test score by the difference

between the mean of the post-test score and the mean of the pre-test score, e.g., the receptive

learning rate of 40% for G1 is obtained by dividing the mean of the pre-test score (15.59) by the

difference between the post-test score and the pre-test score (6.29).

Computer Assisted Vocabulary Learning 31

Table 3 presents the means of productive gain between the pre-test and the post-

test for both groups.

The mean scores set out in Table 1 revealed that the pre-test scores for both groups

regarding receptive and productive vocabulary are quite similar (receptive: 15.59 –

16.06 out of 25; productive: 10.91 – 9.64 out of 21); a t-test indeed confirms that

there is no difference between the two groups (not reported here for the sake of

space). It would seem that both groups have a similar starting point in terms of pre-

knowledge of the vocabulary items to be studied. However, it is noted that G2 had a

higher SD than G1 for both pre-test and post-test on both vocabulary levels, showing

that there was a bigger difference between the subjects within G2 than within G1.

The gain for both groups was quite satisfactory considering there was a high

baseline for each group (see Table 1). Figures presented in Table 1 imply that G1

had nine words to learn to a receptive level and 10 words to a productive level; G2

had nine words to learn to a receptive level and 11 words to a productive level. Our

first research question was: To what extent will WUFUN help Chinese learners to acquire

vocabulary perceived as difficult at the receptive and the productive level in two different

settings: individual use and classroom use? It seems that both groups achieved a

considerable learning rate at both the receptive and the productive level. Moreover,

both groups have a higher vocabulary learning rate at the productive level than at the

receptive level (47%4 40% for G1; 48%4 28% for G2).

Initially, it appeared that G1 had gained more vocabulary at both the receptive and

productive levels. By performing a t-test to compare the means for both groups we

find, however, that G1 did significantly better than G2 at the receptive level but not at

the productive level. See Tables 4 and 5 for the results.

Table 4 shows that the difference in receptive gain between G1 and G2 is

significant (t Stat 2.164 t Critical 2.03, df¼ 33, p5 .05.); however, the difference in

productive gain is insignificant as shown in Table 5 for both groups (t Stat 0.785 t

Critical 2.03, p4 .05.).

Table 3. Mean for productive gain

Mean SD Learning rate Minimum Maximum

G1 5.32 2.59 47% 0 9.5

G2 4.56 3.17 48% 0 11.5

Table 4. T-test of receptive gain between two groups

T-test Mean Variance Observations df T-stat T-critical p

G1 6.29 7.22 17 33 2.16 2.03 .038*G2 4.44 5.67 18

Note: *p5 .05. (two-tailed).

32 Q. Ma and P. Kelly

The two findings given above—that the productive learning rates are higher than

the receptive learning rates for both groups and that there is no significant difference

in vocabulary gain between the two groups at the productive level but the difference is

significant at the receptive level—seem to indicate that WUFUN is slightly more able

to help learners to learn vocabulary productively than receptively regardless of

whether for individual or classroom use.

Post-questionnaire (learner evaluation). As mentioned earlier, there are two types of

questions in Q2: rating scale questions and open questions. We will focus only on the

rating scale questions and leave the open questions to a later stage. See Table 6 for the

means of evaluation for both groups.

A t-test shows there is no significant difference between the two groups (t Stat

0.935 t Critical 2.03, df¼ 33, p4 .05.). Both groups gave a good evaluation of the

program; G1 had a mean of 4 out of 5 and G2 had a mean of 3.83 out of 5. In

answering the question whether they would like to use the software when more units

are developed in the future, all the subjects in G1 unanimously replied ‘‘Yes’’. 3 out of

18 in G2 replied ‘‘No’’, which still leaves a positive result since G2 were forced, as it

were, into participating in the experiment. In response to the research question: How

do users evaluate WUFUN in the two different settings?, the software evaluation by the

learners in the individual or classroom setting is satisfactory with most students from

both groups expressing their willingness to continue to use the software in the future.

Of the 13 sections of Q2, the favourite section for G1 is the ‘‘scores and checking

system’’, which has an average of 4.53. It is the same for G2 who have an average

rating of 4.47. The lowest section (3.65) for G1 is the ‘‘program sequence’’ in which

students are asked whether they like the sequence of the program and whether they

feel they should follow the guidelines of the program instead of doing what they want.

The CALL efficacy model described earlier is implemented in the program sequence

as also is the restricted user freedom regarding the control of the program. For G2,

this section is rated the second lowest (3.47). Nevertheless, the ratings for both

Table 5. T-test of productive gain between two groups

T-test Mean Variance Observations df T-stat T-critical p

G1 5.32 6.69 17 33 0.78 2.03 .44

G2 4.56 10.03 18

Table 6. Mean of learner evaluation of the software

Mean SD Minimum Maximum*

G1 4 0.42 3.21 4.74

G2 3.83 0.63 2.1 4.53

Note: *Full rating¼ 5.

Computer Assisted Vocabulary Learning 33

groups for this section have exceeded the middle point in the rating scale. This

suggests that subjects in both settings do not particularly like the constraints but that

they find them acceptable.

User actions. Table 7 presents the mean of user actions: time spent on the program,

number of words viewed in WF, number of words viewed in WMA, time spent on the

exercises and score obtained for the exercises for both groups.

A quick look at this table will reveal that the two groups are very different regarding

the way they use the software. At first sight, it appears that G2 spent more time on the

program than G1 but G2 had a much greater SD (33.5) than G1 (18.72). A careful

look at the data shows that three subjects (all females) in G2 spent 141, 150 and 156

minutes on the program. If the three were taken out, the average time for G2 would

be about 73 minutes. For G1, the longest time spent was 112 minutes. Thus, in fact,

subjects in G2 generally spent less time than those in G1, except the three female

subjects. G2 also spent less time on the exercises and scored much lower than G1. To

answer the research question: How are user actions related to learner evaluation and to the

learning results in the two different settings?, we performed multiple correlation tests

between each selected user action (listed in Table 7) and the receptive, productive

gain (learning results) and the learner evaluation (results of Q2) for both groups. See

Table 8 and Table 9 for the results.

Note that in both Tables 8 and 9, the correlation r whose absolute value is smaller

than 0.2 is excluded. As revealed in Tables 8 and 9, the situations for both groups are

quite different. For G1, the total time spent on the software seems to have a good

significant negative correlation (r¼7.52, p5 .05.) with the learner evaluation; that

is, the more time spent on the software, the lower the evaluation tends to be. This is

the opposite for G2 in which there is a good significant positive correlation (r¼ .51,

p5 .05.) with the evaluation. The correlations between total time and the receptive

Table 7. Mean of user actions

Time

(minutes) WMA WF

Time on ex.

(minutes)

Score for ex.

(Max.¼ 100)

G1 80.77 7.94 17.65 22.53 60.08

G2 85.44 8.56 18.67 17.44 36.30

Table 8. Correlation between user actions and their learning results and learner evaluation for G1

Person r Time WMA WF Time on ex. Score for ex.

Learner evaluation 7.52* 7.22

Receptive gain .25 .61** .36

Productive gain .32 .44 .49*

Note: *p5 .05. **p5 .01. (two-tailed).

34 Q. Ma and P. Kelly

and productive gain are weak and insignificant for both groups. The number of words

viewed in WMA seems to have a good significant positive correlation with the

receptive gain for both groups (r¼ .61, p5 .01 for G1; r¼ .52, p5 .05 for G2). The

number of words viewed in WF has little correlation with receptive and productive

gain for G1; it has quite a good significant correlation with the productive gain for G2

and a weaker insignificant correlation with the receptive gain. Time spent on the

exercises seems to have little to do with the receptive and productive gain for G1; it

has a better positive correlation (r¼ .47, p5 .05) with the receptive gain for G2. The

score for the exercises has a good significant positive correlation (r¼ .49, p5 .05)

with the productive gain for G1 and a good significant positive correlation (r¼ .51,

p5 .05) with the receptive gain for G2.

In both tables we find that three factors, total time spent on the program, words viewed

in WMA and score obtained for the exercises, seem to be more closely related to the

learning results and learner evaluation for both groups. The way these factors

correlate with the learning results and evaluation is quite different for both groups.

For example, we are not very clear why the total time spent on the program is

correlated in two opposite directions for G1 and G2. The only common phenomenon

shared by both groups is that WMA has similar positive correlation with receptive

gain. This proves that WMA, the main section to introduce vocabulary learning

strategies, is more likely to be helpful to receptive vocabulary gain. But why is it less

likely to be helpful for productive vocabulary gain? One assumption might be that a

single exposure to vocabulary learning strategies is not enough to help the students to

learn the vocabulary to a productive level. To learn a word productively, one needs, in

addition to deep mental processing of the lexical information, sufficient familiarity

with the word in different contexts. Therefore, the score the subjects obtained for the

exercises would be more likely to account for the productive gain. This is the case

with G1 for which a significant positive correlation is found between the exercise

score and the productive gain. This is not the case with G2, where a significant

positive correlation is only found between the exercise score and the receptive gain.

Finally, an extra correlation test was performed between the learner evaluation and

the learner outcome. No significant correlation was found for both groups and the

two types of vocabulary gain. Unlike previous findings, learner attitude toward the

learning tasks does not greatly affect the learning results. For example, the subject in

G1 who had the lowest evaluation of the software (3.21) turned out to have achieved a

high vocabulary gain both receptively (10 words) and productively (eight words).

Table 9. Correlation between user actions and their learning results and learner evaluation for G2

Person r Time WMA WF Time on ex. Score for ex.

Learner evaluation .51* .35 .44 .35 .33

Receptive gain .38 .52* .21 .47* .51*Productive gain .31 .28 .5* .42 .21

Note: *p5 .05. (two-tailed).

Computer Assisted Vocabulary Learning 35

This subject stated frankly in the interview that he did not like the software because

the ‘rigid’ order of the program would not allow him to exercise his individuality and

creativity. He spent 90 minutes on the software, of which 19 minutes were devoted to

the exercises, and obtained a score of 73.9. In addition, he viewed 15 words in WMA

and 17 in WF. Note that he spent more time, viewed more words in WMA and did

better on the exercises than the average (See Table 7). Although he did not like the

restricted freedom regarding the software use, it is this design feature that guided and

controlled his actions which led to his superior learning results over others who gave a

higher evaluation of the software but who spent less time and viewed fewer words.

This, on the one hand, proves that the design of WUFUN based on the CALL

efficacy model has been preliminarily successful; on the other hand, it indicates that

affective factors such as attitudes towards the learning task do not always predict

learning result. What matters is what learners actually do in the learning process.

Subjects in the two different settings provided rather different pictures of how user

actions are related to vocabulary gain and learner evaluation in the two different

settings. The difference can be attributed to the quantitatively different user

behaviour, as shown in Table 7. It is very likely that the two groups differ in several

respects; for example, individual users in G1 are doubtless more motivated to use the

software than group users in G2 since the former volunteered to participate in the

study while the latter were coerced into doing so. One subject in G2 spent only 44

minutes on the software, including two minutes on the exercises, viewed two words in

WMA and one word in WF. His vocabulary gain turned out to be the lowest: two

words receptively and zero word productively. The comments he gave were negative:

he considered that the software was boring and that it did not differ much from their

textbooks. His insufficient user actions and poor learning outcome are clearly the

result of a lack of motivation. In addition, the subjects in G1 might have been more at

ease, attentive, and relaxed than those in G2 in the experiment due to the different

settings. It should be remembered that subjects in G1 completed the experiment

individually while all the subjects of G2 were placed together in a computer room.

Other information. This information includes comments and suggestions given by the

subjects in the free open section in Q2 and further information obtained from the

interview (limited to G1). In addition, the subjects were asked to indicate whether

they acquired some useful strategies for learning vocabulary from the software.

Table 10 presents quantitative information regarding answers to the two questions.

The response rates for these two questions are much better for G1 than for G2; in

addition, the quality of the answers for G1 is definitely better in terms of content and

Table 10. Response rate for questions in free section in Q.2

Comments/suggestions Percentage Ideas for voc. learning Percentage

G1 (n¼ 17) 16 94% 13 76%

G2 (n¼ 18) 15 83% 5 27%

36 Q. Ma and P. Kelly

length. We will discuss only the strategies that they acquired from using the software

in order to answer the research question: Are learners likely to develop vocabulary

learning strategies that will facilitate vocabulary learning in the long run in the two different

settings? See Table 11 for the categorization of the strategies both groups claimed to

have acquired from the software.

We noted two facts. First, most learners mentioned just one or two strategies.

Second, the learners tend to adopt the strategies that require less mental effort and

show less interest in those requiring more mental effort, such as imagery and

practising words in different contexts. This could also be due to the perceived

usefulness of each category.

Thus our answer to this research question is that the majority of individual users

acquired one or two strategies perceived to be useful from using the software but

strategies requiring more mental effort are less likely to be appreciated. In contrast,

the embedded vocabulary learning strategies are largely ignored by most group

learners. This does not necessarily mean that those strategies were not perceived to be

useful, but simply that the strategies have not entered into their metacognitive

repertoire. It seems that a single exposure to the software for a short period is not

enough to help students to develop vocabulary learning strategies in a systematic way.

It may be also due to the limited mental processing capacity: when learners attend to

both the form and the meaning of the vocabulary items, the cognitive load might be

too heavy to allow them to pay more than limited attention to the embedded learning

strategies.

Conclusion and Suggestions for the Future Study

Our main objective has been to introduce the CALL efficacy model to ensure the

quality of CALL programs. The model is constructed by identifying four main

components, theory, computer technology, user actions, and learner information, and

integrating them into a whole. They influence and interact upon each other, thus

strengthening all the fibres or links of the model. It is these that determine the quality

of a CALL program as well as constituting the methodology of a CALL program. It is

Table 11. Learning strategies acquired by G1 and G2

G1 G2

Put words in sentences to memorize them 4 1

Put words similar in form and meaning together to study 3

Separate roots or affixes from the words 2

Make word associations 2

Practise words (in diversified contexts) 1

Listen to the words in sentences or a text 1 1

Compare words and group words 1

Image the meaning of words 1 1

Computer Assisted Vocabulary Learning 37

shown how the model can be applied to the design of the CAVL software WUFUN

for Chinese university students to learn difficult vocabulary items. A pilot study is

reported in order to evaluate the software and to validate the model empirically in

both individual use and classroom use. From the results of the study, it seems that

the CALL efficacy model underpinning WUFUN has been preliminarily proved to

be effective in both settings. Due to the complicated experimental procedure we

collected a large amount of different data. Data analysis and results reporting were

also a painstaking process. It is arguable whether we have chosen the ideal research

methodology for such a complicated study.

Regarding our first research question, the learning outcome of WUFUN, it is

demonstrated that by using the software, learners can acquire vocabulary perceived as

difficult both receptively and productively in both settings. Moreover, the productive

learning rate is slightly higher for both. Learners have acquired a few vocabulary

learning strategies but not in a systematic way that would allow their further

independent use, which is probably due to their limited mental processing. For the

second research question, learner evaluation in both settings is fairly satisfactory

despite the constraints incorporated into the software, and the majority of learners

reported that they would like to use the software when more units are developed.

There is not yet a satisfactory answer to the third question. Some user actions, such as

total time, number of words viewed in WMA and the score obtained for exercises,

seem to be closely related to the learning outcome and learner evaluation; however,

subjects in different settings, individual use or classroom use, revealed very different

pictures. Learner attitudes towards the software do not appear to affect the learning

outcome which is more related to what learners actually do in the learning process.

There are, however, a number of suggestions to be made for the next study.

Improvements will be made to the design of the WUFUN software based on the

results of the pilot study and the comments/suggestions made by learners (for

example, more pictures should be added to the software). More importantly, the

following questions will be addressed:

1. The instruction of vocabulary learning strategies will be made more explicit. The

first thing is to make learners notice the existence of vocabulary learning

strategies and convince them of their usefulness. In other words, a (short)

strategy training session can be held preceding the software use by arousing the

learners’ metalinguistic awareness to fully maximize the software learning

potential.

2. The user data recording system will be elaborated to allow a more detailed

recording of the user actions, e.g., what words are viewed in WF or WMA. This

can enable us to look at user actions more clearly in relation to other learner

information, such as previous vocabulary knowledge. It might also lead to a more

satisfactory answer to the question how user actions are related to learning

results.

3. We need to further test the CALL efficacy model. In the present study, user

actions and learning results were investigated under the constraints embedded in

38 Q. Ma and P. Kelly

the software. In the next study we will make a different version of WUFUN with

all the constraints removed, where the user is given complete freedom to decide

what and in what order to do with the software. We shall compare the user

actions and the learning outcome in two conditions: one with constraints and the

other constraint-free.

Acknowledgements

We wish to extend our thanks to the following: Sylviane Granger (University of

Louvain) for her support and for her constructive comments on earlier versions of this

research; Nora Condon (University of Louvain) for her insightful remarks and her

participation in our lengthy discussions; Frank Boers (Erasmus College of Brussels

and University of Antwerp) for his careful reading of the text, for his many helpful

suggestions and for his keen interest in our research; the two anonymous reviewers on

whose suggestions we have endeavoured to act.

Notes

1. This independence and know-how are essentially what we mean by ‘good learner’. It was a key

feature of the method of language learning for non-language specialists developed by a number

of Belgian linguists in the 1980s (Kelly, 1989; Ostyn & Godin, 1985). The learner assumes

responsibility for his or her learning, and is given the materials and knowledge needed to

progress on their own. It is beyond the scope of this paper to say what that knowledge is as that

would take us into the wide and well-researched world of learning strategies.

2. Implicit learning in cognitive psychology can be defined as ‘‘learning without awareness of what

is learned’’ (Dekeyser, 2003, p. 314). Thus explicit learning can be defined as learning with

awareness of what is learned.

3. This cultural aspect of vocabulary learning is stressed and discussed at some length in one of the

research papers that preceded the development of the software (Vanparys et al., 1997).

4. Idioms are introduced for two purposes: to add them to the learners’ lexicon and to show how

idioms in different languages reflect the culture of the language.

5. Ridiculous and funny can be both translated into hao xiao de in Chinese. Thus if a Chinese learner

only remembers the translation for the two words s/he would not be able to know that ridiculous

has a negative connotation while funny is always positive.

6. Each time the user goes to WF or WMA to view a word, a count will be recorded. If the data

show that a user has viewed 15 words in WF, this only means s/he has referred to words in WF

15 times and does not necessarily mean s/he has looked up 15 different words, because a given

word can be viewed several times. Due to some technical constraints, the software programmer

was unable to develop the function to record what words were viewed.

7. The motivation source may be their perceived value of using the software since most Chinese

learners are keen to improve their English on account of the exam requirement and to give them

more chances of professional advancement.

Notes on contributors

Qing Ma is currently doing a Ph.D. in applied linguistics at the University of Louvain, Faculty of

Arts, Belgium. Her main research interests include second language vocabulary acquisition

and CALL.

Computer Assisted Vocabulary Learning 39

Peter Kelly is a professor of linguistics at China University of Three Gorges, formerly senior

professor at the University of Namur, Belgium, where he directed the School of Modern

Languages. His main research interests are in the area of second language acquisition.

References

Ames, W. S. (1966). The development of a classification scheme of contextual aids. Reading

Research Quarterly, 11(1), 57 – 82.

Asher, J. J. (1964). Towards a neo-field theory of behaviour. Journal of humanistic psychology, 4,

85 – 94.

Asher, J. J. (1983). Learning another language through actions. Los Gatos: California Sky Oaks

Productions, Inc.

Atkinson, R. C., & Raugh, M. R. (1975). An application of the mnemonic keyword method to the

acquisition of a Russian vocabulary. Journal of Experimental Psychology: Human Learning and

Memory, 104(2), 126 – 133.

Beck, I. L., McKeown, M. G., & McCaslin, E. S. (1983). Vocabulary development: all contexts are

not created equal. The Elementary School Journal, 83(3), 177 – 181.

Boers, F., Eyckmans, J., & Stengers, H. (2004). Researching mnemonic techniques through CALL:

the case of multiword expressions. Proceedings of The Eleventh International CALL Conference

(pp. 43 – 48). Antwerp: University of Antwerp.

Boers, F., & Lindstromberg, S. (2005). Finding ways to make phrase-learning feasible: the

mnemonic effect of alliteration. System, 33, 225 – 238.

Burt, M., & Dulay, H. (1975). New directions in second language teaching, learning and bilingual

education. Washington, DC: TESOL.

Chapelle, C. A. (2001). Computer applications in second language acquisition. Cambridge: Cambridge

University Press.

Chun, D. M., & Plass, J. L. (1996a). Effects of multimedia annotations on vocabulary acquisition.

The Modern Language Journal, 80(2), 183 – 198.

Chun, D. M., & Plass, J. L. (1996b). Facilitating reading comprehension with multimedia. System,

14(4), 503 – 518.

Clark, R. E. (1994). Media will never influence learning. Educational Technology Research and

Development, 42(2), 21 – 29.

Coady, J. (1993). Research on ESL/EFL vocabulary acquisition: putting it in context. In T. Huckin,

M. Haynes, & J. Coady (Eds.), Second language reading and vocabulary learning (pp. 3 – 23).

Norwood, NJ: Ablex Publishing.

Coady, J. (1997). L2 vocabulary acquisition: a synthesis of research. In J. Coady & T. Huckin

(Eds.), Second language vocabulary acquisition (pp. 273 – 290). Cambridge: Cambridge

University Press.

Cohen, A. D. (1987). The use of verbal and imagery mnemonics in second-language vocabulary

learning. Studies in Second Language Acquisition, 9(1), 43 – 64.

Cohen, A. D. (1998). Strategies in learning and using a second language. Harlow, Essex:

Longman.

Cohen, A., Weaver, S. J., & Li, T. Y. (1996). The impact of strategies-based instruction on

speaking a foreign Language. CARLA Working Paper Series, 4. Retrieved January 10, 2005,

from www.carla.umn.edu/about/profiles/CohenPapers/SBIimpact.pdf

Colpaert, J. (2003). Introduction to CALL. Lecture given at the ELSNET Summer School 2003,

June, Lille, France.

Colpaert, J. (2004). Design of online interactive language courseware: conceptualisation, specification and

prototyping. Research into the impact of linguistic-didactic functionality on software architecture.

Unpublished PhD thesis, University of Antwerp, Belgium. Retrieved June 8, 2005, from

www.didascalia.be/doc-design.pdf

40 Q. Ma and P. Kelly

Dekeyser, R. (2003). Implicit and explicit learning. In J. Doughty & M. L. Long (Eds.), The hand

book of second language acquisition (pp. 313 – 348). Oxford: Blackwell.

De Ridder, I. (2002). Visible or invisible links: Does the highlighting of hyperlinks affect incidental

vocabulary learning, text comprehension, and the reading process? Language Learning &

Technology, 6(1), 123 – 146.

De Saussure, F. (1974). Course in general linguistics. London: Fontana/Collins.

Duquette, L., Renie, D. & Laurier, M. (1998). The evaluation of vocabulary acquisition when

learning French as a second language in a multimedia environment. Computer Assisted

Language Learning, 11(1), 3 – 34.

Ervin-Tripp, S. (1974). Is second language learning like the first? TESOL Quarterly, 8, 111 – 127.

Gary, J. O. (1975). Delayed oral practice in initial stages of second language learning. In M. Burt &

H. Dulay (Eds.), New directions in second language teaching, learning and bilingual education

(pp. 89 – 95). Washington, DC: TESOL.

Gary, N., & Gary, J. O. (1982). Packaging comprehension materials: towards effective language

instruction in difficult circumstances. System, 10(1), 61 – 69.

Goodfellow, R. (1994). A computer-based strategy for foreign-language vocabulary learning.

Unpublished PhD thesis, Open University, UK.

Goodfellow, R. (1995). A review of the types of CALL programmes for vocabulary instruction.

Computer Assisted Language Learning, 2 – 3, 205 – 226.

Groot, P. J. M. (2000). Computer assisted second language vocabulary acquisition. Language

Learning & Technology, 4(1), 60 – 81.

Hulstijn. J. (1992). Retention of inferred and given word vocabulary learning. In P. J. Arnaud and

H. Bejoint (Eds.), Vocabulary and applied linguistics (pp. 113 – 125). London: Macmillan.

Hulstijn, J. (2003). Incidental learning and intentional learning. In J. Doughty & M. L. Long

(Eds.), The handbook of second language acquisition (pp. 349 – 381). Oxford: Blackwell

Publishing Ltd.

Kelly, P. (1986). Solving the vocabulary retention problem. ITL, 74, 1 – 16.

Kelly. P. (1989). A particular application of the RALEX method of foreign language learning.

Le Langage et l’Homme, 24(70), 153 – 160.

Kelly, P. (1992). Does the ear assist the eye in the long-term retention of lexis? International Review

of Applied Linguistics, 30(2), 137 – 145.

Kelly, P., & Li, X. (2005). A new approach to learning English vocabulary: more efficient, more effective,

and more enjoyable. Beijing: Foreign Language Teaching and Research Press.

Kelly, P., Li, X., Vanparys, J., & Zimmer, C. (1996). A comparison of the perceptions and practices

of Chinese and French-speaking Belgian university students in the learning of English: the

prelude to an improved programme of lexical expansion. ITL, 113 – 114, 275 – 303.

Knight, S. (1994). Dictionary: The tool of last resort in foreign language reading? A new

perspective. The Modern Language Journal, 78, 285 – 299.

Krashen, S. (1989). We acquire vocabulary and spelling by reading: additional evidence for the

input hypothesis. The Modern Language Journal, 73(4), 440 – 464.

Krashen, S. (1993). The power of reading. Englewood Colorado: Libraries Unlimited Inc.

Krashen, S., & Terrell, T. (1983). The natural approach: language acquisition in the classroom. Oxford:

Pergamon Press.

Kukusla-Hulme, A. (1988). A computerized interactive vocabulary development system for

advanced learners. System, 16(2), 163 – 170.

Laufer, B. (1997). The lexical plight in second language reading: words you don’t know, words you

think you know and words you can’t guess. In J. Coady, & T. Huckin (Eds.), Second language

vocabulary acquisition (pp. 20 – 34). Cambridge: Cambridge University Press.

Laufer, B. (1998). The development of passive and active vocabulary in a second language: Same or

different? Applied Linguistics, 19(2), 255 – 271.

Laufer, B. (2001). Reading, word-focused activities and incidental vocabulary acquisition in a

second language. Prospect, 16(3), 44 – 54.

Computer Assisted Vocabulary Learning 41

Laufer, B., Elder, C., Hill, K, & Congdon, P. (2004). Size and strength: do we need both to

measure vocabulary knowledge? Language Testing, 21(2), 202 – 226.

Laufer, B., & Hill, M. (2000). What lexical information do L2 learners select in a CALL dictionary

and how does it affect word retention? Language Learning & Technology, 3(2), 58 – 76.

Laufer, B., & Nation, I. S. P. (1995). Vocabulary size and use: lexical richness in L2 written

production. Applied Linguistics, 16, 307 – 322.

Levy, M. (1997). Computer assisted language learning. Oxford: Clarendon Press.

Levy, M. (1999). Design processes in CALL: integration theory, research and evaluation. In

K. Cameron (Eds.), Computer assisted language learning: media, design and applications

(pp. 84 – 107). Lisse: Swets & Zeitlinger.

Levy, M. (2002). CALL by design: discourse, products and process. ReCALL, 14(1), 55 – 84.

Li, X., Song, X., Zimmer, C., Vanparys, J., & Kelly, P. (1999). WUFUN: a new approach to more

efficient and effective vocabulary learning. ITL, 125 – 126, 181 – 194.

Miller, G. A. (1996). The science of words. New York: Scientific American Library.

Nagy, W. E., Herman, P. A., & Anderson, R. C. (1985). Learning words from context. Reading

Research Quarterly, 20, 233 – 253.

Nation, I. S. P. (1990). Teaching and learning vocabulary. New York: Newbury House Publishers.

Nation, I. S. P. (2001). Learning vocabulary in another language. Cambridge: Cambridge University Press.

Nord, J. R. (1978). Developing listening fluency before speaking: an alternative paradigm. Paper

presented at the 5th World Congress of Applied Linguistics, Montreal, Canada.

O’Malley, J. M., & Chamot, A. U. (1990). Learning strategies in second language acquisition.

Cambridge: Cambridge University Press.

O’Malley, J. M., Chamot, A. U., Stewner-Manzanares, G., Russo, R. P., & Kupper, L. (1985).

Learning strategy applications with students of English as a second language. TESOL

Quarterly, 19, 557 – 584.

Ostyn, P. & Godin, P. (1985). RALEX: An alternative approach to language teaching. The Modern

Language Journal, 6(4), 346 – 355.

Oxford, R. L., & Scarcella, R. C. (1994). Second language vocabulary learning among adults: State

of the art in vocabulary instruction. System, 22(2), 231 – 243.

Paribakht, T. S., & Wesche, M. B. (1997). Vocabulary enhancement activities and reading for

meaning in second language vocabulary acquisition. In J. Coady, & T. Huckin (Eds.), Second

language vocabulary acquisition (pp. 174 – 200). Cambridge: Cambridge University Press.

Pavio, A., & Desrochers, A. (1979). Effects of an imagery mnemonic on second language recall and

comprehension. Canadian Journal of Psychology, 33, 17 – 28.

Pavio, A., & Desrochers, A. (1980). A dual-coding approach to bilingual memory. Canadian Journal

of Psychology, 34, 388 – 399.

Postovsky, V. A. (1975). The priority of aural comprehension in the language acquisition process. Paper

presented at the 4th AILA World Congress, Stuttgart, Germany.

Pressley, M., & Levin, J. R. (1981). The keyword method and recall of vocabulary words from

definitions. Journal of Experimental Psychology: Human Learning, 17(1), 72 – 76.

Sternberg, R. J. (1987). Most vocabulary is learned from context. In M. G. McKeown & M. E. Curtis

(Eds.), The nature of vocabulary acquisition (pp. 89 – 105). London: Lawrence Erlbraum Associates.

Vanparys, J., Zimmer, C., Li, X., & Kelly, P. (1997). Some salient and persistent difficulties

encountered by Chinese and Francophone students in the learning of English vocabulary.

ITL, 115 – 116, 137 – 164.

Wenden, A. L. (2002). Learner development in language learning. Applied Linguistics, 23, 32 – 55.

Wesche, M. B., & Paribakht, T. S. (2000). Reading-based exercises in second language vocabulary

learning: an introspective study. The Modern Language Journal, 84(2), 196 – 213.

Winitz, H. (1977). Nonauditory auditory disorders. Otolaryncologic Clinics of N. America, 10, 187 – 192.

Winitz, H. (1978). The learnables. Kansas: International Linguistics Corporation.

Winitz, H., & Reeds, J.A. (1973). Rapid acquisition of a foreign language (German) by the

avoidance of speaking. International Review of Applied Linguistics, 18(3), 245 – 247.

42 Q. Ma and P. Kelly

Appendix A. Vocabulary items to be studied in WUFUN

Words

Acquaintance, available, burst, dam, damage, despair, dump, fail, formal, funny,

injury, jump, land, policy, quantity, ridiculous, roof, shallow, shrink, sign, stretch,

suit, utterly, wonder, weight.

Idioms

He is in the depths of despair

I am fit to burst

I split my sides laughing.

Computer Assisted Vocabulary Learning 43

44 Q. Ma and P. Kelly

Computer Assisted Vocabulary Learning 45