computer analysis of chinese learner english3. chinese learner english corpus (clec) 3.1 design and...

Computer Analysis of Chinese Learner English

Gui ShichunYang Huizhong

HKUST 2001.6

1. Background

1.1 Corpus linguistics as an outcome of modern technologies (high-speed CPUs, big-sized rams and hard-disks, and scanners).

(Sinclair 1991)

1.2 Three approaches to linguistic study

• Different procedures for obtaining data about a language:

1.2.1 Introspection• Linguists as their own informants,

• judging the ambiguity, acceptability, and other properties of utterances against their own intuitions.

• Linguistic analysis based on invented examples:

• Flying planes can be dangerous.• Sincerity may scare the boy.

• John is eager to please.• John is easy to please.

• A competence-based approach

1.2.2 Elicitation

• A carefully planned intensive field investigation

• using controlled methods for eliciting judgments about sentences or elements they contain.

• Subjects are asked to identify errors, to rate the acceptability of sentences, to make judgments of perception or comprehension, and to carry out a variety of analytical procedures.

Sentences + ? -__________________________________________

1. He wants some cake. 76 0 0

2. Neither he nor they know the answer.

53 16 7 3. The old man chose his son a wife.

31 24 21

4. They aren't very loved. 4 20 52

5. A nice little car is had by me. 1 2 73

(From J. Svartvik)

• Field investigations help to yield objective and reliable conclusion.

• People's views vary on the acceptability of certain language forms.

• Time and manpower limits the scale of investigation.

• Informants' judgments may be subject to prompts, which influence the validity of the study.

• Partly performance-based and partly competence-based approach.

1.2.3 A corpus-based approach

• A representative sample of language, compiled for the purpose of linguistic analysis.

• Representativeness, randomness and size of sampling.

• Corpus data

• authenticity of data, • large size of text, and• not subject to subjective judgments.

• A performance-based approach

1.3 Sharply opposed positions on the appropriate data for linguistic study

• The critical problem for grammatical theory today is not paucity of evidence but rather the inadequacy of present theories of language to account for masses of evidence that are hardly open to serious question.

(Chomsky, 1965, Aspects of the Theory of Syntax)

• “The Corpus, if natural, will be so skewed that the description would be no more than a mere list.”

(Chomsky, 1962)

• Invented sentences based on native speaker's intuitions about well-formedness• Merit:

May reveal the depth of abstractness and complexity of native speakers' knowledge of their language.

• Problems:Only narrow aspects of human linguistic competence can be revealed by such methods.A strange notion of data: one normally expects a scientist to develop theories to describe and explain some phenomena which already exist, independently of the scientist, not to make up the data afterwards, in order to illustrate the theory.

• The nature of data for linguistics: Language should be studied in attested, authentic instances of use (not as intuitive, invented sentences); language should be studied as whole texts (not as isolated sentences or text fragments); andtexts must be studied comparatively across text corpora.

(Stubbs, 1996, Text and Corpus Analysis)

• The ability to examine large text corpora in a systematic manner allows access to a quality of evidence that has not been available before.

(Sinclair, 1991, Corpus, Concordance, Collocation)

1.4 Corpus linguistics as a new research paradigm

• The orientation of much linguistic research is undergoing change.

• There are calls for fresh approaches.

• This study will hopefully lead to a better understanding of the complexities of natural languages.

• We are working with real language data.

1.5 Characteristics of corpus-based research

• Empirical.

• A large and principled collection of natural texts.

• Extensive use of computer analysis.

• Qualitative as well as quantitative technique.

• Assumption that language is probabilistic in nature.

(Biber et al.,1998)

1.6 The revival of corpus linguistics

• McEnery & Wilson (1996) summarised the growth of linguistic studies by using the corpus-based approach as shown in the table below.

Date Studies

To 1965 101966~1970 201971~1975 301976~1980 801981~1985 1601986~1991 320

1.7 Corpus linguistics and EFL

Using computerized language data for teaching hasbecome so popular that it is now variously described as 'coming of age', 'becoming mainstream', 'the flower of the decade', etc., leading to the rise of computer learner corpus (CLC) researches.

Leech (1997) emphasises the implications of corpus linguistics for language teaching.

2. Computer Learner Corpora

2.1 Learner corpora are powerful tools for inter-language studies.

• Learner corpora provide a more complete picture of inter-language, namely the totality of the learner performance.

• Learner corpora put the variables affecting the learner performance under control and allow for conditioned queries.

Target Language

Inter-languageLanguage A

Figure 3-1 Inter-language illustrated.(Corder 1981:17)

2.2 Inter-language studies

Inter-language is a system ever approximating the norm of the target language.

Inter-language is dynamic, developmental, and characterized by variations.

Inter-language is the learners' transitional competence.

2.3 Significance of inter-language studies

• Understand language learning.

• Learners observe, make hypotheses, and experiment in the learning process

• Inter-language is the learners' attempted communication in the target language with their own hypotheses.

• Understand learner strategies in the TL production.

• Learners adopt every possible contributory strategy to meet their communicative needs.

• Such strategies are not errors to be prevented.

2.4 Contrastive studies

Learner corpora are accessible for different types of contrastive studies:

Native speakers VS. Non-native learners.

Native learners VS. Non-native learners.

Non-native learners of different mother tongues.

Non-native learners of different language proficiency.

COMPUTER LEARNER CORPORA

CLECHKUST LEARNER

CORPUS

International Corpus of Learner English

Learner Corpus of English by the Chinese Learners in Taiwan

Learner Corpus in

Britain & Europe

Contrastive Inter-language Analysis

NS vs. NNS

NNS vs. NNS

ADVANCED vs. INTERMEDIATE

LEARNERS

DEVIATIONS

AVOIDANCE

2.5 Different types of learner corpora

Untagged

with grammtical tags with error tags

Tagged

Learner Corpus

3. Chinese Learner English Corpus (CLEC)

3.1 Design and Procedure of CLEC

• CBACLE (Corpus-based Analysis of Chinese Learner English) is an ongoing project in the Ninth Five-Year Plan of the National Foundation of Social Sciences.

• Phase one is to develop CLEC (Chinese Learner English Corpus). Phase two is to conduct a series of studies based on the corpus.

• The project is undertaken by teachers of English in various universities across three cities (Guangzhou, Shanghai, XinXiang) under the general directorship of Professor Gui Shichun of Guangdong University of Foreign Studies and Professor Yang Huizhong of Shanghai Jiaotong University.

3.2 Goals of the Research Project

• A contrastive study of CLEC with the other corpora of English speakers, such as Brown (American English), LOB (British English), and sometimes with their updated versions Frown andFlob (by Freiburg U), AHI (American Heritage Intermediate, 5 million words of American English), and JDEST (a corpus of academic English, 4.5 million words).

• A quantitative and qualitative study of Chinese learners' errors of English.

3.3 Composition of CLEC

• 1,000,000 words from 5 types of learners

Code TokensMiddle school st2 251,558Non-majors, Band 4 st3 231,436Non-majors, Band 6 st4 206,906Majors, junior st5 237,978Majors, senior st6 258,099

1,185,977

Presentation1: The construction of CLEC

EM CEL SHCL

THE COMPONENTS OF CLEC

EM=English Majors

CEL=College English Learners

SHCL=Senior High School Learners

College English Learners Corpus500,000 words

Free writingsWritings for the testGuided writings

Presentation 1b: Data Elicitation

总体取样分数段分布直方图

6 7 8 9 10 11 12 13 14 15

分数段

取样份

系列1

The sampling of test writings of Band 4 and Band 6 (Li and Pu, Unpublished)

Tools for tagging and analysis:

Tagging

TACT CONCORDANCER

Brown Corpus(for comparison)

LOB Corpus(for comparison)

JDEST Corpus(for comparison)

Presentation 2: The flowchart of data processingStep 1: Keyboard the raw texts in the computer.

Step 2:

Tag the corpus with the tagging scheme.

Step 3:

Concordance and query the specific lexical patterns and put the result on the databases .

Step 4:

Query the databases.

Step 5:

Interpret the data and examine their relevance to the English learning and Teaching in China.

KeyboardingTexts

Corpus 1Raw Texts

Corpus 2Tagged Texts

Statistic Processing of Data &Concordancing for

Lexical Patterns

Database 1Data Management

Database 2Data Analysis

DataInterpretation

Ms Access Ms Excel

Manual Tagging

3.4 Principles of tagging the errors

• Concise but logical and systematic scheming.

• Detailed categorisation of more common grammatical errors and rough for those with lower frequency of occurrence. The error marking scheme has 61 items under 11 categories (word form, verb phrase, noun phrase, pronoun, adjective, adverb, preposition, conjunction, word usage, collocation, syntax ) .

• Openness.

• Exclusion of stylistic errors.

An Example

In the past, people were [vp6, 4-] kind to each other…

vp = verb phrase

6 = the 6th type

of error: tense

- = position of the error

4 = the context in which the error occurs, 4 words in front of the error.

Table 3 Some Examples of the Error Marking Scheme

CodeForm Verb PhraseType Code Type

fm1 spelling vp1 patternfm2 word building vp2 set phrasefm3 capitalization vp3 agreement

vp4 finite/non-finitevp5 non-finitevp6 tensevp7 voicevp8 moodvp9 modal/auxiliary

Appendixes 1b: The tagging scheme for the learner corpus

Code Class Type Explanation

fm1 word spelling spelling, coinage, abbreviation, apostrophe

fm2 word word building derivation, inflection, compounding, plurality

(noun),irregularity(verb), 3rd person singular

form(verb), syllabification,hyphenation, word

division or fusion

fm3 word capitalization lower initial letter for upper initial letter

or vice versa

vp1 vb phr pattern error in transitivity(vi as vt or vice versa),

transitive verb pattern/grammatical(cf Oxford

advanced learner's dictionary of current

English edited by A. S. Hornby)

vp2 vb phr set phrase phrasal verb and verbal phrase: error in form

or use

vp3 vb phr agreement number agreement with its subject(noun or

Figure4-3: The Tagging-with-Word Toolbar (Li Wenzhong, 1997)

fm vp np pr aj ad pp cj wd cc sn

Tagger1 Tagger2

4. Statistical features

4.1 Compiling the rank list

• Word.

nHDppppH iiii

log//)log()log(

2 =ΣΣ−Σ=

• Frequency.

• D (Dispersion) Value

• U: the estimated frequency-per-million tokensderived from with an adjustment for D.

min)1()[/1000000( fDFDNU −+=

)4(log10 += USFI

• SFI: Standard Frequency Index

39Once in 50,000 words

Word F D U SFIst2 st3 st4 st5 st6

DEBT 32 0.096 9.43 49.75 0 0 0 1 31

MEMORA-BLE

32 0.23 11.95 50.8 2 1 0 29 0

FLAG 32 0.366 15.4 51.9 27 1 0 2 2

TROUBLE 32 0.93 29.4 54.7 2 5 7 6 12

Table 4 Rank List (an example) SFI=50，once in 100,000 words

• Trimming the CLEC.

• Misspellings (such as *abilitical, *abilitities, *abilitys, *abillities, *ablelity, *ablity, *abtilities).

• Chinese proper nouns.

• After these trimmings, the total number of types decreased from 23,633 to 14,598. Altogether 8,675 types have been deleted.

• To justify the contrastive studies, we need to check the lognormality of CLEC. This is done by plotting the cumulative proportions of the sample (in terms of tokens) against the logarithms of the corresponding word-type frequencies. The plot should follow the cumulative normal curve.

1 10 100 10000

Figure 3 Word Frequency Distribution of CLEC

Frequency of Occurence

Figure 4 Word Frequency Distribution of Brown

Figure 5 Word Frequency Distribution of AHI

4.2 Some basic findings

• The TTR (Type/Token Ratio) of CLEC is much smaller than those of native speakers' corpora, showing that the Chinese learners have a limited vocabulary. The more proficient the learners, the greater their vocabulary range. The vocabulary of College English students appears to be smaller due to the constraint of topics, because their written works were chosen from examination papers.

Table 5 Comparison of type/token ratios

0.6925770.014176102979514598CLEC

0.7610.0345130731045093Flob

0.7610.0345131429245360Frown

0.75010.029327135941139868LOB

0.7829460.049699101423250406Brown

Log T/T ratio

T/T ratioTokenTypeCorpus

Table 6 TTRs of 5 Types of Chinese Learners

Learner Type Token TTR Log TTR

st2 5965 209189 0.028515 0.709629

st3 4497 198970 0.022601 0.689388

st4 4889 178526 0.027385 0.702481

st5 8305 205340 0.040445 0.737762

st6 9846 237770 0.04141 0.742772

The average word length of CLEC follows the general pattern of the other corpora with 3-letter words on the top of the list. But 4-letter words come the second instead of 2-letter words. CLEC(4.26) stands in the middle of Frown(4.28) and FLOB(4.23). The more proficient the learners are, the longer the words they use.

Figure 6 Comparison of Word Length of 5Types of Learners Against FLOB

0.41 3 5 7 9

Letter

y % St2

Figure 7 The Average Word Length of 5 Types of Learners

Word Length

The average length of sentences of CLEC is much shorter than that of the English speakers', though their general tendency seems to be using longer sentences. One interesting discovery is that college English learners seem to be using longer sentences than the other types of Chinese learners.

Table 7 The Average Sentence Lengthof 5 corpora

17.4CLEC37.9FLOB21.7LOB33.55Frown23.27Brown

Average Sentence LengthCorpus

Table 8 The Average Sentence Length of 5 Type of Learners

17.2St6

16.5St5

20.1St4

19.1St3

14.7St2

Average Sentence LengthLearner

5. Features of Learner Vocabulary

5.1 Over-use of frequently used words as a result of a limited vocabulary. This is demonstrated by the distribution table of most frequently used words of CLEC, Brown and AHI.

Figure 8 Distribution of Most Frequently Used Words of 3 Corpora

150Brown

Brown 47.43 62 68.86 76.26 80.66 86.24AHI 49 66.56 74.03 81.27 85.16 89.36CLEC 55.38 78.47 86.4 92.6 95.5 98

100 500 1000 2000 3000 5000

5.2 Under-use and over-use of words. CLEC was compared to Flob using Wordsmith (Mike Scott). Most of the over-used words (915) are related to their personal and school activities. Among the under-used words(927) were 3rd person pronouns (she, he, his, her, him, he'd, he's), some prepositions(of, by, within, off, as, with, over, between) and some verbs (had, was, been, were, might). This in a way shows the influence of Chinese.

5.3 Over-use leads to misuse. Following Biber(2000), we look at the usage of great, big andlarge in CLEC, and found great was greatly over-used. Then we look at the collocations of great with one word to the right, there are 22 collocates that occurred over 10 times in CLEC. In Flob, there were only 2 collocates (deal, 34; Britain, 16); whereas in Frown, only one (deal, 30).

Figure 9 Distribution of Great, Big,and Large

Frequency

Flob 534 255 385

Frown 450 327 389

CLEC 1300 502 365

Great Big Large

5.4 Collocation

• Collocations are co-occurrence of words.

• They are recurrent.

• They are syntactically and semantically constrained.

• They are usage restricted.

• They are arbitrary and domain-dependent.

• Some collocations are culturally loaded.

• Collocations are characterised by syntactic patterning and meaning associations.

• Collocations are on a scale on which at one end are free collocations, and at the other are fixed. It is the collocations in the middle area, the semi-fixed collocations, are most problematic.

• Unsuccessful L1 transfer has been identified as the major cause for improper collocations.

• The learners have greater problems in Verb + Nouncollocations than in other patterns.

100200

300400

500600

cc1 cc2 cc3 cc4 cc5 cc6

Band 4 Band Total

Figure 7-6: Distributions of the different patterns ofcollocation tagged as idiosyncratic.

cc1 improper n/n collocation cc4 improper a/n collocation

cc2 improper n/v collocation cc5 improper v/ad collocation

cc3 improper v/n collocation cc6 improper ad/a collocation

• The inappropriacy of the collocations is interpreted as the learner strategies in the attempted communication in the TL, such as

• Transfer;

• substitution;

• avoidance; and

• repetition.

4. Substitution

Collocating words Substituted words Chinese translation Possible sourcesproduction opened increased (?), grew 生产增长 IL based.living standarddevelops

(get) improved 生活水平在提高 IL based and L1transfer.

skills rise technology; (get)improved.

技术提高 L1 transfer.

life expectancy adds increases, grow 寿命增长 IL based and L1transfer.

our brain can't keepcalm

we 脑子不冷静 L1 transfer.

the death willdecrease

the death rate 死亡率将下降 IL based.

jobs are firm stable, steady 工作稳定 IL based and L1transfer.

news happened spread 新闻出现 IL based.health increases physical constitution;

(get) improved体质增强 L1 transfer.

Table 7-20: Analysis of substitution

Table 4. 1. 1: collocates of adapt in the learner and native speaker corpora

CLEC Brown, LOB, JDESTcolligation collocate occurrence collocate occurrence

changeschange

V to n societyworldsocity

2153 difficulties, hardships; circumstances,

situations, contexts; science, technologyV n society 8 policies, operations, techniques, method,

designs, tools, engineV –self to n society 11 ways, life, situation, realities, eventsV n1 to n2 scheme, plans, decisions, designs, words,

lines, qualifications (n1); needs,requirements, demands, interests,purposes (n2)

5.5 Coinages and creativity

• Coinages are created in a systematic manner.

• Coinages are the indicator of the learners' use of productive strategies with their TL vocabulary knowledge available.

• Intermediate learners are most active creators of new forms.

• Learners demonstrate great needs for explicit word formation knowledge.

• Learners tend to over-generalise word formation rules in derivation and transfer their L1 in compounding.

Coinage Restructuring Intended meaningin the context

Description

fastly fast Speedy, fast (used innegative sense).

IL based, falsehypothesis of TLrules.

hastely - In the manner of haste(used in negativesense).

hurrily Hurriedly ↑ ↑oftenly Often, frequently Indicates frequency,

used as a postmodifier.

IL based, avoidance

Increasely Increasingly Used in the sense ofincrease or growth

Skilly Skillfully - ↑Uncarefully Carelessly Not careful enough L1 transfer, literal

translation of L1chunks.

Continuely Continuously - IL based, falsehypothesis of TLrules.

Practicedly - Relating to practice. IL based.

Coinages Restructuring Intended meaning inthe context

Description

Possiblement Possibility - IL based, falsehypothesis of TLrules.

Increasement - The state of increase,or growth.

Comparation Comparison - ↑Changenous - The quality of

variability.↑

Useness - The action of using. ↑Hasteful - The state of haste ↑Suspectable - The quality of being

suspected.L1 transfer, literaltranslation of the L1term.

Changement - The result or state ofchanging.

Effection Effect - ↑Respectence - The state or action of

respect.↑

Warmdom Warmth - ↑Trainess - The action of training. ↑Regulership Regularity - ↑Workmateship - The sense of fraternity

between workers.IL based.

Skillness Skillfulness - IL based, falsehypothesis of the TLrules.

Valueness - Value, being ofinvaluable quality.

Pre-person - Forefathers, ancestors. ↑Pointment - The important point. ↑Waterful - Water-filled as

contrast to waterdrained.

IL based.

Table 7-5: The expression of quality, state, and action

Description

Successer - The person whosucceeds in somefield.

IL based or L1transfer.

Colleger - A college student. IL based, falsehypothesis of the TLrules.

Takers - The people who do ajob.

Disearsers - The people who havediseases or patients.

Typeier Typist - ↑Societor - The person who is

worker as contrast to acollege student.

Statists Statistician - ↑Failurer A failure A person who fails. ↑Sporters Sportsman - AvoidanceChallenger - A person who

challenges.IL based, falsehypothesis of the TLrules.

Funder - The person ororganization thatprovides funds.

IL based.

Constructors - The person whoconstructs or developsa concrete building, oran abstract system.

IL based, falsehypothesis of the TLrules.

Servicer - A person who doesservices (oftentermporarily).

Table 7-6: The expression of identity, actors, doers, executors of the actions.

2. Compounding

Description

Workmates - Fellow workers Both IL based or L1transfer.

Advertise-man - The person who isengaged in the designof advertisement.

L1 transfer. (cf.Guanggaoren).

Oilman - The person who sellsoil.

IL based.

Man-at-home - The person who likesstaying with family.

IL based.

Family-teacher Home tutor - L1 transfer. (cf.Jiajiao).

House-teacher Home tutor - ↑

Table 7-8: The expression of professions

3. Back formationCoinages Restructuring Intended meaning in

the contextDescription

Solute Solve To provide solution.Back formed fromsolution.

IL based.

Revoluting Revolutionize Back formed fromrevolution.

Comprehense Comprehend Back formed fromcomprehension.

Table 7-11: Back formation

0.10.2

0.30.4

0.50.6

6 7 8 9 10 11 12 13 14

Band 6 Band

Figure 4-2: The distribution of coinages across the score ranges.

6. Error Distribution

• Statistical Analysis of Chinese Learner Errors in English

• A horizontal investigation of errors in terms of error types.

• A vertical investigation of errors in terms of learners.

6.1 Horizontal Investigation

• 71,046 errors were identified in the corpus of 1,185,977 tokens. The percentage of errors is 6%.

• Of the 11 categories of errors, word forms, word usage, syntax, and verb phrase constitute the greatest proportion, about 5%.

• The D (dispersion) value is 0.914, showing the errors are well distributed among 5 types of students.

Table 10 Summary of Errors

0.00557469471053114812711327NP

0.0091072213171364256322423236VP

0.010116529251762247327453747Syntax

0.0151761519232310463252853465Lexis

0.0161887929023241349045384708WForm

%subtotst6st5st4st3st2Error

0.000367435284666188107Adv

0.0001121332133261340Conj

0.000422500148821037790Adj

0.000873103572246217297203Prep

0.0014921769138217503582329Pron

0.0021592560271185820804480Colloc

%subtotst6st5st4st3st2Error

Figure 10 Error Distribution ofCLEC

Syntax

Factor analysis shows that the 11 categories of errors can be grouped under 3 factors. Factor 1 (word form, syntax, verb phrase, noun phrase, & conjunction, 37% ) can be interpreted as the syntactic factor, Factor 2 (Lexis, collocation, pronoun, adverb, & conjunction, 32% ) the semantic factor, and Factor 3 (Preposition, adjective, & adverb, 26%) the functional factor.

Table 11 Rotated Component MatrixComponent

1 2 3WDFORM .774 .281 .449LEXIS .400 .826 .363SYNTAX .914 .128 .386VERB .970 .209 0.007NOUN .829 .287 .461COLLOC .396 .878 .112PRONOUN .352 .820 .404PREP .189 .324 .913ADJ -.320 -.008 -.937ADV .383 .629 .547

CONJ .526 -.809 0.005

6.2 Vertical Investigation

• Vertical investigation looks at the 5 types of students representing different levels of language development.

• Factor analysis shows the 5 types of Chinese learners belong to one single group.

• Cluster analysis further shows their hierarchical relation.

Figures 13 Cluster Analysis of Chinese Learners

Tree Diagram for 5 VariablesSingle Linkage

Euclidean distances

ST4 ST3 ST6 ST5 ST2

• The general tendency is that the more proficient the learners are, the less errors they make. But there are also fluctuations in their development of the target language.

• As a whole, st3 and st4 commit more lexical errors than st2, and st4 commit more errors in verb phrases than st3. St3 learners also make morecollocational errors.

Figure 14 The General Trend of Errors of 5 Types of Learners

st2 st3 st4 st5 st6Learner Types

Number of Errors

WdFord

Syntax

Colloc

Some more “abnormalities”:St2 commit more errors in tense, and st4 more errors in agreement.St3 and st4 commit more errors in spellings and word formation.St3 and st4 tend to use more run-on sentences, and st5 and st6 tend to use illogical comparisons.St3 and st4 tend to use the wrong lexical words and parts of speech.

Figure 15 Errors in Verb Phrase

St2 St3 St4 St5 St6

Learner

No of Errors

pattern

set phrase

agreement

finite/nonfinite

nonfinite

modality

Figure 16 Errors in Word Form

100015002000250030003500

st2 st3 st4 st5 st6

Learner

Number of Errors

Spelling

WordFormation

Capitalization

spelling

Word formation

Figure 17 Errors in Syntax

st2 st3 st4 st5 st6

Learner

No of Errors

run-on

fragment

dangling

illogical

coordinat

subordinat

structural

Run-on

illogc

7. Some Concluding Remarks

• Contrastive study is possible because CLEC and the other corpora follows the same lognormal model.

• The corpus approach provides us with an interface between the quantitative method that relies on figures, and the qualitative method that relies on words. It is essential to integrate the two methods.

• In terms of log type/token ratio, average word length, and average sentence length, the figures of the Chinese learners are lower than that of the native speakers'. The more proficient the learners, the closer they are to the native speakers' of English.

• The first 5000 words, which occupy 36% of the types, constitute 98% of the tokens. This implies the learners are using a smaller vocabulary on a greater number of occasions, or over-using the frequently-used words. This may cause their misuse, especially of polysemous words.

• The implication is that we should pay more attention to the teaching of frequently-used words (especially polysemous words and word usage) rather than to simply increasing the vocabulary range of the students.

• Chinese students have difficulties in both lexis and syntax. They are all weak in spelling. Apart from lexis, the three types of errors decrease with the language development of the students.

• Among the lexis errors, 40% of them are related to misuse (substitution), then wrong parts of speech (17%), and redundancy (14%). Lexis errors may not be closely related to the students' levels, they may be more dependent on the writing tasks.

• We have paid more attention to the sampling of student types, than to the sampling of topics.

• Vocabulary knowledge involves the knowledge of the sound system, syntactic functions, semantic relations of individual words of the TL.

• Vocabulary acquisition is essential to learning success.

• Collocation is one of the important component of vocabulary knowledge.

• Collocations, particularly the semi-fixed collocations, create special problems for the Chinese learners of English.

• Implications for EFL vocabulary learning:

• Words learned in isolation may be the sourceof problems.

• Equivalence between L1 and the TL words is bleak and may cause unsuccessful transfer.

• Learners should learn word by chunks.

• Implications for EFL teaching:

• Words must be delivered in association and collocation with adequate context.

• Explicit vocabulary instruction is necessary.

• The CLEC corpus is restricted to the written language only.

• The corpus can only reflect the productive, not the receptive ability of the students.

• Consistency of tagging remains a problem. And the error marking scheme needs to be improved.

The CLEC corpus is now accessible onthe Internet (www.eclass.com.cn or www2.gdufs.edu.cn).

Professor Gui's email address isitscgui@gdufs.edu.cn

My email address is hzyang@mail.sjtu.edu.cn

Thank you for your attention !

-----------------------------------------------------------------Professor Yang HuizhongChairman, National College English Testing Committee of ChinaShanghai Jiao Tong University

hzyang@mail.sjtu.edu.cnwww.sjtu.edu.cn/cet

------------------------------------------------------------------------

computer analysis of chinese learner english3. chinese learner english corpus (clec) 3.1 design and...

Documents

costing and learner support the analysis of costs and its...

cpni cert 2011- broadvox-clec

texas clec-to-clec and clec-to-ilec migration guidelines

competitive local exchange carrier (clec) operations support

university of birmingham syk dependent phosphorylation of...

amplifier collocations in the chinese learner english corpus...

ib learner profile ib learner profile ib learner profileib...

assessing confidence in the chinese learner stephen bruce...

handbook for chinese learner

clec presentation at sea customary land rights conference

outrossim, eng.o wilde clec alcao de alencar

learner profiles-chinese

ameritech...

tribunac de contrataciones clec estado

a successful chinese learner a successful chinese learner...

f -10 australian curriculum: languages chinese second...

clec implementation process

clec18-quiz clec 2018 poster and tech demo...

m&a - increase your clec value

ppt cox clec comp