c orpus -i nformed t eaching and r esearch ii ken lau

CORPUS-INFORMED TEACHING AND RESEARCH II

Ken Lau

GOING FROM PRESCRIPTIVE TO DESCRIPTIVE

Prescriptive grammar has their origins in 18th and 19th

century Europe where grammar was connected to the

idea that there was a relationship between ‘orderliness’

in speech and writing. Prescriptive grammar is often

concerned with what is ‘correct’ and ‘incorrect’ – with

“standard language” and dictionaries that record correct

spelling and setting down precise meanings – all part of

unifying emerging nation-states. We can see a similar

tendency in China over the past 150 years, and Mandarin

and Modern Standard Chinese acquired normative status

as a national standard.

Many of the prescriptive rules of English that cause

us most trouble today come from this period (e.g. Who

did you speak to? or To whom did you speak?) and

were based upon analogies with Latin rather than

what English speakers actually said or wrote at the

Who did you speak to? To whom did you

speak?

GOING FROM PRESCRIPTIVE TO DESCRIPTIVE Descriptive grammar has their origin in 20th-century

linguistics and other disciplines that began to see language as a vital resource for studying all kinds of aspects of social and cultural life. Descriptive grammar grew out of a concern with the language forms people actually used. In some part, this approach was an anthropological one, as different language groups were studied in relation to European languages, but also in part as a wave of American missionaries set about translating the bible into languages that had no written language. The techniques of inducing grammatical rules from spoken data were learnt, but so also was a respect for the variability of language as a system for reflecting thought and relationships.

Out of this work grew the idea of

‘acceptability’ (for communication to succeed)

rather than simply a concern for formal

‘correctness’. From a descriptive point of view,

Who did you speak to? and To whom did you

speak? are both acceptable sentences because

both are used and both make sense. But To

speak whom? did you is not acceptable because

it is neither used, nor does it make sense.

GOING FROM DEDUCTIVE TO INDUCTIVE

The term ‘data-driven learning’ suggests that it is an inductive approach and therefore comparable with the implicit method, though the emphasis is on gaining insight rather than establishing habits, and in this sense it is mentalistic.

The approach makes high demands on the students in terms of language proficiency, observation and inductive reasoning. It is therefore more suitable for advanced language learners.

REALITY OR NOT?

Indeed corpora provide authentic use of

language. However, one question that people

ask is whether corpora really capture reality?

However large a corpus is, it is still not

enough to capture all instances of language

use of adult user’s experience

REALITY OR NOT?- TESTING CLAIMS BASED ON INTUITION

RealPositive Connotation?

‘real English’, ‘the real country taste’

100-Million word BNC‘real world’, ‘real life/lives’, ‘real term(s)’, ‘real problem(s)’

Positive?

REALITY OF NOT?- CLARIFYING MOTIVES FOR TEACHING

Carter discusses a number of spoken

formulae which carry ‘cultural content,

including expressions referring to other

nations:

‘Dutch courage’, ‘to go Dutch’, ‘double-

Dutch’, ‘Dutch cap’

Very low frequencies – are these worth

teaching?

Carter suggests two such reasons: ‘double-Dutch’, ‘go

Dutch’ and indeed, ‘Dutch cap’, could all be useful

expressions for a learner wishing to avoid social

embarrassment in Britain; and the study of British

insularity, as revealed through linguistic references to

foreign nationals and nations, could constitute a

stimulating activity which could increase learners’

awareness of cultural issues.

The inclusion in syllabuses of language which is very rare in large corpora thus calls for justification, and the same is equally true for the exclusion of language which is common. As we saw with ‘real’, corpora can remind us of frequent uses which might otherwise tend to be ignored. Thus McCarthy and Carter (1995) notice the frequency in speech of the semi-modal ‘tend to’ (it occurs almost as often as ‘ought’ in the BNC spoken component). Although this verb has traditionally received little attention in teaching, it arguably provides learners with a valid alternative to frequency adverbs such as ‘usually’ and ‘often’.

MATERIALS DEVELOPMENT

CORPUS-BASED ENGLISH FOR GENERAL ACADEMIC PURPOSES (EGAP) MATERIALS

A key endeavor in the production of corpus-

based materials to aid students with

academic writing of a general nature is that

by Thusturn and Candlin (1998a, b). Moving

from controlled to a more open-ended

writing activities would seem to be

inculcating in students the kind of ‘corpus

competence’

CORPUS-BASED ENGLISH FOR GENERAL ACADEMIC PURPOSES (EGAP) MATERIALS

In this corpus-derived material the lexico-grammar is introduced according to its specific rhetorical function, e.g. referring to the literature, reporting the research of others. Within each broad function, each keyword (e.g. argue, suggest) is then examined within the following chain of activities:

LOOK at concordances for the key term and words surrounding it, thinking of meaning (using for instance, BAWE)

FAMILIARIZE yourself with the patterns of language surrounding the key term by referring to the concordances as you complete the tasks.

PRACTISE key terms without referring to the concordances. CREATE your own piece of writing using the terms studied to fulfill a

particular function of academic writing

(Thurstun and Candlin, 1998)

An example of corpus analysis:Hong Kong students overuse ‘I’ in cognitive activities with affective attachment but ‘under-argue (Ho, 2012)

HK_CORPper 1k words

UK_CORPper 1k words

Compared to UK_CORP LL p

I + 42 verbs* 3.27 0.42 Overused 653 <0.0001I + 12 C verbs 2.03 0.27 Overused 395 <0.0001I + 20 D verbs 1.18 0.08 Overused 311 <0.0001

4 main verbs I argue 0.01 0.03 x 2.5 >0.05I suggest 0.18 0.01 Overused 52.9 <0.0001I believe 0.42 0.10 Overused 55.3 <0.0001I think 1.37 0.11 Overused 335 <0.0001

argue (total) 0.31 0.85 Underused 55 <0.0001suggest (total) 1.06 1.12 x 0.4 >0.05believe (total) 1.53 0.83 Overused 55 <0.0001think (total) 3.22 0.77 Overused 404 <0.0001

Notables: I agree 0.96 0.02 Overused 336 <0.0001I hope 0.11 0.00 Overused 45 <0.0001I am surprised** 0.20 0.00 Overused 81 <0.0001

CORPUS-BASED ENGLISH FOR SPECIFIC ACADEMIC PURPOSES (ESAP) MATERIALS

A variety of specialized corpora, consisting of

lectures, engineering textbooks, legal essays

and research articles, have been used for

various types of pedagogic applications,

which very often combine initial pen-and-

paper awareness-raising activities with

follow-up direct consultation of the corpus by

students.

Jones and Schmitt (2010) devised discipline-specific vocabulary materials including both technical and colloquial terms, derived from corpora of academic seminars on language and gender, international law and entrepreneurship. Mudraya’s (2006) materials, based on a 2-million-word corpus of engineering textbooks, also targeted vocabulary, but of a sub-technical nature. Mudraya has noted that this type of vocabulary (i.e. those items such as current, solution, tension which have some sense in general English, but are used in a different sense in technical English.

She proposes a set of queries based around solution on the grounds that this word occurs, in its general sense, both as high frequency word family and as a frequent sub-technical item. Students are presented with concordance output of carefully selected examples of solution and in one exercise are asked to identify, for example, the following: those adjectives used with solution (1) in the general sense and (2) in the technical (chemical) sense, and then asked to underline those adjectives that can be used with both senses of solution a means to highlight collocational sensitivities.

Task: Use the BNC corpus to compare the

pre-modifiers of solution in the Magazine

and Academic corpora.

At HKU, a legal concordancer was created by

the Centre for Applied English Studies of HKU

http://www4.caes.hku.hk/lawvocab/tools/inde

to help Law students improve the legal

writing skills. One task that students have to

do as part of the course Writing Solutions to

Legal Problem is to have each student

present the usage of a legal term of his/her

choice by analyzing the concordancing lines.

Several pedagogic applications approach the corpus consultation from a genre-based perspective. Bhatia et al. propose various move-specific concordancing activities for one genre of legal English, the problem-question genre written by students within academic settings. They note that deductive reasoning plays a major role in this highly specialized genre. One of the major foci, therefore, is to have students examine various types of non-lexical epistemic and pragmatic/discoursal hedges for the role they play in the deductive reasoning.

Another advocate of a concordance- and genre-based approach to academic essay writing in the legal field, specifically formal legal essays written by undergraduates, is Weber (2001). First, Weber’s students were inducted into the genre of legal essays b reading through whole essays taken from the University of London LLB Examinations written by native speakers, and identifying some of the prototypical rhetorical features, e.g. identifying and/or delimiting the legal principle involved in the case. They were then asked to identify any lexical expressions which seemed to correlate with the genre features. This was followed up by consulting the corpus of the legal essays to verify and pinpoint regularities in lexico-grammatical expressions. Similar to those tasks proposed by Bhatia et. al., Weber also approaches the lexico-grammar from the perspective of a ‘local grammar’, which ‘attempts to describe the resources for only one set of meanings in a language rather than for the language as a whole’ (Hunston 2002: 90).

LEARNER CORPORA

So far we have only looked at expert corpora. We should bear in mind that corpora containing texts from learners have high pedagogical and research value. Mukherjee and Rohrbach (2006) advocated individualizing writing by having students build mini-corpora of their own writing, and localising the database. A pedagogic initiative in which students compare a learner corpus of NNS MBA dissertation writing with a corpus of published journal articles from the field of Business Studies, both compiled by the teacher, is that by Hewings and Hewings (2002).

LEARNER CORPORA

In spite of the potential advantages in

integrating learner corpus data into pedagogy,

Nesselhauf (2003) points out that care is

needed in presenting learner corpus data to

students, as does Mukherjee (2009: 213): ‘It is

neither desirable or useful to establish a rigid

dichotomy between good and correct usage in

nature data on the one hand and bad an

incorrect usage in learner output on the other’

BILINGUAL CORPORA

In this short course, we have never touched upon bilingual corpora but their value to translation and language teaching and learning should not be underestimated. Both Teubert (2004) and Barlow (2000) emphasise that parallel corpora are especially useful for examining phraseological queries, with Barlow noting that frequency counts provide ‘a very good indication of the preferred structure in each language’. Frankenberg-Garcia (2005) shows the value of using concordancing output from a parallel corpus in preference to a bilingual dictionary as students can see the different contexts in which a word is used.

CORPORA IN TEACHER EDUCATION

In teacher education programmes which do include a component on the use of corpora in pedagogy, some insightful observations have been made regarding the following three aspects:Teaching about corpora (technological

awareness of what a corpus and concordancing are)

Teaching through corpora (pedagogic awareness from analyzing corpus samples)

Teaching with corpora (linguistic awareness)

TEACHING ABOUT CORPORA

Key notions to be covered here would include the

different types of corpora available (spoken,

written, multimodal, etc.), corpus design, size and

representativeness. Teachers need to know how to

choose among different types of corpora for particular

queries. Teachers would also be introduced to

concordancing, a key analytical tool for corpus

queries. Teachers need to know how to formulate

different kinds of queries through specifying

searches to the left and right of the node word

and how to sort the concordance lines

alphabetically.

TEACHING ABOUT CORPORA

Many other researchers point out that teachers’ IT

competence, or lack thereof, and preference for more

traditional resources are not to be taken lightly and

that technological awareness is a key component of

developing teachers’ corpus competence.

TEACHING THROUGH CORPORA

A necessary prerequisite for expert teaching is pedagogical

content knowledge consisting of content knowledge (i.e.

linguistic knowledge in the case of EFL teaching), pedagogic

knowledge, and content specific teaching knowledge.

Pedagogic and content-specific teaching knowledge have both been

addressed in corpus-based modules on teacher education

programmes.

O’Keeffe and Farr (2003) outline a series of tasks for raising

students’ awareness of pedagogic knowledge through analysis of

corpus classroom data. It is of interest to note that they combine

this aspect with raising teachers’ technological awareness and

also content knowledge of discourse analysis by building hints on

searching into the instructions, and by asking teachers to analyse

the concordance output based on the classroom discourse model.

They also point out that the corpus data chosen was from both

expert and non-expert teachers to avoid equating inexperience with

lack of expertise or vice versa.

Another initiative which aims to raise trainee

teachers’ awareness through investigation of

the function of discourse markers used in

classroom teaching.

TEACHING WITH CORPORA

Teaching with corpora to raise teachers’

linguistic awareness was first introduced in

teacher education programmes in the mid

1990s, together with training in using

corpora. These studies emphasise the

benefits of corpus-based enquiries to focus

on phraseological patterns or semantic

information which may not be found in

grammar books and dictionaries.

TEACHING WITH CORPORA

c orpus -i nformed t eaching and r esearch ii ken lau

standard language

variability of language

written language

real english

g oing

real terms

terms of language proficiency

authentic use of language

Documents

t eaching notes

harles ries orpus inguistics

c elebrations and c eremonies in language t eaching

decentralization in the t eaching p rofession

st nicholas of myra catholic parish · a brief history of...

building up c orpus of t echnical v ocabulary –...

first thoughts on learning andt eaching

centre for learning and t eaching newsletter

g ood s olid t eaching of percentages

aput et orpus - legeforeningen · tidsskrift for norske...

orpus hristi orpus hristi june 3rd, 2018 iglesia del ... ·...

orpus hristi atholic primary school

eaching agreement family dispute resolution and...

eaching high-rise plumbing design

welcome to t eaching + l earning t uesdays

atholic ocial eaching x

t eaching literacy delivering supportive fluency...

c orpus linguistics in esp: a genre- based...

¦ · saint joseph’s hurh report orpus hristi feast...

learning and eaching resources on ractical for …