corpora in grammatical studies corpus linguistics richard xiao [email protected]

45
Corpora in grammatical studies Corpus Linguistics Richard Xiao [email protected]

Upload: lauren-wiley

Post on 28-Mar-2015

239 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Corpora in grammatical studies

Corpus LinguisticsRichard Xiao

[email protected]

Page 2: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Aims of this session• Lecture

– Corpus-based grammar: Scope and principles– The state of the art of using corpora in grammatical

studies– Using corpora to improve grammatical descriptions:

Infinitival complementation of help• Lab session

– Position of if-clauses in ICE-GB

Page 3: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Corpus revolution• Like lexicographic and lexical studies, grammar is

another area which has frequently exploited corpus data– A balanced representative corpus provides a reliable basis

for quantifying grammatical categories and syntactic features– It is also useful in testing hypotheses derived from

grammatical theory• There has been increasing consensus that non-corpus-

based grammars can contain biases while corpora can help to improve grammatical descriptions (McEnery & Xiao 2005)

• Corpora have had a strong influence on recently published reference grammar books (at least for English)– ‘even people who have never heard of a corpus are using the

product of corpus-based investigation’ (Hunston 2002: 96)

Page 4: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Principles of corpus grammar (Leech 2000)• Data-oriented grammar

– allowing the combination of a quantitative and a qualitative description of the data

– a grammar accountable to observed data of attested language use• Functional Grammar

– establishing a relation between phenomena that are external to the language system and system-internal phenomena (form vs. meaning)

– their explanation of grammar in terms of the wider context of human psychology and behaviour

• Variety Grammar– allowing the description of the full range of varieties (e.g.

conversation, fiction writing, news writing, academic writing)• Integrative Grammar

– allowing an integrated description of syntactic, lexical, and discourse features

– close to communicative grammar as opposed to ‘autonomous syntax’ view of grammar

Page 5: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

A new milestone in English grammar• Longman Grammar of Spoken and Written

English (i.e. LGSWE, Biber et al 1999)– A new milestone following Quirk et al (1985)

Comprehensive Grammar– Based entirely on the 40-million-word Longman

Spoken and Written English Corpus– Giving “a thorough description of English

grammar, which is illustrated throughout with real corpus examples, and which gives equal attention to the ways speakers and writers actually use these linguistic resources” (Biber et al 1999: 45)

Page 6: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Features of corpus-based grammars

• Paying attention to the differences in speech and writing

• Taking account of register/genre variations• Providing frequency information• Treating lexis as an integral part of

grammatical description• Giving authentic examples

Page 7: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Some examples of corpus grammars

• Corpus-based English grammars focusing on speech– Carter, R. and

McCarthy, M. (1997) Exploring Spoken English. Cambridge: Cambridge University Press.

– McCarthy, M. (1998) Spoken Language and Applied Linguistics. Cambridge: Cambridge University Press.

Page 8: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Some examples of corpus grammars

• Corpus-based grammars with a focus on lexis– Francis, G., Hunston, S. and

Manning, E. (1996) Collins COBUILD Grammar Patterns 1: Verbs. London: HarperCollins.

– Francis, G., Hunston, S. and Manning, E. (1998) Collins COBUILD Grammar Patterns 2: Nouns and Adjectives. London: HarperCollins.

– Hunston, S. and Francis, G. 2002. Pattern Grammar. Amsterdam: John Benjamins.

Page 9: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Some examples of corpus grammars

• Corpus-based grammar exploring taking account of register variation– Biber, D., Johansson S.,

Leech G., Conrad S. and Finegan, E. (1999) Longman Grammar of Spoken and Written English. London: Longman.

Page 10: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

A case study

• Using corpora to improve grammatical descriptions– Infinitival complementation of HELP

Page 11: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

A commonly used word

• In the 100-million-word BNC– 245th most frequent word

• 529 instances per million words

– 72nd most frequent verb as a lemma

Page 12: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

A verb with a distinctive syntax• English has two main-clause verbs that can control either a full or a bare

infinitive: dare and help (Biber et al 1999: 735)– The choice between a full and bare infinitive is only available when dare is

used as a lexical verb (as a modal verb, always followed by a bare infinitive)• HELP is the only English verb that can control either a full or bare infinitive

AND occur either with or without an intervening NP– HELP to V

• Perhaps the book helped to prevent things from getting even worse.– HELP NP to V

• I thought I could help him to forget.– HELP V

• Savings can help finance other Community projects.– HELP NP V

• We helped him get to his feet and into the chair.• Dare can occur with or without an intervening NP, but it cannot control a

bare infinitive when such an intervening NP is present– Ernest <…> dared Archie to punch him in the stomach.

Page 13: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

A unique verb of great interest

• A verb that has often been given prominence in textbooks, grammars and dictionaries– E.g. Chalker (1984); Murphy (1985); Quirk et al (1972,

1985); Eastwood (1992); Biber et al (1999)• A verb that has aroused much interest and debate

– Language variety– Language change– Register variation– Semantic distinction– Syntactic conditions

Page 14: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

The corpora

Page 15: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Language variety: AmE vs. BrE• Bare infinitives are much more

common in AmE (cf. Biber et al 1999)– 80% (AmE) vs. 52% (BrE)– LL=23 (1 df), p<0.001

• British preference for full infinitives– You’re going to help me make

to make a birthday cake for Jim remember. (BNC)

• A construction of American provenance, which has penetrated rapidly into BrE– Zandvoort (1966): ‘except in

American English, however, to help usually takes an infinitive with to’

• No longer valid

Page 16: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Language change:1961-1991

• Changing labels for bare infinitives– (OED,1933) “vulgar” -> (Vallins

1951) “not seriously questioned now…” -> (Mair 1995) “lost the informal ring”

• An increase in the proportions of bare infinitives over the three decades in both AmE and BrE

– AmE: 68% -> 82% (+14%)• LL=10.6 (1 df), p=0.001

– BrE: 22% -> 60% (+38%)• LL=47.5 (1 df), p<0.001

• A greater shift towards the use of bare infinitives in BrE because AmE was already more “tolerant” of bare infinitives in the 1960s

Page 17: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Spoken vs. written• Bare infinitives are slightly more

frequent in speech than in writing, in both AmE and BrE

• The differences are not statistically significant– AmE: LL=2.71 (1 df), p=0.10– BrE: LL=2.16 (1 df), p=0.142

• No predictable distribution pattern for bare infinitives in 15 written genres– Common in some formal genres

(e.g. official documents) but infrequent in other formal genres (e.g. academic writing)

Page 18: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Semantic distinction

• The debate has a long history• Some “pre-corpus” arguments

– Wood (1962: 107-8): to ‘can be omitted only when the helper does some of the work, or shares in the activity jointly with the person that is helped’ – Wood’s “unacceptable” examples

• These tablets will help you sleep.– But tablets do not sleep

• Writing out a poem will help you learn it. – But writing does no learning

– According to Quirk et al (1972: 841), the choice ‘is conditioned by the subject’s involvement’

• With a bare infinitive, ‘external help is called in’• With a full infinitive, ‘assistance is outside the action proper’

Page 19: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Semantic distinction– Dixon (1991)

• John helped Mary eat the pudding– John ate part of the pudding as Mary did

• John helped Mary to eat the pudding – John fed the pudding to Mary

– Duffley (1992)• A bare infinitive evokes helping as ‘direct or active involvement’• … help to V evokes help as a condition which enables the person

being helped to realize the event– Lu (1996: 813)

• When the subject of ‘help’ does not take part in the helping activity, the infinitive must take to

– The book helped me to see the truth.– What do your intuitions tell you?

Page 20: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Semantic distinction• Not reported in more recent corpus-based works (e.g.

Longman 1993/1996; Collins 1995; Biber et al 1999)– Quirk et al (1985) dropped the argument for semantic distinction– Collins CoBuild Dictionary

• “If you help someone, you make it easier for them to do something, for example by doing part of the work for them or by giving them advice or money.”

• It is not always easy or even possible to make a distinction between whether or not the helper actually takes part in the helping activity

• Counter examples are abundant in corpora– I help people stop smoking. (FLOB)– oh it says if you have a dose last thing at night it helps you sleep.

(BNC)

Page 21: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Syntactic condition: Intervening NP

• The previous claim (Lind 1983; Kjellmer 1985; Biber et al 1999) that an intervening NP increases the proportion of bare infinitives is only partly supported by our corpora– Only valid in AmE, both written and spoken– Unpredictable results, no statistical significance in BrE

Page 22: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Syntactic condition: Intervening adverbial

02468

10121416

Brown

Frown

CPSALO

BFLO

B

BNCS

Fre

qu

ency

bare-inf full-inf

• Lind (1983) claims that ‘an intervening adverbial will preclude omission of to’– The whisky helped me not to

stagger under this blow.• This claim is ungrounded, esp. in

AmE (CPSA)• Some counter examples

– So, to help people not jump all over it as soon as they see it <…> (CPSA)

– <…> that would even help perhaps focus some of those responses. (CPSA)

– Mr. Clinton <…> also helped, to a much lesser degree, organize a huge march in Washington <…> (Frown)

...helping dramatically reduce poverty. (Time Magazine 2005/12/05)

Now my daughter...is helping digitally restore the Disney films her grandfather worked on. (Time Magazine 2006/04/10)

Page 23: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Syntactic condition: to preceding help

• To preceding help is a decisive syntactic condition that encourages the omission of to (cf. Lind 1983; Kjellmer 1985; Biber et al 1999)– HELP (lemma): 60%– help (finite verb): 65%– to help (infinitive): 88% (+23%)

• Consecutive repetition of to tends to be avoided on the grounds of euphony (cf. Lind 1983)– They took on an estate manager

and wine-maker to help run the business. (FLOB)

• A statistical norm, not categorical distinction

0%

20%

40%

60%

80%

100%

HELP help to help

bare-inf full-inf

In the BNC, to help V (2,161) is 17 times as frequent as to help to V (127)

Page 24: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Syntactic condition: Passive voice

• Palmer (1965: 169) observes that ‘passive occurs <…> only with to: They were helped to do it.’

• All of the 9 instances of passivized HELP in our corpora take a full infinitive with no exception

• No instance of BE helped V is found in the whole BNC or the 100-million-word Time corpus of AmE

• Explanation (?): An analogy can be drawn between HELP and verbs such as MAKE, LET, SEE and HEAR: oC = bare infinitive– The infinitive shifts from oC to sC in passive transformation

• So they should be made to bring their prices down. (BNC) – So the authorities should make them (*to) bring their prices down.

• Pupils should be helped to investigate topics on their own. (BNC)– Teachers should help pupils (to) investigate topics on their own.

Page 25: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Case study: A summary• The choice of a full or bare infinitive following

HELP is conditioned by a wide range of factors including, for example, language variety, language change, as well as various syntactic conditions

• Non-corpus-based grammars are likely to contain biased descriptions that do not accord with attested language use

Page 26: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Adverbial clauses:Position vs. semantic types

Greenbaum and Nelson (1995)

Page 27: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Exploring if-clauses in ICE-GB• ICE-GB

– One million words– 500 samples (300 spoken + 200 written)– Parsed corpus

• Position of if-clauses– Clause initial position

• If it’s a really nice day we could walk.– Clause-final position

• We could walk if it’s a really nice day.

• Reference– Nelson, G., Wallis, S. and Aarts, B. (2002) Exploring Natural

Language: Working with the British Component of ICE. Amsterdam: John Benjamins

Page 28: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

ICEUP

+ Expand to see text categories

Page 29: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Fuzzy Tree Fragment (FTF)

Press "Inset after" twice

Page 30: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

“Edit Node” menu

Page 31: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Editing 1st node

Page 32: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Editing 2nd node

Page 33: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Editing 3rd node

Page 34: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Specifying word

Page 35: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Complete nodes with specified word

clause (main) Adverbial clause introduced by the subordinator “if”

Page 36: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Specifying position (initial)

Click on "First: Yes" for initial position; white linking line disappears

Finally press "Start"

Page 37: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Results for initial position

Page 38: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Example of parse tree

Parsing unit

Page 39: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Specifying position (final)

Page 40: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Results for final position

Page 41: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Example of parse tree

Page 42: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Frequencies of initial / final positions

• Initial position appears to be the “unmarked” position for if-clauses– Initial position (886, 61.4%)– Final position (556, 38.6%)

Page 43: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Written registers

Greenbaum and Nelson's (1995) observation of conditional clause (64.8% for initial and 35.32% final) only applies to written registers

Page 44: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Spoken registers

In the spoken data as a whole, the final position is preferred, though there is considerable internal variation.The more "formal" spoken registers (parliamentary debates, legal presentations and non-broadcast (scripted) speeches show a marked preference for the initial position.

Page 45: Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

ICE-GB: Ditransitive verbs