1 verb compounds within canonical typology: chinese separable verb compounds anna siewierska jiajin...

32
1 Verb compounds within canonical typology: Chinese separable verb compounds Anna Siewierska Jiajin Xu Richard Xiao

Upload: gilbert-bruce

Post on 27-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

1

Verb compounds within canonical typology:

Chinese separable verb compounds

Anna SiewierskaJiajin Xu

Richard Xiao

22

Overview of the talk

Separable verb compounds (SVCs)1

Canonical typological strategy2

A case study of SVCs in Mandarin3

33

Separable verb compounds

• Some languages have verb compounds which are made up of two parts, a verbal stem and a movable element standing before or after the verb in adjacency or close proximity– Different terms in the literature• separable verb compounds, split words, separable verbs,

ionised words, discontinuous / detachable / breakable / discrete words, etc

44

An example of Chinese SVC

• dan1xin1, lit. carry heart, ‘to worry’

• dan1-le yi1 shang4wu3 xin1, carry ASP one morning heart, ‘to be worried the whole morning’

• xin1 yi4zhi2 dan1-zhe, heart all the time carry ASP, ‘to have been worried all the time’

5

Sound similar?

• Derivation by infixing (e.g. abso-fucking-lutely) and syntactic interposing (e.g. of bloody course) in English

• Separable complex verbs in Dutch (aankomen ‘arrive’) and German (ankommen ‘arrive’)

• But Chinese SVCs are …

5

6

…essentially different• 1) Insertions in English infixing and interposing

– Almost exclusively restricted to expletives, euphemisms, and amplifiers– Acting as an ‘emotive intensifier’

• In contrast, discontinuous use of Chinese SVCs has a greater variety of insertions and discourse / pragmatic functions– Insertions as head / tail satellites: aspect markers, RVCs, quantifiers,

classifiers, modifiers, etc– Providing extra information– Acting as a mitigator / softener– Showing casualness– Expressing negative emotions such as disapproval– Enhancing rhythm – important in a syllable-timed language like Chinese

6

7

…essentially different• 2) A significant difference between SVCs in

Mandarin and the split prefix phenomenon in Dutch (e.g. binnenkomen, ‘to come in’) and German (e.g. abfahren, ‘to drive off/depart’)

• Chinese SVCs are not words with a separable affix– E.g. dan1xin1 ‘worry’ V O

7

8

…essentially different• 3) SVCs in Dutch and German can have a wide range of

constituents of all types as insertions, including complex NGs and subordinate clauses as in the example below– A Dutch example of opbellen ‘ring up’

• Ik bel op• Ik bel hem op I ring him up• Ik bel hem morgen op I ring him tomorrow up• Ik bel de man waarvan ik houd op I ring the man that I love up

• ...which is completely impossible in Chinese

8

99

Why are SVCs interesting?• 1) SVCs are a large class of verbs in Chinese which

cannot be marginalised• 2) They satisfy none of the ‘universal criteria’ for

wordhood (Dixon and Aikhenvald 2002: 19-20)– ‘A grammatical word consists of a number of grammatical

elements which (a) always occur together, rather than scattered through the clause (the criterion of cohesiveness); (b) occur in a fixed order; (c) have a conventionalised coherence and meaning’• Criterion (c) means that speakers of the language ‘may talk about a

word (but are unlikely to talk about a morpheme)’

1010

Why are SVCs interesting?• 3) SVCs violate one of the most fundamental

principle of the theory of word formation– The Principle of Lexical Integrity: Word-internal

structures are not accessible to rules of syntax (Booij 1990: 45)

• 4) SVCs are listed as words, but they clearly have some ‘phrasal’ properties, thus straddling the boundary of morphology and syntax– E.g. the analysable internal structures of Chinese

SVCs

1111

Canonical typology

• To study such fuzzy and cross-border grammatical categories, canonical typology (CT) has proved to be a useful strategy (cf. Bond 2007; Corbett 2007; Nikolaeva 2008), e.g.– Suppletive forms– Agreement– Negation– Syncretism– …

1212

Standard strategy in typological research (Croft 2003: 14)

1. Determine the particular structure or situation type of interest

2. Examine the morpho-syntactic construction(s) or strategies used to encode that situation type

3. Search for dependencies between the constructions used for that situation and other linguistic factors

– i.e. other structural features and external functions expressed by the structure, or both

1313

Canonical typological approach

1. Start with a linguistic phenomenon2. Establish a general definition for identifying

that linguistic category 3. Construct a set of features or criteria for the

typical (canonical) case of the category4. Use the criteria to investigate the relevant

categories in languages

1414

Canonical typological approach

1. Start with a linguistic phenomenon2. Establish a general definition for identifying

the linguistic category in question3. Construct a set of features or criteria for the

canonical case of the category4. Use the criteria to investigate the relevant

categories in languages

1515

How can corpora inform CT?• In CT, the features are usually collected from the

literature– The collection could be selective, subjective and arbitrary

• Can the selection of features be more objective and reliable?– We seek to answer this question from the corpus linguistic

perspective– The corpus-based approach makes it possible for variational

parameters of SVCs to be summarised exhaustively and more objectively by looking at a large amount of attested language use simultaneously

1616

A case study of Chinese SVCs

• What are common types of insertions and external patterns of discontinuous use of SVCs in Mandarin?

• How can canonical features be identified on the basis of frequency?

• How can the study of SVCs in Chinese contribute to the research of similar phenomena in other languages?

1717

Prevalence of SVCs in Mandarin

• The 2002 edition of the Modern Chinese Dictionary includes 3,236 types of SVCs (Zhu 2006: 29)– Four categories: verb-object (97%), verb-

complement, subject- predicate, and coordinative

• Given their prevalence, no grammar of Chinese can turn a blind eye to the ‘verb-object paradox’ (Packard 2003: 108)

1818

Corpora

• Two corpora are used in this study– The Lancaster Corpus of Mandarin Chinese (LCMC) for

written Chinese – The Lancaster Los Angeles Corpus of Spoken Chinese (LLSCC)

for spoken Chinese

• The LCMC is a balanced corpus of written Chinese composed of one million words proportionally sampled from fifteen genres ranging from news, fiction to academic prose published in mainland China around 1991 (see McEnery, Xiao & Mo 2003)

1919

Corpora

• The LLSCC comprises one million words of dialogues (55%) and monologues (45%) in Chinese, covering both spontaneous (57%) and scripted (43%) speech in six spoken genres

• The two corpora are also tokenised and POS-tagged

• They provide an empirical basis for our quantitative and qualitative analysis of SVCs in Chinese

2020

Seed SVCs for data extraction

• A total of 1,738 commonly used SVCs listed in A Dictionary of Split Word Usage in Modern Chinese (Yang 1995) were used as seeds to automatically extract all instances of possible SVCs exhaustively when their the head and tail are separated, in either forward or backward direction, by a span of 1-10 words– 2793 raw concordance lines were extracted from

the two corpora

2121

Human evaluation and annotation• Each concordance line was evaluated independently

by two native Chinese speakers in order to remove noise in automatically extracted results

• Only 565 true instances of discontinuous use of SVCs are retained for further annotation and analysis– Type of insertion, direction of separation, word semantics,

sentence semantics (i.e. pragmatic meaning), sentence type, genre

2222

Syntagmatic pattern of SVCs

SVCH + NEG + ASP/RVC + MC + CL + MOD + SVCT

2323

Head satellites of SVCs• Aspect insertion

• Expanded aspect insertion

– Note: The ? slot can be filled or left blank

Pattern SVC types (%) SVC tokens (%)

SVCH-le SVCT 42 (25%) 74 (13%)

SVCH-guo SVCT 15 (9%) 22 (4%)

SVCH-zhe SVCT 12 (7%) 35 (6%)

Total 69 (42%) 131 (23%)

Pattern SVC types(%) SVC tokens(%)SVCH (?) ASP (?) SVCT 91 (55%) 244 (43%)

2424

• RVC insertion

• Expanded RVC insertion

– …hardly surprising given that RVCs can be analysed as markers of the “completive aspect” in Chinese (Xiao and McEnery 2004)

Pattern SVC type (%) SVC token (%)SVCH RVC SVCT 20 (12%) 26 (5%)

Pattern SVC types (%) SVC tokens (%)SVCH (?) RVC (?) SVCT 20 (12%) 66 (12%)

Head satellites of SVCs

2525

Tail satellites of SVCs• Classifier (CL)– 21% (116 SVCs) contain a classifier

• Nominals in Mandarin are typically preceded by a classifier

• Quantifier (MC)– 19% (108 SVCs) contain a quantifying construction

• Modifier (MOD), i.e. pre-modifiers of tails– Possessive pronouns (64 times, 11%)– Adjectival modifiers (63 times, 11%)– Nominal items (59 times, 10%)– Question word (i.e. shen2me ‘what’, 26 times, 5%)– Also combinations of these elements

2626

SVC network:Lexical and grammatical patterning

SVCHSVCTLE

DA

GE

HAOYI

Aspect marker (±)

Resultative verb complement (±)

Quantifier

Classifier

Modifier

2727

• Synchronically, located somewhere on the continuum between words and phrases (cf. Guo and Qian 2004)

words SVCs idioms phrases• Diachronically, wordhood subject to language change– Many compound words in current use have evolved from

phrases (e.g. daoqian ‘apologise’, jugong ‘bow’)– Givón (1971): ‘Today's morphology is yesterday's syntax.’

• Two criteria - depending on the type and number morpheme(s) in the insertion – – Over half of discontinuous use of SVCs in our data (i.e. 54% if RVCs are

seen as quasi-aspect markers), together with their combined cognates, can be analysed as legitimate compound words

Words or phrases?

2828

Two overarching criteria

• Structural criteria– Host dependency• Head dependence enjoys priority over tail dependence

• Phonological criteria– PrWd restriction (Feng 2001, 2002)• A disyllabic unit is the typical prosodic foot in Chinese• A trisyllabic unit can also be a prosodic word

2929

Structural criteria• According to the host dependency criteria of

the canonical typological approach– a) SVCs with a clitic-like aspect marker alone are

compounds rather than phrases– b) SVCs with an RVC attached to the head verb as

quasi-compounds– c) Other modifiers (classifiers, modifiers, etc)

attached to the tail (represented typically by a object or complement) are least possible compounds

• Priority: a > b > c

3030

Phonological criteria• Various manifestations of SVCs define a continuum of

phonological conditions which complement the structural criteria– a) The combined uses of head and tail are disyllabic

compounds– b) SVCs in which the head and tail are separated by one

single morpheme are possible compounds under the Trisyllabic Foot Rule (TFR) of prosodic morphology (McCarthy & Prince 1993; 1995)

– c) The head and tail separated by polymorphemic insertions like quantifiers, adjectival modifiers etc are phrases

• Priority: a > b > c

3131

Conclusions• We have used the corpus-based approach to

generalise canonical internal structures of Chinese SVCs

• The structural and phonological criteria we have proposed work well to define wordhood of SVCs in Mandarin

• The approach combining canonical typology and corpus methodology could also be useful in research of similar phenomena in other languages

32

Thank you!

[email protected]