1 introduction - uzh · factorsbehinddominnepali robertschikowski graduiertenkolloquiumlinguistik...
TRANSCRIPT
Factors behind DOM in NepaliRobert SchikowskiGraduiertenkolloquium Linguistik8 March 2012
1 Introduction
1.1 Language background
• location:mainly spoken in Nepal
• genealogy:Indo-European > Indo-Aryan
• speakers:15 mio.+
• status:national language and lingua franca of Nepalvigorous in all domainsspoken as L1 or L2 by virtually everybody in Nepal
1.2 Overview of morphosyntax
1.2.1 Morphological marking
• high fusion and syntheticity in the verb, 3 categories: per-son+number of one argument, TMA, polarity; wealth of synthetictenses
• nominal system much simpler: only category is case marked bypostpositions
1.2.2 Expression of syntactic relations
• verbal agreement is usually linked to S/A1, but there’s alsodummy 3SG-AGR with many experiencer verbs and in the im-personal passive
• the case used to mark a role depends on the predicate frame. Inaddition, each argument has its own specific differential markingpattern. This results in many cases for every role:
◦ S:default NOMERG in PRFV of a few verbsDAT with experiencers and deontic verbs
◦ A:NOM in IPFVERG in PRFV (cf. Li 2007)many other factors can trigger ERG, e.g. animacy (op. cit.),deontic semantics (Abadie 1974), A focus (own research)→ ERG is becoming default
◦ P:NOM for “low” referentsDAT for “high” referents (see below for details)other cases such as NOM/LOC, DAT in a couple of smallishverb classes
◦ T:basically NOM/DAT as with P, butDAT rare with G-DAT in same clauseERG for instrument-like T
◦ G:NOM/LOC with motion predicateNOM/DAT with instrumental predicateDAT on animate recipientoccasionally NOM/LOC/DAT and others
1The role system used here is based on Dowty (1991) and Bickel et al. (2010).
1
1.2.3 Alignment
= as defined by argument marking/indexing in default verb class:
• agreement: S/A vs rest
• case: completely chaotic because of multitude of splits; e.g. in-transitive/transitive alignment:
◦ accusative {S-NOM}, {A-NOM P-DAT} in IPFV + high P◦ ergative {S-NOM}, {A-ERG P-NOM} in PRFV + low P◦ tripartite {S-NOM}, {A-ERG P-DAT} in PRFV + high P◦ neutral {S-NOM}, {A-NOM P-NOM} in IPFV + low P
1.2.4 Other remarks on syntax
• word order is “free”, i.e. directly governed by information struc-ture; defaults are SV, APV, AGTV
• right-headed NPs, more than two elements rare, contiguous ex-cept with quantifier floating
• pro-drop frequent, especially with objects
2 Differential object markingCertain arguments can bemarked by NOM -Ø or DAT -lai but no othercases:
• all P with A-NOM/ERG (i.e. not P of transitive motion verbs andnot P of experiencer verbs with A-DAT)
• all T except instrumental T-ERG; high threshold for T-DAT G-DAT
• G with T-ERG
Only this alternation will be referred to as DOM here. The argumentsselected by it will be referred to as “objects” (or “O”). (1) shows anexample:
(1) Hijoyesterday
dekh-ekosee-PST.PTCP
bhale(-lai)cock-DAT
kin-erʌbuy-CVB1
a-u!come-IMP.2MH‘Buy the cock we saw yesterday!’ (elicitation SR 2011)
The case marker -lai is also used for marking certain G (mostly ani-mate recipients as in (2a), hence the label DAT), many experiencers(2b), S of deontic expressions (2c) and a couple ofmoremarginal func-tions.
(2) a. Joseph-leJoseph-ERG
daju-hʌru-laibrother-PL-DAT
ʌnnʌgrain
di-yo.give-PST.3SG
‘Joseph gave grain to his brothers.’ (NNC book-criticism-paschimka-kehi-sahityakar-2062.5034)
b. Mʌ-lai1SG-DAT
bhokhunger
lag-yo.be.on-PST.3SG
‘I’m hungry.’ (NNC book-essay-hindai-garda-2061.935)c. Pʌti-lai
husband-DATjeljail
ja-nugo-INF1
pʌr-yo.fall-PST.3SG
‘Her husband had to go to jail.’ (NNC book-fiction-sanghu-tarepachhi-2062.2302)
3 Formal properties of DOMDOM concerns the case of a single argument and therefore is not ex-pected to interact formally with a lot of other phenomena. There areonly two contexts for interaction:
• DOM is generally less likely in the presence of DAT on otherarguments. It is impossible on P with A-DAT and very rare on Twith G-DAT.
2
• Many Indo-Aryan language have a strict rule saying that S/A-AGR must always be linked to a NOM-marked argument. As aconsequence, S/A-AGR in perfective tenses (A-ERG!) is linkedto O-NOM or dummy 3SG with O-DAT. Nepali is exceptionalin requiring S/A-AGR with A even when it is marked by ERG.However, a remnant of the rule can be seen in the passive, whereS/A-AGR is by default linked to O-NOM. If O is marked by DAT,AGR must be 3SG.
Here are two examples for T-DAT G-DAT:
(3) a. YoPROX
tʌrikatechnique
goppesecret
ho.be.NPST.3SG
Es-laiPROX-DAT
kʌs-ʌi-laiwho-EMPH-DAT
pʌnialso
sik-au-nulearn-CAUS-INF1
hu-dʌinʌ.be.there-NEG.NPST.3SG‘This technique is secret. It must not be taught to anyone.’(elicitation SR 2011)
b. Ram-leRam-ERG
aph-noREFL-GEN
nokʌr-laiservant-DAT
Sita-laiSita-DAT
pʌʈha-yo.send-PST.3SG‘Ram sent his own servant to Sita.’ (elicitation SR 2011)
Here is an example of how AGR is linked to P-NOM but not P-DATin PASS. Also note that DAT is obligatory on the active P in (4a) butoptional on the passive P.
(4) a. Iskul-maschool-LOC
mʌ*(-lai)1SG-DAT
kuʈ-chʌn.beat-NPST.3PL
‘In school they beat me.’ (elicitation GP 2010)b. Iskul-ma
school-LOCmʌ(*-lai)1SG
kuʈ-in-chu.beat-PASST-NPST.1SG
‘I am beaten in school.’ (elicitation GP 2010)
c. Iskul-maschool-LOC
mʌ*(-lai)1SG-DAT
kuʈ-in-chʌ.beat-PASS-NPST.3SG
‘I am beaten in school.’ (elicitation GP 2010)
4 Functional properties of DOM
4.1 Various factors4.1.1 Animacy
Animacy plays a central role in Nepali DOM.Many grammarians (e.g.Gupta & Karmacharya (1981), Matthews (1984), Sommer (1993))even assume it to be the only relevant factor, which is, however, anoversimplification.Two strong tendencies can be observed:
• The higher an object referent is on the animacy scale, the morecommon DAT.
• As one moves down the animacy scale, the commonness of DATcan only stay equal or go down.
For instance, various O in the framing sentence
(5) Ajʌtoday
ratiat.night
mʌi-le1SG-ERG
sʌpʌna-madream-LOC
...
...dekh-e.see-PST.1SG
‘Yesterday night I saw ... in a dream.’
were judged by speaker SR (elicitation 2011) as in table 12.The extreme ends of the commonness scale were never given, whichmeans that there are no cases where either case is completely ungram-matical but also no cases where both are equally possible. As a ten-dency this holds in all elicitation work I have done on Nepali DOM.
2I used a special commonness scale for binary variables where the commonnessof one value (DAT) implies that of the other (NOM). The scale is only possibil-ity/ungrammatical - normal/odd - common/uncommon - more common/less common- equal.
3
O noun commonness of DAT
manche ‘person’ more commonkukur ‘dog’ less commonputʌli ‘butterfly’ uncommonɖhuŋga ‘stone’ uncommoncamʌl ‘(polished) rice’ odd
Table 1: Commonness of -lai [DAT] on various nouns
There are two deviation from the standard animacy scale:
• There is a distinction between “high” animals (mammals andbirds) and “low” animals (the rest). The motivation for this wasthat some animals are behaviourally much more similar to hu-man beings than others. It indeed turned out that low animalsoften pattern with inanimates, as in the example above.
• The distinction between singular object andmass inanimates mayappear dubious because it actually belongs to a different variable.However, mass inanimates are muchmore frequent than mass an-imates like cattle or people, so there is a strong correlation.In addition, singular object inanimates are conceptually moresimilar to animates than mass inanimates in one important re-spect. One of the main characteristics of animate beings is thatthey can move around freely. Singular object inanimates may notbe able to move by themselves, but in contrast to mass inanimatesthey can be moved easily. This distinction also turned out to befruitful as the table above shows.
4.1.2 Definiteness and specificity
Factors related to the identifiability of referents are the second-mostfrequently cited in the literature on Nepali DOM. Interestingly allworks that mention it also acknowledge the importance of animacy
(which does not apply the other way round), e.g. Korolev (1965),Abadie (1974), Li (2007).Whereas definiteness did not have an effect in my own elicitationwork, specificity can account for minimal pairs such as the following:
(6) a. MʌnojManoj
bidafree.time
hu-dabe-CVB4
mancheperson
bheʈ-nʌmeet-INF2
ja-nchʌ.go-NPST.3SG‘In his free time, Manoj goes to meet people.’ (elicitationSR 2011)
b. MʌnojManoj
bidafree.time
hu-dabe-CVB4
manche-laiperson-DAT
bheʈ-nʌmeet-INF2
ja-nchʌ.go-NPST.3SG‘In his free time, Manoj goes to meet someone.’ (elicita-tion SR 2011)
The non-specific human referent in (6a) is marked by NOM, its spe-cific counterpart in (6b) by DAT.As before with animacy, nothing is impossible - specific referents canbemarked by NOMas in (7), and non-specific referents can bemarkedby DAT as in (8):
(7) Kehisome
nikasikʌrta-hʌruexporter-PL
phlorfloor
mulyʌvalue
ghʌʈ-au-nedecrease-CAUS-NPST.PTCP
magrequest
gʌr-i-rʌh-ek-ado-LNK-CONT-PST.PTCP-PL
ch-ʌn.be.there-NPST.3p
‘Some exporters have been making a request to decrease thefloor value (= lowest possible price).’ (NNC a01.17)
(8) Hal-sʌmmʌ-mapresent.time-TERM-LOC
ekone
sʌihundred
ek-jʌnaone-HUM.CLF
4
bhʌndaCOMP
bʌɖimore
birami-laiill-DAT
upʌcarcure
gʌr-ido-CVB2
ghʌrphʌrkaihomecoming
sʌk-i-ekofinish-PASS-PST.PTCP
ch-ʌ.be.there-NPST.3s
‘Until now more than 101 patients have been cured and havereturned home.’ (NNC a02.38)
However, when animacy and specificity point into the same direction(high-high or low-low), the outcome is almost certain to be DAT orNOM, respectively. Exceptions can be found when searching longenough, but they are always problematic for some reason. For in-stance, (9) contains a specific human P on which DAT is optional ac-cording to two informants:
(9) Tʌpaı2HH
Netrʌ-ji(-lai)Netra-HON-DAT
cin-nuhunchʌ?know-NPST.2/3HH
‘Do you know Netra?’ (elicitation BP/NP 2012)
The problem here is that Netrʌ-ji has two readings. The usual readingwould be the person called Netrʌ, which is why DAT is more commonhere. On the other hand it can also refer to the appellation itself, whichis obviously neither animate nor necessarily specific andmay thereforemotivate NOM. A more appropriate translation for the variant withNOM would then be ‘Does “Netra-ji” ring a bell with you?’.Similarly, (10) contains a DAT-marked non-specific referent:
(10) Kʌiyʌunumerous
rʌcʌna-hʌru-laicomposition-PL-DAT
mʌi-le1SG-ERG
Mʌithili-maMaithili-LOC
ʈyuntune
gʌr-ekodo-PST.PTCP
ch-u.be.there-NPST.1SG
‘I’ve done numerous compositions in Maithili.’ (NNCA001013001.671)
However, although the speaker probably wouldn’t be able to enumer-ate the relevant compositions offhand, one may argue that he is stilllikely to be familiar enough with these songs as a group.
The addition of the effects of animacy and specificity leaves the bulkof variation to low-high and high-low constellations.The connection between animacy and identifiability is well-knownin the literature on other Indo-Aryan languages with similar DOMpatterns (esp. Hindi, e.g. Masica (1982), Mohanan (1994), Kachru(2006)), though the claim that the statuses on the two scales add upis always implicit.
4.1.3 Topicality
A large number of mentions (up to a certain point or in a text as awhole) can motivate DAT. For instance, the short story Ṭēbalamāthikotyasa ākāśavāṇī by Paraśu Pradhāna starts with the following sen-tence:
(11) Ʈebʌl-mathi-kotable-on-GEN
tesMED.OBL
akasvaɳi-laitelegram-DAT
pheriagain
pʌɖ-yo.read-PST.3SG‘Again he read that telegram on the table.’ (Pradhana 1997:75)
The telegram that is marked by DAT here plays a central role for theplot. This status is mirrored by its frequency3:
• The most frequent referent in this story is its protagonist, Kṛṣṇa,with absolute/relative4 frequencies of 79/0.31.
• Kṛṣṇa is followed by the telegram (21/0.08).
• The third-most referent, a friend of Kṛṣṇa’s, is clearly below that(8/0.03).
3Note that I did not only count overt representations but also zeros in argument po-sition
4The relative frequencies were calculated relative to the number of all referent rep-resentations.
5
Topicality is, however, not always as easy to measure as in this exam-ple. Frequency in a text ultimately reflects the importance of a referentin the mind of a speaker. (12) is a later sentence from the same novelwhere the telegram is marked by NOM. The reason for this seems tobe that it refers back to a time when the protagonist didn’t know yetabout the telegram’s content and therefore didn’t ascribe the same im-portance to it as he does now:
(12) Us-ko3SG-GEN
lʌkche-k-aaim-GEN-OBL
nimittʌreason
saedprobably
teiMED.EMPH
akasvaɳitelegram
praptʌobtained
hu-nube-INF1
awʌseknecessary
thi-yobe-PST.3SG
rʌand
tyoMED
pa-erʌget-CVB1
saed,probably
u3SG
khusihappy
ch-ʌ.be.there-NPST.3SG
‘For his aim it was probably necessary that that telegramwas obtained, and having received it he is probably happy.’(Pradhana 1997:76)
Conversely, a referent may be marked by DAT upon its first and onlymention if it has been an important topic on the speaker’s mind. To-wards the end of the film Yatīko khojīmā by Santoś Ḍhakāl, Mohit saysto his beloved Lucy:
(13) Mʌi-le1SG-ERG
tim-ro2SG.MH-GEN
maya-lailove-DAT
buj-nʌunderstand-INF2
sʌk-inʌ.be.able-NEG.PST.1SG‘I couldn’t understand your love.’ (D. hakal 2008)
This is the first time and the last time in the movie that this love ismentioned as an independent referent (Lucy dies right after this utter-ance).A factor that is closely related to topicality is pronominality. Pronounsare used to represent referents with a high activation status and arethus frequently marked by DAT. Pronouns with human referents andSAP pronouns necessitate DAT. This is independent of their honorific
status, i.e. NOM is ungrammatical even with the lowest 2nd personpronoun:
(14) Tʌ*(-lai)2LH-DAT
dekh-chʌ.see-NPST.3SG
‘He sees you.’ (elicitation GP 2011)
4.1.4 Possession
One often finds DAT-marked low referents with a high possessor. Itthus seems like referential status can “rub off”:
(15) Uni-hʌru-le3SG-PL-ERG
me-ro1SG-GEN
parthʌkyʌ-laisecession-DAT
buj-nʌunderstand-INF2
sʌk-enʌn.be.able-NEG.PST.3PL‘They couldn’t understand my secession.’ (NNC book-criticism-samakalin-samalochanako-swarup-2061.3505)
Note that this effect is probably not syntactic since the possessor canbe coded in various ways:
(16) Mʌ-sʌ1s-COM1
bhʌ-ek-abe-PST.PTCP-PL
kwaliʈi-hʌru-laiquality-PL-DAT
bistarislowly
bistarislowly
rekʌɖ-marecord-LOC
lyau-nutake-INF1
pʌr-chʌ.must-NPST.3s
‘The qualities I have should be taken into a record slowly.’(NNC A001013001.823)
4.1.5 Focus
For objects whose case marking cannot be explained by the factorsdiscussed so far, focus often helps. There are two kinds of focus atplay.The first type has the following prerequisites:
• a referent R1
6
• a type T to which R1 belongs• an attribute A of R1
If then R1 is contrasted with another referent R2 which also is a T buthas a different A, R2 may be marked by DAT.For instance in (17), the present situation of Iraq has been discussedfor quite some time; now the future situation is mentioned for the firsttime and marked by DAT:
(17) Phrans-koFrance-GEN
bidesiforeign
mʌntralʌi-dwaraministry-by
ajʌtoday
prʌsaritbroadcast
bʌktʌbyʌ-mastatement-LOC
Irak-koIraq-GEN
pʌchillolater
sthiti-laisituation-DAT
dhyan-maattention-LOC
rakh-erʌput-CVB1
Ʌmerika,America
SobhiyʌtSoviet
Sʌŋgʌ,Union
Cin,China
PhransFrance
tʌthaand
Belayʌt-koU.K.-GEN
bʌiʈhʌkmeeting
ayojʌnaarrangement
gʌr-i-nudo-PASS-INF1
pʌr-nemust-NPST.PTCP
awʌsektanecessity
ʌulya-ekoindicate-PST.PTCP
ch-ʌ.be.there-3SG
‘In a statement broadcast today it was indicated by the Frenchforeign ministry that it is necessary to consider the future sit-uation of Iraq and arrange a meeting between the U.S., theSoviet Union, China, France, and the U.K. (for that).’ (NNCa01.65)
This is a subtype of standard contrastive focus, where R1 and R2 donot have to share T but can also contrast themselves. Contrast withouta shared T does not license DAT in Nepali:
(18) Ɗhokadoor
h-oinʌ,be-NEG.NPST.3SG
jhyal(*-lai)window(*-DAT)
khol-ʌ!open-IMP.2MH‘Don’t open the door but the window!’ (elicitation GP 2010)
The other kind of focus is more elusive. It is found where a particularunique object referent is crucial for the success of an action or its verypossibility.For instance in (19), there is only one right that can be made use of.If it weren’t for this right it wouldn’t even make sense to talk of usingthem:
(19) Ʌbʌnow
teskarʌntherefore
caıSPEC.TOP
pa-koget-PST.PTCP
euɖaone.CLF
ʌdhikar-lairight-DAT
hami-le1PL-ERG
prʌyoguse
gʌr-nʌdo-INF2
sʌk-nʌbe.able-INF2
pʌr-yo.fall-PST.3SG‘Therefore we have to be able to make use of the one rightwe’ve got.’ (NNC A001017001.511)
Similarly in (20), the two areas that make up the district Kabhrepʌlan-cok, Kabhre and Pʌlancok, also gave it its name. If it hadn’t been forprecisely these two areas, the A of the predicate sʌmeʈerʌ would nothave existed, and therefore the whole action would have been impos-sible:
(20) Bhʌn-ʌusay-OPT.1p
duiʈ-ʌi-laitwo.CLF-EMPH-DAT
sʌmeʈ-erʌkeep.together-CVB1
KabhrepʌlancokKabhrepalancok
bhʌ-ekocome.into.being-PST.PTCP
h-o.be-NPST.3SG
‘One could say Kabhrepalancok has come into beingby keeping precisely these two (areas) together.’ (NNCA001011002.106-112)
4.1.6 Disambiguation
Ambiguous O are preferredly marked by DAT. In (21), one speakersaw a cat that stood still. He asked another speaker who had a betterview on the scene what had happened and got the answer that the catwas alarmed because it was being watched by a big dog. When the cat
7
occupies the default position for O between A and V as in (21a), DATis possible. When the cat is fronted to O-A-V as in (21b), however,DAT becomes obligatory:
(21) a. TyoMED
thulobig
kukurdog
tel(-lai)MED-DAT
her-dʌiwatch-PROG
ch-ʌ.be.there-NPST.3SG‘That big dog is watching it.’ (elicitation NP 2012)
b. Tel*(-lai)MED-DAT
tyoMED
thulobig
kukurdog
her-dʌiwatch-PROG
ch-ʌ.be.there-NPST.3SG‘It is being watched by that big dog.’ (elicitation NP2012)
This effect is not very strong and disappears as soon as position or se-mantics give hints to the role distribution. Still, it is interesting becauseit is different from all other factors in that it cannot be neatly broughttogether with the fuzzy idea of “high” referents. Instead, its motiva-tion seems to lie in processing. According to van Gompel & Pickering(2007:289), speakers tend to integrate words as soon as possible intothe syntactic structure they have built so far, which is why temporaryambiguities regularly create a delay in processing.
The urge to disambiguate O becomes greater the more material is in-serted between it and the predicate. This is because predicates them-selves often give important cues to the distribution of roles (compare,for instance, the sets {‘bear’, ‘hunter’, ‘kill’} and {‘bear’, ‘hunter’,‘shoot’}).In the presence of an ambiguous N-NOM, the predicate frequentlybecomes the most important cue itself. The more material is insertedbetween the argument in question and that cue, the longer he has towait until the ambiguity is resolved, which explains why O-NOM iseven less popular in that case.
(22) shows an example where O is separated from the associatedpredicate bec- ‘sell’ by a converbial clause. Leaving DAT away ismarginally possible but according to informants makes the sentenceharder to understand. By contrast, if O is moved next to bec- bothDAT and NOM become equally possible.
(22) Ʌbʌnow
aphu-leREFL-ERG
ʌŋsʌshare
pa-ekoget-PST.PTCP
sʌmpʌti-laiproperty-DAT
ʌbʌnow
chorachori-sʌŋgʌchildren-COM
mʌnjuripermission
nʌ-li-iNEG-take-CVB2
bec-nʌsell-INF2
pa-eget-COND
bhʌnerʌCIT
tes-laiMED.OBL-DAT
caıSPEC.TOP
ʌbʌnow
durupʌyogabuse
gʌr-nʌdo-INF2
bhʌ-enʌ,be-NEG.PST.3SG
hʌinʌ.QTAG
‘Now just because one gets the chance to sell property oneholds a share of without taking one’s children’s permis-sion that doesn’t mean one will abuse this right.’ (NNCA001017001.519)
4.2 Some theoretical problems4.2.1 What does -lai mark?
Ideal of marking:
• 1 marker→ 1 function (ease of processing)• 1 function→ 1 marker (ease of production)
-lai as a case marker is a far cry from this ideal:
• It can mark every role (S, A, P, T, G), so it is maximally ambigu-ous.
• No role always triggers DAT.
→ The question is justified whether -lai is a case marker at all! Thereare, however, two important arguments for this:
8
• -lai forms a paradigm with other, less problematic case markers.• Cases where -lai serves to disambiguate between two roles (esp.A vs P) have a high token frequency (also cf. 4.1.6).
Is there anything that is common to all instances of -lai (role takenaside)?→ tentative answer: -laimarks referents that would have been “high”enough to be a subject but that ended up as objects or “weak” subjects(experiencer S/A, deontic S/A), or shorter: -lai marks a mismatch be-tween referential status and role.
4.2.2 Functional minimalism vs maximalism
Two extreme options when asking for the function of a marker:
• There is one main function and possibly additional conditioningfactors.
• Everything that conditions a marker is part of its function.
For instance for Nepali -lai:
• function is “mark mismatch between status and role”.• plethora of functions, e.g. roles, high animacy, specificity, ...
One-function view is elegant, but...
• DAT does not always imply a status-role mismatch, cf. DAT fordisambiguation.
• Many status-role mismatches are not marked by DAT, e.g. Alacking control can also have NOM or ERG.
• The notion of weak subjects is shaky: experiencers and deonticA are certainly weakly agentive, but that does not explain whyDAT is also common on similar S.
• Status can’t be determined independently but is the product ofmany factors, e.g. high animacy can be neutralised by very lowspecificity and vice versa.
→ One-function view does not work well - status-role mismatch maybe saved as diachronic principle, but not as synchronic function. →-lai is polyfunctional, DAT just a label!
4.2.3 How can -lai be predicted?
If -lai does not have a single function, the question arises how its oc-currence can be predicted - this is, after all, one of the main objectivesof linguistic description. Two possibilities:
• rule system that finds a given combination of conditioning factorsin a list and outputs the associated case (NOM or DAT)
• probabilistic system that uses the given factors to calculate prob-abilities for each case marker
It’s easy to see that a simplistic rule system (e.g. “inanimate→ NOM,animate→ DAT”) is inferior to a probabilistic system. But this is notbecause it gives rules but because it doesn’t use enough input factors.There is the question, thus, whether a complex rule system can com-pete with a probabilistic system. This question is as yet unanswered,but here are some preliminary ideas:
• Unless organised in the form of a decision tree, a rule system islikely to be less elegant than a probabilistic system.But: aesthetics shouldn’t matter in science!
• A rule system has a conceptual problem with two identical con-stellations producing different outputs.But: there is no problem on the practical side; one can simplyassume whichever rule covers more cases and then compare thecoverage with the probabilistic system.
• It may be less easy to generalise from a rule system that has beengenerated on the base of a small corpus.
9
4.3 Corpus annotationTo compare the rule-based and the probabilistic approach I am taggingparts of the Nepali National Corpus (NNC) with the help of two nativespeakers. Presently there are about 30,000 annotated words containingabout 2000 objects. Below is a list of the annotated variables togetherwith their values.
• DOM: NOM - DAT - GEN - NANA (“not applicable”) is used for P/T/G that are not eligible forDOM.
• role: S - A - P - T - G - CT - CRCT and CR denote the two arguments of copular predicates (CT= copular theme, CR = copular rheme).
• animacy: human - human familiar - human proper noun - humangroup - high animal - high animal proper noun - middle animal -low animal - inanimate - abstract
• quantification: fixed quantity - open quantity
• situation: concrete - exemplary - general - abstractQuantification and situation together replace specificity, forwhich it was too difficult to reach a satisfying degree of inter-annotator agreement.
• topicality: continous values; split into several indicators: ab-solute/relative frequency so far/overall, distance to last mention,number of competing topicsAll topicality-related values are calculated on the base of refer-ential IDs that are assigned to every overt referent and every ar-gument zero.
• part of speech: noun - pronoun - adjective - possessive pronoun- determiner - other
• modifiers: none - adjective - relative clause - genitive humanpossessor - other human possessor - numeral - demonstrative -interrogative - sortal modifier - several modifiers - other
• focus: none - contrastive - fragile success“Fragile success” is a preliminary label for the second kind offocus discussed in 4.1.5.
• alternation: none - passive
ReferencesAbadie, Peggy. 1974. Nepali as an ergative language. Linguistics of
the Tibeto-Burman Area 1. 156--177.
Bickel, Balthasar, Manoj Rai, Netra Paudyal, Goma Banjade, ToyaBhatta, Martin Gaenszle, Elena Lieven, Ichchha Rai, Novel KishorRai & Sabine Stoll. 2010. The syntax of three-argument verbs inChintang andBelhare (SoutheasternKiranti). In AndrejMalchukov,Martin Haspelmath &Bernard Comrie (eds.), Studies in ditransitiveconstructions, Berlin: Mouton de Gruyter.
D. hakal, Santos (स तोष ढकाल). 2008. Yatıko khojıma (यतीको खोजीमा).Kathmandu: Visual Production Center.
Dowty, David. 1991. Thematic proto-roles and argument selection.Language 67(3). 547--619.
van Gompel, Roger P.G. & Martin J. Pickering. 2007. Syntactic pars-ing. InM. Gareth Gaskell (ed.), TheOxfordHandbook of Psycholin-guistics, chap. 17, 289--307. Oxford University Press.
Gupta, Bidhu Bhudan Das &Madhav Lal Karmacharya. 1981. Nepaliself-taught. Calcutta: Das Gupta Prakashan.
Kachru, Yamuna. 2006. Hindi. Amsterdam/Philadelphia: John Ben-jamins.
10
Korolev, I. (И. Королев). 1965. Jazyk Nepali (Язык Непали).Moscow: Nauka.
Li, Chao. 2007. Split ergativity and split intransitivity in Nepali. Lin-gua 117(8). 1462--1482.
Masica, Colin. 1982. Identified Object Marking in Hindi and otherLanguages. In Topics in Hindi Linguistics, vol. 2, 16--51. Chandi-garh: Bahri.
Matthews, David. 1984. A course in Nepali. New Delhi: Heritage.
Mohanan, Tara. 1994. Argument Structure in Hindi. Stanford: CSLI.
Pradhana, Parasu (परश धान). 1997. T. ebalamathiko tyasa akasavan. ı( बलमा थको यस आकाशवाणी). In Michael Hutt (ed.), Modern lit-erary Nepali, 75--79. New Delhi: Oxford University Press.
Sommer, Anton F.W. 1993. Einfuhrung in das Nepali. Wien: Self-published.
Abbreviations1 1st person, 2 2nd person, 3 3rd person, A agent, CAUS causative,CIT citation marker, CLF classifier, COM comitative, COMP com-parative, COND conditional, CONT continuous, CVB1 converb I -erʌ, CVB2 converb II -i, CVB4 converb IV -da, DAT dative, EMPHemphatic, ERG ergative,GEN genitive,G ditransitive goal,HH highhonorific,sHUM human, IMP imperative, INF1 infinitive I -nu, INF2infinitive II -nʌ, LNK verbal linker, LOC locative,MEDmedial,MHmiddle honorific, NEG negative, LH low honorific, OBL oblique,PASS passive, PL plural, PROG progressive, PROX proximative,PST past, PTCP participle, P patient, QTAG question tag, REFLreflexive, RETRV retriever, SG singular, S intransitive subject, T di-transitive theme, TERM terminative, V verb
11
animacy
animacy
DOM
kinship human name human human group animal inanimate abstract unknown
DAT
NOM
newness
newness in discourse
DOM
new old
DAT
NOM
frequency
quartiles of log of frequency relative to all referents
DOM
low mid low mid high high
DAT
NOM
focus
focus
DOM
contrastive fragile none unkown
DAT
NOM
quantification
quantification
DOM
fixed quantity open quantity unkown
DAT
NOM
situation
situation
DOM
concrete exemplary general abstract unknown
DAT
NOM
Figure 1: Visualisation of contingency tables for some tagged variables