parsing sanskrit texts: some relation specific issues

Post on 18-Mar-2022

14 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

. . . . . .

.

.. ..

.

.

Parsing Sanskrit Texts: Some relation specificissues

Amba Kulkarni1 and K V Ramakrishnamacharyulu2

Department of Sanskrit Studies,University of Hyderabad,

Hyderabad

Department of Vyakarana,Rashtriya Sanskrit Vidyapeetha,

Tirupati

1 / 39

. . . . . .

.. Sentence Level Analysis

Parsing: a linear string of words – > a structure showing therelations between words.

Structure: Constituency / Dependency

2 / 39

. . . . . .

.. Constituency Structure: Eng Sentence

For positional languages such as English, constituency structuremore or less leads to the dependency relations.

Figure: Constituency Parse of an English Sentence

3 / 39

. . . . . .

.. Constituency Structure: Sanskrit CompoundIn case of Sanskrit compounds, such a constitutional structure isvery useful in determining the parse.

Figure: Constituency Structure of a Sanskrit Compound

4 / 39

. . . . . .

.. Structure of a Sanskrit Sentence

Sanskrit:Morphologically richFree word order to a large extent

Dependency structure makes more sense than the constituencystructure.

5 / 39

. . . . . .

.. Dependency Structure: Sanskrit Sentence

Dependency Parse: rājā viprāya gām dadāti

6 / 39

. . . . . .

.. Sentence Level Analysis

Traditional Approaches:Daṇdānvyayaḥ [Anvyayamukhī]Khaṇdānvyayaḥ [Kathambhūtinī]

7 / 39

. . . . . .

.. Sentence Level Analysis ...

Modern TimesDependency Parsing: credited to Tesniére [1959]

8 / 39

. . . . . .

.. Tagset

Dependency Relations:Compilation of all relations: 90 [KVRK, 2009]Proposed a tagset

9 / 39

. . . . . .

.. Tagset ...

Is this tagset suitable?

For Manual AnnotationFor Statistical ParsingFor Rule Based Parsing

10 / 39

. . . . . .

.. Tagset ...

Is this tagset suitable?For Manual AnnotationFor Statistical ParsingFor Rule Based Parsing

10 / 39

. . . . . .

.. Tagset ...

Concerns related to Manual Annotation / design of a StatisticalParser

The inter annotator agreementThe grey / fussy boundaries between semantics of tags lead toerrors in annotation.

Concerns related to a design of a Rule based ParserWhether the relations can be decided purely on the basis ofmorphological and syntactic information?

11 / 39

. . . . . .

.. Annotation Convention

1 rAmaH kartA,32 vanam karma,33 gacchati

12 / 39

. . . . . .

.. Clues for extracting the relations

Abhihitatva (property of being expressed)VibhaktiIndeclinables (avyaya)Sāmānādhikaraṇya (being in g-n-p agreement)Nitya sambandhaḥ

13 / 39

. . . . . .

.. Clues: Abhitatva

tiṅrāmaḥ vanam gacchati.marks the kartā relationrāmeṇa vanam gamyate.marks the karma relation

kṛtdhāvan aśvaḥ.marks the kartā.

taddhitaḥsamāsaḥ

14 / 39

. . . . . .

.. Clues: Vibhakti

kāraka vibhaktiMarks noun-verb relations.e.g. kartā, karma, etc.upapada vibhaktiMarks noun-noun relations through the upapadas.More of a morphological requirement.E.g. rāmeṇa saha sītā vanam gacchati.special vibhakti

kriyā-viśeṣaṇamvegena dhāvatiaṅgavikāraḥakṣṇā kāṇaḥnirdhāraṇanareṣu śreṣṭaḥ

15 / 39

. . . . . .

.. Clues: Indeclinables

negationrāmaḥ gṛham na gacchati.emphasisrāmaḥ eva tatra upaviṣati.

16 / 39

. . . . . .

.. Clues: sāmānādhikaraṇya

śvetaḥ aśvaḥ dhāvati.aśvaḥ śvetaḥ asti.

17 / 39

. . . . . .

.. Clues: Nitya Sambandhaḥ

Relation between yat-tat pairyadā - tadāyatra - tatra

18 / 39

. . . . . .

.. Choice of the relations

Should the inflectional suffixes and derivational suffixes betreated at par?How to treat the function words? Should they be treated as anode in a tree or an edge?How to represent the inter sentential relations?Should anaphoric resolution be part of this annotation?

19 / 39

. . . . . .

.. Basic Principles

...1 Preserve one-one mapping between the nodes of a tree andthe words in a sentence.

...2 In case of derived nouns, consider only the inflectional suffixfor establishing the relations.

...3 In case of derived indeclinables, use the derived suffix to markthe relations.

...4 A suffix or a word can represent one and only one relation.

20 / 39

. . . . . .

.. Which relation to mark?

Example: dhāvantam aśvam paśya.

21 / 39

. . . . . .

.. Which relation to mark? Contd...

Example: dhāvantam aśvam paśya.

A loop destroys the nice tree structure of a parse. Hence mark onlyrelations indicated through the inflectional suffixes and not theones indicated through the derivational suffixes.

22 / 39

. . . . . .

.. Which relation? .. contd

What about the faithfulness?

Information is available through the kṛt suffix, which is availablefor marking the relation of kartā at a later stage.

23 / 39

. . . . . .

.. Treating indeclinables:

(1) kṛdanta avyayas:rāmaḥ dugdham pītvā śālām gacchati.

24 / 39

. . . . . .

.. Treating indeclinables:

(1) kṛdanta avyayas:rāmaḥ dugdham pītvā śālām gacchati.

25 / 39

. . . . . .

.. Treating indeclinables:Content / function ?

(2) upapada avyayassītā rāmeṇa saha vanam gacchati.

26 / 39

. . . . . .

.. Treating indeclinables:Content / function ?

(3) Rest of the avyayaseva – avadhāraṇāna – niṣedhya

Number of relations explode.

27 / 39

. . . . . .

.. Treating indeclinables:Content / function ?

(3) Rest of the avyayaseva – avadhāraṇāna – niṣedhya

Number of relations explode.

28 / 39

. . . . . .

.. Inter Sentential Connectivesyadi tvam icchasi tarhi aham bhavataḥ gṛham āgacchāmi.

29 / 39

. . . . . .

.. Treatment of Anaphoras

yatra nāryaḥ pūjyante ramante tatra devatāḥ

30 / 39

. . . . . .

31 / 39

. . . . . .

.. Treatment of Conjunction and Disjunctionrāmaḥ sītā lakṣmaṇaḥ ca vanam gacchanti

32 / 39

. . . . . .

.. GranularityCriterion for Granularity:If one can tell one relation from the other purely on the basis ofsyntax or morphology, then the two relations may be treated asdistinctTradition clasifies kartā into the following subcategories.

anubhavī kartāEx: ghaṭo naśyatiamūrtaḥ kartāEx: krodhaḥ āgacchatiprayojaka kartāEx: devadattaḥ viṣṇumitreṇa pācayati.prayojya kartāEx: devadattaḥ viṣṇumitreṇa pācayati.madhyastha kartāEx: devadattaḥ yajñadattena viṣṇumitreṇa pācayati.abhipreraka / utpreraka kartāEx: modakaḥ rocate.

33 / 39

. . . . . .

.. Granularity ...

karma-kartṛEx: kāṣṭhaḥ svayameva bhidyate.karaṇa-kartṛEx: asiḥ chinatti.ṣaṣṭhī kartāEx: ācāryasya anuśāsanam

34 / 39

. . . . . .

Necessary conditions forprayojaka kartāprayojya kartāṣaṣṭhī kartā

areMorphologySyntax

devadattena annam pācayati. (prayojya kartā)devadattaḥ agninā annam pācayati. (karaṇa)

35 / 39

. . . . . .

asiḥ chinattiHere asiḥ is karaṇakartṛ of chinn only because it is a karaṇa forchinn.

Semantics is involved in the decision.

31 relations: Necessary information for deciding the possibilityinvolves only morphology and Syntactic information.

36 / 39

. . . . . .

.. Set of Relations

kartā prayojyakartāprayojakakartā karmakaraṇam sampradānamapādānam ṣaṣṭhīsambandhaḥadhikaraṇam sambodhanasūcakamsambodhyaḥhetuḥ prayojanamtādarthya niṣedhyaḥkriyāviśeṣaṇam viśeṣaṇamnirdhāraṇam upapadasambandhaḥpratiyogī anuyogīsamuccitam anyataraḥkartṛsamānādhikaraṇam karmasamānādhikaraṇamsamānakālaḥ anantarakālaḥpūrvakālaḥ vīpsāśeṣasambandhaḥ sambandhaḥ

37 / 39

. . . . . .

.. Towards a more useful parse

Treat upapadas as function words rather than content wordsShow the co-indexing for anaphora resolutionShow the sharing of kārakas

38 / 39

. . . . . .

DEMO

39 / 39

top related