partial dependency parsing for irish

39
1 Partial Dependency Parsing for Irish Elaine Uí Dhonnchadha & Josef Van Genabith

Upload: luigi

Post on 25-Feb-2016

57 views

Category:

Documents


1 download

DESCRIPTION

Partial Dependency Parsing for Irish. Elaine Uí Dhonnchadha & Josef Van Genabith. Aims of the Research. To be able to parse and/or chunk unrestricted Irish text To account for as much of the syntactic phenomena of Irish as possible in an efficient and principled way - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Partial Dependency Parsing  for Irish

1

Partial Dependency Parsing for Irish

Elaine Uí Dhonnchadha & Josef Van Genabith

Page 2: Partial Dependency Parsing  for Irish

2

Aims of the Research

To be able to parse and/or chunk unrestricted Irish text

To account for as much of the syntactic phenomena of Irish as possible in an efficient and principled way

To use open-source software a far as possible

Page 3: Partial Dependency Parsing  for Irish

3

Outline of the Talk

Background Stages of Development for Dependency

Parser Chunker Future Work

Page 4: Partial Dependency Parsing  for Irish

4

Irish Language – some facts

Celtic Language Goidelic (Irish, Manx, Scottish Gaelic) Brittonic (Breton, Cornish, Welsh)

Verb – Subject – Object sentence word order Chaith Seán an liathróid.Threw Seán the ball.V S O‘Seán threw the ball’

Fixed word order

Page 5: Partial Dependency Parsing  for Irish

5

Irish Language

Inflectional language gender: fem/masc case: common/genitive/vocative verbs inflected for number and person

chuala mé, I heard (analytic) chualamar, we heard (synthetic)

Initial mutation of words cailín ‘girl’, an chailín ‘the girl’ arán ‘bread’, an t-arán ‘the bread’ seachtain ‘week’, an tseachtain ‘the week’ bord ‘table’, ar an mbord ‘on the table’

Page 6: Partial Dependency Parsing  for Irish

6

Irish Language

Prepositions inflected for person and number. Labhair sé liom faoiSpoke he with-me about-it‘He spoke to me about it’

Tabhair dom éGive to-me it‘Give it to me’

Full paradigm for every preposition liom ‘with-me’leat ‘with-you’leis ‘with-him/it ETC. ETC.

Page 7: Partial Dependency Parsing  for Irish

7

Irish Language Verbal noun - used in progressives, perfects, infinitives,

etc. De-verbal nouns: bris(v) ‘break’, briseadh(vn) ‘breaking De-agentive nouns: feirmeoir(n) ‘farmer’, feirmeoireacht(vn)

‘farming’

Progressive Tá mé ag oscailt an dorais

Is he at opening(vn) the door(gen)‘He is opening the door’

After Perfect Tá mé tar_éis an doras a oscailt

Is me after the door PRT opening(vn) ‘I am after opening the door’

Page 8: Partial Dependency Parsing  for Irish

8

Parsing Methodology

Dependency Analysis & Constituency Analysis Dependency Analysis

Relationships between pairs of words Grammatical Functions and Head-Modifier dependencies Root and terminal nodes

Constituency Analysis Phrase Structure Rules, e.g. S = NP VP Hierarchical structure; root, phrase categories,

leaf/terminal nodes

Page 9: Partial Dependency Parsing  for Irish

9

Dependency Analysis

Issues in the theoretical syntax of Irish on which there is no clear concensus … The non-adjacency of verb and object in a VSO language,

i.e. difficulties with VP Some periphrastic aspectual constructions in Irish, e.g.

progressive aspect has more nominal than verbal characteristics …

Dependency Analysis includes semantic as well as synactic information

Page 10: Partial Dependency Parsing  for Irish

10

Dependency Parsing

A dependency analysis looks at dependencies between pairs of words (which do not have to be adjacent) in a sentence

The tokens present in the input string are annotated without introducing any abstract categories (e.g. phrasal nodes) i.e. dependency analysis consists of a root, and leaf nodes,

without intermediate levels Grammatical functions such as subject, object, predicate,

as well as various types of prepositional phrase, e.g. adverbial, aspectual, predicative, etc. are annotated

Clauses and head-modifier dependencies are identified

Page 11: Partial Dependency Parsing  for Irish

11

Dependency Parsing Surface-oriented, bottom-up parsing Dependency relations between pairs of tokens

Grammatical functions Head-modifier relations

Tokens not necessarily adjacent.

V Det N Det N Bhris an fear a rúitínBroke the man his ankle‘The man broke his ankle’

DO

S

Page 12: Partial Dependency Parsing  for Irish

12

Previous NLP Work Tokenization & Morphological Analysis

Finite-State Morphology: (Karttunen, Beesley, 1999; 2003) Finite-State Morphological Analyser & Generator for Irish: (Uí

Dhonnchadha, 2002)

POS Tagging and Parsing Constraint Grammar (CG): Karlsson et al (1995), Constraint Grammar Parser CG-2 (Tapanainen, 1996), VISL CG3 (Bick et al, 2003 ...) http://visl.sdu.dk

Chunking: Partial Parsing via Finite-State Cascades (Abney, 1996)

Page 13: Partial Dependency Parsing  for Irish

13

Stages of Development

Define the Syntactic Phenomena to include Gather Test Data Decide on Parsing Methodology Decide a Tag-Set for dependency and

grammatical relations Develop Linguistic Rules for dependency

analysis Test the rules Evaluate the results

Page 14: Partial Dependency Parsing  for Irish

14

Syntactic Phenomena

Sources of Information Grammar books Previous research on aspects of Irish Syntax

Simple declarative sentences (incl. neg. and interrogative)

Relative clauses Copular constructions Non-finite complements Adjuncts

Page 15: Partial Dependency Parsing  for Irish

15

Test Data (Gold Standard) Sample Sentences

Short invented grammatical sentences (225) based on grammar books etc.

Automatically POS tagged and manually checked and corrected

Dependency tagged and manually checked and corrected Chunked and manually checked and corrected

Corpus Data Corpus data – 250 real sentences

randomly selected from the 3000 sentence Gold Standard POS Tagged Corpus

Dependency tagged, chunked and manually checked and corrected

Page 16: Partial Dependency Parsing  for Irish

16

Tag Set Grammatical Functions

@SUBJ, @OBJ, @FMV, @FAUX @CLB, etc. Unlabelled depedencies

@>N, @N<, @P<, etc. Start with the @ symbol, by convention, to distinguish them from

morphosyntatic tags “Fuair” faigh +Verb+VTI+PastInd+Len+@FMV

This tagset follows the style of tags described for English (Karlsson, 1995), and for Danish (Bick, 2003),[1]

However, there is not a prescribed list of tags for CG, which allows us to tailor the tagset to the language.

[1] Other languages are also detailed on the VISL website: http://visl.sdu.dk/corpus_linguistics.html

Page 17: Partial Dependency Parsing  for Irish

17

Dependency Tags: Verbs and Copulas@FMV finite main verb rith 'run'

@FMV_SUBJ finite main verb including subject ritheamar 'we ran'

@FMV_REL relative finite main verb a chuala mé, 'that I heard'

@FMV_REL_SUBJ relative finite main verb incl. subject a chualamar, 'that we heard'

@FAUX finite auxiliary verb Tá sé ag cócaireacht 'He is cooking'

@FAUX_SUBJ finite auxiliary verb including subject táimid 'we are'

@FAUX_REL relative finite auxiliary verb atá siad 'that/which they are'

@FAUX_REL_SUBJ relative finite auxiliary verb including subject

atáimid 'that/which we are'

@COP copula Is

@COP_SUBJ copula including subject Seo an fear...'This is the man...'

@COP_WH interrogative copula cé leis an leabhar 'whose is the book'

@INF bare infinitive Ba mhaith liom fanacht 'I would like to stay'

Page 18: Partial Dependency Parsing  for Irish

18

Dependency Tags: Grammatical Relations@SUBJ subject Chonaic Seán Máire, 'Seán saw Máire'

@SUBJ_ASP subject of aspectual phrase bhí sé ag obair 'he was working'

@SUBJ_INF subject of infinitive (intrans) an obair a bheith déanta 'the work to be done'

@SUBJ_OR_OBJ subject or obj. of relative clause a chonaic an bhean, 'that the woman saw' OR 'that saw the woman'

@SUBJ_REL subject of relative clause a rinne sé 'that he made'

@OBJ object Chonaic Seán Máire, 'Seán saw Máire'

@OBJ_ASP object of aspectual ag déanamh oibre, 'doing work'

@OBJ_INF object of infinitive bainne a ól, 'to drink milk'

@PRED predicate Tá sé mór 'It is big'

@NP unlabelled noun head, e.g. list item, apposition, or fragment

1) dathuithe, 2) leasaithigh, '1) colours, 2) additives'

@CC co-ordinating conjunction agus 'and'

@CLB clause boundary e.g. agus ‘and’ when followed by a verb, and subordinating conjs.etc.

Page 19: Partial Dependency Parsing  for Irish

19

Dependency Tags: Head Modifiers (Unlabelled Dependencies)@>ADJ adverbial particle dependent on the adjective to the

rightgo ciúin 'quietly'

@>N pre-modifier dependent on the first noun to the right an 'the'

@>V pre-verbal particle dependent on a verb to the right ní 'not'

@ADVL< adverbial post modifier

@N< noun post-modifier teach mór 'big house'

@P< noun dependent on the preceding prep. ag an doras 'at the door'

@PC< noun dependent on compound preposition is in genitive case

tar éis na Nollag, after Christmas

@PN< pronoun post-mod. é féin 'himself'

@PRED< dependent on predicate Is deas an lá é 'It is a nice day' i.e. Is nice the day it

@ADVL adverbial anocht 'tonight'

@AUG>SUBJ augment pronoun dependent on subj. to the right Is é Seán …, It/He, Seán is…

Page 20: Partial Dependency Parsing  for Irish

20

Dependency Tags: Prepositional Phrases@PP_ADVL head adverbial adjunct ag an doras 'at the door'

@PP_ASP head of an aspectual ag rith '(at) running'

@PP_HAS ‘at X’ meaning ‘X has’ ag Seán, 'Seán has' i.e. at Seán (possession)

@PP_NEG negative gan dul 'without going'

@PP_OBL oblique PP head do Mháire ‘to Máire’

@PP_SUBJ prep + subj pronoun D'éirigh liom, 'I succeeded' i.e. success was with me'

@PP_PRED Predicative Is liom é 'It is mine' i.e. Is with me it (ownership)

@PP_STAT stative ina rí 'is a king' i.e. 'in his king(hood)'

Page 21: Partial Dependency Parsing  for Irish

21

Parsing Methodology: Constraint Grammar Aims (Karlsson et al., 1995)

assign the appropriate morphological and syntactic information according to the context of each token or larger structure in the text;

assign an analysis to every string in the input, bearing in mind that unrestricted text will contain typographical errors, non-sentential fragments, dialectal and colloquial material;

if an ambiguity cannot be resolved, the alternative analyses are retained rather than forcing a (possibly incorrect) choice

Page 22: Partial Dependency Parsing  for Irish

22

Constraint Grammar Principles Differences between CG and other parsing

methodologies (Karlsson, 1995, p37). Unlike a context-free grammar, a Constraint Grammar

does not attempt to define the set of grammatical sentences in a language.

‘... everything is licensed which is not explicitly ruled out’

makes it more robust in handling unrestricted text Does not aim to produce a minimal set of general

rules – a CG grammar can contain many specific lexically-specific rules to handle special cases.

Doesn’t attempt to determine constituency structure.

Page 23: Partial Dependency Parsing  for Irish

23

CG Dependency Rules

MAP (@TAG) TARGET (POS) IF (CONDITIONS);

e.g. MAP (@FMV) TARGET (Verb) IF

(NOT 0 VSYNTH OR AUX) (NOT -1 RELPART) (NOT -2 RELPART);

SETS LIST VSYNTH = (Verb 1P) (Verb 2P) (Verb 3P) (Verb Auto) ;

LIST AUX = ("bí") ("téigh") ("tosaigh") ("tosnaigh") ("féad") ("caith") ("féach");

LIST RELPART = (Vb Rel) (Prep Rel) ;

Page 24: Partial Dependency Parsing  for Irish

24

Order of Implementation of Rules Dependency Analysis is carried out in the following

order: Clause Boundaries Verbs and/or Copulas Preposition Heads All Dependent Modifiers Subject Predicates of Copular Constructions Object(s) Adverbials Other

Page 25: Partial Dependency Parsing  for Irish

25

Example (1)

Fuair faigh+Verb+VT+PastIndsé sé+Pron+Pers+3P+Sg+Masc+Sbjleabhar leabhar+Noun+Masc+Com+Sgins i+Prep+Art+Sgan an+Art+Sg+Defsiopa siopa+Noun+Masc+Com+Sg+DefArt

Fuair sé leabhar ins an siopaGot he book in the shopV Pro N Prep Det N@FMV @SUBJ @OBJ @PP_ADVL @>N @<P ’He got a book in the shop’

Page 26: Partial Dependency Parsing  for Irish

26

Example (1)

Fuair sé leabhar ins an siopa Got he book in the shop V Pro N Prep Det N @FMV @SUBJ @OBJ @PP_ADVL @>N @P< ’He got a book in the shop’

Page 27: Partial Dependency Parsing  for Irish

27

Example (1)

root Fuair sé leabhar ins an siopa Got he book in the shop V Pro N Prep Det N @FMV @SUBJ @OBJ @PP_ADVL @>N @<P ’He got a book in the shop’

Page 28: Partial Dependency Parsing  for Irish

28

Example (2)

Chonaic Máire an fear a bhí ag itheSaw Máire the man that was at eatingV N Det N Rel V Prep VN@FMV @SUBJ @>N @SUBJ_REL @>V @FAUX @PP_ASP @<P ‘Máire saw the man that was eating’

ag ithePrep VN FORM@PP_ASP @<P FUNCTION‘eating’

Page 29: Partial Dependency Parsing  for Irish

29

Development/Test Cycle

CG Mapping Rules

Test against Gold Std.

Dependency Analysis

POS Tagged Text

Page 30: Partial Dependency Parsing  for Irish

30

Evaluation of Dependency Analysis

Sample Sentences: 225 short grammatical sentences

Precision (Test Suite): 97.66%1

1001,2411,212

1100

agsTotalAutoToTagsCorrectAut

Gold Standard Dependency Analysis Corpus 250 sentences randomly selected from the 3,000 sentence Gold

Standard POS Tagged Corpus

Gold Standard Development Set (150 Sentences)Tot

TokensPunct.

TokensTokens Correc

tIncorrec

tF-Score

4403 444 3959 3706 253 93.60Gold Standard Test Set (150 Sentences)

Tot Tokens

Punct. Tokens

Tokens Correct

Incorrect

F-Score

2555 282 2273 2143 130 94.28

Page 31: Partial Dependency Parsing  for Irish

31

Chunking

Using the Dependency Annotations and a Regular Expression Grammar (implemented using Xerox Finite-State Tools[1]) we can identify phrase-like structures, described by Abney (1991) as 'chunks'.

[1] For details see http://www.cis.upenn.edu/~cis639/docs/xfst.html

Page 32: Partial Dependency Parsing  for Irish

32

Implementation Regular expressions and Xerox FST

Chunks [NP .. ] , [V .. ] etc.

PP with embedded NP [PP .. [NP .. ] ]

Conjunction with embedded conjoint [CJ2 .. [?] ]

[NP úlla ] [CJ2 agus [NP oráistí NP] ] ‘apples and oranges’

Aspectual phrases [ASP [PP-ASP .. [NP ..] ] ([OA ..]) ]

[ASP [PP-ASP ag [NP dúnadh ] ] [OA an dorais] ]] ‘closing the door’

Page 33: Partial Dependency Parsing  for Irish

33

Example (3)

"<Tá>""bí" Verb VI PresInd @FAUX Is"<sé>""sé" Pron Pers 3P Sg Masc Sbj @SUBJ he"<ag>""ag" Prep Simp @PP_ASP at"<rith>" "rith" Verbal Noun VTI @P< running

‘He is running’

[S [V Tá bí+Verb+VI+PresInd+@FAUX ] [NP sé sé+Pron+Pers+3P+Sg+Masc+Sbj+@SUBJ NP] [ASP [PP-ASP ag ag+Prep+Simp+@PP_ASP

[NP rith rith+Verbal+Noun+VTI+@P< NP] PP-ASP] ASP]

S]

Page 34: Partial Dependency Parsing  for Irish

34

Regular Expession Chunker ########################################################### # Verb Chunk Dependency Tags ########################################################### define VTag [%@FAUX|%@FAUX%_REL|%@FMV|%@FMV%_REL]; define VSTag [%@FAUX%_SUBJ|%@FAUX%_REL%_SUBJ|

%@FMV%_SUBJ|%@FMV%_REL%_SUBJ]; define PreVTag [%@%>V]; # Verb Pre & Post Modifiers define PreVStr [TokLemMTag PreVTag SP]; # Verb Chunk define VStr [TokLemMTag VTag SP]; define VChunk [PreVStr* VStr]; define VChunkBr [VChunk @-> "[V " ... " ] "]; # Verb_Subject Chunk define VSStr [TokLemMTag VSTag SP]; define VSChunk [PreVStr* VSStr]; define VSChunkBr [VSChunk @-> "[VS " ... " ] "];

Page 35: Partial Dependency Parsing  for Irish

35

Example (4) Tá bí+Verb+VI+PresInd+@FAUXIs

mé mé+Pron+Pers+1P+Sg+@SUBJ_ASP Iag ag+Prep+Simp+@PP_ASP atdéanamh déanamh+Verbal+Noun+VTI+@P< making cáca cáca+Noun+Masc+Gen+Sg+@OBJ_ASPcake. .+Punct+Fin+.

‘I am making a cake’

[S [V Tá bí+Verb+VI+PresInd+@FAUX V] [NP mé mé+Pron+Pers+1P+Sg+@SBJ_ASP NP] [ASP [PP-ASP ag ag+Prep+Simp+@PP_ASP [NP déanamh déanamh+Verbal+Noun+@P< NP] PP-ASP] [OA cáca cáca+Nn+Msc+Gen+Sg+@OBJ_ASP OA]

ASP] . .+Punct+Fin S]

Page 36: Partial Dependency Parsing  for Irish

36

Corpus Data

Ach sin an toradh is measa a fhéadfadh tarlú don pháirtí agus déarfaidís leat nár cóir an iomad airde a thabhairt do na pobalbhreitheanna nach raibh riamh fabhrach do na páirtithe beaga.

'But that is the worst possible result for the party and they would say to you that it is not right to pay too much attention to the opinion polls that were never favourable to small parties.‘

Page 37: Partial Dependency Parsing  for Irish

37

Dependency Analysis [S [CONJ Ach ach+Conj+Subord+@CLB ] [COP Sin sin+Cop+Pro+Dem+@COP_SUBJ ] [NP an an+Art+Sg+Def+@>N toradh

toradh+Noun+Msc+Com+Sg+DefArt+@PRED is is+Part+Sup+@>ADJ measa olc+Adj+Comp+@N< NP] [VP a a+Part+Vb+Rel+Direct+@CLB fhéadfadh

féad+Verb+VTI+Cond+Len+@FAUX_REL ] [INF tarlú tarlú+Verbal+Noun+VTI+@INF

INF] [PP don do+Prep+Art+Sg+@PP_ADVL [NP pháirtí

páirtí+Noun+Masc+Com+Sg+Len+@P< NP] PP]

[CB agus agus+Conj+Coord+@CLB ] [VS déarfaidís

abair+Verb+VTI+Cond+3P+Pl+@FMV+SUBJ] [PP leat le+Pron+Prep+2P+Sg+@PP_ADVL

PP] [COP nár is+Cop+Past+Rel+Neg+@CLB ] [PRED cóir cóir+Adj+Base+@PRED ]

[INF an an+Art+Sg+Def+@>N iomad iomad+Subst+Noun+Sg@OBJ_INF airde aird+Noun+Fem+Gen+Sg+@N< [I a a+Prep+Simp+@PP_INF thabhairt tabhairt+Verbal+Noun+VTI+Len+@P<I] INF] [PP do do+Prep+Simp+@PP_ADVL [NP na na+Art+Pl+Def+@>N pobalbhreitheanna pobalbhreith+Noun+Fem+Com+Pl+@P< NP]PP][V nach nach+Part+Vb+Neg+Rel+@CLB raibh bí+Verb+PastInd+Neg+Len+@FMV_REL ] [PRED riamh riamh+Adv+Its+@>ADJ ] fabhrach fabhrach+Adj+Base+@PRED ] [PP do do+Prep+Simp+@PP_ADVL [NP na na+Art+Pl+Def+@>N páirtithe páirtí+Noun+Masc+Com+Pl+DefArt+@P< beaga beag+Adj+Com+NotSlen+Pl+@N< NP] PP] . +Punct+FinS]

Page 38: Partial Dependency Parsing  for Irish

38

Evaluation of Chunker Evalb program used to evaluate bracketing of 250 sens.

150 Development Set SentencesALL SENTENCES SENTENCES Len<40

Number of sentence 150 Number of sentence 120

Bracketing Recall 96.26 Bracketing Recall 97.31

Bracketing Precision 98.15 Bracketing Precision 98.57

Bracketing F-Measure 97.20 Bracketing F-Measure 97.94

ALL SENTENCES SENTENCES Len<40

Number of sentence 100 Number of sentence 85

Bracketing Recall 92.89 Bracketing Recall 94.09Bracketing Precision 94.12 Bracketing Precision 94.09Bracketing FMeasure 93.50 Bracketing FMeasure 94.09

100 Test Set Sentences

Page 39: Partial Dependency Parsing  for Irish

39

Future Work Partial Parsing to date as we have not addressed

Co-ordination He packed his [clothes] and [shoes] [He packed his clothes] and [left]

PP-attachment [He] [stabbed] [the man with the knife] [He] [stabbed] [the man] [with the knife]

PP-function locative vs. stative adjunct v.s. indirect object adding additional info in the FS Lexicons, e.g. noun sub-classes,

subcategorisation frames for verbs

Irish Text Processing Tools: http://www.scss.tcd.ie/Elaine.UiDhonnchadha/irish.utf8.htm