an fst grammar for verb chain transfer in a spanish-basque mt systemixa.si.ehu.es › sites ›...

1
Spanish-Basque MT system: Traditional transfer model Based on shallow and dependency parsing Integrated in OpenTrad initiative: MT engines for translation among main languages in Spain Open, reusable and interoperable framework Government-funded and shared among different universities and companies An FST grammar for verb chain transfer in a Spanish-Basque MT System IÒaki Alegria, Arantza DÌaz de Ilarraza, Gorka Labaka, Mikel Lersundi, Aingeru Mayor and Kepa Sarasola Structural transfer of verb chains Complex due to the high distance between both languages Corpus coverage: 92% non-finite forms (21%), indicative forms (65%), periphrases (6%) Context Task FST grammar Using XRCE Finite State Tools Inplementation Verb chain transfer Spanish verb chain Finite verbs Morphological information of the nodes of the Spanish verb chain Basque form corresponding to the Spanish main verb of the chain, and information about its transitivity Agreement information about the objects (absolutive and dative) and the type of subordination of the sentence The list of the nodes of the Basque verb chain, each one with the information necessary to decide the order of the words (between parenthesis) carry out the morphological generation (between brackets) Spanish verb type identification and Basque schema adding rules Attribute replacement rules Cleaning rules haber[vaif1s]+tener[vmpp]+que[cs]+comer[vmn] // jan [trans] // [obj3p] [caus] =>P1> (main)Aspm / Per Aspp / Dum Aspd / Aux TenseM Erg Abs + RelM ez ditudalako patatak jan behar izango Input FST Grammar Output [ esVerbChainType @-> ... "=>" euVerbChainSchema ] [ "euAttr" @-> "euVal" || ?* esVals ?* "=>" ?* euVals ?* _ ] A rule identifies the input as a Spanish periphrastic verb chain of type 1 and adds the schema for the Basque verb for this type [ esVerbChainTypePerif1 @->... "=>" euVerbChainSchemaP1 ] An example meaning, voice, mood, aspect, tense, person and number meaning Add one of the possible Basque verb chain schema depending on the type of the Spanish verb chain: non-finite, non-periphrastic verbs and four types of periphrastic verbs Replace attributes in the schema with their corresponding values, depending on the values of some attributes in the Spanish verb chain and/or in the Basque schema Remove the unnecessary information haber[vaif1s] + tener[vmpp] + que[cs] + comer[vmn] // jan [trans] // [o3p] [caus] jan(main)[partPerf]/ behar(per)[partPerf]/ izan(dum)[partFut]/ edun(aux)[indPres][s1s][o3p]+lako[causMorph] Output Input 6 rules 80 rules 2 rules The last rules eliminate the information of the input f como Simple or compound tense main verb (finite) habrÈ terminado de comer optional auxiliary (finite) periphrastic verb (non-finite) optional particle main verb (non-finite) optional auxiliary voice, mood, aspect, tense, person and number Periphrases Rules porque no habrÈ tenido que comer patatas (because I won't have had to eat potatoes) ixa ixa IXA Research Group on NLP http://ixa.si.ehu.es Faculty of Computer Scienc University of the Basque Country Basque verb chain jaten f ditut Analytical or synthetic (a single word) auxiliary (finite) optional dummy auxiliary main verb (non-finite) meaning, aspect and tense bazkaltzen amaitu izango dut auxiliary (finite) opt. dummy auxiliary (non- finite) main verb (non-finite) periphrastic verb (non-finite) meaning (or a modal particle or an adverb) aspect FST [ìPerî @-> ìbehar(per)î || ?* ìtenerî ?* ìqueî ?* "=>" ìP1î ?* _ ] [ìAsppî @-> ì[partPerf]î || ?* VAIF ?* "=>" ìP1î ?* _ ] [ìAuxî @-> ìedun(aux)î || ?* "=>" ?* ìbehar(per)î ?* _ ] [ìErgî @-> î[s1s]î || ?* ì1sî ?* ì=>î ?* ìedun(aux)î ?* _ ] ... haber[vaif1s]+tener[vmpp]+que[cs]+comer[vmn] // jan [trans] // [obj3p] [caus] =>P1> (main)[partPerf]/ behar(per)[partPerf]/ izan(dum)[partFut]/ edun(aux)[indPres][s1s][o3p]+lako[causMorph] Other rules replace one by one the attributes of the Basque verb schema ergative, absolutive and dative agreement; tense and mood ergative, absolutive and dative agreement; tense and mood d o Yo habrÈ terminado de comer Nik bazkaltzen amaitu izango dut I will have finished eating Yo como manzanas Nik sagarrak jaten ditut I eat apples

Upload: others

Post on 25-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An FST grammar for verb chain transfer in a Spanish-Basque MT Systemixa.si.ehu.es › sites › default › files › dokumentuak › 3715 › ... · 2016-08-18 · An FST grammar

Spanish-Basque MT system: Traditional transfer model Based on shallow and dependency parsing

Integrated in OpenTrad initiative: MT engines for translation among main languages in Spain Open, reusable and interoperable framework Government-funded and shared among different universities and companies

An FST grammar for verb chain transfer in a Spanish-Basque MT SystemIÒaki Alegria, Arantza DÌaz de Ilarraza, Gorka Labaka, Mikel Lersundi, Aingeru Mayor and Kepa Sarasola

Structural transfer of verb chainsComplex due to the high distance between both languagesCorpus coverage: 92%

non-finite forms (21%), indicative forms (65%), periphrases (6%)

Context Task

FST grammar Using XRCE Finite State Tools

Inplementation

Verb chain transfer Spanish verb chain

Finite verbs

Morphological information of the nodes of the Spanish verb chain Basque form corresponding to the Spanish main verb of the chain,

and information about its transitivity Agreement information about the objects (absolutive and dative)

and the type of subordination of the sentence

The list of the nodes of the Basque verb chain, each one with the information necessary to

decide the order of the words (between parenthesis) carry out the morphological generation (between brackets)

Spanish verb type identification and Basque schema adding rules

Attribute replacement rules

Cleaning rules

haber[vaif1s]+tener[vmpp]+que[cs]+comer[vmn] // jan [trans] // [obj3p] [caus]=>P1> (main)Aspm / Per Aspp / Dum Aspd / Aux TenseM Erg Abs + RelM

ez ditudalako patatak jan behar izango

InputFST Grammar

Output

[ esVerbChainType @-> ... "=>" euVerbChainSchema ]

[ "euAttr" @-> "euVal" || ?* esVals ?* "=>" ?* euVals ?* _ ]

A rule identifies the input as a Spanishperiphrastic verb chain of type 1and adds the schema for the Basque verb for this type

[ esVerbChainTypePerif1 @->... "=>" euVerbChainSchemaP1 ]

An example

meaning, voice, mood, aspect, tense,person and number

meaning

Add one of the possible Basque verb chain schemadepending on the type of the Spanish verb chain:non-finite, non-periphrastic verbs and four types of periphrastic verbs

Replace attributes in the schema with their correspondingvalues, depending on the values of some attributesin the Spanish verb chain and/or in the Basque schema

Remove the unnecessary information

haber[vaif1s] + tener[vmpp] + que[cs] + comer[vmn] // jan [trans] // [o3p] [caus]

jan(main)[partPerf]/ behar(per)[partPerf]/ izan(dum)[partFut]/ edun(aux)[indPres][s1s][o3p]+lako[causMorph]Output

Input

6 rules

80 rules

2 rules

The last rules eliminate the information of the input

f como

Simple or compound tensemain verb

(finite)

habrÈ terminado de comer

optionalauxiliary

(finite)

periphrasticverb

(non-finite)optionalparticle

mainverb

(non-finite)

optionalauxiliary

voice, mood,aspect, tense,

person and number

Periphrases

Rules

porque no habrÈ tenido que comer patatas (because I won't have had to eat potatoes)

ixaixaIXA Research Group on NLP

http://ixa.si.ehu.esFaculty of Computer SciencUniversity of the Basque Country

Basque verb chain

jaten f ditut

Analytical or synthetic (a single word)auxiliary

(finite)optional

dummy auxiliarymain verb(non-finite)

meaning, aspectand tense

bazkaltzen amaitu izango dut

auxiliary(finite)

opt. dummyauxiliary

(non- finite)

mainverb

(non-finite)

periphrasticverb

(non-finite)

meaning (or a modal particleor an adverb)

aspect

FST

[ìPerî @-> ìbehar(per)î || ?* ìtenerî ?* ìqueî ?* "=>" ìP1î ?* _ ][ìAsppî @-> ì[partPerf]î || ?* VAIF ?* "=>" ìP1î ?* _ ][ìAuxî @-> ìedun(aux)î || ?* "=>" ?* ìbehar(per)î ?* _ ][ìErgî @-> î[s1s]î || ?* ì1sî ?* ì=>î ?* ìedun(aux)î ?* _ ]...

haber[vaif1s]+tener[vmpp]+que[cs]+comer[vmn] // jan [trans] // [obj3p] [caus]=>P1> (main)[partPerf]/ behar(per)[partPerf]/ izan(dum)[partFut]/ edun(aux)[indPres][s1s][o3p]+lako[causMorph]

Other rules replace one by onethe attributes of theBasque verb schema

ergative, absolutiveand dative agreement;

tense and mood

ergative, absolutiveand dative agreement;

tense and mood

habrÈ terminado de comerhabrÈ terminado de comerhabrÈ terminado de comer

mainverb

(non-finite)

voice, mood,aspect, tense,

person and number

bazkaltzen amaitu izango dutbazkaltzen amaitu izango dut(or a modal particle

or an adverb)

bazkaltzen amaitu izango dut

opt. dummyauxiliary

(non- finite)

(or a modal particleor an adverb)

bazkaltzen amaitu izango dut

opt. dummyauxiliary

(non- finite)

jaten f ditut

Analytical or synthetic (a single word)optional

dummy auxiliaryjaten f ditut

Analytical or synthetic (a single word)

f como

Simple or compound tense

Yo habrÈ terminado de comerNik bazkaltzen amaitu izango dut

I will have finished eating

Yo como manzanasNik sagarrak jaten ditut

I eat apples