machine translation marazi to unl presented by ashwini, salil center for indian language technology...

18
Machine Translation marazI to UNL Presented by Ashwini, Salil Center for Indian Language Technology Solutions CSE, IIT Powai

Upload: catherine-sutton

Post on 13-Jan-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Machine Translation marazI to UNL Presented by Ashwini, Salil Center for Indian Language Technology Solutions CSE, IIT Powai

Machine Translation

marazI to UNLPresented by

Ashwini, Salil

Center for Indian Language Technology Solutions

CSE, IIT Powai

Page 2: Machine Translation marazI to UNL Presented by Ashwini, Salil Center for Indian Language Technology Solutions CSE, IIT Powai

Characteristics of marazI

a. Syntactic structure – Subject-object-verb

e.g. rama Baat Katao. – Similarity with Hindi

b. Morphology

– P`a%yaya– Differences with Hindi

Page 3: Machine Translation marazI to UNL Presented by Ashwini, Salil Center for Indian Language Technology Solutions CSE, IIT Powai

Main tasks

1. Marathi-UW dictionary building

2. Rulebase building for converting Marathi language phenomenon to UNL expressions

3. Testing using corpus sentences

4. Verification with Hindi and Marathi deconverters.

Page 4: Machine Translation marazI to UNL Presented by Ashwini, Salil Center for Indian Language Technology Solutions CSE, IIT Powai

Analysis consists of

• Morphology

• Syntax

• Semantics

• Pragmatics

Page 5: Machine Translation marazI to UNL Presented by Ashwini, Salil Center for Indian Language Technology Solutions CSE, IIT Powai

Marathi analysis done so far

We focus on Marathi morphology

• Noun morphology

• Pronoun morphology click

• Verb morphology click

• Relation label morphology click

• Adjective morphology click

Page 6: Machine Translation marazI to UNL Presented by Ashwini, Salil Center for Indian Language Technology Solutions CSE, IIT Powai

Types of adjectives in Marathi

1. Pronounic adjectives 1.1 Pronoun adjectives: The nine pronouns being used as adjectives.

1.2 Adjectives derived from the nine pronouns

2. Qualitative adjectives 2.1 Adjectives ending with vowel +É 2.2 Adjectives ending with vowels other than +É

2.3 Postposition adjectives

Page 7: Machine Translation marazI to UNL Presented by Ashwini, Salil Center for Indian Language Technology Solutions CSE, IIT Powai

Type of adjectives [contd.]

3. Numerical adjectives• 3.1 Cardinal

3.1.1 (whole number)3.1.2 (fractional number)

3.1.3 (entirety, totality, completeness)• 3.2 Ordinal• 3.3 Occurrencial

6 types• 3.4 Distinctive

Page 8: Machine Translation marazI to UNL Presented by Ashwini, Salil Center for Indian Language Technology Solutions CSE, IIT Powai

[pAvaNedonashe] means 175 or 199.75?

- There is no word assigned to 199.75, 299.75, etc. - the problems with paun, pauvane and savva.- (pAvaNedon) times 100 (she). she and shambhar,

both mean 100. pAUNashe means 75. pAvaNeshambhar means 99.75.

- The powers of ten for which there is a distinct word in Marathi need to be stored separately.

- pronunciation is not pAvaNedona-[pause]-she but

pAvaNe -[pause]-donashe

Page 9: Machine Translation marazI to UNL Presented by Ashwini, Salil Center for Indian Language Technology Solutions CSE, IIT Powai

Tables of numbers: continous and random access.

• Some forms of numbers are used for verbalizing the tables of numbers: ºÉÉiÉ / ºÉÉiÉÉ / ºÉÉiÉä / ºÉÉiÉÒä / ºÉiiÉä.

• Marathi: A, B times, (is C), occurring in the table for A. English: B A’s (are C).

• Usage of forms: 1. only for the expression ‘A’ 2. only for ‘B times’ 3. only while recalling the number directly without going through the table.

• Some forms occur especially for square. The repetition is emphasized.

Page 10: Machine Translation marazI to UNL Presented by Ashwini, Salil Center for Indian Language Technology Solutions CSE, IIT Powai

words used to familiarise a child with numbers

• Some words are used mostly to familiarise a child with numbers: BEÒ BE, nÖEÔ nÉäxÉ, ÊiÉEÔ iÉÒxÉ, etc. The similarity of each word with the number is used to help a child remember the number. The words used as familiarisers are: BEÒ, nÖEÔ, ÊiÉEÔ, SÉÉèEÒ, {ÉÉSÉÒ, ºÉɽÒ, ºÉÉiÉÒä, +É`Ò, xÉ´Éä, nɽÒ.

Page 11: Machine Translation marazI to UNL Presented by Ashwini, Salil Center for Indian Language Technology Solutions CSE, IIT Powai

playing cards and game of cricket

1. playing cards:

ekka, durri / durra, tirri / tirra, chavvi / chouka, panji / panja, chhakki / chhakka, satti / satta, atthi / attha, navvi / nashsha, dashshi / dashsha.

2. shots scoring multiple runs in the game of cricket:

SÉÉèEÉ®, ¹É]EÉ®.

Page 12: Machine Translation marazI to UNL Presented by Ashwini, Salil Center for Indian Language Technology Solutions CSE, IIT Powai

The current status of dictionary

Number of entries 375

•Dictionary click

•Nouns

•Noun morphology suffixes

•Verbs

•Verb morphology suffixes

Page 13: Machine Translation marazI to UNL Presented by Ashwini, Salil Center for Indian Language Technology Solutions CSE, IIT Powai

The current status of rulebase

Number of rules is 1050.

• Verb morphology (Simple and conjunct verbs) – Tense (Past, Present, Future)– Aspect of tense (Progress, complete, custom)– Voice (Passive voice)

– +lÉÇ (imperative, should, negative)– Ability, intention etc. for conjunct verbs only.

Page 14: Machine Translation marazI to UNL Presented by Ashwini, Salil Center for Indian Language Technology Solutions CSE, IIT Powai

The current status of rulebase [contd.]

• Noun morphology – Number

– With case marker (ºÉɨÉÉxªÉ° {É)• Case when penultimate vowel is either

> or <Ç e.g. ¨ÉÚ±É - ¨ÉÖ±Éä (Plural)

Page 15: Machine Translation marazI to UNL Presented by Ashwini, Salil Center for Indian Language Technology Solutions CSE, IIT Powai

The current status of rulebase [contd.]

• Relation labels used so faragt, obj, gol, aoj, and, or

e.g. ¨ÉÖ±ÉÉÆxÉÒ +ÉƤÉä JÉɱ±Éä xÉ´½iÉäÃ.

obj(eat(icl>do).@entry.@pred.@past.@not. @complete, mango(icl>fruit):08.@pl)

agt(eat(icl>do).@entry.@pred.@past.@not. @complete, child(icl>person):00.@pl)

Page 16: Machine Translation marazI to UNL Presented by Ashwini, Salil Center for Indian Language Technology Solutions CSE, IIT Powai

Plans

• Adjective morphology

• Pronoun morphology

• Relation labels handling for corpus sentences.

For simple sentence only.

Page 17: Machine Translation marazI to UNL Presented by Ashwini, Salil Center for Indian Language Technology Solutions CSE, IIT Powai

THANK YOU

Page 18: Machine Translation marazI to UNL Presented by Ashwini, Salil Center for Indian Language Technology Solutions CSE, IIT Powai

References:

•Damle, Moro Keshav (1970). Shastriya marathi vyakarana. [SaswrIya marATI vyAkaraNa]. (Ed: K. S. Arjunwadkar). Pune: Deshmukh & Co.

•Meying, Zhu (2000) EnConverter specifications, version 2.1. Tokyo: UNU/IAS/UNL Center.

• Meying, Zhu (2002) UNL specifications, version 3 edition 1. Tokyo: UNU/IAS/UNL Center.

•Valambe, M. R. (2001) Sugam marathi vyakaran lekhan [sugama marATI vyAkaraNa leKana]. Pune: Nitin Prakashan.