parallel syntactic annotation of multiple languages owen rambow, bonnie dorr, david farwell, rebecca...

42
Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy, Lori Levin, Keith J. Miller, Teruko Mitamura, Florence Reeder, Advaith Siddharthan

Upload: benjamin-greene

Post on 13-Jan-2016

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Parallel Syntactic Annotation of Multiple Languages

Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen

Helmreich, Eduard Hovy, Lori Levin, Keith J. Miller, Teruko Mitamura, Florence Reeder,

Advaith Siddharthan

Page 2: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Interlingual Annotation of Multi-lingual Text Corpora (IAMTC)

• CMU– Lori Levin, Teruko Mitamura

• Columbia– Owen Rambow, Advaith Siddharthan

• ISI– Eduard Hovy

• MITRE– Keith Miller, Flo Reeder

• New Mexico State University– David Farwell, Steven Helmreich

• University of Maryland– Bonnie Dorr, Rebecca Green, Nizar Habash

Page 3: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Goals of IAMTC

• Design an Interlingua – Language-independent representation of text meaning– Useful for MT, IR, IE, QA,…

• Develop an Annotation Methodology– Manuals, tools, evaluations

• Annotate multi-lingual, multi-parallel texts– Foreign language original and 2 English translations– Foreign languages: Arabic, French, Hindi, Japanese,

Korean, Spanish

Page 4: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

IL Development: Three Levels • IL0: syntactic dependency tree• IL1: semantic annotations

– Concepts: • ‘senses’ from ISI’s Omega ontology• for Nouns, Verbs, Adjs, Advs

– Semantic Roles • Theta Roles from Dorr’s LCS work

• IL2: reconciliation of different IL1s with same meaning but different syntax:– Predicate argument structure– Sentence plan: main and embedded clauses

Page 5: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Outline• Goals of IAMTC IL0: A deep syntactic dependency representation

– How and why it is different from other dependency representations

• Examples:– Copula– Future tense– Causative– Light verbs

• Comparison to other work– Prague tectogrammatical representation– PropBank

Page 6: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Example of IL0

TrEd, Pajas, 1998

Sheikh Mohammed, who is also the Defense Minister of the United Arab Emirates, announced at the inauguration ceremony that “we want to make Dubai a new trading center”

Page 7: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

IL0 Design: Reduce cross-linguistic Differences

• Retain content words• Replace function words with syntactic

features – Tense, definiteness, etc.

• Retain information about the event and participants

• Neutralize information about the organization of the information or how it is communicated

Page 8: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

IL0 Features

• Parts of Speech– Verb, noun, proper noun, adjective, adverb,

preposition, conjunction, determiner, aux (modal), punctuation, symbols, speech sounds, misc

• Features of Nouns– Number, Definiteness

• Features of Verbs– Progressive, Perfective, Tense, Mood

Page 9: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Summary of IL0

• No auxiliary verbs• No determiners• Add empty arguments

– I want ___ to go

• “Undo” passives and clefts• Copular sentences are headed by the predicate

– The umbrella is red

• Retain causative markers and light verbs only if they affect the argument structure of the sentence or have a literal meaning

• Includes syntactic roles (Subj, Obj, IndObj, Mod)

Page 10: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Annotations done so far

• Annotations of 6 English Texts• Each translated from a different source

language• Two translations of each text• 10 – 12 annotators for each text• Approximately 144 annotated texts

total

Page 11: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

IL0 Annotation Manuals

• English

• Arabic

• French

• Hindi

• Japanese

• Korean

• Spanish

Page 12: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Outline• Goals of IAMTC• IL0: A deep syntactic dependency representation

– How and why it is different from other dependency representations

Examples:– Copula– Future tense– Causative– Light verbs

• Comparison to other work– Prague tectogrammatical representation– PropBank

Page 13: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Copula

• English: overt copula– The umbrella was red.

• Arabic: overt copula in past tense– kAnat AlmiZl~apu HamrA’F

• Japanese: optional copula (desu)– Kasa wa akai.

Page 14: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

IL0 for Copula Sentences

Page 15: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

IL1 for Copula Sentence

Page 16: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Future Tense

English: Juan will arriveSpanish: Llegará Juan

Page 17: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Causative Sentences in English, Japanese, and Arabic

• English: main clause and embedded clauseI made [the cat eat the fish]

• Japanese: productive causative morphemeWatashi-wa neko-ni sakana-wo tabe-sase-taI TOP cat DAT fish ACC eat CAUSE-PAST

• Arabic: lexical causatives>ak~altu AlqiT~apa AlsamakpaEat-CAUSE cat.DEF.ACC fish.DEF.ACC

Page 18: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

IL0 for causative sentences in English, Japanese, and Arabic

>ak~al[V,cause,past]

SUBJEmpty[N]

IOBJcat[N,sg,def]

OBJfish[N,sg,def]

Make[V,past]

SUBJI[N]

OBJfish[N,sg,def]

SUBJcat[N,sg,def]

OBJeat[V]

sase[V,past]

SUBJwatashi[N]

OBJsakana[N,sg,def]

SUBJneko[N,sg,def]

OBJtabe[V]

Reduce differences between languages but only to the extent allowed by the syntax, morphology, and lexical items

Page 19: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Hindi Light Verbs

Hum santre kha gaye

We oranges eat went

“We ate oranges”

Page 20: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Hindi Light Verbs

Ram santra kha-kar jayega

Ram orange eat-then go

“Ram will eat the orange and left”

Page 21: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Outline

• Goals of IAMTC• IL0: A deep syntactic dependency representation• Examples:

– Copula– Future tense– Causative– Light verbs

Comparison to other work– Prague tectogrammatical representation– PropBank

Page 22: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Comparison to other work

• Compared to annotation projects– IAMTC is an interlingua project– IAMTC annotates multi-lingual, multi-parallel

texts in order to reconcile differences between languages

• Compared to interlingua design projects– IAMTC is a corpus driven project– IAMTC is an annotation project

Page 23: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Comparison to Tectogrammatical Representation

• IL0 has only syntactic relation labels– In IL0: all adjuncts are marked “adj”

• IL0 retains strongly governed prepositions – give X to Y

• IL0: prepositions are heads – But there is some flexibility for each language

to decide

Page 24: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Comparison to PropBank

• IAMTC is more syntactic

• Thematic paraphrases: same arguments filling the same roles for the same verb– Load hay on truck/load truck with hay– Same in PropBank– Different in IL0

Page 25: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

End

Page 26: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Extra Slides

Page 27: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

IL0 Differences Between Languages

• Morphological features on nodes different between languages

• No raising verbs in Arabic, Hindi, Japanese, Korean; raising verbs have no subjectJohn seems to like beans

• Serial verbs in Hindi: additional verb with only aspectual meaning (?) treated as dependent on main verbhum santre kha gaye we oranges eat went`We ate the oranges’

Page 28: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

IL0 Differences Between Languages (2)

• Morphological causatives in Japanese: causative morpheme is head

私は ( 猫に 魚を 食べ -) - させた1sg-TOP (cat-DAT fish-OBJ eat-) -CAUSE-PAST

I made the cat eat the fish

• Prepositions as heads in all our languages, but probably not others (Czech)

Page 29: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Summary: What is Normalized Where?

• Syntactic variation: IL0– The gangster killed at least 3 innocent bystanders– At least 3 innocent bystanders were killed by the

gangster

• Lexical synonymy: IL1– The toddler sobbed, and he attempted to console her– The baby wailed, and he tried to comfort her

• Diathesis alternation: IL1 (caveat)– The men loaded hay into the trucks – The men loaded the trucks with hey

Page 30: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Summary: What is Normalized Where?

• Part-of-speech class and derivational morpholgy: IL1/2– I was surprised that he destroyed the old house– I was surprised by his destruction of the old house

• Possession: IL1– Dubais’s oil, oil of Dubai

• Clause combination: IL2– This is Joe’s new car, which he bought in New York– This is Joe’s new car. He bought it in New York

Page 31: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Summary: What is Normalized Where?

Different argument realizations: IL1/2– Bob enjoys playing with his kids– Playing with his kids pleases Bob

• Noun-noun compounds: IL2– She loves velvet dresses– She loves dresses made of velvet

• Head switching: IL2– Mike Mussina excels at pitching– Mike Mussina pitches well– Mike Mussina is a good pitcher

Page 32: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Summary: What is Normalized Where?

• Overlapping meanings:IL2– Lindbergh flew across the Atlantic Ocean– Lindbergh crossed the Atlantic Ocean by

plane

• Locus of Negation: IL2– I have not bought any cheese– I have bought no cheese 

Page 33: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Summary: What is Normalized Where?

• Light verbs: IL2– conduct a tightening = tighten – witness a growth rate of = grow by

• Direct and indirect discourse: IL2– said “X” vs. said that X

Page 34: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Not Normalized at IL0, IL1, nor IL2

• Logical inferences– He’s smarter than everybody else– He’s the smartest one

• Real-World Inference– The tight end caught the ball in the end zone– The tight end scored a touchdown

• Different syntactic sentence types, same pragmatic meaning– Who composed the Brandenburg Concertos? – Tell me who composed the Brandenburg Concertos

Page 35: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Not Normalized at IL0, IL1, nor IL2

• Viewpoint variation– The U.S.-led invasion/liberation/occupation of

Iraq – He is getting in the way vs.

He is only trying to help

Page 36: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Differences from other projects

• Eurotra, Euro WordNet, UNL– Share the goal of defining an interlingua– Don’t share the goal of producing an annotated

corpus

• ParGram– Grammars for several languages developed in close

consultation– Based on the assumption of universal grammar– Not an annotation project– Not corpus based

Page 37: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Getting at Meaning(Two translations of Korean original text)

Starting on January 1 of next year, SK Telecom subscribers can switch to less expensive LG Telecom or

KTF. …

The Subscribers cannot switch again to another provider for the first 3 months, but they can cancel the switch in 14 days if they are not satisfied with

services like voice quality.

Starting January 1st of next yearcustomers of SK Telecom can change their service

company toLG Telecom or KTF … Once a service company swap

has been made, customers are not allowed to change companies again within the first three months, although they can cancel the change anytime within 14 days if problems such as poor call quality are experienced.

Page 38: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Getting at Meaning(Two translations of Korean original text)

Starting on January 1 of next year, SK Telecom subscribers can switch to less expensive LG Telecom or

KTF. …

The Subscribers cannot switch again to another provider for the first 3 months, but they can cancel the switch in 14 days if they are not satisfied with

services like voice quality.

Starting January 1st of next yearcustomers of SK Telecom can change their service

company toLG Telecom or KTF … Once a service company swap

has been made, customers are not allowed to change companies again within the first three months, although they can cancel the change anytime within 14 days if problems such as poor call quality are experienced.

black: same words, same meaning

Page 39: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Getting at Meaning(Two translations of Korean original text)

Starting on January 1 of next year, SK Telecom subscribers can switch to less expensive LG Telecom or

KTF. …

The Subscribers cannot switch again to another provider for the first 3 months, but they can cancel the switch in 14 days if they are not satisfied with

services like voice quality.

Starting January 1st of next yearcustomers of SK Telecom can change their service

company toLG Telecom or KTF … Once a service company swap

has been made, customers are not allowed to change companies again within the first three months, although they can cancel the change anytime within 14 days if problems such as poor call quality are experienced.

green: small syntactic differences

Page 40: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Getting at Meaning(Two translations of Korean original text)

Starting on January 1 of next year, SK Telecom subscribers can switch to less expensive LG Telecom or

KTF. …

The Subscribers cannot switch again to another provider for the first 3 months, but they can cancel the switch in 14 days if they are not satisfied with

services like voice quality.

Starting January 1st of next yearcustomers of SK Telecom can change their service

company toLG Telecom or KTF … Once a service company swap

has been made, customers are not allowed to change companies again within the first three months, although they can cancel the change anytime within 14 days if problems such as poor call quality are experienced.

blue: lexical differences

Page 41: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Getting at Meaning(Two translations of Korean original text)

Starting on January 1 of next year, SK Telecom subscribers can switch to less expensive LG Telecom or

KTF. …

The Subscribers cannot switch again to another provider for the first 3 months, but they can cancel the switch in 14 days if they are not satisfied with

services like voice quality.

Starting January 1st of next yearcustomers of SK Telecom can change their service

company toLG Telecom or KTF … Once a service company swap

has been made, customers are not allowed to change companies again within the first three months, although they can cancel the change anytime within 14 days if problems such as poor call quality are experienced.

red: not contained in other text

Page 42: Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,

Getting at Meaning(Two translations of Korean original text)

Starting on January 1 of next year, SK Telecom subscribers can switch to less expensive LG Telecom or

KTF. …

The Subscribers cannot switch again to another provider for the first 3 months, but they can cancel the switch in 14 days if they are not satisfied with

services like voice quality.

Starting January 1st of next yearcustomers of SK Telecom can change their service

company toLG Telecom or KTF … Once a service company swap

has been made, customers are not allowed to change companies again within the first three months, although they can cancel the change anytime within 14 days if problems such as poor call quality are experienced.

purple: more complex relations