annotation for hindi propbank. outline introduction to the project basic linguistic concepts –...

17
Annotation for Hindi PropBank

Upload: lambert-burns

Post on 04-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments

Annotation for Hindi PropBank

Page 2: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments

Outline

• Introduction to the project

• Basic linguistic concepts– Verb & Argument– Making information

explicit– Null arguments

• Tasks to be carried out• Tools for annotation• Timesheets, tips• Practice

Page 3: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments

Creation of Resources

• For machines rather than humans• Imagine a dictionary/ thesaurus for computers• A requirement for Natural Language Processing – Large annotated resources

• Annotation implies addition of linguistic information• Tailored to language specific requirements• Needs to be as consistent as possible

– Used for applications like Semantic Role Labelling, Parsing, Word Sense Disambiguation

Page 4: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments

Hindi-Urdu Treebank Project

• One of the first efforts to make a large-scale resource for Hindi-Urdu

• Similar resources exist for Chinese, Arabic and English

• Three main components– Hindi-Urdu dependency treebank– Hindi-Urdu PropBank– Hindi-Urdu phrase structure treebank [derived]

Page 5: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments

PropBank

• PropBank resource creation at CU Boulder• We annotate semantic information on top of

syntactic information• PropBank involves annotation of predicate

argument structure– Mainly concerned with verbs & their arguments– And the semantic nature of the arguments

Page 6: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments

What are verbs?

• Verbs are predicating elements e.g daud, pii, baras etc

• Encode (very broadly) actions and states• Also have two kinds of grammatical

information– Tense, aspect (present, future ; perfect,

continuous)– Gender, number, person (masc/fem; sing, pl; 1st,

2nd, 3rd )

Page 7: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments

What are arguments?

• In a sentence, e.g Ram ate an apple / Raam ne seb khaaya:– A verb, ‘eat’ or ‘khaa’ predicate– A person eating ‘Raam’ ARGUMENT– Thing eaten ‘apple’ / ‘seb’ ARGUMENT

• Without arguments, the meaning of the verb ‘ate’ is not realized completely

• Together, they make up the predicate argument structure of the sentence

Page 8: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments

Arguments show what’s important

• Raam ne jaldi se seb khaaya– Raam, seb are arguments– But ‘jaldi se’ is not

• It’s all about the verb– It projects its need for certain arguments– Sift what’s mandatory from what’s optional

Page 9: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments

Like Unix commands

• Some commands require only one argument.– cd /home/student/ashwini

– cp hmwk1.txt hmwk2.txt

• If the command is typed with too many or too few arguments…

Page 10: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments

Error!

Page 11: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments

Making information explicit

• As speakers of Hindi or English, we already have knowledge of predicate argument structure

• E.g. hari ___ pahuMcaa– Capturing this knowledge for the machine is

essential– Ram ne seb khaaya aur paani piyaa– Who drank the water?

Page 12: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments

Identify arguments

• In PropBank, we first identify arguments of a verb

• When explicitly present, they are called ARG• Further, they are numbered as ARG0, ARG1,

ARG2 etc.• Often, you have ARG as well as ARG-M– RamARG0 ne jaldi seARG-M sebARG1 khaaya

Page 13: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments

Null arguments

• What if arguments are not explicit?– E.g Ram ne seb khaaya aur ___ paani piyaa– Ram is also the person drinking water– It can be dropped, because of conjunction aur– For the machine, it must be retrieved from the

sentence • We also mark these missing or null arguments

Page 14: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments

Tasks to be carried out

• Null argument insertion

• Argument annotation

Page 15: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments

Tools to be used

• Sanchay – GUI for annotators. We use it especially for Null argument insertion

• Use your verbs account to access Sanchay

• Wiki for annotator resources

Page 16: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments

Timesheets & tips

• Being honest about filling out timesheets is quite important

• We can access the amount of time you spend on verbs

• I will ask you to keep track of number of annotations per hour to cross check

• Turn in the timesheets at my CINC mailbox in physical form, with your signature

Page 17: Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments

Practice

• We need to learn about four kinds of empty categories

• Plan to proceed– Recognizing syntactic constructions – Getting familiar with the tool– Practice with the corpus– Q & A based on null argument insertion