annotation for hindi propbank. outline introduction to the project basic linguistic concepts –...
TRANSCRIPT
Annotation for Hindi PropBank
Outline
• Introduction to the project
• Basic linguistic concepts– Verb & Argument– Making information
explicit– Null arguments
• Tasks to be carried out• Tools for annotation• Timesheets, tips• Practice
Creation of Resources
• For machines rather than humans• Imagine a dictionary/ thesaurus for computers• A requirement for Natural Language Processing – Large annotated resources
• Annotation implies addition of linguistic information• Tailored to language specific requirements• Needs to be as consistent as possible
– Used for applications like Semantic Role Labelling, Parsing, Word Sense Disambiguation
Hindi-Urdu Treebank Project
• One of the first efforts to make a large-scale resource for Hindi-Urdu
• Similar resources exist for Chinese, Arabic and English
• Three main components– Hindi-Urdu dependency treebank– Hindi-Urdu PropBank– Hindi-Urdu phrase structure treebank [derived]
PropBank
• PropBank resource creation at CU Boulder• We annotate semantic information on top of
syntactic information• PropBank involves annotation of predicate
argument structure– Mainly concerned with verbs & their arguments– And the semantic nature of the arguments
What are verbs?
• Verbs are predicating elements e.g daud, pii, baras etc
• Encode (very broadly) actions and states• Also have two kinds of grammatical
information– Tense, aspect (present, future ; perfect,
continuous)– Gender, number, person (masc/fem; sing, pl; 1st,
2nd, 3rd )
What are arguments?
• In a sentence, e.g Ram ate an apple / Raam ne seb khaaya:– A verb, ‘eat’ or ‘khaa’ predicate– A person eating ‘Raam’ ARGUMENT– Thing eaten ‘apple’ / ‘seb’ ARGUMENT
• Without arguments, the meaning of the verb ‘ate’ is not realized completely
• Together, they make up the predicate argument structure of the sentence
Arguments show what’s important
• Raam ne jaldi se seb khaaya– Raam, seb are arguments– But ‘jaldi se’ is not
• It’s all about the verb– It projects its need for certain arguments– Sift what’s mandatory from what’s optional
Like Unix commands
• Some commands require only one argument.– cd /home/student/ashwini
– cp hmwk1.txt hmwk2.txt
• If the command is typed with too many or too few arguments…
Error!
Making information explicit
• As speakers of Hindi or English, we already have knowledge of predicate argument structure
• E.g. hari ___ pahuMcaa– Capturing this knowledge for the machine is
essential– Ram ne seb khaaya aur paani piyaa– Who drank the water?
Identify arguments
• In PropBank, we first identify arguments of a verb
• When explicitly present, they are called ARG• Further, they are numbered as ARG0, ARG1,
ARG2 etc.• Often, you have ARG as well as ARG-M– RamARG0 ne jaldi seARG-M sebARG1 khaaya
Null arguments
• What if arguments are not explicit?– E.g Ram ne seb khaaya aur ___ paani piyaa– Ram is also the person drinking water– It can be dropped, because of conjunction aur– For the machine, it must be retrieved from the
sentence • We also mark these missing or null arguments
Tasks to be carried out
• Null argument insertion
• Argument annotation
Tools to be used
• Sanchay – GUI for annotators. We use it especially for Null argument insertion
• Use your verbs account to access Sanchay
• Wiki for annotator resources
Timesheets & tips
• Being honest about filling out timesheets is quite important
• We can access the amount of time you spend on verbs
• I will ask you to keep track of number of annotations per hour to cross check
• Turn in the timesheets at my CINC mailbox in physical form, with your signature
Practice
• We need to learn about four kinds of empty categories
• Plan to proceed– Recognizing syntactic constructions – Getting familiar with the tool– Practice with the corpus– Q & A based on null argument insertion