natural language processing: parsing

Post on 19-Jan-2015

716 Views

Category:

Education

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

This lecture talks about parsing. Briefly gives overview on lexicon, categorization, grammar rules, syntactic tree, word senses and various challenges of natural language processing

TRANSCRIPT

Artificial Intelligence

Natural Language Processing: Parsing

Rushdi ShamsComputational Linguistics Lab

Western University.rshams@uwo.ca

Natural Language

• Natural Language means any language we speak

• We need to process natural language (in text, speech, etc.) so that machine can exploit it.

• Applications: numerous!– Watson (Jeopardy)– MS Word

Parsing

• The first task for any NLP-based system is to read (or to parse) the text

• Parsing depends on three components of a language-

1. Lexicon2. Categorization3. Grammar Rules

Rushdi Shams, Dept of CSE, KUET, Bangladesh 4

Lexicon stench | breeze | glitter | nothing | wumpus | pit | pits | gold | east | ..

is | see | smell | shoot | feel | stinks | go | grab | carry | kill | turn | …

right | left | east | south | back | smelly | …

here | there | nearby | ahead | right | left | east | south | back | …

me | you | I | it | S=HE | Y’ALL …

John | Mary | Boston | UCB | PAJC | …

the | a | an | …

to | in | on | near | …

and | or | but | …

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Rushdi Shams, Dept of CSE, KUET, Bangladesh 5

CategorizationNoun > stench | breeze | glitter | nothing | wumpus | pit | pits | gold |

east | ..

Verb > is | see | smell | shoot | feel | stinks | go | grab | carry | kill | turn | …

Adjective > right | left | east | south | back | smelly | …

Adverb > here | there | nearby | ahead | right | left | east | south | back | …

Pronoun > me | you | I | it | S=HE | Y’ALL …

Name > John | Mary | Boston | UCB | PAJC | …

Article > the | a | an | …

Preposition > to | in | on | near | …

Conjunction > and | or | but | …

Digit > 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Grammar Rules

• “The large cat”• This phrase can be parsed by an NLP-system if

it has a grammar likeNoun Phrase -> Determiner + Adjective + Noun

• If your system finds a phrase or sentence that has a pattern not mentioned in its set of Grammar Rules it won’t be able to parse them.

Rushdi Shams, Dept of CSE, KUET, Bangladesh 7

Therefore...

• Parsing is the process of using grammar rules to determine whether a sentence is legal,

• and to obtain its Syntactic Tree

Syntactic Tree

‘The large cat eats the small rat’

http://www.digitalenema.com/2012_07_01_archive.html

Rushdi Shams, Dept of CSE, KUET, Bangladesh 9

The large cat eats the small rat

Syntactic Tree

Rushdi Shams, Dept of CSE, KUET, Bangladesh 10

The large cat

Article adjective noun

Article adjective noun

eats the small rat

Syntactic Tree

Verb

Rushdi Shams, Dept of CSE, KUET, Bangladesh 11

The large cat

Article adjective noun noun phrase

Article adjective noun

eats the small rat

Syntactic Tree

Verb

Rushdi Shams, Dept of CSE, KUET, Bangladesh 12

The large cat

Article adjective noun Verb noun phrase

Article adjective noun

Noun phrase

eats the small rat

Syntactic Tree

Rushdi Shams, Dept of CSE, KUET, Bangladesh 13

The large cat

Article adjective noun Verb noun phrase

Article adjective noun

Noun phrase verb phrase

eats the small rat

Syntactic Tree

Rushdi Shams, Dept of CSE, KUET, Bangladesh 14

The large cat

Article adjective noun Verb noun phrase

Article adjective noun

Noun phrase verb phrase

sentence

eats the small rat

Syntactic Tree

Label Bracketing

Rushdi Shams, Dept of CSE, KUET, Bangladesh 15

• It is a process of representing the syntactic tree in another way.

Rushdi Shams, Dept of CSE, KUET, Bangladesh 16

Do yourself: Label Bracket the tree

17

Evaluation of Parsing

• The two most frequent and basic measures to evaluate parsing:

18

Precision, Recall, and F1-Score

• The notions are much clearer with a contingency table-

Evaluation of Parsing

However…

http://www.cafepress.com/barrysworld/1486105

And…

Rushdi Shams, Dept of CSE, KUET, Bangladesh 24

Ambiguity

• There are 2 types of ambiguity-1. Lexical Ambiguity: Sentence

contains an idiom/word/term that has more than one meaning.Glasses means both drinking glasses and spectacles

Rushdi Shams, Dept of CSE, KUET, Bangladesh 25

Ambiguity

2. Structural Ambiguity: Sentence has more than one syntactic treeI saw the boy with the telescope

Did you see the boy with a telescope? OrDid you see the boy who was having a telescope?

Rushdi Shams, Dept of CSE, KUET, Bangladesh 26

Structural Ambiguity

Rushdi Shams, Dept of CSE, KUET, Bangladesh 27

Ambiguity

• Which of the following examples have lexical ambiguity and which of them carry structural ambiguity; justify-

1. The painter put on another coat2. We like flying planes3. Visiting relatives can be tiresome

Rushdi Shams, Dept of CSE, KUET, Bangladesh 28

Ambiguity

• He wrote the note yesterday• You mean you carried the information by a

bus?• Connecting wires are tiring in electronics lab• Squad helps dog bite victim

Word Sense

• Most of the lexical ambiguity arises from the differences in word sense.

• Word senses vary due to several factors:– Synonymy– Antonymy– Homonymy– Polysemy and– Heteronymy

Rushdi Shams, Dept of CSE, KUET, Bangladesh 30

Synonymy

• Synonyms are different words (or sometimes phrases) with identical or very similar meanings.

• Words that are synonyms are said to be synonymous, and the state of being a synonym is called synonymy

Rushdi Shams, Dept of CSE, KUET, Bangladesh 31

Synonymy

• student and pupil (noun)• buy and purchase (verb)• sick and ill (adjective)• quickly and speedily (adverb)• on and upon (preposition)

Rushdi Shams, Dept of CSE, KUET, Bangladesh 32

Synonymy is a relation between senses rather than words

• Note that synonyms are defined with respect to certain senses of words

• pupil as the "aperture in the iris of the eye" is not synonymous with student.

• Similarly, he expired means the same as he died, yet my passport has expired cannot be replaced by my passport has died.

Synonymy is a relation between senses rather than words

• Consider the words big and large• Are they synonyms?:

– How big is the plane?– Are we travelling with a large or small plane?

• How about?:– Mrs Benjamin became a big sister of him– Mrs Benjamin became a large sister of him

Rushdi Shams, Dept of CSE, KUET, Bangladesh 34

Heteronymy

• heteronyms (also known as heterophones) are words with – identical spellings (or characters) – but different pronunciations and meanings.

Rushdi Shams, Dept of CSE, KUET, Bangladesh 35

Antonymy

• Antonyms are words with opposite or nearly opposite meanings.

• short and tall• dead and alive• increase and decrease

Rushdi Shams, Dept of CSE, KUET, Bangladesh 36

Homonymy

• A homonym is one of a group of words that – share the same spelling but– Have different distinct meaning

• Bank (Financial Institute) vs Bank (Sloping Land)• Bat (A club for hitting the ball) vs Bat (Mammal)

• Homographs (Bank/Bank, Bat/Bat)• Homophones (Right/Write, Piece/Peace)

Polysemy

• Homonymous words that are related with each other– The bank was constructed in 1971 (building

related to a financial institute)– I draw money from the bank (financial institute)

Hypernymy and Hyponymy

• Superclass-subclass structure– Car is a hypernym of Honda– Honda is a hyponym of Car

Zeugma Test

• A test to see whether or not two words have the same sense– Which flight does serve breakfast?– Does Lufthansa serve Philadelphia?

• Simply make a conjunction:– Does Lufthansa serve breakfast and Philadelphia?

WordNet 3.0• A hierarchically organized lexical database• On-line thesaurus + aspects of a dictionary

• Some other languages available or under development– (Arabic, Finnish, German, Portuguese…)

Category Unique StringsNoun 117,798Verb 11,529Adjective 22,479Adverb 4,481

Senses of “bass” in Wordnet

WordNet Hypernym Hierarchy for “bass”

WordNet Noun Relations

WordNet 3.0

• Where it is:– http://wordnetweb.princeton.edu/perl/webwn

• Libraries– Python: WordNet from NLTK

• http://www.nltk.org/Home– Java:

• JWNL, extJWNL on sourceforge

Rushdi Shams, Dept of CSE, KUET, Bangladesh 45

Difficulties with Natural Language:Anaphora

• Using pronouns to refer back to entities already introduced in the text

– After Mary proposed to John, they found a preacher and got married. For the honeymoon, they went to Hawaii

– Mary saw a ring through the window and asked John for it

– Mary threw a rock at the window and broke it

Rushdi Shams, Dept of CSE, KUET, Bangladesh 46

Difficulties with Natural Language:Indexicality

• Indexical sentences refer to utterance situation (place, time, etc.)

– I am over here– Why did you do that?

Rushdi Shams, Dept of CSE, KUET, Bangladesh 47

Difficulties with Natural Language:Metonymy

• Using one noun phrase to stand for another

– I've read Shakespeare– Chrysler announced record profits– The ham sandwich on Table 4 wants

another beer

Rushdi Shams, Dept of CSE, KUET, Bangladesh 48

Difficulties with Natural Language:Metaphor

• “Non-literal" usage of words and phrases, often systematic.

– I've tried killing the process but it won't die. Its parent keeps it alive.

Summary

• The components of a language– Lexicon– Categorization– Grammar rules

• Syntactic Tree• Label Bracketing• Evaluation of Parsing• Word sense• Problem of Parsing

top related