1 inteligenta artificiala universitatea politehnica bucuresti anul universitar 2003-2004 adina magda...
Embed Size (px)
TRANSCRIPT

1
Inteligenta ArtificialaInteligenta Artificiala
Universitatea Politehnica BucurestiAnul universitar 2003-2004
Adina Magda Florea
http://turing.cs.pub.ro/ia_2005

2
Curs nr. 12
Prelucrarea limbajului natural
(Natural Language Processing)
2

3
Defining Languages with Defining Languages with Backus-Naur Form (BNF)Backus-Naur Form (BNF)
A formal language is defined as a set of strings, where each string is a sequence of symbols
All the languages consist of an infinite set of strings need a concise way to characterize the set use a grammar
Terminal Symbols – Symbols or words that make up the strings of the languageExample– Set of symbols for the language of simple arithmetic
expressions– {0,1,2,3,4,5,6,7,8,9,+,-,*,/,(,)}

4
Components in a BNF Grammar Components in a BNF Grammar
Nonterminal Symbols– Categorize subphrases of the language
Example– The nonterminal symbol NP (NounPhrase)
denotes an infinite set of strings, including “you” and “the big dog”

5
Components in a BNF GrammarComponents in a BNF Grammar
Start Symbol– Nonterminal symbol that denotes the complete
strings of the language
Set of rewrite rules or productions– LHS RHS– LHS is a nonterminal– RHS is a sequence of zero or more symbols
(either terminal or nonterminal)

6
Example: BNF Grammar for Simple Arithmetic Expressions
Exp Exp Operator Exp | (Exp) | Number
Number Digit | Number Digit
Digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9Operator + | - | * | /

7
The Component Steps of The Component Steps of CommunicationCommunication
A typical communication, in which the speaker S wants to transmit the proposition P to the hearer H using words W, is composed of 7 processes.
3 take place in the speaker
4 take place in the hearer

8
Processes in the SpeakerProcesses in the Speaker
Intention– S wants H to believe P (where S typically
believes P) Generation
– S chooses the words W (because they express the meaning P)
Synthesis – S tells the words W (usually addressing them to
H)

9
Processes in the HearerProcesses in the Hearer
Perception– H perceives W’ (ideally W’ = W, but
misperception is possible)
Analysis – H infers that W’ has possible meanings P1,
…,Pn (words and phrases can have several meanings)

10
Processes in the Hearer Processes in the Hearer
Disambiguation– H infers that S intended to express Pi
(where ideally Pi = P, but misinterpretation is possible)
Incorporation– H decides to believe Pi (or rejects it if it is
out of line with what H already believes)

11
ObservationsObservations
If the perception refers to spoken expressions, this is speech recognition
If the perception refers to hand written expressions, this is recognition of hand writing
Neural networks have been successfully used to both speech recognition and to hand writing recognition

12
Observations Observations
The analysis, disambiguation and incorporation form natural language understanding are relying on the assumption that the words of the sentence are known
Many times, recognition of individual words may be driven by the sentence structure, so perception and analysis interact, as well as analysis, disambiguation, and incorporation

13
Defining a GrammarDefining a Grammar
Lexicon - list of allowable vocabulary words, grouped in categories (parts of speech):– open classes - words are added to the
category all the time (natural language is dynamic, it constantly evolves)
– closed classes - small number of words, generally it is not expected that other words will be added

14
Example - A Small Lexicon
Noun stench | breeze | wumpus ..Verb is | see | smell ..Adjective right | left | smelly …Adverb here | there | ahead …Pronoun me | you | I | itRelPronoun that | whoName John | Mary Article the | a | an Preposition to | in | on Conjunction and | or | but

15
The Grammar Associated to the The Grammar Associated to the LexiconLexicon
Combine the words into phrases Use nonterminal symbols to define
different kinds of phrases– sentence S– noun phrase NP– verb phrase VP– prepositional phrase PP– relative clause RelClause

16
Example - The Grammar Associated to the Lexicon
S NP VP | S Conjunction SNP Pronoun | Noun | Article Noun |
NP PP | NP RelClauseVP Verb | VP NP | VP Adjective |
VP PP | VP AdverbPP Preposition NPRelClause RelPronoun VP

17
Syntactic Analysis (Parsing)Syntactic Analysis (Parsing)
Parsing is the problem of constructing a derivation tree for an input string from a formal definition of a grammar.
Parsing algorithms may be divided into two classes:– top-down parsing– bottom-up parsing

18
Top-Down ParsingTop-Down Parsing
Start with the top-level sentence symbol and attempt to build a tree whose leaves match the target sentence's words (the terminals)
Better if many alternative terminal symbols for each word
Worse if many alternative rules for a phrase

19
Example for Top-Down Parsing
"John hit the ball" 1. S 2. S NP, VP 3. S Noun, VP 4. S John, Verb, NP 5. S John, hit, NP 6. S John, hit, Article, Noun 7. S John, hit, the, Noun 8. S John, hit, the, ball

20
Bottom-Up ParsingBottom-Up Parsing
Start with the words in the sentence (the terminals) and attempt to find a series of reductions that yield the sentence symbol
Better if many alternative rules for a phrase
Worse if many alternative terminal symbols for each word

21
Example for Bottom-Up Parsing
1. John, hit, the, ball 2. Noun, hit, the, ball 3. Noun, Verb, the, ball 4. Noun, Verb, Article, ball 5. Noun, Verb, Article, Noun 6. NP, Verb, Article, Noun 7. NP, Verb, NP 8. NP, VP 9. S

22
Definite Clause Grammar (DCG)Definite Clause Grammar (DCG)
Problems with BNF Grammar– BNF only talks about strings, not meanings– Want to describe context-sensitive
grammars, but BNF is context-free Introduce a formalism that can handle
both of these problems Use the first-order logic to talk about
strings and their meanings

23
Definite Clause Grammar (DCG)Definite Clause Grammar (DCG)
We are interested in using language for communication need some way of associating a meaning with each string
Each nonterminal symbol becomes a one-place predicate that is true of strings that are phrases of that category
Example– Noun(“ball”) is a true logical sentence– Noun(“the”) is a false logical sentence

24
Definite Clause Grammar (DCG)Definite Clause Grammar (DCG)
A definite clause grammar (DCG) is a grammar in which every sentence must be a definite clause.
A definite clause is a type of Horn clause that, when written as an implication, has exactly one atom in the conclusion and a conjunction of zero or more atoms in the hypothesis, for example A1 A2 … C1

25
Example 1
In BNF notation, we have: S NP VP
In First-Order Logic notation, we have:NP(s1) VP(s2) S(Append(s1, s2))
We read: If there is a string s1 that is a noun phrase and a string s2 that is a verb phrase, then the string formed by appending them together is a sentence

26
Example 2
In BNF notation, we have: Noun ball | book
In First-Order Logic notation, we have:(s = “ball” s = “book”) Noun(s)
We read: If s is the string “ball” or the string “book”, then the string s is a noun

27
Rules to Translate BNF in DCGRules to Translate BNF in DCG
BNF DCG
X Y Z Y(s1) Z(s2) X(Append(s1,s2))
X word X(["word"])
X Y | Z Y(s) X(s) Z(s) X(s)

28
Augmenting the DCGAugmenting the DCG
Extend the notation to incorporate grammars that can not be expressed in BNF
Nonterminal symbols can be augmented with extra arguments

29
Augmenting the DCG Augmenting the DCG Add one argument for semanticsAdd one argument for semantics
In DCG, the nonterminal NP translates as a one-place predicate where the single argument is a string: NP(s)
In the augmented DCG, we can write NP(sem) to express “an NP with semantics sem”. This gets translated into logic as the two-place predicate NP(sem, s)

30
Augmenting the DCG Augmenting the DCG Add one argument for semanticsAdd one argument for semantics
DCG FOPL PROLOG
S(sem) NP(sem1) VP(sem2) {compose(sem1, sem2, sem)}
NP(s1, sem1) VP(s2, sem2) S(append(s1, s2)), compose(sem1, sem2, sem)
See later on

31
Semantic InterpretationSemantic Interpretation
Compositional semantics - the semantics of any phrase is a function of the semantics of its subphrases; it does not depend on any other phrase before, after, or encompassing the given phrase
But natural languages does not have a compositional semantics for the general case.

sentence(S, Sem) :- np(S1, Sem1), vp(S2, Sem2), append(S1, S2, S), Sem = [Sem1 | Sem2].
np([S1, S2], Sem) :- article(S1), noun(S2, Sem).
vp([S], Sem) :- verb(S, Sem1), Sem = [property, Sem1].
vp([S1, S2], Sem) :- verb(S1), adjective(S2, color, Sem1),Sem = [color, Sem1].
vp([S1, S2], Sem) :- verb(S1), noun(S2, Sem1), Sem = [parts, Sem1].

33
Problems with Augmented DCGProblems with Augmented DCG
The previous grammar will generate sentences that are not grammatically correct
NL is not a context free language Must deal with
– cases– agreement between subject and main verb in the
sentence (predicate)– verb subcategorization: the complements that a
verb can accept

34
SolutionSolution
Augment the existing rules of the grammar to deal with context issues
Start by parameterizing the categories NP and Pronoun so that they take a parameter indicating their case

CASESNominative case (subjective case) + agreementI take the bus Je prends l’autobus Eu iau autobuzulYou take the bus Tu prends l’autobus Tu iei autobuzulHe takes the bus Il prend l’autobus El ia autobuzul Accusative case (objective case)He gives me the book Il me donne le livre El imi da cartea
Dative case
You are talking to me Il parle avec moi El vorbeste cu mine

36
Example - The Grammar Using Augmentations to Represent Noun Cases
S NP(Subjective) VPNP(case) Pronoun (case) | Noun | Article NounPronoun(Subjective) I | you | he | shePronoun(Objective) me | you | him | her

37
sentence(S) :- np(S1,subjective), vp(S2),append(S1, S2, S).
np([S], Case) :- pronoun(S, Case).np([S], _ ) :- noun(S).np([S1, S2], _ ) :- article(S1), noun(S2).pronoun(i, subjective).pronoun(you, _ ).pronoun(he, subjective).pronoun(she, subjective).pronoun(me, objective).pronoun(him, objective).pronoun(her, objective).

38
Verb SubcategorizationVerb Subcategorization
Augment the DCG with a new parameter to describe the verb subcategorization
The grammar must state which verbs can be followed by which other categories. This is the subcategorization information for the verb
Each verb has a list of complements

39
Integrate Verb Subcategorization Integrate Verb Subcategorization into the Grammarinto the Grammar
A subcategorization list is a list of complement categories that the verb accepts
Augment the category VP to take a subcategorization argument that indicates the complements that are needed to form a complete VP

40
Integrate Verb Subcategorization Integrate Verb Subcategorization into the Grammarinto the Grammar
Change the rule for S to say that it requires a verb phrase that has all its complements, and thus a subcategorization list of [ ]
Rule S NP(Subjective) VP([ ])– The rule can be read as “A sentence can
be composed of a NP in the subjective case, followed by a VP which has a null subcategorization list “

41
Integrate Verb Subcategorization Integrate Verb Subcategorization into the Grammarinto the Grammar
– Verb phrases can take adjuncts, which are phrases that are not licensed by the individual verb, but rather may appear in any verb phrase
– Phrases representing time and place are adjuncts, because almost any action or event can have a time or a place
VP(subcat) VP(subcat) PP| VP(subcat) Adverb
I smell the wumpus now–

42
VP(subcat) VP([NP | subcat]) NP(Objective)| VP([Adjective | subcat]) Adjective| VP ([PP | subcat]) PP| Verb(subcat)| VP(subcat) PP| VP(subcat) Adverb
The first line can be read as “A VP, with a given subcategorization list, subcat, can be formed by a VP followed by a NP in the objective case, as long as that VP has a subcategorization list that starts with the symbol NP and is followed by the elements of the list subcat ”

43
give [NP, PP] give the gold in box to me[NP, NP] give me the gold
smell [NP] smell a wumpus[Adjective] smell awfull[PP] smell like a wumpus
is [Adjective] is smelly[PP] is in box[NP] is a pit
died [] died
believe [S] believe the wumpus is dead

VP(subcat) VP([NP | subcat]) NP(Objective)| VP([Adjective | subcat]) Adjective| VP ([PP | subcat]) PP| Verb(subcat)| VP(subcat) PP| VP(subcat) Adverb
vp(S, [np | Subcat]) :- vp(S1, [np | Subcat]), np(S2, objective),
append(S1, S2, S).
vp(give, [np, pp]).vp(give, [np, np]). vp(smell, [np]).vp(smell,[adjective]).vp(smell,[pp]).

But dangerous to translateVP(subcat) VP(subcat) PP
Solutionvp(S, Subcat) :- vp1(S1, Subcat), pp(S2), append(S1, S2, S).

46
Generative Capacity of Generative Capacity of Augmented GrammarsAugmented Grammars
The generative capacity of augmented grammars depends on the number of values for the augmentations
If there is a finite number, then the augmented grammar is equivalent to a context-free grammar

47
Semantic InterpretationSemantic Interpretation
The semantic interpretation is responsible for getting all possible interpretations, and disambiguation is responsible for choosing the best one.
Disambiguation is done starting from the pragmatic interpretation of the sentence.

48
Pragmatic InterpretationPragmatic Interpretation
Complete the semantic interpretation by adding information about the current situation
Pragmatics shows how the language is used and its effects on the listener
Pragmatics will tell why it is not appropriate to answer "Yes" to the question "Do you know what time it is?"

49
IndexicalsIndexicals
Indexical - phrase that refer directly to the current situation
Example– I am in Bucharest today.

50
AnaphoraAnaphora
Anaphora - the occurrence of phrases referring to objects that have been mentioned previously
Example
– John was hungry. He entered a restaurant.
– The ball hit the house. It broke the window.

51
AmbiguityAmbiguity
Lexical Ambiguity Syntactic Ambiguity Referential Ambiguity Pragmatic Ambiguity

52
Lexical AmbiguityLexical Ambiguity
A word has more than one meaning Examples
– A clear sky– A clear profit– The way is clear– John is clear– It is clear that ...

53
Syntactic AmbiguitySyntactic Ambiguity
Can occur with or without lexical ambiguity
Examples– I saw the Statue of Liberty flying over New
York.– I saw John in a restaurant with a telescope.

54
Referential AmbiguityReferential Ambiguity
Occurs because natural languages consist almost entirely of words for categories, not for individual objects
Example– John met Mary and Tom. They went to a
restaurant.– Block A is on block B and it is not clear.

55
Pragmatic AmbiguityPragmatic Ambiguity
Occurs when the speaker and the hearer disagree on what the current situation is
Example– I will meet you tomorrow.