syntax and processing it: definite clause grammars in prolog (optional material) john barnden school...
TRANSCRIPT
![Page 1: Syntax and Processing it: Definite Clause Grammars in Prolog (optional material) John Barnden School of Computer Science University of Birmingham Natural](https://reader036.vdocuments.net/reader036/viewer/2022071716/56649e395503460f94b2a945/html5/thumbnails/1.jpg)
Syntax and Processing it:Definite Clause Grammars in Prolog(optional material)
John Barnden
School of Computer ScienceUniversity of Birmingham
NaturNatural al Language ProcessLanguage Processinging 1 1
2014/15 Semester 2
![Page 2: Syntax and Processing it: Definite Clause Grammars in Prolog (optional material) John Barnden School of Computer Science University of Birmingham Natural](https://reader036.vdocuments.net/reader036/viewer/2022071716/56649e395503460f94b2a945/html5/thumbnails/2.jpg)
DCGs: Introduction• A way of writing syntactic recognizers and parsers directly in Prolog.
• We write Prolog rules of a special type. These look very much like CF grammar productions.
Recognition or parsing happens by the normal Prolog computation process.
Different structures can be recognized/created for the same sentence, by the normal alternative-answer process of Prolog: i.e., natural handling of syntactic ambiguity.
• In the parsing case, syntax trees are produced.
Grammatical constraints such as agreement are also easy to include.
• The rules can be translated into ordinary Prolog, but with a lot of extra parameters that are tedious to write and that obscure the main information.
The compiler meta-interprets the rules into normal Prolog.
• Caution: DCGs provide only top-down depth-first parsing, because of Prolog’s approach to using rules.
But other strategies may be better. More on this later.
![Page 3: Syntax and Processing it: Definite Clause Grammars in Prolog (optional material) John Barnden School of Computer Science University of Birmingham Natural](https://reader036.vdocuments.net/reader036/viewer/2022071716/56649e395503460f94b2a945/html5/thumbnails/3.jpg)
DCGs, contd: Recognition• See link on Slides page to a toy recognizer in DCG that you can examine and play with.
• Example DCG rules for recognition of non-terminal categories: s --> np, vp.
np --> noun, pp. np --> det, adj, noun, pp.
• Example DCG rules for recognition of terminal categories: det --> [a]. det --> [an]. det --> [the].
noun --> [cat]. noun --> [dog]. noun --> [dogs].
verb --> [dogs].
(There is another, more economical method.)
• The program can be run in two ways: s([a, dog, sits, on, a, mat], []). np([a, dog], []).
phrase(s, ([a, dog, sits, on, a, mat]). phrase(np,[a,dog]).
• The second argument for s, np etc. is for catching extra words: np([a, dog, sits, on, a], X). Gives X = [sits, on, a].
![Page 4: Syntax and Processing it: Definite Clause Grammars in Prolog (optional material) John Barnden School of Computer Science University of Birmingham Natural](https://reader036.vdocuments.net/reader036/viewer/2022071716/56649e395503460f94b2a945/html5/thumbnails/4.jpg)
Advantage of DCGs over ordinary Prolog• Consider the abstract grammar rules S NP VP NP Det Noun
• Here’s how they could be implemented in ordinary Prolog (for just recognition, but syntax-tree constructing and grammatical-category checking [see later] can be added) :
s(WordList, Residue):-
np(WordList, Residue_to_pass_on), vp(Residue_to_pass_on, Residue).
np(WordList, Residue):-
det(WordList, Residue_to_pass_on), noun(Residue_to_pass_on, Residue).
det([the | Residue], Residue).
noun([dog | Residue], Residue).
• Can be called as in: s([a, dog, sits, on, a dog], []). Exercise: See ordinary-prolog version of the recognizer linked from Slides page.
• Compared to DCG form, have the extra WordList and Residue arguments in every syntactic-category predicate. Tedious, error-prone.
![Page 5: Syntax and Processing it: Definite Clause Grammars in Prolog (optional material) John Barnden School of Computer Science University of Birmingham Natural](https://reader036.vdocuments.net/reader036/viewer/2022071716/56649e395503460f94b2a945/html5/thumbnails/5.jpg)
DCGs: Additions
• Can embed ordinary Prolog within grammar rules.
• Can use disjunction and cuts.
• Can add arguments to the category symbols (np, det, etc.) so as to
– Build syntax trees, i.e. do parsing, not just recognition
– Include “grammatical categories” (used to enforce constraints such as agreement)
– Build semantic structures.
• Will see some of this in following slides.
![Page 6: Syntax and Processing it: Definite Clause Grammars in Prolog (optional material) John Barnden School of Computer Science University of Birmingham Natural](https://reader036.vdocuments.net/reader036/viewer/2022071716/56649e395503460f94b2a945/html5/thumbnails/6.jpg)
DCGs: Parsing
• Add a parameter to each category symbol, delivering a node of the syntax tree:
vp(vp_node(Verb_node, PP_node) ) --> verb(Verb_node), pp(PP_node).
verb(verb_node(sits)) --> [sits].
• The program can again be run in two ways: s(ST, [a, dog, sits, on, a, mat], []).
phrase(s(ST), ([a, dog, sits, on, a, mat]).
• See links on Slides page to toy parsers in DCG that you can examine and play with.
So far: “basic” parser1.
An initial exercise: add new words and new NP rules.
![Page 7: Syntax and Processing it: Definite Clause Grammars in Prolog (optional material) John Barnden School of Computer Science University of Birmingham Natural](https://reader036.vdocuments.net/reader036/viewer/2022071716/56649e395503460f94b2a945/html5/thumbnails/7.jpg)
DCGs: Syntactic Ambiguity
• Suppose we add two extra rules:
vp( vp_node(Verb_node, PP_node1, PP_node2) ) -->
verb(Verb_node), pp(PP_node1), pp(PP_node2).
np( np_node(Det_node, N_node, PP_node) ) -->
det(Det_node), noun(N_node), pp(PP_node).
• Then we get two different structures for
A dog sits on the mat with the flowers.
• Exercise: • Work out by hand what structures you should get, both as drawn syntax trees and as
Prolog forms.
• Try it out using the relevant parser on the Slides page.
![Page 8: Syntax and Processing it: Definite Clause Grammars in Prolog (optional material) John Barnden School of Computer Science University of Birmingham Natural](https://reader036.vdocuments.net/reader036/viewer/2022071716/56649e395503460f94b2a945/html5/thumbnails/8.jpg)
Terminals: A Better Implementation
• verb(verb_node(Word)) --> [Word], {verb_pred(Word)}.
The part in braces is ordinary Prolog.
• Individual verbs are included as follows: verb_pred(sit). verb_pred(sits). verb_pred(hates).
• This is less writing per individual verb, and concentrates the node-building into one place.
• Looks possibly less efficient, because of the extra step.
BUT in modern Prologs it speeds up execution:
by making the DCG terminal symbol call (verb in top line above) deterministic
by making the call of the lexical predicates (verb_pred, etc.) deterministic.
• Exercise: amend one of the toy parsers by using the above method.
![Page 9: Syntax and Processing it: Definite Clause Grammars in Prolog (optional material) John Barnden School of Computer Science University of Birmingham Natural](https://reader036.vdocuments.net/reader036/viewer/2022071716/56649e395503460f94b2a945/html5/thumbnails/9.jpg)
Grammatical Categories• A grammatical category is a dimension along which (some) lexical or syntactic consistuents
can vary in limited, systematic ways, such as (in English):
Number singular or plural: lexically, nouns, verbs, determiners, numerals
Person first, second and third: lexically, only for verbs, nouns and some pronouns
Tense present, past (various forms), future: lexically, only for verbs
Gender M, F, N [neither/neuter]: lexically, only some pronouns and some nouns
• Syntactic constituents can sometimes inherit grammatical category values from their components, e.g. (without showing all possible GC values):
the big dog: 3rd person M/F/N singular NP // the big dogs: 3rd person M/F/N plural NP
we in the carpet trade: 1st person M/F plural NP // you silly idiot: 2nd person M/F singular NP
eloped with the gym teacher: past-tense VP // will go: future-tense VP
the woman with the long hair: female NP // the radio with the red knobs: neuter NP
• A lexical or syntactic constituent can be ambiguous as to a GC value: e.g. sheep: singular/plural; manage: singular/plural 1st/2nd person
![Page 10: Syntax and Processing it: Definite Clause Grammars in Prolog (optional material) John Barnden School of Computer Science University of Birmingham Natural](https://reader036.vdocuments.net/reader036/viewer/2022071716/56649e395503460f94b2a945/html5/thumbnails/10.jpg)
Grammatical Categories in DCGs, contd• Or, using the better lexicon representation:
noun(n_node(Word), gcs(numb(Numb), person(third)) )
--> [Word], {noun_pred(Word, Numb)}.
noun_pred(dog, singular).
noun_pred(dogs, plural).
![Page 11: Syntax and Processing it: Definite Clause Grammars in Prolog (optional material) John Barnden School of Computer Science University of Birmingham Natural](https://reader036.vdocuments.net/reader036/viewer/2022071716/56649e395503460f94b2a945/html5/thumbnails/11.jpg)
Grammatical Categories in DCGs, contd• Enforcing agreement in an NP syntax rule:
np(np_node(Det_node, N_node), gcs(Number_gc, Person_gc) )
--> det(Det_node, gcs(Number_gc, Person_gc) ),
noun(n_node, gcs(Number_gc, Person_gc) ).
OR more simply, if don’t need to enforce a particular shape to gcs(...):
np(np_node(Det_node, N_node), GCs)
--> det(Det_node, GCs), noun(n_node, GCs).
• Enforcing subject-NP / VP agreement (NB: doesn’t handle the case GC) s(s_node(NP_node, VP_node), GCs)
--> np(NP_node, GCs), vp(VP_node, GCs).
![Page 12: Syntax and Processing it: Definite Clause Grammars in Prolog (optional material) John Barnden School of Computer Science University of Birmingham Natural](https://reader036.vdocuments.net/reader036/viewer/2022071716/56649e395503460f94b2a945/html5/thumbnails/12.jpg)
Grammatical Categories in DCGs, contd• Not enforcing agreement within part of a VP rule:
vp(vp_node(Verb_node, PP_node), GCs )
--> verb(Verb_node, GCs), pp(PP_node).
OR if you needed PP to return some GCs that didn’t matter:
vp(vp_node(Verb_node, PP_node), GCs )
--> verb(Verb_node, GCs), pp(PP_node, _ ).
• Exercise: understand and play around with the GC version of the parser linked from Slides page.
• The program can again be run in two ways: s(ST, GCs, [a, dog, sits, on, a, mat], []).
phrase(s(ST, GCs), ([a, dog, sits, on, a, mat]).