syntax and processing it: definite clause grammars in prolog (optional material) john barnden school...

12
Syntax and Processing it: Definite Clause Grammars in Prolog (optional material) John Barnden School of Computer Science University of Birmingham Natur Natur al al Language Process Language Process ing ing 1 1 2014/15 Semester 2

Upload: kelly-ray

Post on 27-Dec-2015

225 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Syntax and Processing it: Definite Clause Grammars in Prolog (optional material) John Barnden School of Computer Science University of Birmingham Natural

Syntax and Processing it:Definite Clause Grammars in Prolog(optional material)

John Barnden

School of Computer ScienceUniversity of Birmingham

NaturNatural al Language ProcessLanguage Processinging 1 1

2014/15 Semester 2

Page 2: Syntax and Processing it: Definite Clause Grammars in Prolog (optional material) John Barnden School of Computer Science University of Birmingham Natural

DCGs: Introduction• A way of writing syntactic recognizers and parsers directly in Prolog.

• We write Prolog rules of a special type. These look very much like CF grammar productions.

  Recognition or parsing happens by the normal Prolog computation process.

  Different structures can be recognized/created for the same sentence, by the normal alternative-answer process of Prolog: i.e., natural handling of syntactic ambiguity.

• In the parsing case, syntax trees are produced.

  Grammatical constraints such as agreement are also easy to include.

• The rules can be translated into ordinary Prolog, but with a lot of extra parameters that are tedious to write and that obscure the main information.

  The compiler meta-interprets the rules into normal Prolog.

• Caution: DCGs provide only top-down depth-first parsing, because of Prolog’s approach to using rules.

  But other strategies may be better. More on this later.

Page 3: Syntax and Processing it: Definite Clause Grammars in Prolog (optional material) John Barnden School of Computer Science University of Birmingham Natural

DCGs, contd: Recognition• See link on Slides page to a toy recognizer in DCG that you can examine and play with.

• Example DCG rules for recognition of non-terminal categories:  s --> np, vp.

  np --> noun, pp. np --> det, adj, noun, pp.

• Example DCG rules for recognition of terminal categories:  det --> [a]. det --> [an]. det --> [the].

  noun --> [cat]. noun --> [dog]. noun --> [dogs].

  verb --> [dogs].

  (There is another, more economical method.)

• The program can be run in two ways:  s([a, dog, sits, on, a, mat], []). np([a, dog], []).

  phrase(s, ([a, dog, sits, on, a, mat]). phrase(np,[a,dog]).

• The second argument for s, np etc. is for catching extra words:  np([a, dog, sits, on, a], X). Gives X = [sits, on, a].

Page 4: Syntax and Processing it: Definite Clause Grammars in Prolog (optional material) John Barnden School of Computer Science University of Birmingham Natural

Advantage of DCGs over ordinary Prolog• Consider the abstract grammar rules S NP VP NP Det Noun

• Here’s how they could be implemented in ordinary Prolog (for just recognition, but syntax-tree constructing and grammatical-category checking [see later] can be added) :

  s(WordList, Residue):-

  np(WordList, Residue_to_pass_on), vp(Residue_to_pass_on, Residue).

  np(WordList, Residue):-

  det(WordList, Residue_to_pass_on), noun(Residue_to_pass_on, Residue).

  det([the | Residue], Residue).

  noun([dog | Residue], Residue).

• Can be called as in: s([a, dog, sits, on, a dog], []).  Exercise: See ordinary-prolog version of the recognizer linked from Slides page.

• Compared to DCG form, have the extra WordList and Residue arguments in every syntactic-category predicate. Tedious, error-prone.

Page 5: Syntax and Processing it: Definite Clause Grammars in Prolog (optional material) John Barnden School of Computer Science University of Birmingham Natural

DCGs: Additions

• Can embed ordinary Prolog within grammar rules.

• Can use disjunction and cuts.

• Can add arguments to the category symbols (np, det, etc.) so as to

– Build syntax trees, i.e. do parsing, not just recognition

– Include “grammatical categories” (used to enforce constraints such as agreement)

– Build semantic structures.

• Will see some of this in following slides.

Page 6: Syntax and Processing it: Definite Clause Grammars in Prolog (optional material) John Barnden School of Computer Science University of Birmingham Natural

DCGs: Parsing

• Add a parameter to each category symbol, delivering a node of the syntax tree:

  vp(vp_node(Verb_node, PP_node) ) --> verb(Verb_node), pp(PP_node).

  verb(verb_node(sits)) --> [sits].

• The program can again be run in two ways:  s(ST, [a, dog, sits, on, a, mat], []).

  phrase(s(ST), ([a, dog, sits, on, a, mat]).

• See links on Slides page to toy parsers in DCG that you can examine and play with.

  So far: “basic” parser1.

  An initial exercise: add new words and new NP rules.

Page 7: Syntax and Processing it: Definite Clause Grammars in Prolog (optional material) John Barnden School of Computer Science University of Birmingham Natural

DCGs: Syntactic Ambiguity

• Suppose we add two extra rules:

  vp( vp_node(Verb_node, PP_node1, PP_node2) ) -->

  verb(Verb_node), pp(PP_node1), pp(PP_node2).

  np( np_node(Det_node, N_node, PP_node) ) -->

  det(Det_node), noun(N_node), pp(PP_node).

• Then we get two different structures for

  A dog sits on the mat with the flowers.

• Exercise: • Work out by hand what structures you should get, both as drawn syntax trees and as

Prolog forms.

• Try it out using the relevant parser on the Slides page.

Page 8: Syntax and Processing it: Definite Clause Grammars in Prolog (optional material) John Barnden School of Computer Science University of Birmingham Natural

Terminals: A Better Implementation

• verb(verb_node(Word)) --> [Word], {verb_pred(Word)}.

  The part in braces is ordinary Prolog.

• Individual verbs are included as follows:  verb_pred(sit). verb_pred(sits). verb_pred(hates).

• This is less writing per individual verb, and concentrates the node-building into one place.

• Looks possibly less efficient, because of the extra step.

  BUT in modern Prologs it speeds up execution:

  by making the DCG terminal symbol call (verb in top line above) deterministic

  by making the call of the lexical predicates (verb_pred, etc.) deterministic.

• Exercise: amend one of the toy parsers by using the above method.

Page 9: Syntax and Processing it: Definite Clause Grammars in Prolog (optional material) John Barnden School of Computer Science University of Birmingham Natural

Grammatical Categories• A grammatical category is a dimension along which (some) lexical or syntactic consistuents

can vary in limited, systematic ways, such as (in English):

  Number singular or plural: lexically, nouns, verbs, determiners, numerals

  Person first, second and third: lexically, only for verbs, nouns and some pronouns

  Tense present, past (various forms), future: lexically, only for verbs

  Gender M, F, N [neither/neuter]: lexically, only some pronouns and some nouns

• Syntactic constituents can sometimes inherit grammatical category values from their components, e.g. (without showing all possible GC values):

  the big dog: 3rd person M/F/N singular NP // the big dogs: 3rd person M/F/N plural NP

  we in the carpet trade: 1st person M/F plural NP // you silly idiot: 2nd person M/F singular NP

  eloped with the gym teacher: past-tense VP // will go: future-tense VP

  the woman with the long hair: female NP // the radio with the red knobs: neuter NP

• A lexical or syntactic constituent can be ambiguous as to a GC value:   e.g. sheep: singular/plural; manage: singular/plural 1st/2nd person

Page 10: Syntax and Processing it: Definite Clause Grammars in Prolog (optional material) John Barnden School of Computer Science University of Birmingham Natural

Grammatical Categories in DCGs, contd• Or, using the better lexicon representation:

  noun(n_node(Word), gcs(numb(Numb), person(third)) )

  --> [Word], {noun_pred(Word, Numb)}.

  noun_pred(dog, singular).

  noun_pred(dogs, plural).

Page 11: Syntax and Processing it: Definite Clause Grammars in Prolog (optional material) John Barnden School of Computer Science University of Birmingham Natural

Grammatical Categories in DCGs, contd• Enforcing agreement in an NP syntax rule:

  np(np_node(Det_node, N_node), gcs(Number_gc, Person_gc) )

  --> det(Det_node, gcs(Number_gc, Person_gc) ),

  noun(n_node, gcs(Number_gc, Person_gc) ).

  OR more simply, if don’t need to enforce a particular shape to gcs(...):

  np(np_node(Det_node, N_node), GCs)

  --> det(Det_node, GCs), noun(n_node, GCs).

• Enforcing subject-NP / VP agreement (NB: doesn’t handle the case GC)  s(s_node(NP_node, VP_node), GCs)

  --> np(NP_node, GCs), vp(VP_node, GCs).

Page 12: Syntax and Processing it: Definite Clause Grammars in Prolog (optional material) John Barnden School of Computer Science University of Birmingham Natural

Grammatical Categories in DCGs, contd• Not enforcing agreement within part of a VP rule:

  vp(vp_node(Verb_node, PP_node), GCs )

  --> verb(Verb_node, GCs), pp(PP_node).

  OR if you needed PP to return some GCs that didn’t matter:

  vp(vp_node(Verb_node, PP_node), GCs )

  --> verb(Verb_node, GCs), pp(PP_node, _ ).

• Exercise: understand and play around with the GC version of the parser linked from Slides page.

• The program can again be run in two ways:  s(ST, GCs, [a, dog, sits, on, a, mat], []).

  phrase(s(ST, GCs), ([a, dog, sits, on, a, mat]).