grammar & parsing (syntactic analysis) nlp- week 4

10
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4

Upload: elaine-cook

Post on 17-Dec-2015

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4

GRAMMAR & PARSING (Syntactic Analysis)

NLP- WEEK 4

Page 2: GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4

SYNTACTIC STRUCTURE

To compute the syntactic structure of a sentence, must consider TWO things:– GRAMMAR = a formal specification of the

structures allowable in a language– PARSING Technique = the method of analysing a

sentence to determine itsstructure according to the grammar

Page 3: GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4

TREE Representation

Most common method to re[resent how a sentence is broken into its major subparts & how these subparts are broken up in turn is using a TREE.

Eg: Fatin ate the papaya.

(S (NP (NAME Fatin) ) -----> LIST notation (VP (V ate)

(NP (ART the)

(N papaya) ) ) ).

* Show correspondence Tree structure (Fig 3.1 pg 42, Allen)

Page 4: GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4

Tree Representation : Terminology

Trees = a special form of GRAPH Structures consisting of:

– NODES (eg. Labeled as S, NP)– LINKS (connecting lines/arrows)– ROOT (the node at the top) – (dominates all other nodes)– LEAVES (the nodes at the bottom)– “ a LINK points from a PARENT node to a CHILD node) ‘– Every CHILD node has a UNIQUE PARENT– A PARENT node may point to MANY CHILD codes– An ANCESTOR of a node N is defined as N’s Parent– A node is DOMINATED by its Ancestor node

Page 5: GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4

CONSTRUCT a TREE Structure

To construct a tree structure of a Sentence, one MUST know what Structures are legal for English.

A set of REWRITE Rules: – describes what tree structures are allowable. – Say that certain symbol may be expanded in the tree by a

sequence of other symbols– Example Rule ( Grammar 3.2, Allen pg 42)– Grammars consisting entirely of rules with a single symbol

on the LHS (called the MOTHER) = Context Free Grammars (CFGs).

Page 6: GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4

CFGs

A very important grammars:1. The formalism is powerful enough to describe

most of the structure in Natural languages

2. Yet, It is restricted enough so that efficient parsers can be built to analyze sentences.

Page 7: GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4

Terminology cont.

Symbols that cannot be further decomposed in a grammar = TERMINAL symbols (namely the words)

The other symbols such as S, VP, NP = NON-TERMINAL symbols.

The grammatical symbols such as N, V that describes word categories = LEXICAL symbols

Some words will be listed under multiple categories. Eg: word can would be listed under V and N.

Page 8: GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4

Grammars and Parsing

Grammars have a special symbol called the START symbol ( = S)

A grammar is said to DERIVE a sentence if there is a sequence of rules that allow you to rewrite the start symbol into the sentence.

Page 9: GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4

DERIVATIONS

Two important processes are based on derivations:

1. Sentence Generation – uses derivations to construct legal sentences

2. Parsing – identifies the structure of sentences given a grammar.

Page 10: GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4

SEARCHING TECHNIQUES

– Two basis methods of searching:1. A Top-down Strategy: start with the S symbol and then

searches through different ways to rewrite the symbols until the input sentence is generated; or until all possibilities have been explored.

2. A Bottom-up Strategy : start with the words in the sentence and use the rewrite rules backward to reduce the sequence of symbols until it consists solely of S. The LHS of each rule is used to rewrite the symbol on the RHS