what it’s ? “parsing” parsing or syntactic analysis is the process of analysing a string of...

Post on 12-Jan-2016

226 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Student: Alexandru Iliescu

A unification – based syntactic parser

PART

What it’s ?“parsing”

Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages, according to the rules of a formal grammer. The term parsing comes from Latin pars, meaning part (of speech).

The term has slightly different meanings in different branches of linguistics and computer science. Traditional sentence parsing is often performed as a pedagogical exercise, especially in inflected languages such as the Romance languages or Latin, sometimes with the aid of devices such as sentence diagrams. It usually emphasizes the importance of grammatical divisions such as subject and predicate.

Parsing a computer language

with two levels of grammar:

lexical and syntactic.

The first stage is the token

generation, or lexical analysis,

by which the input character

stream is split into meaningful

symbols defined by a grammar

of regular expressions.

For example, a calculator

program would look at an input

such as "12*(3+4)^2" and split

it into the

tokens 12, *, (, 3, +, 4, ), ^, 2,

each of which is a meaningful

symbol in the context of an

arithmetic expression.

The next stage is parsing or

syntactic analysis, which is

checking that the tokens form

an allowable expression.

D-PART PC-PART

D-PARTD-PART is a development environment

for unification-based grammers on Xerox 1100 series work stations.

The first version of D-PART, was written at the Scandinavian Summer Workshop for Computational Linguistics in Helsinki, Finland, in 1985.

D-PART

This formalism is suitable for

encoding a wide variety of grammers.

D-PART

D-PART consists of four basic parts:

A unification package;

Interpreter for rules and lexical items;

Input/output routines for directed

graphs;

An Earley style chart parser.

D-PARTParsing and Unification

x

y

unify copyz z’

restore x

restore y

The method entails making only one copy, not

two, when the operation succeds. In the event of

failure, D-PART simply restores the original structures

without copying anything.

D-PARTRules

A rule in D-PART is a list of atomic

constituent labels that may be followed by

specifications.

D-PARTRules

Example of a rule:

S -> NP VPIn D-PART notation is written as

(S NP VP)

D-PARTRules

Before a rule is used by the parser, D-

PART compiles it to a feature set. A feature

set can be displayed in different ways – for

example, as a matrix or as a direct graph.

D-PARTLexical Rules

A lexical rule is a special kind of

template with two attributes: in and out.

D-PARTLexical Rules

In applying a lexical rule to a graph, the

latter is first unified with the value of in. If

the operation succeds, the value of out is

passed on as the result.

D-PART

D-PART is not a commercial product. It is

made available to users outside SRI who

might wish to develop unification-based

grammars.

PC-PART

PC-PART is a implementation of PART-II

computational linguistic formalism for

personal computers, available for MS-DOS,

Microsoft Windows, Macintosh and Unix,

and is still under devlopment.

PC-PART

PC –PART has the following parts:

Chart parser;

Unification package;

Interpreter for grammar and lexical

rules;

PC-PART

PC-PATR uses a left corner chart parser

with these characteristics: bottom-up parse with top-down filtering based on

the categories;

left-to-right order-after each word is added to the

chart.

PC-PART

Unification

Unification is the basic operation applied to

feature structures in PC-PATR. It consists of the

merging of the information from two feature

structures. Two feature structures can unify if their

common features have the same values, but do not

unify if any feature values conflict.

PC-PART

Grammar rulesA PC-PATR grammar rule has these parts, in the

order listed:1. the keyword Rule;2. an optional rule identifier enclosed in braces ({});3. the nonterminal symbol to be expanded;4. an arrow (->) or equal sign (=);5. zero or more terminal or nonterminal symbols;6. an optional colon (:);7. zero or more feature constraints;8. an optional period (.).

PC-PART

Grammar rules

The optional rule identifier consists of

one or more words enclosed in braces.

PC-PART

Grammar rules

For example, this rule says that any category in the grammar rules can be replaced by two copies of the same category separated by a CJ.

Rule X -> X_1 CJ X_2 <X cat> = <X_1 cat> <X cat> = <X_2 cat> <X arg1> = <X_1 arg1> <X arg1> = <X_2 arg1>

PC-PART

Lexical rules

A PC-PATR lexical rule has these parts, in the order listed:1. the keyword Define;2. the name of the lexical rule;3. the keyword as;4. the rule definition;5. an optional period (.).

PC-PART

Several people have contributed to the

development of PC-PATR over the past few

years.Alan Buseman, Jim Skon, Bob Kasper,

and Nathan Miles all contributed to an earlier

program named SILPATR that contained the

same basic parsing and unification functions.

Bilbliography:

D-PART: A Development Environment for

Unification-Based Grammars, Lauri Karttunen;

PC-PART Reference Manual, Stephen McConnel;

Internet.

top related