the simplest nl applications: text searching and pattern matching

19
The Simplest NL Applications: Text Searching and Pattern Matching Read J & M Chapter 2

Upload: leola

Post on 25-Feb-2016

24 views

Category:

Documents


3 download

DESCRIPTION

The Simplest NL Applications: Text Searching and Pattern Matching. Read J & M Chapter 2. Searching for a Single String Using a Nondeterministic FSM. c o c o n u t. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Simplest NL Applications: Text Searching and Pattern Matching

The Simplest NL Applications: Text Searching and Pattern Matching

Read J & M Chapter 2

Page 2: The Simplest NL Applications: Text Searching and Pattern Matching

Searching for a Single StringUsing a Nondeterministic FSM

c o c o n u t

1 2 3 4 5 6 7 8

Page 3: The Simplest NL Applications: Text Searching and Pattern Matching

Searching for a Single String Using the Boyer Moore Algorithm

Page 4: The Simplest NL Applications: Text Searching and Pattern Matching

Searching for Multiple Strings

c o c o n u t

1 2 3 4 5 6 7 8

o c o s 2 3 4 5 6 l

Example: lococonut

Page 5: The Simplest NL Applications: Text Searching and Pattern Matching

Converting to a Deterministic FSM

c o c o n u t

1 2 3 4 5 6 7 8

o c o s 2 3 4 5 6 l

Page 6: The Simplest NL Applications: Text Searching and Pattern Matching

Regular Expressions

Two different (but related) uses of the term:

•Expressions that define all and only the regular languages

•(aa ab ba bb)*

•Expressions in a useful pattern language

Matching ip addresses:

S!<emphasis> ([0-9]+ (\ . [0-9]+) {3}) </emphasis> !

<inet> $1 </inet>!

Finding doubled words:

\< ([A-Za-z]+) \s+ \1 \>

Page 7: The Simplest NL Applications: Text Searching and Pattern Matching

REs: Syntax and Semantics

Syntax

The regular expressions over an alphabet are all strings over the alphabet {(, ), , , *} that can be obtained as follows:

1. and each member of is a regular expression.

2. If , are regular expressions, then so is .

3. If , are regular expressions, then so is .

4. If is a regular expression, then so is *.

5. If is a regular expression, then so is ().

6. Nothing else is a regular expression.

Page 8: The Simplest NL Applications: Text Searching and Pattern Matching

REs: Syntax and SemanticsRegular expressions define languages via a semantic interpretation function we'll call L:1. L() = and L(a) = {a} for each a 2. If , are regular expressions, then L() = L() L() =

all strings that can be formed by concatenating to somestring from L() some string from L().

3. If , are regular expressions, then L() = L() L()4. If is a regular expression, then L(*) = L()*5. If () is a regular expression, then L( () ) = L()A language is regular if and only if it can be described by a regular expression.Note: L is compositional.

Page 9: The Simplest NL Applications: Text Searching and Pattern Matching

The Importance of Compositionality

What is the meaning of:

Mary cooked the yujutes.

Mary tyroked the yujutes.

Page 10: The Simplest NL Applications: Text Searching and Pattern Matching

Morphological Analysis

•Read J & M Chapter 3

•Recognize words

•Parse words

Page 11: The Simplest NL Applications: Text Searching and Pattern Matching

Morphological Parsing

Goal: to represent the facts declaratively so that a single representation can be used for both recognition and generation.

Note: ^ marks morpheme boundaries. # marks word boundaries.

Page 12: The Simplest NL Applications: Text Searching and Pattern Matching

From Lexical to Intermediate

Note: All the transducers in the book are described as lexical:intermediate, but they can run the other direction.

Page 13: The Simplest NL Applications: Text Searching and Pattern Matching

Where Did reg-noun-stem Come From?

Page 14: The Simplest NL Applications: Text Searching and Pattern Matching

We Can Cascade or Compose

Page 15: The Simplest NL Applications: Text Searching and Pattern Matching

From Intermediate to Surface

For text, we need spelling rules.

x

e / s ^ ___ s #

z

Read this as “Replace as e in the context after the /.

Page 16: The Simplest NL Applications: Text Searching and Pattern Matching

Turning the Rule into a Transducer

foxes

xerox

fox#sat

Page 17: The Simplest NL Applications: Text Searching and Pattern Matching

Disambiguation - Local

Local ambiguities:

asses#

s#luxury

Page 18: The Simplest NL Applications: Text Searching and Pattern Matching

Disambiguation - Harder

Sometimes additional knowledge is necessary:

foxes: fox +N + PL or fox +V +SG

Can we think of nouns that cannot also be verbs?

Page 19: The Simplest NL Applications: Text Searching and Pattern Matching

Search•For FSMs, we can build a deterministic machine.

•In other cases, we will have to search:•Depth-first•Breadth-first – chart parsing

S S VP VP NP PP NP NP V VPR N det N PREP DET NI hit the boy with a bat.