regular expressions & automata

32
Regular Expressions & Automata Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park

Upload: ayala

Post on 22-Feb-2016

172 views

Category:

Documents


1 download

DESCRIPTION

Regular Expressions & Automata . Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park. Complexity to Computability. Just looked at algorithmic complexity How many steps required At high end of complexity is computability Decidable Undecidable. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Regular Expressions & Automata

Regular Expressions & Automata

Fawzi EmadChau-Wen Tseng

Department of Computer ScienceUniversity of Maryland, College Park

Page 2: Regular Expressions & Automata

Complexity to Computability

Just looked at algorithmic complexityHow many steps required

At high end of complexity is computabilityDecidableUndecidable

Page 3: Regular Expressions & Automata

Complexity to Computability

Approach complexity from different directionLook at simple models of computation

Regular expressionsFinite automataTuring Machines

Page 4: Regular Expressions & Automata

Overview

Regular expressionsNotationPatternsJava support

AutomataLanguagesFinite State MachinesTuring MachinesComputability

Page 5: Regular Expressions & Automata

Regular Expression (RE)

Notation for describing simple string patternsVery useful for text processing

Finding / extracting pattern in textManipulating stringsAutomatically generating web pages

Page 6: Regular Expressions & Automata

Regular Expression

Regular expression is composed ofSymbolsOperators

Concatenation ABUnion A | BClosure A*

Page 7: Regular Expressions & Automata

Definitions

AlphabetSet of symbols Examples {a, b}, {A, B, C}, {a-z,A-Z,0-9}…

StringsSequences of 0 or more symbols from alphabetExamples , “a”, “bb”, “cat”, “caterpillar”…

LanguagesSets of stringsExamples , {}, {“a”}, {“bb”, “cat”}…

empty string

Page 8: Regular Expressions & Automata

More Formally

Regular expression describes a language over an alphabetL(E) is language for regular expression E

Set of strings generated from regular expressionString in language if it matches pattern specified by regular expression

Page 9: Regular Expressions & Automata

Regular Expression Construction

Every symbol is a regular expressionExample “a”

REs can be constructed from other REs usingConcatenationUnion |Closure *

Page 10: Regular Expressions & Automata

Regular Expression Construction

ConcatenationA followed by BL(AB) = { ab | a L(A) AND b L(B) }

Example a

{“a”}ab

{“ab”}

Page 11: Regular Expressions & Automata

Regular Expression Construction

UnionA or BL(A | B) = { a | a L(A) OR a L(B) }

Example a | b

{“a”, “b”}

Page 12: Regular Expressions & Automata

Regular Expression Construction

ClosureZero or more AL(A*) = { a | a = OR a L(A) OR a L(A)L(A)… }

Examplea*

{, “a”, “aa”, “aaa”, “aaaa” …}(ab)*c

{“c”, “abc”, “ababc”, “abababc”…}

Page 13: Regular Expressions & Automata

Regular Expressions in Java

Java supports regular expressions In java.util.regex.*Applies to String class in Java 1.4

Introduces additional specification methodsSimplifies specificationDoes not increase power of regular expressionsCan simulate with concatenation, union, closure

Page 14: Regular Expressions & Automata

Regular Expressions in Java

Concatenationab “ab”(ab)c “abc”

Union ( bar | or square brackets [ ] )a | b “a”, “b”[abc] “a”, “b”, “c”

Closure (star *)(ab)* , “ab”, “abab”, “ababab” …[ab]* , “a”, “b”, “aa”, “ab”, “ba”, “bb” …

Page 15: Regular Expressions & Automata

Regular Expressions in Java

One or more (plus +)(a)+ One or more “a”s

Range (dash –)[a–z] Any lowercase letters[0–9] Any digit

Complement (caret ^ at beginning of RE)[^a] Any symbol except “a”[^a–z] Any symbol except lowercase letters

Page 16: Regular Expressions & Automata

Regular Expressions in Java

PrecedenceHigher precedence operators take effect first

Precedence orderParentheses ( … )Closure a* b+Concatenation abUnion a | bRange [ … ]

Page 17: Regular Expressions & Automata

Regular Expressions in Java

Examplesab+ “ab”, “abb”, “abbb”, “abbbb”…(ab)+ “ab”, “abab”, “ababab”, …ab | cd “ab”, “cd”a(b | c)d “abd”, “acd”[abc]d “ad”, “bd”, “cd”

When in doubt, use parentheses

Page 18: Regular Expressions & Automata

Regular Expressions in Java

Predefined character classes[.] Any character except end of line[\d] Digit: [0-9][\D] Non-digit: [^0-9][\s] Whitespace character: [ \t\n\x0B\f\r][\S] Non-whitespace character: [^\s][\w] Word character: [a-zA-Z_0-9][\W] Non-word character: [^\w]

Page 19: Regular Expressions & Automata

Regular Expressions in Java

Literals using backslash \Need two backslashJava compiler will interpret 1st backslash for String

Examples\\] “]”\\. “.”\\\\ “\”

4 backslashes interpreted as \\ by Java compiler

Page 20: Regular Expressions & Automata

Using Regular Expressions in Java

1. Compile patternimport java.util.regex.*;Pattern p = Pattern.compile("[a-z]+");

2. Create matcher for specific piece of text Matcher m = p.matcher("Now is the time");

3. Search textBoolean found = m.find();

Returns true if pattern is found in text

Page 21: Regular Expressions & Automata

Using Regular Expressions in Java

If pattern is found in textm.group() string found m.start() index of the first character matchedm.end() index after last character matchedm.group() is same as substring(m.start(), m.end())

Calling m.find() againStarts search after end of current pattern matchIf no more matches, return to beginning of string

Page 22: Regular Expressions & Automata

Complete Java ExampleCode

Outputow – is – the – time –

import java.util.regex.*;public class RegexTest { public static void main(String args[]) { Pattern p = Pattern.compile(“[a-z]+”); Matcher m = p.matcher(“Now is the time”); while (m.find()) { System.out.print(m.group() + “ – ”);} } }

Page 23: Regular Expressions & Automata

Language Recognition

Accept string if and only if in language Abstract representation of computationPerforming language recognition can be

SimpleStrings with even number of 1’s

HardStrings representing legal Java programs

Impossible!Strings representing nonterminating Java programs

Page 24: Regular Expressions & Automata

Automata

Simple abstract computersCan be used to recognize languagesFinite state machine

States + transitions

Turing machineStates + transitions + tape

Page 25: Regular Expressions & Automata

Finite State Machine

StatesStartingAcceptingFinite number allowed

TransitionsState to stateLabeled by symbol

L(M) = { w | w ends in a 1}

q1 q2

0

10 1

Start State

Accept Statea

Page 26: Regular Expressions & Automata

Finite State Machine

Operations Move along transitions based on symbol Accept string if ends up in accept stateReject string if ends up in non-accepting state

q1 q2

0

10 1

“011” Accept

“10” Reject

Page 27: Regular Expressions & Automata

Finite State Machine

PropertiesPowerful enough to recognize regular expressionsIn fact, finite state machine regular expression

Languages recognized by

finite state machines

Languages recognized by

regular expressions

1-to-1 mapping

Page 28: Regular Expressions & Automata

Turing Machine

Defined by Alan Turing in 1936Finite state machine + tapeTape

Infinite storageRead / write one symbol at tape headMove tape head one space left / right

Tape Head

… …q1 q2

0

10 1

Page 29: Regular Expressions & Automata

Turing Machine

Allowable actionsRead symbol from current squareWrite symbol to current squareMove tape head leftMove tape head rightGo to next state

Page 30: Regular Expressions & Automata

Turing Machine

* 1 0 0 1 0 *… …

Current State

Current Content

Value to Write

Direction to Move

New state to enter

START * * Left MOVING

MOVING 1 0 Left MOVING

MOVING 0 1 Left MOVING

MOVING * * No move HALT

Tape Head

Page 31: Regular Expressions & Automata

Turing Machine

OperationsRead symbol on current squareSelect action based on symbol & current stateAccept string if in accept stateReject string if halts in non-accepting stateReject string if computation does not terminate

Halting problemIt is undecidable in general whether long-running computations will eventually accept

Page 32: Regular Expressions & Automata

Computability

ComputabilityA language is computable if it can be recognized by some algorithm with finite number of steps

Church-Turing thesisTuring machine can recognize any language computable on any machine

IntuitionTuring machine captures essence of computing

Program (finite state machine) + Memory (tape)