courtesy costas buch - rpi1 simplifications of context-free grammars

83
Courtesy Costas Buch - RP I 1 Simplifications of Context-Free Grammars

Post on 19-Dec-2015

228 views

Category:

Documents


4 download

TRANSCRIPT

Courtesy Costas Buch - RPI 1

Simplifications of

Context-Free Grammars

Courtesy Costas Buch - RPI 2

A Substitution Rule

bB

aAB

abBcA

aaAA

aBS

Substitute

Equivalentgrammar

aAB

abbcabBcA

aaAA

abaBS

|

|

bB

Courtesy Costas Buch - RPI 3

A Substitution Rule

EquivalentgrammarabaAcabbcabBcA

aaAA

aaAabaBS

||

||

aAB

abbcabBcA

aaAA

abaBS

|

|

Substitute aAB

Courtesy Costas Buch - RPI 4

In general:

1yB

xBzA

Substitute

zxyxBzA 1|equivalentgrammar

1yB

Courtesy Costas Buch - RPI 5

Nullable Variables

:production A

Nullable Variable: A

Courtesy Costas Buch - RPI 6

Removing Nullable Variables

Example Grammar:

M

aMbM

aMbS

Nullable variable

Courtesy Costas Buch - RPI 7

M

M

aMbM

aMbSSubstitute

abM

aMbM

abS

aMbS

Final Grammar

Courtesy Costas Buch - RPI 8

Unit-Productions

BAUnit Production:

(a single variable in both sides)

Courtesy Costas Buch - RPI 9

Removing Unit Productions

Observation:

AA

Is removed immediately

Courtesy Costas Buch - RPI 10

Example Grammar:

bbB

AB

BA

aA

aAS

Courtesy Costas Buch - RPI 11

bbB

AB

BA

aA

aAS

SubstituteBA

bbB

BAB

aA

aBaAS

|

|

Courtesy Costas Buch - RPI 12

Remove

bbB

BAB

aA

aBaAS

|

|

bbB

AB

aA

aBaAS

|

BB

Courtesy Costas Buch - RPI 13

SubstituteAB

bbB

aA

aAaBaAS

||

bbB

AB

aA

aBaAS

|

Courtesy Costas Buch - RPI 14

Remove repeated productions

bbB

aA

aBaAS

|

bbB

aA

aAaBaAS

||

Final grammar

Courtesy Costas Buch - RPI 15

Useless Productions

aAA

AS

S

aSbS

aAaaaaAaAAS

Some derivations never terminate...

Useless Production

Courtesy Costas Buch - RPI 16

bAB

A

aAA

AS

Another grammar:

Not reachable from S

Useless Production

Courtesy Costas Buch - RPI 17

In general:

if wxAyS

then variable is usefulA

otherwise, variable is uselessA

)(GLw

contains only terminals

Courtesy Costas Buch - RPI 18

A production is useless if any of its variables is useless

xA

DC

CB

aAA

AS

S

aSbS

Productions

useless

useless

useless

useless

Variables

useless

useless

useless

Courtesy Costas Buch - RPI 19

Removing Useless Productions

Example Grammar:

aCbC

aaB

aA

CAaSS

||

Courtesy Costas Buch - RPI 20

First: find all variables that can producestrings with only terminals

aCbC

aaB

aA

CAaSS

|| },{ BA

AS

},,{ SBA

Round 1:

Round 2:

Courtesy Costas Buch - RPI 21

Keep only the variablesthat produce terminal symbols:

aCbC

aaB

aA

CAaSS

||

},,{ SBA

aaB

aA

AaSS

|

(the rest variables are useless)

Remove useless productions

Courtesy Costas Buch - RPI 22

Second:Find all variablesreachable from

aaB

aA

AaSS

|

S A B

Use a Dependency Graph

notreachable

S

Courtesy Costas Buch - RPI 23

Keep only the variablesreachable from S

aaB

aA

AaSS

|

aA

AaSS

|

Final Grammar

(the rest variables are useless)

Remove useless productions

Courtesy Costas Buch - RPI 24

Removing All

Step 1: Remove Nullable Variables

Step 2: Remove Unit-Productions

Step 3: Remove Useless Variables

Courtesy Costas Buch - RPI 25

Normal Formsfor

Context-free Grammars

Courtesy Costas Buch - RPI 26

Chomsky Normal Form

Each productions has form:

BCA

variable variable

aAor

terminal

Courtesy Costas Buch - RPI 27

Examples:

bA

SAA

aS

ASS

Not ChomskyNormal Form

aaA

SAA

AASS

ASS

Chomsky Normal Form

Courtesy Costas Buch - RPI 28

Convertion to Chomsky Normal Form

Example:

AcB

aabA

ABaS

Not ChomskyNormal Form

Courtesy Costas Buch - RPI 29

AcB

aabA

ABaS

Introduce variables for terminals:

cT

bT

aT

ATB

TTTA

ABTS

c

b

a

c

baa

a

cba TTT ,,

Courtesy Costas Buch - RPI 30

Introduce intermediate variable:

cT

bT

aT

ATB

TTTA

ABTS

c

b

a

c

baa

a

cT

bT

aT

ATB

TTTA

BTV

AVS

c

b

a

c

baa

a

1

1

1V

Courtesy Costas Buch - RPI 31

Introduce intermediate variable:

cT

bT

aT

ATB

TTV

VTA

BTV

AVS

c

b

a

c

ba

a

a

2

2

1

1

2V

cT

bT

aT

ATB

TTTA

BTV

AVS

c

b

a

c

baa

a

1

1

Courtesy Costas Buch - RPI 32

Final grammar in Chomsky Normal Form:

cT

bT

aT

ATB

TTV

VTA

BTV

AVS

c

b

a

c

ba

a

a

2

2

1

1

AcB

aabA

ABaS

Initial grammar

Courtesy Costas Buch - RPI 33

From any context-free grammar(which doesn’t produce )not in Chomsky Normal Form

we can obtain: An equivalent grammar in Chomsky Normal Form

In general:

Courtesy Costas Buch - RPI 34

The Procedure

First remove:

Nullable variables

Unit productions

Courtesy Costas Buch - RPI 35

Then, for every symbol : a

In productions: replace with a aT

Add production aTa

New variable: aT

Courtesy Costas Buch - RPI 36

Replace any production nCCCA 21

with

nnn CCV

VCV

VCA

12

221

11

New intermediate variables: 221 ,,, nVVV

Courtesy Costas Buch - RPI 37

Theorem:For any context-free grammar(which doesn’t produce )there is an equivalent grammar in Chomsky Normal Form

Courtesy Costas Buch - RPI 38

Observations

• Chomsky normal forms are good for parsing and proving theorems

• It is very easy to find the Chomsky normal form for any context-free grammar

Courtesy Costas Buch - RPI 39

Greinbach Normal Form

All productions have form:

kVVVaA 21

symbol variables

0k

Courtesy Costas Buch - RPI 40

Examples:

bB

bbBaAA

cABS

||

GreinbachNormal Form

aaS

abSbS

Not GreinbachNormal Form

Courtesy Costas Buch - RPI 41

aaS

abSbS

Conversion to Greinbach Normal Form:

bT

aT

aTS

STaTS

b

a

a

bb

GreinbachNormal Form

Courtesy Costas Buch - RPI 42

Theorem:For any context-free grammar(which doesn’t produce ) there is an equivalent grammarin Greinbach Normal Form

Courtesy Costas Buch - RPI 43

Observations

• Greinbach normal forms are very good for parsing

• It is hard to find the Greinbach normal form of any context-free grammar

Courtesy Costas Buch - RPI 44

Compilers

Courtesy Costas Buch - RPI 45

Compiler

Program

v = 5;if (v>5) x = 12 + v;while (x !=3) { x = x - 3; v = 10;}......

Add v,v,0cmp v,5jmplt ELSETHEN: add x, 12,vELSE:WHILE:cmp x,3...

Machine Code

Courtesy Costas Buch - RPI 46

Lex

Courtesy Costas Buch - RPI 47

Lex: a lexical analyzer

• A Lex program recognizes strings

• For each kind of string found the lex program takes an action

Courtesy Costas Buch - RPI 48

Var = 12 + 9;

if (test > 20)

temp = 0;

else

while (a < 20)

temp++;

Lexprogram

Identifier: Var

Operand: =

Integer: 12

Operand: +

Integer: 9

Semicolumn: ;

Keyword: if

Parenthesis: (

Identifier: test

....

Input

Output

Courtesy Costas Buch - RPI 49

In Lex strings are described with regular expressions

“if”“then”

“+”“-”“=“

/* operators */

/* keywords */

Lex programRegular expressions

Courtesy Costas Buch - RPI 50

(0|1|2|3|4|5|6|7|8|9)+ /* integers */

/* identifiers */

Regular expressions

(a|b|..|z|A|B|...|Z)+

Lex program

Courtesy Costas Buch - RPI 51

integers

[0-9]+(0|1|2|3|4|5|6|7|8|9)+

Courtesy Costas Buch - RPI 52

(a|b|..|z|A|B|...|Z)+ [a-zA-Z]+

identifiers

Courtesy Costas Buch - RPI 53

Each regular expression has an associated action (in C code)

Examples:

\n

Regular expression Action

linenum++;

[a-zA-Z]+ printf(“identifier”);

[0-9]+ prinf(“integer”);

Courtesy Costas Buch - RPI 54

Default action: ECHO;

Prints the string identifiedto the output

Courtesy Costas Buch - RPI 55

A small lex program

%%

[a-zA-Z]+ printf(“Identifier\n”);

[0-9]+ printf(“Integer\n”);

[ \t\n] ; /*skip spaces*/

Courtesy Costas Buch - RPI 56

1234 test

var 566 78

9800

Input Output

Integer

Identifier

Identifier

Integer

Integer

Integer

Courtesy Costas Buch - RPI 57

%%

[a-zA-Z]+ printf(“Identifier\n”);

[0-9]+ prinf(“Integer\n”);

[ \t] ; /*skip spaces*/

. printf(“Error in line: %d\n”, linenum);

Another program%{ int linenum = 1;%}

\n linenum++;

Courtesy Costas Buch - RPI 58

1234 test

var 566 78

9800 +

temp

Input Output

Integer

Identifier

Identifier

Integer

Integer

Integer

Error in line: 3

Identifier

Courtesy Costas Buch - RPI 59

Lex matches the longest input string

“if”“ifend”

Regular Expressions

Input: ifend if

Matches: “ifend” “if”

Example:

Courtesy Costas Buch - RPI 60

Internal Structure of Lex

Lex

Regular expressions

NFA DFAMinimalDFA

The final states of the DFA areassociated with actions

Courtesy Costas Buch - RPI 61

Lexicalanalyzer parser

Compiler

program machinecode

input output

Courtesy Costas Buch - RPI 62

A parser knows the grammarof the programming language

Courtesy Costas Buch - RPI 63

ParserPROGRAM STMT_LISTSTMT_LIST STMT; STMT_LIST | STMT;STMT EXPR | IF_STMT | WHILE_STMT | { STMT_LIST }

EXPR EXPR + EXPR | EXPR - EXPR | IDIF_STMT if (EXPR) then STMT | if (EXPR) then STMT else STMTWHILE_STMT while (EXPR) do STMT

Courtesy Costas Buch - RPI 64

The parser finds the derivation of a particular input

10 + 2 * 5

Parser

E -> E + E | E * E | INT

E => E + E => E + E * E => 10 + E*E => 10 + 2 * E => 10 + 2 * 5

input

derivation

Courtesy Costas Buch - RPI 65

10

E

2 5

E => E + E => E + E * E => 10 + E*E => 10 + 2 * E => 10 + 2 * 5

derivation

derivation tree

E E

E E

+

*

Courtesy Costas Buch - RPI 66

10

E

2 5

derivation tree

E E

E E

+

*

mult a, 2, 5add b, 10, a

machine code

Courtesy Costas Buch - RPI 67

Parsing

Courtesy Costas Buch - RPI 68

grammar

Parserinputstring

derivation

Courtesy Costas Buch - RPI 69

Example:

Parser

derivation

S

bSaS

aSbS

SSSinput

?aabb

Courtesy Costas Buch - RPI 70

Exhaustive Search

||| bSaaSbSSS

Phase 1:

S

bSaS

aSbS

SSSaabb

All possible derivations of length 1

Find derivation of

Courtesy Costas Buch - RPI 71

S

bSaS

aSbS

SSS aabb

Courtesy Costas Buch - RPI 72

Phase 2

aSbS

SSS

aabb

SSSS

bSaSSSS

aSbSSSS

SSSSSS

Phase 1

abaSbS

abSabaSbS

aaSbbaSbS

aSSbaSbS

||| bSaaSbSSS

Courtesy Costas Buch - RPI 73

Phase 2

SSSS

aSbSSSS

SSSSSS

aaSbbaSbS

aSSbaSbS

Phase 3

aabbaaSbbaSbS

||| bSaaSbSSS

aabb

Courtesy Costas Buch - RPI 74

Final result of exhaustive search

Parser

derivation

S

bSaS

aSbS

SSSinput

aabb

aabbaaSbbaSbS

(top-down parsing)

Courtesy Costas Buch - RPI 75

Time complexity of exhaustive search

Suppose there are no productions of the form

A

BA

Number of phases for string : w ||2 w

Courtesy Costas Buch - RPI 76

Time for phase 1: k

k possible derivations

For grammar with rules k

Courtesy Costas Buch - RPI 77

Time for phase 2: 2k

possible derivations2k

Courtesy Costas Buch - RPI 78

Time for phase : ||2wk

possible derivations||2wk

||2 w

Courtesy Costas Buch - RPI 79

Total time needed for string :w

||22 wkkk

Extremely bad!!!

phase 1 phase 2 phase 2|w|

Courtesy Costas Buch - RPI 80

There exist faster algorithmsfor specialized grammars

S-grammar: axA

symbol stringof variables

),( aA appears oncePair

Courtesy Costas Buch - RPI 81

S-grammar example:

cS

bSSS

aSS

abccabcSabSSaSS

Each string has a unique derivation

Courtesy Costas Buch - RPI 82

In the exhaustive search parsingthere is only one choice in each phase

For S-grammars:

Total time for parsing string :w ||w

Time for a phase: 1

Courtesy Costas Buch - RPI 83

For general context-free grammars:

There exists a parsing algorithmthat parses a stringin time

||w3||w