courtesy costas buch - rpi1 simplifications of context-free grammars
Post on 19-Dec-2015
228 views
TRANSCRIPT
Courtesy Costas Buch - RPI 2
A Substitution Rule
bB
aAB
abBcA
aaAA
aBS
Substitute
Equivalentgrammar
aAB
abbcabBcA
aaAA
abaBS
|
|
bB
Courtesy Costas Buch - RPI 3
A Substitution Rule
EquivalentgrammarabaAcabbcabBcA
aaAA
aaAabaBS
||
||
aAB
abbcabBcA
aaAA
abaBS
|
|
Substitute aAB
Courtesy Costas Buch - RPI 6
Removing Nullable Variables
Example Grammar:
M
aMbM
aMbS
Nullable variable
Courtesy Costas Buch - RPI 14
Remove repeated productions
bbB
aA
aBaAS
|
bbB
aA
aAaBaAS
||
Final grammar
Courtesy Costas Buch - RPI 15
Useless Productions
aAA
AS
S
aSbS
aAaaaaAaAAS
Some derivations never terminate...
Useless Production
Courtesy Costas Buch - RPI 17
In general:
if wxAyS
then variable is usefulA
otherwise, variable is uselessA
)(GLw
contains only terminals
Courtesy Costas Buch - RPI 18
A production is useless if any of its variables is useless
xA
DC
CB
aAA
AS
S
aSbS
Productions
useless
useless
useless
useless
Variables
useless
useless
useless
Courtesy Costas Buch - RPI 20
First: find all variables that can producestrings with only terminals
aCbC
aaB
aA
CAaSS
|| },{ BA
AS
},,{ SBA
Round 1:
Round 2:
Courtesy Costas Buch - RPI 21
Keep only the variablesthat produce terminal symbols:
aCbC
aaB
aA
CAaSS
||
},,{ SBA
aaB
aA
AaSS
|
(the rest variables are useless)
Remove useless productions
Courtesy Costas Buch - RPI 22
Second:Find all variablesreachable from
aaB
aA
AaSS
|
S A B
Use a Dependency Graph
notreachable
S
Courtesy Costas Buch - RPI 23
Keep only the variablesreachable from S
aaB
aA
AaSS
|
aA
AaSS
|
Final Grammar
(the rest variables are useless)
Remove useless productions
Courtesy Costas Buch - RPI 24
Removing All
Step 1: Remove Nullable Variables
Step 2: Remove Unit-Productions
Step 3: Remove Useless Variables
Courtesy Costas Buch - RPI 26
Chomsky Normal Form
Each productions has form:
BCA
variable variable
aAor
terminal
Courtesy Costas Buch - RPI 27
Examples:
bA
SAA
aS
ASS
Not ChomskyNormal Form
aaA
SAA
AASS
ASS
Chomsky Normal Form
Courtesy Costas Buch - RPI 28
Convertion to Chomsky Normal Form
Example:
AcB
aabA
ABaS
Not ChomskyNormal Form
Courtesy Costas Buch - RPI 29
AcB
aabA
ABaS
Introduce variables for terminals:
cT
bT
aT
ATB
TTTA
ABTS
c
b
a
c
baa
a
cba TTT ,,
Courtesy Costas Buch - RPI 30
Introduce intermediate variable:
cT
bT
aT
ATB
TTTA
ABTS
c
b
a
c
baa
a
cT
bT
aT
ATB
TTTA
BTV
AVS
c
b
a
c
baa
a
1
1
1V
Courtesy Costas Buch - RPI 31
Introduce intermediate variable:
cT
bT
aT
ATB
TTV
VTA
BTV
AVS
c
b
a
c
ba
a
a
2
2
1
1
2V
cT
bT
aT
ATB
TTTA
BTV
AVS
c
b
a
c
baa
a
1
1
Courtesy Costas Buch - RPI 32
Final grammar in Chomsky Normal Form:
cT
bT
aT
ATB
TTV
VTA
BTV
AVS
c
b
a
c
ba
a
a
2
2
1
1
AcB
aabA
ABaS
Initial grammar
Courtesy Costas Buch - RPI 33
From any context-free grammar(which doesn’t produce )not in Chomsky Normal Form
we can obtain: An equivalent grammar in Chomsky Normal Form
In general:
Courtesy Costas Buch - RPI 35
Then, for every symbol : a
In productions: replace with a aT
Add production aTa
New variable: aT
Courtesy Costas Buch - RPI 36
Replace any production nCCCA 21
with
nnn CCV
VCV
VCA
12
221
11
New intermediate variables: 221 ,,, nVVV
Courtesy Costas Buch - RPI 37
Theorem:For any context-free grammar(which doesn’t produce )there is an equivalent grammar in Chomsky Normal Form
Courtesy Costas Buch - RPI 38
Observations
• Chomsky normal forms are good for parsing and proving theorems
• It is very easy to find the Chomsky normal form for any context-free grammar
Courtesy Costas Buch - RPI 39
Greinbach Normal Form
All productions have form:
kVVVaA 21
symbol variables
0k
Courtesy Costas Buch - RPI 40
Examples:
bB
bbBaAA
cABS
||
GreinbachNormal Form
aaS
abSbS
Not GreinbachNormal Form
Courtesy Costas Buch - RPI 41
aaS
abSbS
Conversion to Greinbach Normal Form:
bT
aT
aTS
STaTS
b
a
a
bb
GreinbachNormal Form
Courtesy Costas Buch - RPI 42
Theorem:For any context-free grammar(which doesn’t produce ) there is an equivalent grammarin Greinbach Normal Form
Courtesy Costas Buch - RPI 43
Observations
• Greinbach normal forms are very good for parsing
• It is hard to find the Greinbach normal form of any context-free grammar
Courtesy Costas Buch - RPI 45
Compiler
Program
v = 5;if (v>5) x = 12 + v;while (x !=3) { x = x - 3; v = 10;}......
Add v,v,0cmp v,5jmplt ELSETHEN: add x, 12,vELSE:WHILE:cmp x,3...
Machine Code
Courtesy Costas Buch - RPI 47
Lex: a lexical analyzer
• A Lex program recognizes strings
• For each kind of string found the lex program takes an action
Courtesy Costas Buch - RPI 48
Var = 12 + 9;
if (test > 20)
temp = 0;
else
while (a < 20)
temp++;
Lexprogram
Identifier: Var
Operand: =
Integer: 12
Operand: +
Integer: 9
Semicolumn: ;
Keyword: if
Parenthesis: (
Identifier: test
....
Input
Output
Courtesy Costas Buch - RPI 49
In Lex strings are described with regular expressions
“if”“then”
“+”“-”“=“
/* operators */
/* keywords */
Lex programRegular expressions
Courtesy Costas Buch - RPI 50
(0|1|2|3|4|5|6|7|8|9)+ /* integers */
/* identifiers */
Regular expressions
(a|b|..|z|A|B|...|Z)+
Lex program
Courtesy Costas Buch - RPI 53
Each regular expression has an associated action (in C code)
Examples:
\n
Regular expression Action
linenum++;
[a-zA-Z]+ printf(“identifier”);
[0-9]+ prinf(“integer”);
Courtesy Costas Buch - RPI 55
A small lex program
%%
[a-zA-Z]+ printf(“Identifier\n”);
[0-9]+ printf(“Integer\n”);
[ \t\n] ; /*skip spaces*/
Courtesy Costas Buch - RPI 56
1234 test
var 566 78
9800
Input Output
Integer
Identifier
Identifier
Integer
Integer
Integer
Courtesy Costas Buch - RPI 57
%%
[a-zA-Z]+ printf(“Identifier\n”);
[0-9]+ prinf(“Integer\n”);
[ \t] ; /*skip spaces*/
. printf(“Error in line: %d\n”, linenum);
Another program%{ int linenum = 1;%}
\n linenum++;
Courtesy Costas Buch - RPI 58
1234 test
var 566 78
9800 +
temp
Input Output
Integer
Identifier
Identifier
Integer
Integer
Integer
Error in line: 3
Identifier
Courtesy Costas Buch - RPI 59
Lex matches the longest input string
“if”“ifend”
Regular Expressions
Input: ifend if
Matches: “ifend” “if”
Example:
Courtesy Costas Buch - RPI 60
Internal Structure of Lex
Lex
Regular expressions
NFA DFAMinimalDFA
The final states of the DFA areassociated with actions
Courtesy Costas Buch - RPI 63
ParserPROGRAM STMT_LISTSTMT_LIST STMT; STMT_LIST | STMT;STMT EXPR | IF_STMT | WHILE_STMT | { STMT_LIST }
EXPR EXPR + EXPR | EXPR - EXPR | IDIF_STMT if (EXPR) then STMT | if (EXPR) then STMT else STMTWHILE_STMT while (EXPR) do STMT
Courtesy Costas Buch - RPI 64
The parser finds the derivation of a particular input
10 + 2 * 5
Parser
E -> E + E | E * E | INT
E => E + E => E + E * E => 10 + E*E => 10 + 2 * E => 10 + 2 * 5
input
derivation
Courtesy Costas Buch - RPI 65
10
E
2 5
E => E + E => E + E * E => 10 + E*E => 10 + 2 * E => 10 + 2 * 5
derivation
derivation tree
E E
E E
+
*
Courtesy Costas Buch - RPI 66
10
E
2 5
derivation tree
E E
E E
+
*
mult a, 2, 5add b, 10, a
machine code
Courtesy Costas Buch - RPI 70
Exhaustive Search
||| bSaaSbSSS
Phase 1:
S
bSaS
aSbS
SSSaabb
All possible derivations of length 1
Find derivation of
Courtesy Costas Buch - RPI 72
Phase 2
aSbS
SSS
aabb
SSSS
bSaSSSS
aSbSSSS
SSSSSS
Phase 1
abaSbS
abSabaSbS
aaSbbaSbS
aSSbaSbS
||| bSaaSbSSS
Courtesy Costas Buch - RPI 73
Phase 2
SSSS
aSbSSSS
SSSSSS
aaSbbaSbS
aSSbaSbS
Phase 3
aabbaaSbbaSbS
||| bSaaSbSSS
aabb
Courtesy Costas Buch - RPI 74
Final result of exhaustive search
Parser
derivation
S
bSaS
aSbS
SSSinput
aabb
aabbaaSbbaSbS
(top-down parsing)
Courtesy Costas Buch - RPI 75
Time complexity of exhaustive search
Suppose there are no productions of the form
A
BA
Number of phases for string : w ||2 w
Courtesy Costas Buch - RPI 79
Total time needed for string :w
||22 wkkk
Extremely bad!!!
phase 1 phase 2 phase 2|w|
Courtesy Costas Buch - RPI 80
There exist faster algorithmsfor specialized grammars
S-grammar: axA
symbol stringof variables
),( aA appears oncePair
Courtesy Costas Buch - RPI 81
S-grammar example:
cS
bSSS
aSS
abccabcSabSSaSS
Each string has a unique derivation
Courtesy Costas Buch - RPI 82
In the exhaustive search parsingthere is only one choice in each phase
For S-grammars:
Total time for parsing string :w ||w
Time for a phase: 1