context free grammar. introduction why do we want to learn about context free grammars? used in...
TRANSCRIPT
Context Free Grammar
Introduction
Why do we want to learn about Context Free
Grammars?
Used in many parsers in compilers
Yet another compiler-compiler, yacc
GNU project parser generator, bison
Web application
In describing the formats of XML (eXtensible Markup
Language) documents
Context Free Grammar
G = (V, T, P, S)
V – a set of variables, e.g. {S, A, B, C, D, E}
T – a set of terminals, e.g. {a, b, c}
P – a set of productions rules
In the form of A , where AV, (VT)* e.g. S aB
S is a special variable called the start symbol
CFG – Example 1
Construct a CFG for the following language :
L1={0n1n | n>0}
Thought: If is a string in L1, so is 01, i.e.
0k+11k+1 = 0(0k1k)1
S 01 | 0S1
CFG – Example 2
Construct a CFG for the following language :
L2={0i1j | ij and i, j>0}
Thought: Similar to L1, but at least one more ‘1’ or
at least one more ‘0’
S AC | CB
A 0A | 0
B B1 | 1
C 0C1 |
CFG – Example 3
Determine the language generated by the CFG:
S AS |
A A1 | 0A1 | 01
S generates consecutive ‘A’s
A generates 0i1j where i j
Languages: each block of ‘0’s is followed by at least
as many ‘1’s
Derivation
How a string is generated by a grammar G
Consider the grammar G
S AS |
A A1 | 0A1 | 01
S 0110011 ?
Derivation
A S
S
A SA 1
0 1 0 1A
0 1
S AS
A1S
011S
011AS
0110A1S
0110011S
0110011
S AS
AAS
AA
A0A1
A0011
A10011
0110011
Parse Tree orDerivation Tree Leftmost
DerivationRightmost Derivation
Ambiguity
Each parse tree has one unique leftmost
derivation and one unique rightmost derivation.
A grammar is ambiguous if some strings in it
have more than one parse trees, i.e., it has
more than one leftmost derivations (or more
than one rightmost derivations).
Ambiguity
Consider the following grammar G: S AS | a | b A SS | ab A string generated by this grammar can
have more than one parse trees. Consider the string abb:
ba
A S
S S b
S
A S
a b b
S
Ambiguity
As another example, consider the following grammar:
E E + E | E * E | (E) | x | y | z
There are 2 leftmost derivations for x + y + z: E E + E
x + E x + E + E x + y + E x + y + z
E E + E E + E + E x + E + E x + y + E x + y + z
E + E
x y + z
E
E + E
x y
E
z+
Simplification of CFG
Remove useless variables Generating Variables
Reachable Variables
Remove -productions, e.g. A
Remove unit-productions, e.g. AB
A variable X is useless if:
• X does not generate any string of terminals, or
• The start symbol, S, cannot generate X.
S X
where , (VT)*, X V and T*
Useless Variables
Removal of Non-generating Variables
1 Mark each production of the form:
X where T*
2 Repeat
Mark X where consists of terminals or variables
which are on the left hand side of some marked
productions.
Until no new productions is marked.
3 Remove all unmarked productions.
Example
CFG:
S AB|CAA a
B BC|AB C aB|b
S CA A a
C b
Example
CFG:
S aAa|aBA aS|bDB aBa|bC abb|DDD aDa
S aAa|aB A aS
B aBa|bC abb
Removal of Non-Reachable Variables
1 Mark each production of the form:
S where (VT)*
2 Repeat
Mark X where X appears on the right
hand side of some marked productions.
Until no new productions is marked.
3 Remove all unmarked productions.
Example
CFG:S aAa|aBA aSB aBa|bC abb
S aAa|aB A aS
B aBa|b
Example
CFG:S Aab|ABA aB bD | bBC A | BD a
S Aab|AB A a
B bD | bBD a
Remove Useless Variables
A Bac|bTC|baT aB|TBB aTB|bBCC TBc|aBC|ac
A ba C ac
A ba
Removal of -productions
Step 1: Find nullable variables
Step 2: Remove all -productions by replacing
some productions
How to find nullable variables ?
1 Mark all variables A for which there exists
a production of the form A .
2 Repeat
Mark X for which there exists X where V*
and all symbols in have been marked.
Until no new variables is marked.
Removal of -Productions
We can remove all -productions (except S
if S is nullable) by rewriting some other
productions.
e.g., If X1 and X3 are nullable, we should
replace
A X1X2X3X4 by:
A X1X2X3X4 | X2X3X4 | X1X2X4 | X2X4
Removal of -productions
Example:S A | B | CA aAa | BB bB | bbC aCaa | DD baD | abD |
Step1: nullable variables are D, C and S
Removal of -productions(3)
Step 2:eliminate D by replacing:
D baD | abD into D baD | abD | ba | ab
eliminate C by replacing: C aCaa | D into C aCaa | D | aaa S A | B | C into S A | B | C |
The new grammar: S A | B | C | A aAa | B B bB | bb C aCaa | D | aaa D baD | abD | ba | ab
Example:S A | B | CA aAa | BB bB | bbC aCaa | DD baD | abD |
Example
S ABCDA a B C ED | D BCE b
S ABCD|ABC|ABD|ACD|AB|AC|AD|AA aC ED|ED BC|B|CE b
Removal of Unit Productions
Unit production: A B, where A and B V
1. Detect cycles of unit productions:
A1 A2 A3 … A1
Replaced A1, A2, A3 , …, Ak by any one of them
2. Detect paths of unit productions:
A1 A2 A3 …, Ak , where (VT)*/V
Add productions A1 , A2 , …, Ak and
remove all the unit productions
Removal of Unit Productions
Example:S A | B | C A aa | BB bb | CC cc | A
Cycle: A B C ARemove by replace B,C by
A
S A | A | AA aa | AA bb | AA cc | A
Becomes:S A A aa | bb | cc
Path: S A Remove by adding
productions
S aa | bb | cc
Example
CFG:S → A | Bb A → C | a B → aBa | b C → aSa
S → Bb | aSa |a → A → a | aSa
B → aBa | b C → aSa
Example
bbB
AB
BA
aA
aAS
Substitute
BA
bbB
BAB
aA
aBaAS
|
|
Remove
bbB
BAB
aA
aBaAS
|
|
bbB
AB
aA
aBaAS
|
BB
XX Unit productions of form can be removed immediately
Example
Example
Substitute
ABbbB
aA
aAaBaAS
||
bbB
AB
aA
aBaAS
|
Example
Remove repeated productions
bbB
aA
aBaAS
|
bbB
aA
aAaBaAS
||
Final grammar
Thank You