general overview of compiler - sumit...
TRANSCRIPT
-
General Overview of Compiler
Compiler: - It is a complex program by which we convert any high level
programming language (source code) into machine readable code.
Interpreter: - It performs the same task of compiler but in line by line
passion.
Assembler: - It converts assembly level instructions into machine level
instructions as a binary code.
Translator: - It is a program which converts any language into any other
language for synchronization. Compiler is also a translator.
Compiler and its Stages or Phases of Compiler or The structure of a
Compiler
Up to this point we have treated a compiler as a single box that maps a
source program into semantically equivalent target program. If we open up
this box a little, we see that there are two parts to this mapping analysis
and synthesis. These analysis and synthesis parts are also known as the front
end and the back end of the compiler. The analysis part breaks up the
source program into constituent pieces and imposes a grammatical
structure on them. The analysis part also collects information about the
source program and stores it in a data structure called symbol table.
The synthesis part constructs the desired target program from the
intermediate representation and the information in the symbol table. The
analysis part is often called the front end of the compiler and the synthesis
part is the back end of the compiler.
-
1. Lexical analysis
2. Syntax analysis Front end of compiler
3. Semantic analysis
4. Intermediate code generation
5. Code optimization Back end of compiler
6. Code generation
Syntax Analysis
Lexical Analysis
Semantic Analysis
Intermediate Code Generator
Code Optimizer
Code Generator
Code optimizer
Character Stream (Input file)
90
SYMBOL
TABLE
Token stream
Syntax tree
Syntax tree
Intermediate representation
Intermediate representation
Target-machine code
Target-machine code (output file)
ERROR
HANDLER
-
In the above diagram, figure shows the Phases of a compiler. Now we are
going to talk about the general description of all the phases of a compiler.
1. Lexical Analysis:-
It is a scanner which scans input value one by one in left to right manner. It
produces output with entire description of each scanned value
E.g.
Position: = Initial + Rate *60
In this example.
Id1 = Position
:= = Assignment Operator
Id2 = Initial
+ = Addition Operator
Id3 = Rate
* = Multiplication Operator
60 = A number
2. Syntax Analysis (parser):-
This phase validates the syntax of expression. For this purpose, we construct
syntax tree or parser tree.
E.g. Make a tree for the given equation.
C := a + b
Note: - Priority for symbols ( > (*, /) > (+ , -) > = or :=)
-
The above equation is for c := a + b. Now the following equation is
for Position := Initial + Rate * 60
:=
+
*
Position
Initial
Rate 60
:=
+ c
a b
-
3. Semantic Analysis :-
This phase is used to match data types and context of programming
language. It also converts program statement according to target language
(Machine language).
Output of the above tree by the semantic analysis:-
id1 = id2 + id3* 60
4. Intermediate Code Generation : -
In this phase, we construct TAC (Three Address Codes) by using temporary
registers. In TAC, we use maximum of three operands and minimum of two
and we use maximum of two operators including necessary assignment
operators.
:=
+
*
Id1
Id2
Id3 Int to Real 60
-
1) t1 = a + b (TAC condition applies)
2) t2 = a (temporary register)
Example:-
id1 = id2 + id3* 60
t1 = 60.0 t1 = id3
t2 = id3*t1 OR t2 = t1*60.0
t3 = id2+ t2 t3 = id2 + t2
t4 = t3 id1 = t3
5. Code Optimization:-
It is a technique where we modify, alter or re-arrange or minimize
intermediate code sequence for better utilization memory and to increase
speed of execution without changing the meaning of original code.
Example: - t1 = id3 * 60.0, id1 = id2 + t1
6. Code Generation: -
The code generation takes an input as intermediate representation of the
source program and maps it into the target language. If the target language
is machine code, registers or memory locations are selected for each of the
variable used by the program.
For example: Using registers R0 and R1 , the intermediate code given below
might get translated into the machine code.
t1 = id3 * 60.0, id1 = id2 + t1
-
Operation
Name
Operation From
Operation
To
Comments
given
MOV id3 R0 id3 to R0
MUL 60.0 R0 t1 is in R0
MOV id2 R1 id2 is in R1
ADD R0 R1 id1 in R1, R0 is empty
MOV R1 id1 R1 is empty because of
leftmost side
***********************************************************
-
Basic Concepts
1) The scanning work is also known as lexical analysis.
2) Mike Lesk and Shimdit were the inventors of lexical analysis.
3) Output of lexical analysis is also called lexemes.
If c = a + b
Then c, a and b are lexemes.
4) Our eye is the best example of lexical analysis because it first scans the thing and
then identifies it.
5) The program which is used for lexical analysis is called a Lex Program.
6) Tokens are just the collection of lexemes.
C = a + b
Where a, b and c are identifiers
7) The lexemes are of three types:-
1. Static
2. Dynamic
3. Variable
8) Example of white spaces:-
endl, /n, extra etc.
9) Regular expressions are just the predefined writing syntax.
10) Some basic formulas:-
1. a* = Kleens closure = { , a, aa, aaa, aaaa, aaaaa..}
2. a+ = Positive closure = {a, aa, aaa, aaaa, aaaaa,.}
3. (a + b)* = { , a, aa, aaaab, ba, aba.}
Note:- This is called regular set of (a + b)
4. Id = letter(letter/digit)*
Note:- It implies that, first position of any id is always letter.
5. is the sign of not equals in programming language.
***********************************************************
-
Some More Basic Concepts
1) Yacc (Yet Another Compiler Compiler). It is used to help the syntax analysis to
make the tree after lexical analysis.
2) a*, a+, ab, a+b, abb, (a+b)*, a/b, (a,b), aUb are all regular expressions
ab a/b
anb AND Operation a,b
ab aub OR Operation
a+b
a* = Kleens Closure
(a+b)* =Universal closure expression or universal regular expression.
= { , a, aa, aba, bab, abb..}
3) letter(letter/digit)*, it means, we can write:
c12 = a + b or
c12a = a + b
But we cannot write the following one:
21 = a + b
4) Lex is the super scanner of lexical analysis.
5) In the following figure a, b and c are known as lexeme values.
6) yylval stands for yy lexical value.
It means the lexeme value is going directly to yacc.
Install_id( ) Install and forward the id or identifier.
:=
+ c
a b
-
Overview of Finite State Automata
1. It is used in lexical analysis or scanning phase of compiler.
2. It is used to implement statement or regular expression of any programming
language. Finite Automata and regular expressions are acting as foundation of
lexical analysis.
3. Automata:- It is an automatic machine developed or designed by a developer,
programmer or manufacturer to complete any desired task.
E.g. Automobiles, calculators, computers, home appliances, super computers,
microwave technologies, generators etc.
4. Finite Automata or Finite State Machine:-
4.1. A machine which compute finite number of computations is called finite
automata or finite state machine
4.2. Formal Definition:- It consists of five tuples:-
M = {Q, , , q0 , F}
Where
M = Machine
Q = Non empty set of all states.
= {q0, q1, q2, q3,..qn}
= Non empty set of input values.
= {a, b, c, 1, 2, 3, *, /, ( , ) , , }
= Input transition function represented by transition
or by transition diagram.
q0 = Default initial state.
F = Set of final state.
E.g. An example of transition table (Rotation of a fan).
State Switch Rotation
OFF ON 100rpm 100rpm 200rpm
200rpm 300rpm 300rpm OFF
-
Note:- Here rpm is rotation per minute.
4.3. Types of Finite Automata:-
There are two types of Finite Automata:-
1. NFA or NDFA
(Non Finite Automata or Non Deterministic Finite Automata)
2. DFA (Deterministic Finite Automata)
4.4. Technical definition :-
1. DFA:-
(Q X ) Q
2. NFA:-
(Q X ) 2Q
Where 2Q is the power set of all the states (multiple outputs)
E.g. If A = {a, b, c}
Then 2A = {, a, b, c, ab, bc, ca, abc,}
Implementation of Lex Program with DFA
Rules:-
1. For given lex program, check regular definition or regular expressions
associated with it.
2. Construct NFA for each individual regular expressions and define initial
state and final state
3. Give a unique name to each valuable state.
4. Assume a common initial state and connect it with all NFA by using
transition.
5. Draw the empty DFA transition table and initialize it with joint initial
state combination.
6. Construct new output states by checking input values and apply DFA
construction rules accordingly.
-
7. Check input patterns associated with each input state and enter
matched pattern value in patter announced column of DFA table. In
this way, lex program will be implemented by DFA with associated
patterns.
Question:-
%
{ C declaration (empty) }
%
{ regular definition }
a
abb
a*b+
%
{ translation rules (empty)}
%
Solution:-
1) NFA for a
Start aa
2) NFA for abb
start a b b
1
2
3
6 5
4
-
3) NFA for a*b+
Start b
ab
4) Now according to rule number 4, combine all NFAs with the help of .
a
a b b
b
ab
Note: - In the following table, the Pattern announced are the common
ways to reach the input state.
7
8
0
1
2
7
8
3
6 5
4
-
5) Transition table :-
a b Pattern announced
[0137] [247] [8] No pattern
[ 2 4 7 ] [7] [ 5 8 ] a
[ 8 ] - [ 8 ] a*b+
[ 7 ] [7] [ 8 ] a*
[ 5 8 ] - [ 6 8 ] ab
[ 6 8 ] - [ 8 ] abb
***********************************************************
Grammar & Language
1. Language:- It is alpha numeric, symbolic, alphabetic, syntactical way of
representation by which we form some words and by arranging words in a
meaningful sequence we form some sentences. Sentences are helpful to
establish a communication link or interaction between two machines, two
humans and human beings with the machine.
E.g. Programming language (C, C++, and JAVA), frameworks (Dot Net), general
languages (English, Hindi, Urdu, Marathi, and Telugu etc.), interfaces and drivers.
2. Grammar: - Set of rules to define any language so that the communication will
be meaningful.
e.g. #include cant be written as include#.h. Every
programming language has to follow some language and grammar rules.
-
Formal definition of Grammar
It consists of four tuples:-
G = {V, T, P, S}
Where
V = Non empty set of variables or non-terminals.
= {A, B, C, D, EZ}
T = Non empty set of terminals.
= {a, b, c, d, e.z }
P = Non empty set of production rules.
S = Default starting production variable.
We can understand with the following example.
Example:-
S aB/bA
A d
B g
Note: - S aB/bA can also be written as S aB & S bA separately.
Derivation: - Any string value can be derived by any grammar production and it is
known as derivation.
Acceptability: - If any string is generated by starting production variable that the
string is accepted by the grammar.
There are two types of Derivation:-
1. LMD (Left Most Derivation):- It is used in top down parsing approach. To
generate any string, if we open left most non-terminal before other non-
terminal, then, it is an LMD. This technique is based on backtracking.
E.g.:
S ABC
Where A a
B b
C c
-
Then according to the rule of LMD, we can solve the above expression as follows
S ABC
S aBC
S abC
S abc
Note In the given expression S ABC, S is called or derivative part and
ABC is called as or derived part
2. RMD (Right Most Derivation):- It is used in bottom up parsing approach.
Whenever we open right most non terminal before others to generate any string,
then, it is an RMD. This technique is also based on backtracking.
E.g.:
S ABC
Where A a
B b
C c
Then according to the rule of RMD, we can solve the above expression as follows
S ABC
S ABc
S Abc
S abc
Derivation Tree:- Step by step, expansion process of any string can be
expressed by a tree known as derivation or parser tree.
E.g.:
S ABC
Where A a
B b
C c
The derivation tree of the above expression can be made as follows.
-
S
A B C
a b c
Question: - Generate the string for the following:-
(1) id + id * id
(2) (id + id) * id
By the grammar as follows:-
E E + T/T
T T * F/ F
F (E)/ id
Solve the above equations by LMD and RMD.
Solution:-
(1) id + id * id
Solve by LMD :-
First we take the grammar E E + T/T and solve it by LMD
E E + T
E E + T
E T + T
E F + T
E id + T
E id + T * F
E id + F * F
E id + id * F
E id + id * id
E T
E T
E F
E (E)
This case is not possible because here the brackets ( ) are not there in the
string id + id * id. Now, consider the following case.
-
E T
E T * F
E F * F
E id * F
Now, this case is also not possible because of * sign come first here after
id.
Now solve by RMD :-
E E + T
E E + T * F
E E + T * id
E E + F * id
E E + id * id
E T + id * id
E F + id * id
E id + id * id
E T is also not possible here in RMD.
(2) (id + id) * id
Solve by LMD :-
First we take the grammar E E + T/T and solve it by LMD
E E + T
Here the above grammar cant be possible in this case. So, without wasting
our time, we need to go to the further case.
E T
E T
E T * F
E F * F
E (E) * F
E (E + T) * F
E (T + T) * F
E (F + T) * F
E (id + T) * F
E (id + F) * F
E (id + id) * F
-
E (id + id) * id
Solved by RMD :-
E E + T is not possible here.
E T
E T
E T * F
E T * id
E F * id
E (E) * id
E (E + T) * id
E (E + F) * id
E (T + id) * id
E (F + id) * id
E (id + id) * id
If you want to make a derivation tree for the above grammars, then I will make a tree
for you as an example. I am going to make a tree for grammar LMD of E T
which is given as follows:-
LMD of E T
E
T
T * F
F id
(E) + T
T F
F id
id
-
Left recursion: - Whenever any non-terminal produces itself at left most position of
grammar production, then, it is a left recursion.
Example:-
S Sab/b
Note: - S produces itself at leftmost part as indicated here.
Drawbacks of recursion (also known as repetition):-
1. Ambiguous.
2. Repetition.
Example:-
S Sab
S a b
S a b
Format of Left Recursion Technique:-
If A A/
Then it is a single left recursion.
Where A V
(V u T)* (any value)
(V u T)* (any value)
Note: - Consider the following case:-
S S + T/T
A A /
Where S represents A
+ T represents
T represents
-
Multiple Left Recursions: - Whenever any non-terminal produces itself at left most
position of grammar production but in a multiple way, then, it is a multiple left
recursion.
Consider an example as follows:-
Example:-
If A A1/ A2/ A3/.. /An
And A 1 / 2 / 3 / 4 /../n
It can also be written in the following manner:-
E E + T/ E F/ E * T/a/b
A A 1 / A 2 / A 3 /1 /2
Where E = A
+T = 1
F = 2
* T = 3
a = 1
And b = 2
Removal of left recursion:-
1. Single
A A /
The removal formula of the above expression is:-
A B
B B/
2. Multiple
A A /
Then the removal formula of the above expression is:-
A 1 B/ 2 B/ 3 B/ 4 B n B
B 1 B/ 2 B/ 3 B/ 4 B/ n B
Question:- Remove the recursion for the following grammar.
E E + T/T
-
Solution:-
E T B
B +T B/
Note :-
When E E + T
Then E T B
B +T B/
Now we can solve it by another method as follows:-
E E + T (Convert E in this equation into E + T)
E E + T + T (Convert E in this equation into T)
E T + T + T
Now take
E EB (Convert B here into +TB)
E T + TB (Convert B here into +TB)
E T + T + T (Here B is converted into )
Question:- Remove the recursion.
E E + T/E*F/a/b/d
Solution:-
E aB / bB / dB
B +T B / *FB /
Question :- Remove the left recursion.
1. E E + T/T
2. T T * F/F
3. F (E) / id
Solution:-
1. E TB
B +TB /
2. T FB
B *FB/
-
3. No left recursion is there.
Indirect Left Recursion:- Whenever any non-terminal produces itself at many position
of grammar production way (indirectly), then, it is a indirect left recursion.
Example:-
S AA/0 (1)
A SS/1 (2)
Now put equation (2) in (1), then we get
S SSA/0
We have to take the following steps for the removal of these types of recursions. The
steps are as follows:-
1. Reduce left recursion.
2. Removal of it.
Now we are going to apply these steps in the following examples.
Example:-
1) S SSA/1A/0
A SS/1
2) S 1AB/0B
B SAB/
A SS/1
There are some other ways to solve this problem which are given as
follows:-
1) S AA/0
A AAS/0 S/1
2) S AA/0
B 0SB/ 1B
A ASB/
Left Factor: - Whenever any value repeat itself at leftmost position of any grammar
production more than it is left factored value.
Example:-
-
S ab / ac / ad / a / b / g (One common factor i.e. a)
S aBc / aBd / aB / g (Combination of left common factor i.e. aB)
Note: - There can be one or more than one left factors.
Format for Left Factor:-
A 1 / 2 / 3 /.. n /1 /2 /3n
Example:-
1. S aB/aC/d
Then the format is:-
S aA/d
A B / C /
2. S aBD / aBG / aB / a / d
Take aB as left factor.
S aBA/a/d
A D / G /
S aC / d
C BA /
A D / G /
S aA / d
A BD / BG / B /
S aA / d
A BC /
C D / G /
Parsing: - It is the technique where we construct a fixed parser record of any
grammar. By using parser, we can check acceptability or rejection of any string by the
grammar for which we construct parser. It is predictive technique.
Note:- LMD and RMD are non-predictive or with backtracking technique.
-
Classification of Parsing:-
The classification of parsing is given below. Because of lack of space, first I will define
the abbreviations used in the classification and then provide the hierarchical diagram
or classification diagram of parsing. The definitions are:-
SLR (Simple) (Left to right scan) (RMD)
LR (0) (Left to right scan) (RMD) (No look ahead values)
SLR (1) (Simple) (Left to right scan) (RMD) (One entry is permitted in parsing table)
LR (Left to right scan)
CLR (Canonical) (Left to right scan) (RMD)
LR (1) (Left to right scan) (RMD) (One set of look ahead values)
LALR (Look Ahead) (Left to right scan) (RMD)
LALR (1) - (Look Ahead) (Left to right scan) (RMD) (Single entry in the table)
Note:-Canonical = One shape with different names.
LALR = One shape with different names but are merged together to form a single
entity.
Now the classification diagram is given as follows:-
Parsing
Top Down Parser(LMD) Bottom Up Parser(RMD)
With backtracking Without backtracking Shift reduce parsing
Recursive decent parser Non-recursive Operation LR
parser or table procedure
driven parser parser SLR LR LALR
[LL (1)] or or or
LR(0) CLR LALR(1)
or or or
SLR(1) LR(1) Merge
(LR)
-
FIRST & FOLLOW FIRST and FOLLOW:- FIRST It is first terminal value produced by any non-terminal at derived side in all possible ways
If S aB Then FIRST(S) = a
FOLLOW It is also a terminal value which appears after any non-terminal at derived side of grammar production.
S aAd Then FOLLOW (A) = d
Algorithms for FIRST and FOLLOW:- Algorithm for FIRST:- Rules
1. If A is any production, then FIRST (A) =
2. It is a first terminal value produced by any non-terminal in all possible ways, which will be discussed in next lemmas or rules.
3. If A is any production where A V T (V U T)*
or in other words A is derived. is single terminal. can contain any value.
Then FIRST (A) = NOTE In S bD, b and D . Example:-
If S aBCDEFGH Then FIRST (S) = a
4. If A is any production, where contain single non-terminal and never tends to anywhere in the grammar, then:-
FIRST (A) = FIRST ()
-
Example:- If S AB
A aB B d
Then FIRST (S) = FIRST (A) = a 5. If A is any production, where contain single non-terminal and produces
anywhere in the grammar, then:- FIRST (A) = FIRST ()
But, for possibility, we check next to and apply rule 1, 3, 4 and 5. Example:-
S AB A aB/ B d
Non- terminal FIRST
S a, d
A a,
B d
Algorithm for FOLLOW:- 1. A non-terminal for which we calculate FOLLOW value always appears derived
side of production. The terminal value arrived after non terminal will be FOLLOW value of that non-terminal.
2. Add $ in FOLLOW of starting production variable directly. 3. If A B is any production where FOLLOW (B) is to be calculated.
(V U T)* A V T
Then FOLLOW (B) =
Example:- S aBd A aBg B bBe
Then FOLLOW (B) = {d, g, e} 4. If A B, is any production where contain single non-terminal and never
tends to , then:-
-
FOLLOW (B) = FIRST () Example:-
S BA A aB/bA
Then FOLLOW (B) = a, b = FIRST (A) 5. If A B, is any production where contain single non-terminal and
produces , then :- FOLLOW (B) = FIRST () and for or for A b FOLLOW (B) = FOLLOW (A)
Example:- S BA A aB/bA/
Non-terminal FOLLOW
S $
A $
B a, b, $
6. If A B is any production where contains any value, then FOLLOW of B is
totally dependent on FIRST of . Apply rules 3, 4, 5 and 6 accordingly after checking FIRST of . Also check next to is possible. Example:-
If S aBdefgh Then FOLLOW (B) = d If S BAefgh Then FOLLOW (B) = efgh
Question:- Find the FIRST and FOLLOW for the following.
E TE E +TE/ T FT T *FT/ F (E)/ id
Solution:-
Non-terminal FIRST FOLLOW
E {( , id} {$ , ) }
-
E {+ , } {$ , ) }
T {( , id} {+ , $ , ) }
T {* , } {+ , $ , ) }
F {( , id} {* , + , $ , ) }
1. FIRST (F) = FIRST (T) = FIRST (E) = {(, id}. To see why, note that the two
productions for F have bodies that start with these two terminal symbols, id and the left parenthesis. T has only one production, and its body starts with F. Since F does not derive , FIRST (T) must be the same as FIRST (F). The same argument covers FIRST (E).
2. FIRST (E') = {+, }. The reason is that one of the two productions for E' has a body that begins with terminal +, and the other's body is . whenever a nonterminal derives , we place in FIRST for that nonterminal.
3. FIRST (T') = {*, }. The reasoning is analogous to that for FIRST ( E ' )- 4. FOLLOW (E) = FOLLOW (E') = {), $}. Since E is the start symbol, FOLLOW (E) must
contain $. The production body (E) explains why the right parenthesis is in FOLLOW (E). For E', note that this nonterminal appears only at the ends of bodies of E-productions. Thus, FOLLOW (E') must be the same as FOLLOW (E).
5. FOLLOW (T) = FOLLOW (T') = {+, ) , $}. Notice that T appears in bodies only followed by E'. Thus, everything except that is in FIRST (E') must be in FOLLOW (T); that explains the symbol +. However, since FIRST (E') contains , and E' is the entire string following T in the bodies of the E-productions, everything in FOLLOW (E) must also be in FOLLOW (T). That explains the symbols $ and the right parenthesis. As for T', since it appears only at the ends of the T-productions, it must be that FOLLOW (T') = FOLLOW (T).
6. FOLLOW (F) = {+, *,), $}. The reasoning is analogous to that for T in point (5).
***********************************************************
-
LL (1)Parser
Rules:-
1. Remove left recursion or left factor from the given grammar, if available.
2. Calculate FIRST and FOLLOW.
3. Construct LL (1) parsing table according to table construction rules.
4. Check LL (1) parsing table for multiple entries. If found, then, declare the parser
is NOT LL (1) parser.
5. Check the acceptability of string by LL (1) parsing table.
Practice questions for FIRST and FOLLOW:-
Question: - Calculate the FIRST and FOLLOW for the following:-
S CC S cC/d
Solution:-
Non-terminal FIRST FOLLOW
S c, d $
C c, d c, d, $
Question: - Calculate the FIRST and FOLLOW for the following:-
S aAB A aBd/B B bA/
Solution:-
Non-terminal FIRST FOLLOW Depends On
S A $
A a, b, b, $ B
B b, $, d, b
Question: - Calculate the FIRST and FOLLOW for the following:-
S aSD / ABC A BC / bAC B cB / CD / eCf C gBA / hDi / D
-
D jD / Dk / Solution:-
Non-terminal FIRST FOLLOW Depends On
S a, b, c, e, g, h, j, k, $, d
A b, c, e, g, h, j, k, g, h, j, k, c, e, $, d, f, b C
B c, e, g, h, j, k, g, h, j, k, $, d, b, c, e, f A, C
C g, h, j, k, f, j, k, $, d, g, h, c, e, b B, A
D j, k, i, k, f, j, k, $, d, g, h, c, e, b B, C
Note: - Put D as to get the k in the FIRST (D).
Questions for LL (1):- Question: - Make LL (1) for the following grammar:-
E E + T/T
T T *F/ F
F (E)/ id
And strings:-
(3) id + id *id
(4) (id + id) * id
Solution:-
1. Removal of left recursion.
E TE E +TE/ T FT T *FT/ F (E)/ id
2. Calculate FIRST and FOLLOW.
Non-terminal FIRST FOLLOW
E {( , id} {$ , ) }
E {+ , } {$ , ) }
T {( , id} {+ , $ , ) }
T {* , } {+ , $ , ) }
F {( , id} {* , + , $ , ) }
-
3. Arrange the non-terminal row wise & all terminals column wise including $
and excluding .
Non-terminals
Input Symbols + * ( ) id $
E E T E' E T E'
E' E +TE E E
T T FT T FT
T' T T *FT T T
F F (E) F id
Table entry rules (To fill out the above table):-
1) Enter FIRST generating production in row of FIRST generating non-terminal with
column of FIRST terminal value.
2) Enter production in row of derivative with column of FOLLOW of
derivative.
Example:-
If E , then
Answer: - The above grammar and table satisfies the LL(1) grammar.
4. Make the LL (1) parsing table for string id + id *id.
Now what do we do with this table? This table forms one part in a three part data structure. The other two parts are a stack of grammar symbols (E, E', T, T', F, +, *, (, ), int, and $), and an input stream (the expression we want to parse, already tokenized into lexemes by the scanner). We start our stack with the starting non-terminal E here.
Stack Input Action
$E id + id * id $ E TE
$ET id + id * id $ T FT
$ETF id + id * id $ F id
N-T $ ) E E E
-
$ET id id + id * id $ POP id
$ET + id * id $ T
$E + id * id $ E + TE
$ ET + + id * id $ POP +
$ ET id * id $ T FT
$ ETF id * id $ F id
$ ET id id * id $ POP id
$ ET * id $ T *FT
$ ETF * * id $ POP *
$ ETF id $ F id
$ ETid id $ POP id
$ ET $ T
$ E $ E
$ $ ACCEPTED
5. Now, make LL (1) table for (id + id) * id.
Stack Input Action
$E (id + id) * id $ E TE
$ET (id + id) * id $ T FT
$ETF (id + id) * id $ F (E)
$ET ) E ( (id + id) * id $ POP (
$ET ) E id + id) * id $ E TE
$ET ) ET id + id) * id $ T FT
$ET ) ETF id + id) * id $ F id
$ET ) ETid id + id) * id $ POP id
$ET ) ET + id) * id $ T
$ET ) E + id) * id $ E +TE
$ET ) ET+ + id) * id $ POP +
$ET ) ET id) * id $ T FT
$ET ) ETF id) * id $ F id
$ET ) ETid id) * id $ POP id
$ET ) ET ) * id $ T
$ET ) E ) * id $ E
$ET ) ) * id $ POP )
$ET * id $ T *FT
-
$ET F* * id $ POP *
$ET F id $ F id
$ET id id $ POP id
$ET $ T
$E $ E
$ $ ACCEPTED
***********************************************************
Bottom-Up Parsing
SLR/LR (0)/SLR (1)
Rules:-
1. Calculate FIRST and FOLLOW for given grammar.
2. Numbering of productions.
3. Augmentation of grammar. (Initialization)
4. Construction of LR (0) item set.
5. Construction of LR (0) parsing table.
6. Parsing table entries. (SHIFT, REDUCE, GOTO & ACCEPT)
7. Declaration of parser by checking conflict in parsing table.
8. SHIFT or GOTO or GOTO SHIFT graph. (Optional)
9. Parsing of string or acceptability of any string.
Question: - Construct SLR parser for given grammar and check the acceptability of
ccdd.
S CC
C c C / d
Solution:-
1. Calculate FIRST and FOLLOW:-
Non Terminal FIRST FOLLOW
S c,d $
C c, d $, c, d
-
2. Numbering of productions:-
S CC R1
C c C R2
C d R3
3. Augmentation: - The process where we initialize starting production variable by
any auxiliary variable.
Example: - Ignition of matchstick before burn the gas stove. So, ignition is an
augmentation.
S S
Scanning Rule:-
1) Whenever . (Dot) scans any non-terminal then we write all productions of it
with . (Dot).
2) Whenever . scans any terminal then we stop for only such possibilities.
Now:-
I0 ; S S
S CC
C cC
C d
Dot Scanning Rules:-
a. Similar scanning always moves together.
b. At a time, only one scanning movement is possible.
c. For non-terminal, we use GOTO operation and for terminal, we use SHIFT
operation.
d. Whenever any new collection is found, then, declare a new item set name
otherwise refer previous name for it.
Note:-
If
S S
S SC
-
Then after one scanning move.
S S
S SC
Now, move on to the question.
4. Construction of LR (0) item set.
I0 ; S GOTO
S S
I1
I0 ; C GOTO
S cC
C cC
C d
I2
I0 ; c SHIFT
C cC
C c C
C d
I3
I0 ; d SHIFT
C d
I4
I2 ; C GOTO
S CC
I5
I2 ; c SHIFT
I3
-
I2 ; d SHIFT
I4
I3 ; C GOTO
C c C
I6
I3 ; c SHIFT
I3
I3 ; d SHIFT
I4
Construction of parsing table:-
a. Arrange all item sets row wise
b. Arrange all terminals including $ column wise in column of ACTION.
c. Arrange all non-terminals column wise in column of GOTO.
Note: - In following table, S stands for SHIFT moves.
Items ACTION GOTO c d $ S C
I0 S3 S4 1 2
I1 ACCEPT
I2 S3 S4 5
I3 S3 S4 6
I4 R3 R3 R3
I5 R1
I6 R2 R2 R2
Declaration:-
There is no conflict in the table (no dual values in single cell). So, it is an SLR
parser.
-
Acceptability of String by SLR, LR (1) and LALR:- Rules:-
1. Draw three columns for STACK, INPUT and ACTION, and do the following:-
a. Enter the entire input string in input column followed by $.
b. Initialize stack with $ and initial item set number.
c. Check top stack with first input and:-
1) If SHIFT entry is found then PUSH first input in top stack followed by
shifting number, then repeat step (c) for next input with new top stack.
2) If reduce entry is found, then we enter reduction production in ACTION
column. We check derived side of reduction production and we POP
double values compared with derived side of reduction production. After
POP operation, we PUSH derivative of reduction production in top stack.
We check GOTO entry with previous top stack along with new top stack.
Then, we repeat step (c).
3) If, we found ACCEPT entry, then only the string will be accepted.
STACK INPUT ACTION
$0 ccdd$ S3
$0c3 cdd$ S3
$0c3c3 dd$ S4
$0c3c3d4 d$ R3
$0c3c3C6 d$ R2
-
***********************************************************
LR (1) / CLR / LR 1. Numbering of production.
2. Augmentation.
3. Construction of canonical collection of LR (1) item set.
4. Construction of LR (1) parsing table.
5. Fill out parsing table entries.
6. Declaration of parsers by checking conflicts.
7. Construct graph (SHIFT, GOTO and GOTO SHIFT).
8. Acceptability of string.
Look Ahead:-
1. It is a collection of values used for reduce entry.
2. $ is default LOOK AHEAD of augmentation variable.
3. We calculate LOOK AHEADS for each new production in three possible ways.
Question: - Make LR (1) parser for the following grammar.
S CC
C c C / d
Solution:-
1) Numbering
1. S CC R1
2. C cC R2
3. C d R3
2) Augmentation
S .S
$0c3C6 d$ R2
$0C2 d$ S4
$C2d4 $ R3
$0C2C5 $ R1
$051 $ ACCEPTED
-
3) Construction of canonical collection of LR (1) item set.
I0; S S $
S CC $
C cC cd
C d cd
I0; S . GOTO
S S. $
I1
I0 ; C GOTO
S CC $
C cC $
C d $
I2
I0 ; c SHIFT
C cC cd
C c C cd
C d cd
I3
I0 ; d SHIFT
C d cd
I4
I2 ; C GOTO
S CC $
I5
-
I2 ; c SHIFT
C cC $
C c C $
C d $
I6
I2 ; d SHIFT
C d $
I7
I3 ; C GOTO
C c C cd
I8
I3 ; c SHIFT
I3
I3 ; d SHIFT
I4
I6 ; C GOTO
C c C $
I9
I6 ; c SHIFT
I6
I6; d SHIFT
I7
-
4) Construction of LR (1) parsing table.
ITEMS ACTION GOTO c d $ S C I0 S3 S4 1 2
I1 ACCEPTED
I2 S6 S7 5
I3 S3 S4 8
I4 R3 R3
I5 R1
I6 S6 S7 9
I7 R3
I8 R2 R2
I9 R2 ***********************************************************
Question:- Check that the grammar is SLR and LR.
S L = R / R
L *R / id
R L
Solution:-
1) Numbering
1. S L=R R1
2. S R R2
3. L * R R3
4. L id R4
5. R L R5
2) Augmentation
S S
3) Construction of canonical collection of LR (1) item set. LOOK AHEADS
I0 ; S S $
S L=R $
S R $
-
L *R =$
L id =$
R L $
I0 ; S GOTO
S S $
I1
I0 ; L GOTO
S L=R $
R L $
I2
I0 ; R GOTO
S R $
I3
I0 ; * SHIFT
L *R =$
R L =$
L *R =$
L id =$
I4
I0 ; id SHIFT
L id =$
I5
I2 ; = SHIFT
S L=R $
R L $
L *R $
L id $
-
I6
I4 ; R GOTO
L *R =$
I7
I4 ; L GOTO
R L =$
I8
I4 ; * SHIFT
L *R =$
R L =$
L *R =$
L id =$ I4
I4 ; id SHIFT
I5
I6 ; R GOTO
S L=R $
I9
I6 ; L GOTO
R L $
I10
I6 ; * SHIFT
L *R $
R L $
L *R $
L id $
I11
-
I6 ; id SHIFT
L id $
I12
I11 ; R GOTO
L *R $
I13
I11 ; L GOTO
I10
I11 ; * SHIFT I11
I11 ; id SHIFT
I12
4) Construct LR (1) parsing table.
ITEMS ACTION GOTO = * id $ S L R
I0 S4 S5 1 2 3
I1 ACCEPTED
I2 S6 R5
I3 R2
I4 S4 S5 8 7
I5 R4 R4
I6 S11 S12 10 9
I7 R3 R3
I8 R5 R5
I9 R1
I10 R5
I11 S11 S12 10 13
I12 R4
I13 R3 As there is no multiple value in the same cell of the table, the grammar is said to be LR
(1)
-
LALR (Direct Method) Rules:-
1. Numbering of productions.
2. Augmentation.
3. Construction of LALR item set.
4. LALR parsing table.
5. Fill out the table entries.
6. Declaration of parser after checking conflicts.
7. Construction of graph (SHIFT, GOTO and GOTO SHIFT).
8. Acceptability of string.
Note: - There is also an indirect method. We only have to use the indirect method
when the question is asking for both LR (1) and then LALR. An example of this is given
as follows:-
Example of INDIRECT method
Question: - Construct the LR (1) and LALR for the following grammar.
S CC
C c C / d
Solution:-
For LR (1) - See previous method and For LALR
As we have noted that in LR (1) item sets, item I3 = I6 and item I4 = I7 . So, we have to
merge these items and make a single item by combining the equal items as
I3 = I6 = I3, 6
I4 = I7 = I4, 7
Now, construction of LALR table.
ITEMS ACTION GOTO c d $ S C I0 S3, 6 S4, 7 1 2
I1 ACCEPTED
I2 S3, 6 S4, 7 5
I3, 6 S3, 6 S4, 7 8, 9
I4, 7 R3 R3 R3
I5 R1
I8, 9 R2 R2 R2
-
Question:- Construct LR (1) and LALR for the following grammar.
S L=R/R
L *R / id
R L
Solution:-
Hint I4 = I11 = I4, 11
I5 = I12 = I5, 12
I7 = I13 = I7, 13
I8 = I10 = I8, 10 ***********************************************************
Direct Method for LALR Question: - Check that the following grammar is LALR or not.
S CC
C cC / d
Solution:-
1. Numbering of productions.
S CC ...1
C cC ...2
C d ...3
2. Augmentation.
S S
3. Construction of LALR item set.
I0; S S $
S CC $
C cC cd
C d cd
I0 ; S GOTO
S S $
I1
-
I0 ; C GOTO
S CC $
C cC $
C d $
I2
I0 ; c SHIFT
C cC $cd
C c C $cd
C d $cd
I3
I0 ; d SHIFT
C d $cd
I4
I2 ; C GOTO
S CC $
I5
Now merge the LOOK AHEADS of I3, we get the following.
I2 ; c SHIFT
I3
I2 ; d SHIFT
I4
I3 ; C GOTO
C c C $cd
I6
-
I3 ; c SHIFT
I3
I3 ; d SHIFT
I4
4. LALR parsing table with entries.
Is given below-------------- >>>>
ITEMS ACTION GOTO c d $ S C I0 S3 S4 1 2
I1 ACCEPTED
I2 S6 S7 5
I3 S3 S4 6
I4 R3 R3 R3
I5 R1
I6 R2 R2 R2
Question: - Check that the following grammar is LALR or not.
Solution:-
1. Numbering of productions.
6. S L=R R1
7. S R R2
8. L * R R3
9. L id R4
10. R L R5
2. Augmentation
S S
-
3. Construction of canonical collection of LR (1) item set. LOOK AHEADS
I0 ; S S $
S L=R $
S R $
L *R =$
L id =$
R L $
I0 ; S GOTO
S S $
I1
I0 ; L GOTO
S L=R $
R L $
I2
I0 ; R GOTO
S R $
I3
I0 ; * SHIFT
L *R =$
R L =$
L *R =$
L id =$
I4
I0 ; id SHIFT
L id =$
I5
-
I2 ; = SHIFT
S L=R $
R L $
L *R $
L id $
I6
I4 ; R GOTO
L *R =$ I7
I4 ; L GOTO
R L =$
I8
I4 ; * SHIFT
I4
I4 ; id SHIFT
I5
I6 ; R GOTO
S L=R $
I9
I6 ; L GOTO
I8
I6 ; * GOTO
I4
-
I6 ; id SHIFT
L id $
I5
4. LALR parsing table with entries.
ITEMS ACTION GOTO id * = $ S L R I0 S5, 12 S4, 11 1 2 3
I1 ACCEPTED
I2 S6 R5
I3 R2
I4, 11 S5, 12 S4, 11 8, 10 7, 13
I5, 12 R4 R4
I6 S5, 12 S4, 11 8, 10 9
I7, 13 R3 R3
I8, 12 R5 R5
I9 R1
***********************************************************