general overview of compiler - sumit...

General Overview of Compiler

Compiler: - It is a complex program by which we convert any high level

programming language (source code) into machine readable code.

Interpreter: - It performs the same task of compiler but in line by line

passion.

Assembler: - It converts assembly level instructions into machine level

instructions as a binary code.

Translator: - It is a program which converts any language into any other

language for synchronization. Compiler is also a translator.

Compiler and its Stages or Phases of Compiler or The structure of a

Compiler

Up to this point we have treated a compiler as a single box that maps a

source program into semantically equivalent target program. If we open up

this box a little, we see that there are two parts to this mapping analysis

and synthesis. These analysis and synthesis parts are also known as the front

end and the back end of the compiler. The analysis part breaks up the

source program into constituent pieces and imposes a grammatical

structure on them. The analysis part also collects information about the

source program and stores it in a data structure called symbol table.

The synthesis part constructs the desired target program from the

intermediate representation and the information in the symbol table. The

analysis part is often called the front end of the compiler and the synthesis

part is the back end of the compiler.

1. Lexical analysis

2. Syntax analysis Front end of compiler

3. Semantic analysis

4. Intermediate code generation

5. Code optimization Back end of compiler

6. Code generation

Syntax Analysis

Lexical Analysis

Semantic Analysis

Intermediate Code Generator

Code Optimizer

Code Generator

Code optimizer

Character Stream (Input file)

90

SYMBOL

TABLE

Token stream

Syntax tree

Syntax tree

Intermediate representation

Intermediate representation

Target-machine code

Target-machine code (output file)

ERROR

HANDLER

In the above diagram, figure shows the Phases of a compiler. Now we are

going to talk about the general description of all the phases of a compiler.

1. Lexical Analysis:-

It is a scanner which scans input value one by one in left to right manner. It

produces output with entire description of each scanned value

E.g.

Position: = Initial + Rate *60

In this example.

Id1 = Position

:= = Assignment Operator

Id2 = Initial

+ = Addition Operator

Id3 = Rate

* = Multiplication Operator

60 = A number

2. Syntax Analysis (parser):-

This phase validates the syntax of expression. For this purpose, we construct

syntax tree or parser tree.

E.g. Make a tree for the given equation.

C := a + b

Note: - Priority for symbols ( > (*, /) > (+ , -) > = or :=)

The above equation is for c := a + b. Now the following equation is

for Position := Initial + Rate * 60

:=

+

*

Position

Initial

Rate 60

:=

+ c

a b

3. Semantic Analysis :-

This phase is used to match data types and context of programming

language. It also converts program statement according to target language

(Machine language).

Output of the above tree by the semantic analysis:-

id1 = id2 + id3* 60

4. Intermediate Code Generation : -

In this phase, we construct TAC (Three Address Codes) by using temporary

registers. In TAC, we use maximum of three operands and minimum of two

and we use maximum of two operators including necessary assignment

operators.

:=

+

*

Id1

Id2

Id3 Int to Real 60

1) t1 = a + b (TAC condition applies)

2) t2 = a (temporary register)

Example:-

id1 = id2 + id3* 60

t1 = 60.0 t1 = id3

t2 = id3*t1 OR t2 = t1*60.0

t3 = id2+ t2 t3 = id2 + t2

t4 = t3 id1 = t3

5. Code Optimization:-

It is a technique where we modify, alter or re-arrange or minimize

intermediate code sequence for better utilization memory and to increase

speed of execution without changing the meaning of original code.

Example: - t1 = id3 * 60.0, id1 = id2 + t1

6. Code Generation: -

The code generation takes an input as intermediate representation of the

source program and maps it into the target language. If the target language

is machine code, registers or memory locations are selected for each of the

variable used by the program.

For example: Using registers R0 and R1 , the intermediate code given below

might get translated into the machine code.

t1 = id3 * 60.0, id1 = id2 + t1

Operation

Name

Operation From

Operation

To

Comments

given

MOV id3 R0 id3 to R0

MUL 60.0 R0 t1 is in R0

MOV id2 R1 id2 is in R1

ADD R0 R1 id1 in R1, R0 is empty

MOV R1 id1 R1 is empty because of

leftmost side

***********************************************************

Basic Concepts

1) The scanning work is also known as lexical analysis.

2) Mike Lesk and Shimdit were the inventors of lexical analysis.

3) Output of lexical analysis is also called lexemes.

If c = a + b

Then c, a and b are lexemes.

4) Our eye is the best example of lexical analysis because it first scans the thing and

then identifies it.

5) The program which is used for lexical analysis is called a Lex Program.

6) Tokens are just the collection of lexemes.

C = a + b

Where a, b and c are identifiers

7) The lexemes are of three types:-

1. Static

2. Dynamic

3. Variable

8) Example of white spaces:-

endl, /n, extra etc.

9) Regular expressions are just the predefined writing syntax.

10) Some basic formulas:-

1. a* = Kleens closure = { , a, aa, aaa, aaaa, aaaaa..}

2. a+ = Positive closure = {a, aa, aaa, aaaa, aaaaa,.}

3. (a + b)* = { , a, aa, aaaab, ba, aba.}

Note:- This is called regular set of (a + b)

4. Id = letter(letter/digit)*

Note:- It implies that, first position of any id is always letter.

5. is the sign of not equals in programming language.

***********************************************************

Some More Basic Concepts

1) Yacc (Yet Another Compiler Compiler). It is used to help the syntax analysis to

make the tree after lexical analysis.

2) a*, a+, ab, a+b, abb, (a+b)*, a/b, (a,b), aUb are all regular expressions

ab a/b

anb AND Operation a,b

ab aub OR Operation

a+b

a* = Kleens Closure

(a+b)* =Universal closure expression or universal regular expression.

= { , a, aa, aba, bab, abb..}

3) letter(letter/digit)*, it means, we can write:

c12 = a + b or

c12a = a + b

But we cannot write the following one:

21 = a + b

4) Lex is the super scanner of lexical analysis.

5) In the following figure a, b and c are known as lexeme values.

6) yylval stands for yy lexical value.

It means the lexeme value is going directly to yacc.

Install_id( ) Install and forward the id or identifier.

:=

+ c

a b

Overview of Finite State Automata

1. It is used in lexical analysis or scanning phase of compiler.

2. It is used to implement statement or regular expression of any programming

language. Finite Automata and regular expressions are acting as foundation of

lexical analysis.

3. Automata:- It is an automatic machine developed or designed by a developer,

programmer or manufacturer to complete any desired task.

E.g. Automobiles, calculators, computers, home appliances, super computers,

microwave technologies, generators etc.

4. Finite Automata or Finite State Machine:-

4.1. A machine which compute finite number of computations is called finite

automata or finite state machine

4.2. Formal Definition:- It consists of five tuples:-

M = {Q, , , q0 , F}

Where

M = Machine

Q = Non empty set of all states.

= {q0, q1, q2, q3,..qn}

= Non empty set of input values.

= {a, b, c, 1, 2, 3, *, /, ( , ) , , }

= Input transition function represented by transition

or by transition diagram.

q0 = Default initial state.

F = Set of final state.

E.g. An example of transition table (Rotation of a fan).

State Switch Rotation

OFF ON 100rpm 100rpm 200rpm

200rpm 300rpm 300rpm OFF

Note:- Here rpm is rotation per minute.

4.3. Types of Finite Automata:-

There are two types of Finite Automata:-

1. NFA or NDFA

(Non Finite Automata or Non Deterministic Finite Automata)

2. DFA (Deterministic Finite Automata)

4.4. Technical definition :-

1. DFA:-

(Q X ) Q

2. NFA:-

(Q X ) 2Q

Where 2Q is the power set of all the states (multiple outputs)

E.g. If A = {a, b, c}

Then 2A = {, a, b, c, ab, bc, ca, abc,}

Implementation of Lex Program with DFA

Rules:-

1. For given lex program, check regular definition or regular expressions

associated with it.

2. Construct NFA for each individual regular expressions and define initial

state and final state

3. Give a unique name to each valuable state.

4. Assume a common initial state and connect it with all NFA by using

transition.

5. Draw the empty DFA transition table and initialize it with joint initial

state combination.

6. Construct new output states by checking input values and apply DFA

construction rules accordingly.

7. Check input patterns associated with each input state and enter

matched pattern value in patter announced column of DFA table. In

this way, lex program will be implemented by DFA with associated

patterns.

Question:-

%

{ C declaration (empty) }

%

{ regular definition }

a

abb

a*b+

%

{ translation rules (empty)}

%

Solution:-

1) NFA for a

Start aa

2) NFA for abb

start a b b

1

2

3

6 5

4

3) NFA for a*b+

Start b

ab

4) Now according to rule number 4, combine all NFAs with the help of .

a

a b b

b

ab

Note: - In the following table, the Pattern announced are the common

ways to reach the input state.

7

8

0

1

2

7

8

3

6 5

4

5) Transition table :-

a b Pattern announced

[0137] [247] [8] No pattern

[ 2 4 7 ] [7] [ 5 8 ] a

[ 8 ] - [ 8 ] a*b+

[ 7 ] [7] [ 8 ] a*

[ 5 8 ] - [ 6 8 ] ab

[ 6 8 ] - [ 8 ] abb

***********************************************************

Grammar & Language

1. Language:- It is alpha numeric, symbolic, alphabetic, syntactical way of

representation by which we form some words and by arranging words in a

meaningful sequence we form some sentences. Sentences are helpful to

establish a communication link or interaction between two machines, two

humans and human beings with the machine.

E.g. Programming language (C, C++, and JAVA), frameworks (Dot Net), general

languages (English, Hindi, Urdu, Marathi, and Telugu etc.), interfaces and drivers.

2. Grammar: - Set of rules to define any language so that the communication will

be meaningful.

e.g. #include cant be written as include#.h. Every

programming language has to follow some language and grammar rules.

Formal definition of Grammar

It consists of four tuples:-

G = {V, T, P, S}

Where

V = Non empty set of variables or non-terminals.

= {A, B, C, D, EZ}

T = Non empty set of terminals.

= {a, b, c, d, e.z }

P = Non empty set of production rules.

S = Default starting production variable.

We can understand with the following example.

Example:-

S aB/bA

A d

B g

Note: - S aB/bA can also be written as S aB & S bA separately.

Derivation: - Any string value can be derived by any grammar production and it is

known as derivation.

Acceptability: - If any string is generated by starting production variable that the

string is accepted by the grammar.

There are two types of Derivation:-

1. LMD (Left Most Derivation):- It is used in top down parsing approach. To

generate any string, if we open left most non-terminal before other non-

terminal, then, it is an LMD. This technique is based on backtracking.

E.g.:

S ABC

Where A a

B b

C c

Then according to the rule of LMD, we can solve the above expression as follows

S ABC

S aBC

S abC

S abc

Note In the given expression S ABC, S is called or derivative part and

ABC is called as or derived part

2. RMD (Right Most Derivation):- It is used in bottom up parsing approach.

Whenever we open right most non terminal before others to generate any string,

then, it is an RMD. This technique is also based on backtracking.

E.g.:

S ABC

Where A a

B b

C c

Then according to the rule of RMD, we can solve the above expression as follows

S ABC

S ABc

S Abc

S abc

Derivation Tree:- Step by step, expansion process of any string can be

expressed by a tree known as derivation or parser tree.

E.g.:

S ABC

Where A a

B b

C c

The derivation tree of the above expression can be made as follows.

S

A B C

a b c

Question: - Generate the string for the following:-

(1) id + id * id

(2) (id + id) * id

By the grammar as follows:-

E E + T/T

T T * F/ F

F (E)/ id

Solve the above equations by LMD and RMD.

Solution:-

(1) id + id * id

Solve by LMD :-

First we take the grammar E E + T/T and solve it by LMD

E E + T

E E + T

E T + T

E F + T

E id + T

E id + T * F

E id + F * F

E id + id * F

E id + id * id

E T

E T

E F

E (E)

This case is not possible because here the brackets ( ) are not there in the

string id + id * id. Now, consider the following case.

E T

E T * F

E F * F

E id * F

Now, this case is also not possible because of * sign come first here after

id.

Now solve by RMD :-

E E + T

E E + T * F

E E + T * id

E E + F * id

E E + id * id

E T + id * id

E F + id * id

E id + id * id

E T is also not possible here in RMD.

(2) (id + id) * id

Solve by LMD :-

First we take the grammar E E + T/T and solve it by LMD

E E + T

Here the above grammar cant be possible in this case. So, without wasting

our time, we need to go to the further case.

E T

E T

E T * F

E F * F

E (E) * F

E (E + T) * F

E (T + T) * F

E (F + T) * F

E (id + T) * F

E (id + F) * F

E (id + id) * F

E (id + id) * id

Solved by RMD :-

E E + T is not possible here.

E T

E T

E T * F

E T * id

E F * id

E (E) * id

E (E + T) * id

E (E + F) * id

E (T + id) * id

E (F + id) * id

E (id + id) * id

If you want to make a derivation tree for the above grammars, then I will make a tree

for you as an example. I am going to make a tree for grammar LMD of E T

which is given as follows:-

LMD of E T

E

T

T * F

F id

(E) + T

T F

F id

id

Left recursion: - Whenever any non-terminal produces itself at left most position of

grammar production, then, it is a left recursion.

Example:-

S Sab/b

Note: - S produces itself at leftmost part as indicated here.

Drawbacks of recursion (also known as repetition):-

1. Ambiguous.

2. Repetition.

Example:-

S Sab

S a b

S a b

Format of Left Recursion Technique:-

If A A/

Then it is a single left recursion.

Where A V

(V u T)* (any value)

(V u T)* (any value)

Note: - Consider the following case:-

S S + T/T

A A /

Where S represents A

+ T represents

T represents

Multiple Left Recursions: - Whenever any non-terminal produces itself at left most

position of grammar production but in a multiple way, then, it is a multiple left

recursion.

Consider an example as follows:-

Example:-

If A A1/ A2/ A3/.. /An

And A 1 / 2 / 3 / 4 /../n

It can also be written in the following manner:-

E E + T/ E F/ E * T/a/b

A A 1 / A 2 / A 3 /1 /2

Where E = A

+T = 1

F = 2

* T = 3

a = 1

And b = 2

Removal of left recursion:-

1. Single

A A /

The removal formula of the above expression is:-

A B

B B/

2. Multiple

A A /

Then the removal formula of the above expression is:-

A 1 B/ 2 B/ 3 B/ 4 B n B

B 1 B/ 2 B/ 3 B/ 4 B/ n B

Question:- Remove the recursion for the following grammar.

E E + T/T

Solution:-

E T B

B +T B/

Note :-

When E E + T

Then E T B

B +T B/

Now we can solve it by another method as follows:-

E E + T (Convert E in this equation into E + T)

E E + T + T (Convert E in this equation into T)

E T + T + T

Now take

E EB (Convert B here into +TB)

E T + TB (Convert B here into +TB)

E T + T + T (Here B is converted into )

Question:- Remove the recursion.

E E + T/E*F/a/b/d

Solution:-

E aB / bB / dB

B +T B / *FB /

Question :- Remove the left recursion.

1. E E + T/T

2. T T * F/F

3. F (E) / id

Solution:-

1. E TB

B +TB /

2. T FB

B *FB/

3. No left recursion is there.

Indirect Left Recursion:- Whenever any non-terminal produces itself at many position

of grammar production way (indirectly), then, it is a indirect left recursion.

Example:-

S AA/0 (1)

A SS/1 (2)

Now put equation (2) in (1), then we get

S SSA/0

We have to take the following steps for the removal of these types of recursions. The

steps are as follows:-

1. Reduce left recursion.

2. Removal of it.

Now we are going to apply these steps in the following examples.

Example:-

1) S SSA/1A/0

A SS/1

2) S 1AB/0B

B SAB/

A SS/1

There are some other ways to solve this problem which are given as

follows:-

1) S AA/0

A AAS/0 S/1

2) S AA/0

B 0SB/ 1B

A ASB/

Left Factor: - Whenever any value repeat itself at leftmost position of any grammar

production more than it is left factored value.

Example:-

S ab / ac / ad / a / b / g (One common factor i.e. a)

S aBc / aBd / aB / g (Combination of left common factor i.e. aB)

Note: - There can be one or more than one left factors.

Format for Left Factor:-

A 1 / 2 / 3 /.. n /1 /2 /3n

Example:-

1. S aB/aC/d

Then the format is:-

S aA/d

A B / C /

2. S aBD / aBG / aB / a / d

Take aB as left factor.

S aBA/a/d

A D / G /

S aC / d

C BA /

A D / G /

S aA / d

A BD / BG / B /

S aA / d

A BC /

C D / G /

Parsing: - It is the technique where we construct a fixed parser record of any

grammar. By using parser, we can check acceptability or rejection of any string by the

grammar for which we construct parser. It is predictive technique.

Note:- LMD and RMD are non-predictive or with backtracking technique.

Classification of Parsing:-

The classification of parsing is given below. Because of lack of space, first I will define

the abbreviations used in the classification and then provide the hierarchical diagram

or classification diagram of parsing. The definitions are:-

SLR (Simple) (Left to right scan) (RMD)

LR (0) (Left to right scan) (RMD) (No look ahead values)

SLR (1) (Simple) (Left to right scan) (RMD) (One entry is permitted in parsing table)

LR (Left to right scan)

CLR (Canonical) (Left to right scan) (RMD)

LR (1) (Left to right scan) (RMD) (One set of look ahead values)

LALR (Look Ahead) (Left to right scan) (RMD)

LALR (1) - (Look Ahead) (Left to right scan) (RMD) (Single entry in the table)

Note:-Canonical = One shape with different names.

LALR = One shape with different names but are merged together to form a single

entity.

Now the classification diagram is given as follows:-

Parsing

Top Down Parser(LMD) Bottom Up Parser(RMD)

With backtracking Without backtracking Shift reduce parsing

Recursive decent parser Non-recursive Operation LR

parser or table procedure

driven parser parser SLR LR LALR

[LL (1)] or or or

LR(0) CLR LALR(1)

or or or

SLR(1) LR(1) Merge

(LR)

FIRST & FOLLOW FIRST and FOLLOW:- FIRST It is first terminal value produced by any non-terminal at derived side in all possible ways

If S aB Then FIRST(S) = a

FOLLOW It is also a terminal value which appears after any non-terminal at derived side of grammar production.

S aAd Then FOLLOW (A) = d

Algorithms for FIRST and FOLLOW:- Algorithm for FIRST:- Rules

1. If A is any production, then FIRST (A) =

2. It is a first terminal value produced by any non-terminal in all possible ways, which will be discussed in next lemmas or rules.

3. If A is any production where A V T (V U T)*

or in other words A is derived. is single terminal. can contain any value.

Then FIRST (A) = NOTE In S bD, b and D . Example:-

If S aBCDEFGH Then FIRST (S) = a

4. If A is any production, where contain single non-terminal and never tends to anywhere in the grammar, then:-

FIRST (A) = FIRST ()

Example:- If S AB

A aB B d

Then FIRST (S) = FIRST (A) = a 5. If A is any production, where contain single non-terminal and produces

anywhere in the grammar, then:- FIRST (A) = FIRST ()

But, for possibility, we check next to and apply rule 1, 3, 4 and 5. Example:-

S AB A aB/ B d

Non- terminal FIRST

S a, d

A a,

B d

Algorithm for FOLLOW:- 1. A non-terminal for which we calculate FOLLOW value always appears derived

side of production. The terminal value arrived after non terminal will be FOLLOW value of that non-terminal.

2. Add $ in FOLLOW of starting production variable directly. 3. If A B is any production where FOLLOW (B) is to be calculated.

(V U T)* A V T

Then FOLLOW (B) =

Example:- S aBd A aBg B bBe

Then FOLLOW (B) = {d, g, e} 4. If A B, is any production where contain single non-terminal and never

tends to , then:-

FOLLOW (B) = FIRST () Example:-

S BA A aB/bA

Then FOLLOW (B) = a, b = FIRST (A) 5. If A B, is any production where contain single non-terminal and

produces , then :- FOLLOW (B) = FIRST () and for or for A b FOLLOW (B) = FOLLOW (A)

Example:- S BA A aB/bA/

Non-terminal FOLLOW

S $

A $

B a, b, $

6. If A B is any production where contains any value, then FOLLOW of B is

totally dependent on FIRST of . Apply rules 3, 4, 5 and 6 accordingly after checking FIRST of . Also check next to is possible. Example:-

If S aBdefgh Then FOLLOW (B) = d If S BAefgh Then FOLLOW (B) = efgh

Question:- Find the FIRST and FOLLOW for the following.

E TE E +TE/ T FT T *FT/ F (E)/ id

Solution:-

Non-terminal FIRST FOLLOW

E {( , id} {$ , ) }

E {+ , } {$ , ) }

T {( , id} {+ , $ , ) }

T {* , } {+ , $ , ) }

F {( , id} {* , + , $ , ) }

1. FIRST (F) = FIRST (T) = FIRST (E) = {(, id}. To see why, note that the two

productions for F have bodies that start with these two terminal symbols, id and the left parenthesis. T has only one production, and its body starts with F. Since F does not derive , FIRST (T) must be the same as FIRST (F). The same argument covers FIRST (E).

2. FIRST (E') = {+, }. The reason is that one of the two productions for E' has a body that begins with terminal +, and the other's body is . whenever a nonterminal derives , we place in FIRST for that nonterminal.

3. FIRST (T') = {*, }. The reasoning is analogous to that for FIRST ( E ' )- 4. FOLLOW (E) = FOLLOW (E') = {), $}. Since E is the start symbol, FOLLOW (E) must

contain $. The production body (E) explains why the right parenthesis is in FOLLOW (E). For E', note that this nonterminal appears only at the ends of bodies of E-productions. Thus, FOLLOW (E') must be the same as FOLLOW (E).

5. FOLLOW (T) = FOLLOW (T') = {+, ) , $}. Notice that T appears in bodies only followed by E'. Thus, everything except that is in FIRST (E') must be in FOLLOW (T); that explains the symbol +. However, since FIRST (E') contains , and E' is the entire string following T in the bodies of the E-productions, everything in FOLLOW (E) must also be in FOLLOW (T). That explains the symbols $ and the right parenthesis. As for T', since it appears only at the ends of the T-productions, it must be that FOLLOW (T') = FOLLOW (T).

6. FOLLOW (F) = {+, *,), $}. The reasoning is analogous to that for T in point (5).

***********************************************************

LL (1)Parser

Rules:-

1. Remove left recursion or left factor from the given grammar, if available.

2. Calculate FIRST and FOLLOW.

3. Construct LL (1) parsing table according to table construction rules.

4. Check LL (1) parsing table for multiple entries. If found, then, declare the parser

is NOT LL (1) parser.

5. Check the acceptability of string by LL (1) parsing table.

Practice questions for FIRST and FOLLOW:-

Question: - Calculate the FIRST and FOLLOW for the following:-

S CC S cC/d

Solution:-


S c, d $

C c, d c, d, $


S aAB A aBd/B B bA/

Solution:-

Non-terminal FIRST FOLLOW Depends On

S A $

A a, b, b, $ B

B b, $, d, b


S aSD / ABC A BC / bAC B cB / CD / eCf C gBA / hDi / D

D jD / Dk / Solution:-

Non-terminal FIRST FOLLOW Depends On

S a, b, c, e, g, h, j, k, $, d

A b, c, e, g, h, j, k, g, h, j, k, c, e, $, d, f, b C

B c, e, g, h, j, k, g, h, j, k, $, d, b, c, e, f A, C

C g, h, j, k, f, j, k, $, d, g, h, c, e, b B, A

D j, k, i, k, f, j, k, $, d, g, h, c, e, b B, C

Note: - Put D as to get the k in the FIRST (D).

Questions for LL (1):- Question: - Make LL (1) for the following grammar:-

E E + T/T

T T *F/ F

F (E)/ id

And strings:-

(3) id + id *id

(4) (id + id) * id

Solution:-

1. Removal of left recursion.

E TE E +TE/ T FT T *FT/ F (E)/ id

2. Calculate FIRST and FOLLOW.


E {( , id} {$ , ) }

E {+ , } {$ , ) }

T {( , id} {+ , $ , ) }

T {* , } {+ , $ , ) }

F {( , id} {* , + , $ , ) }

3. Arrange the non-terminal row wise & all terminals column wise including $

and excluding .

Non-terminals

Input Symbols + * ( ) id $

E E T E' E T E'

E' E +TE E E

T T FT T FT

T' T T *FT T T

F F (E) F id

Table entry rules (To fill out the above table):-

1) Enter FIRST generating production in row of FIRST generating non-terminal with

column of FIRST terminal value.

2) Enter production in row of derivative with column of FOLLOW of

derivative.

Example:-

If E , then

Answer: - The above grammar and table satisfies the LL(1) grammar.

4. Make the LL (1) parsing table for string id + id *id.

Now what do we do with this table? This table forms one part in a three part data structure. The other two parts are a stack of grammar symbols (E, E', T, T', F, +, *, (, ), int, and $), and an input stream (the expression we want to parse, already tokenized into lexemes by the scanner). We start our stack with the starting non-terminal E here.

Stack Input Action

$E id + id * id $ E TE

$ET id + id * id $ T FT

$ETF id + id * id $ F id

N-T $ ) E E E

$ET id id + id * id $ POP id

$ET + id * id $ T

$E + id * id $ E + TE

$ ET + + id * id $ POP +

$ ET id * id $ T FT

$ ETF id * id $ F id

$ ET id id * id $ POP id

$ ET * id $ T *FT

$ ETF * * id $ POP *

$ ETF id $ F id

$ ETid id $ POP id

$ ET $ T

$ E $ E

$ $ ACCEPTED

5. Now, make LL (1) table for (id + id) * id.

Stack Input Action

$E (id + id) * id $ E TE

$ET (id + id) * id $ T FT

$ETF (id + id) * id $ F (E)

$ET ) E ( (id + id) * id $ POP (

$ET ) E id + id) * id $ E TE

$ET ) ET id + id) * id $ T FT

$ET ) ETF id + id) * id $ F id

$ET ) ETid id + id) * id $ POP id

$ET ) ET + id) * id $ T

$ET ) E + id) * id $ E +TE

$ET ) ET+ + id) * id $ POP +

$ET ) ET id) * id $ T FT

$ET ) ETF id) * id $ F id

$ET ) ETid id) * id $ POP id

$ET ) ET ) * id $ T

$ET ) E ) * id $ E

$ET ) ) * id $ POP )

$ET * id $ T *FT

$ET F* * id $ POP *

$ET F id $ F id

$ET id id $ POP id

$ET $ T

$E $ E

$ $ ACCEPTED

***********************************************************

Bottom-Up Parsing

SLR/LR (0)/SLR (1)

Rules:-

1. Calculate FIRST and FOLLOW for given grammar.

2. Numbering of productions.

3. Augmentation of grammar. (Initialization)

4. Construction of LR (0) item set.

5. Construction of LR (0) parsing table.

6. Parsing table entries. (SHIFT, REDUCE, GOTO & ACCEPT)

7. Declaration of parser by checking conflict in parsing table.

8. SHIFT or GOTO or GOTO SHIFT graph. (Optional)

9. Parsing of string or acceptability of any string.

Question: - Construct SLR parser for given grammar and check the acceptability of

ccdd.

S CC

C c C / d

Solution:-

1. Calculate FIRST and FOLLOW:-

Non Terminal FIRST FOLLOW

S c,d $

C c, d $, c, d

2. Numbering of productions:-

S CC R1

C c C R2

C d R3

3. Augmentation: - The process where we initialize starting production variable by

any auxiliary variable.

Example: - Ignition of matchstick before burn the gas stove. So, ignition is an

augmentation.

S S

Scanning Rule:-

1) Whenever . (Dot) scans any non-terminal then we write all productions of it

with . (Dot).

2) Whenever . scans any terminal then we stop for only such possibilities.

Now:-

I0 ; S S

S CC

C cC

C d

Dot Scanning Rules:-

a. Similar scanning always moves together.

b. At a time, only one scanning movement is possible.

c. For non-terminal, we use GOTO operation and for terminal, we use SHIFT

operation.

d. Whenever any new collection is found, then, declare a new item set name

otherwise refer previous name for it.

Note:-

If

S S

S SC

Then after one scanning move.

S S

S SC

Now, move on to the question.

4. Construction of LR (0) item set.

I0 ; S GOTO

S S

I1

I0 ; C GOTO

S cC

C cC

C d

I2

I0 ; c SHIFT

C cC

C c C

C d

I3

I0 ; d SHIFT

C d

I4

I2 ; C GOTO

S CC

I5

I2 ; c SHIFT

I3

I2 ; d SHIFT

I4

I3 ; C GOTO

C c C

I6

I3 ; c SHIFT

I3

I3 ; d SHIFT

I4

Construction of parsing table:-

a. Arrange all item sets row wise

b. Arrange all terminals including $ column wise in column of ACTION.

c. Arrange all non-terminals column wise in column of GOTO.

Note: - In following table, S stands for SHIFT moves.

Items ACTION GOTO c d $ S C

I0 S3 S4 1 2

I1 ACCEPT

I2 S3 S4 5

I3 S3 S4 6

I4 R3 R3 R3

I5 R1

I6 R2 R2 R2

Declaration:-

There is no conflict in the table (no dual values in single cell). So, it is an SLR

parser.

Acceptability of String by SLR, LR (1) and LALR:- Rules:-

1. Draw three columns for STACK, INPUT and ACTION, and do the following:-

a. Enter the entire input string in input column followed by $.

b. Initialize stack with $ and initial item set number.

c. Check top stack with first input and:-

1) If SHIFT entry is found then PUSH first input in top stack followed by

shifting number, then repeat step (c) for next input with new top stack.

2) If reduce entry is found, then we enter reduction production in ACTION

column. We check derived side of reduction production and we POP

double values compared with derived side of reduction production. After

POP operation, we PUSH derivative of reduction production in top stack.

We check GOTO entry with previous top stack along with new top stack.

Then, we repeat step (c).

3) If, we found ACCEPT entry, then only the string will be accepted.

STACK INPUT ACTION

$0 ccdd$ S3

$0c3 cdd$ S3

$0c3c3 dd$ S4

$0c3c3d4 d$ R3

$0c3c3C6 d$ R2

***********************************************************

LR (1) / CLR / LR 1. Numbering of production.

2. Augmentation.

3. Construction of canonical collection of LR (1) item set.

4. Construction of LR (1) parsing table.

5. Fill out parsing table entries.

6. Declaration of parsers by checking conflicts.

7. Construct graph (SHIFT, GOTO and GOTO SHIFT).

8. Acceptability of string.

Look Ahead:-

1. It is a collection of values used for reduce entry.

2. $ is default LOOK AHEAD of augmentation variable.

3. We calculate LOOK AHEADS for each new production in three possible ways.

Question: - Make LR (1) parser for the following grammar.

S CC

C c C / d

Solution:-

1) Numbering

1. S CC R1

2. C cC R2

3. C d R3

2) Augmentation

S .S

$0c3C6 d$ R2

$0C2 d$ S4

$C2d4 $ R3

$0C2C5 $ R1

$051 $ ACCEPTED

3) Construction of canonical collection of LR (1) item set.

I0; S S $

S CC $

C cC cd

C d cd

I0; S . GOTO

S S. $

I1

I0 ; C GOTO

S CC $

C cC $

C d $

I2

I0 ; c SHIFT

C cC cd

C c C cd

C d cd

I3

I0 ; d SHIFT

C d cd

I4

I2 ; C GOTO

S CC $

I5

I2 ; c SHIFT

C cC $

C c C $

C d $

I6

I2 ; d SHIFT

C d $

I7

I3 ; C GOTO

C c C cd

I8

I3 ; c SHIFT

I3

I3 ; d SHIFT

I4

I6 ; C GOTO

C c C $

I9

I6 ; c SHIFT

I6

I6; d SHIFT

I7

4) Construction of LR (1) parsing table.

ITEMS ACTION GOTO c d $ S C I0 S3 S4 1 2

I1 ACCEPTED

I2 S6 S7 5

I3 S3 S4 8

I4 R3 R3

I5 R1

I6 S6 S7 9

I7 R3

I8 R2 R2

I9 R2 ***********************************************************

Question:- Check that the grammar is SLR and LR.

S L = R / R

L *R / id

R L

Solution:-

1) Numbering

1. S L=R R1

2. S R R2

3. L * R R3

4. L id R4

5. R L R5

2) Augmentation

S S

3) Construction of canonical collection of LR (1) item set. LOOK AHEADS

I0 ; S S $

S L=R $

S R $

L *R =$

L id =$

R L $

I0 ; S GOTO

S S $

I1

I0 ; L GOTO

S L=R $

R L $

I2

I0 ; R GOTO

S R $

I3

I0 ; * SHIFT

L *R =$

R L =$

L *R =$

L id =$

I4

I0 ; id SHIFT

L id =$

I5

I2 ; = SHIFT

S L=R $

R L $

L *R $

L id $

I6

I4 ; R GOTO

L *R =$

I7

I4 ; L GOTO

R L =$

I8

I4 ; * SHIFT

L *R =$

R L =$

L *R =$

L id =$ I4

I4 ; id SHIFT

I5

I6 ; R GOTO

S L=R $

I9

I6 ; L GOTO

R L $

I10

I6 ; * SHIFT

L *R $

R L $

L *R $

L id $

I11

I6 ; id SHIFT

L id $

I12

I11 ; R GOTO

L *R $

I13

I11 ; L GOTO

I10

I11 ; * SHIFT I11

I11 ; id SHIFT

I12

4) Construct LR (1) parsing table.

ITEMS ACTION GOTO = * id $ S L R

I0 S4 S5 1 2 3

I1 ACCEPTED

I2 S6 R5

I3 R2

I4 S4 S5 8 7

I5 R4 R4

I6 S11 S12 10 9

I7 R3 R3

I8 R5 R5

I9 R1

I10 R5

I11 S11 S12 10 13

I12 R4

I13 R3 As there is no multiple value in the same cell of the table, the grammar is said to be LR

(1)

LALR (Direct Method) Rules:-


2. Augmentation.

3. Construction of LALR item set.

4. LALR parsing table.

5. Fill out the table entries.

6. Declaration of parser after checking conflicts.

7. Construction of graph (SHIFT, GOTO and GOTO SHIFT).

8. Acceptability of string.

Note: - There is also an indirect method. We only have to use the indirect method

when the question is asking for both LR (1) and then LALR. An example of this is given

as follows:-

Example of INDIRECT method

Question: - Construct the LR (1) and LALR for the following grammar.

S CC

C c C / d

Solution:-

For LR (1) - See previous method and For LALR

As we have noted that in LR (1) item sets, item I3 = I6 and item I4 = I7 . So, we have to

merge these items and make a single item by combining the equal items as

I3 = I6 = I3, 6

I4 = I7 = I4, 7

Now, construction of LALR table.

ITEMS ACTION GOTO c d $ S C I0 S3, 6 S4, 7 1 2

I1 ACCEPTED

I2 S3, 6 S4, 7 5

I3, 6 S3, 6 S4, 7 8, 9

I4, 7 R3 R3 R3

I5 R1

I8, 9 R2 R2 R2

Question:- Construct LR (1) and LALR for the following grammar.

S L=R/R

L *R / id

R L

Solution:-

Hint I4 = I11 = I4, 11

I5 = I12 = I5, 12

I7 = I13 = I7, 13

I8 = I10 = I8, 10 ***********************************************************

Direct Method for LALR Question: - Check that the following grammar is LALR or not.

S CC

C cC / d

Solution:-


S CC ...1

C cC ...2

C d ...3

2. Augmentation.

S S

3. Construction of LALR item set.

I0; S S $

S CC $

C cC cd

C d cd

I0 ; S GOTO

S S $

I1

I0 ; C GOTO

S CC $

C cC $

C d $

I2

I0 ; c SHIFT

C cC $cd

C c C $cd

C d $cd

I3

I0 ; d SHIFT

C d $cd

I4

I2 ; C GOTO

S CC $

I5

Now merge the LOOK AHEADS of I3, we get the following.

I2 ; c SHIFT

I3

I2 ; d SHIFT

I4

I3 ; C GOTO

C c C $cd

I6

I3 ; c SHIFT

I3

I3 ; d SHIFT

I4

4. LALR parsing table with entries.

Is given below-------------- >>>>

ITEMS ACTION GOTO c d $ S C I0 S3 S4 1 2

I1 ACCEPTED

I2 S6 S7 5

I3 S3 S4 6

I4 R3 R3 R3

I5 R1

I6 R2 R2 R2

Question: - Check that the following grammar is LALR or not.

Solution:-


6. S L=R R1

7. S R R2

8. L * R R3

9. L id R4

10. R L R5

2. Augmentation

S S

3. Construction of canonical collection of LR (1) item set. LOOK AHEADS

I0 ; S S $

S L=R $

S R $

L *R =$

L id =$

R L $

I0 ; S GOTO

S S $

I1

I0 ; L GOTO

S L=R $

R L $

I2

I0 ; R GOTO

S R $

I3

I0 ; * SHIFT

L *R =$

R L =$

L *R =$

L id =$

I4

I0 ; id SHIFT

L id =$

I5

I2 ; = SHIFT

S L=R $

R L $

L *R $

L id $

I6

I4 ; R GOTO

L *R =$ I7

I4 ; L GOTO

R L =$

I8

I4 ; * SHIFT

I4

I4 ; id SHIFT

I5

I6 ; R GOTO

S L=R $

I9

I6 ; L GOTO

I8

I6 ; * GOTO

I4

I6 ; id SHIFT

L id $

I5

4. LALR parsing table with entries.

ITEMS ACTION GOTO id * = $ S L R I0 S5, 12 S4, 11 1 2 3

I1 ACCEPTED

I2 S6 R5

I3 R2

I4, 11 S5, 12 S4, 11 8, 10 7, 13

I5, 12 R4 R4

I6 S5, 12 S4, 11 8, 10 9

I7, 13 R3 R3

I8, 12 R5 R5

I9 R1

***********************************************************

general overview of compiler - sumit...

Documents