general overview of compiler - sumit...

52
General Overview of Compiler Compiler : - It is a complex program by which we convert any high level programming language (source code) into machine readable code. Interpreter : - It performs the same task of compiler but in line by line passion. Assembler : - It converts assembly level instructions into machine level instructions as a binary code. Translator : - It is a program which converts any language into any other language for synchronization. Compiler is also a translator. Compiler and its Stages or Phases of Compiler or The structure of a Compiler Up to this point we have treated a compiler as a single box that maps a source program into semantically equivalent target program. If we open up this box a little, we see that there are two parts to this mapping analysis and synthesis. These analysis and synthesis parts are also known as the front end and the back end of the compiler. The analysis part breaks up the source program into constituent pieces and imposes a grammatical structure on them. The analysis part also collects information about the source program and stores it in a data structure called symbol table. The synthesis part constructs the desired target program from the intermediate representation and the information in the symbol table. The analysis part is often called the front end of the compiler and the synthesis part is the back end of the compiler.

Upload: letram

Post on 28-Mar-2018

231 views

Category:

Documents


1 download

TRANSCRIPT

  • General Overview of Compiler

    Compiler: - It is a complex program by which we convert any high level

    programming language (source code) into machine readable code.

    Interpreter: - It performs the same task of compiler but in line by line

    passion.

    Assembler: - It converts assembly level instructions into machine level

    instructions as a binary code.

    Translator: - It is a program which converts any language into any other

    language for synchronization. Compiler is also a translator.

    Compiler and its Stages or Phases of Compiler or The structure of a

    Compiler

    Up to this point we have treated a compiler as a single box that maps a

    source program into semantically equivalent target program. If we open up

    this box a little, we see that there are two parts to this mapping analysis

    and synthesis. These analysis and synthesis parts are also known as the front

    end and the back end of the compiler. The analysis part breaks up the

    source program into constituent pieces and imposes a grammatical

    structure on them. The analysis part also collects information about the

    source program and stores it in a data structure called symbol table.

    The synthesis part constructs the desired target program from the

    intermediate representation and the information in the symbol table. The

    analysis part is often called the front end of the compiler and the synthesis

    part is the back end of the compiler.

  • 1. Lexical analysis

    2. Syntax analysis Front end of compiler

    3. Semantic analysis

    4. Intermediate code generation

    5. Code optimization Back end of compiler

    6. Code generation

    Syntax Analysis

    Lexical Analysis

    Semantic Analysis

    Intermediate Code Generator

    Code Optimizer

    Code Generator

    Code optimizer

    Character Stream (Input file)

    90

    SYMBOL

    TABLE

    Token stream

    Syntax tree

    Syntax tree

    Intermediate representation

    Intermediate representation

    Target-machine code

    Target-machine code (output file)

    ERROR

    HANDLER

  • In the above diagram, figure shows the Phases of a compiler. Now we are

    going to talk about the general description of all the phases of a compiler.

    1. Lexical Analysis:-

    It is a scanner which scans input value one by one in left to right manner. It

    produces output with entire description of each scanned value

    E.g.

    Position: = Initial + Rate *60

    In this example.

    Id1 = Position

    := = Assignment Operator

    Id2 = Initial

    + = Addition Operator

    Id3 = Rate

    * = Multiplication Operator

    60 = A number

    2. Syntax Analysis (parser):-

    This phase validates the syntax of expression. For this purpose, we construct

    syntax tree or parser tree.

    E.g. Make a tree for the given equation.

    C := a + b

    Note: - Priority for symbols ( > (*, /) > (+ , -) > = or :=)

  • The above equation is for c := a + b. Now the following equation is

    for Position := Initial + Rate * 60

    :=

    +

    *

    Position

    Initial

    Rate 60

    :=

    + c

    a b

  • 3. Semantic Analysis :-

    This phase is used to match data types and context of programming

    language. It also converts program statement according to target language

    (Machine language).

    Output of the above tree by the semantic analysis:-

    id1 = id2 + id3* 60

    4. Intermediate Code Generation : -

    In this phase, we construct TAC (Three Address Codes) by using temporary

    registers. In TAC, we use maximum of three operands and minimum of two

    and we use maximum of two operators including necessary assignment

    operators.

    :=

    +

    *

    Id1

    Id2

    Id3 Int to Real 60

  • 1) t1 = a + b (TAC condition applies)

    2) t2 = a (temporary register)

    Example:-

    id1 = id2 + id3* 60

    t1 = 60.0 t1 = id3

    t2 = id3*t1 OR t2 = t1*60.0

    t3 = id2+ t2 t3 = id2 + t2

    t4 = t3 id1 = t3

    5. Code Optimization:-

    It is a technique where we modify, alter or re-arrange or minimize

    intermediate code sequence for better utilization memory and to increase

    speed of execution without changing the meaning of original code.

    Example: - t1 = id3 * 60.0, id1 = id2 + t1

    6. Code Generation: -

    The code generation takes an input as intermediate representation of the

    source program and maps it into the target language. If the target language

    is machine code, registers or memory locations are selected for each of the

    variable used by the program.

    For example: Using registers R0 and R1 , the intermediate code given below

    might get translated into the machine code.

    t1 = id3 * 60.0, id1 = id2 + t1

  • Operation

    Name

    Operation From

    Operation

    To

    Comments

    given

    MOV id3 R0 id3 to R0

    MUL 60.0 R0 t1 is in R0

    MOV id2 R1 id2 is in R1

    ADD R0 R1 id1 in R1, R0 is empty

    MOV R1 id1 R1 is empty because of

    leftmost side

    ***********************************************************

  • Basic Concepts

    1) The scanning work is also known as lexical analysis.

    2) Mike Lesk and Shimdit were the inventors of lexical analysis.

    3) Output of lexical analysis is also called lexemes.

    If c = a + b

    Then c, a and b are lexemes.

    4) Our eye is the best example of lexical analysis because it first scans the thing and

    then identifies it.

    5) The program which is used for lexical analysis is called a Lex Program.

    6) Tokens are just the collection of lexemes.

    C = a + b

    Where a, b and c are identifiers

    7) The lexemes are of three types:-

    1. Static

    2. Dynamic

    3. Variable

    8) Example of white spaces:-

    endl, /n, extra etc.

    9) Regular expressions are just the predefined writing syntax.

    10) Some basic formulas:-

    1. a* = Kleens closure = { , a, aa, aaa, aaaa, aaaaa..}

    2. a+ = Positive closure = {a, aa, aaa, aaaa, aaaaa,.}

    3. (a + b)* = { , a, aa, aaaab, ba, aba.}

    Note:- This is called regular set of (a + b)

    4. Id = letter(letter/digit)*

    Note:- It implies that, first position of any id is always letter.

    5. is the sign of not equals in programming language.

    ***********************************************************

  • Some More Basic Concepts

    1) Yacc (Yet Another Compiler Compiler). It is used to help the syntax analysis to

    make the tree after lexical analysis.

    2) a*, a+, ab, a+b, abb, (a+b)*, a/b, (a,b), aUb are all regular expressions

    ab a/b

    anb AND Operation a,b

    ab aub OR Operation

    a+b

    a* = Kleens Closure

    (a+b)* =Universal closure expression or universal regular expression.

    = { , a, aa, aba, bab, abb..}

    3) letter(letter/digit)*, it means, we can write:

    c12 = a + b or

    c12a = a + b

    But we cannot write the following one:

    21 = a + b

    4) Lex is the super scanner of lexical analysis.

    5) In the following figure a, b and c are known as lexeme values.

    6) yylval stands for yy lexical value.

    It means the lexeme value is going directly to yacc.

    Install_id( ) Install and forward the id or identifier.

    :=

    + c

    a b

  • Overview of Finite State Automata

    1. It is used in lexical analysis or scanning phase of compiler.

    2. It is used to implement statement or regular expression of any programming

    language. Finite Automata and regular expressions are acting as foundation of

    lexical analysis.

    3. Automata:- It is an automatic machine developed or designed by a developer,

    programmer or manufacturer to complete any desired task.

    E.g. Automobiles, calculators, computers, home appliances, super computers,

    microwave technologies, generators etc.

    4. Finite Automata or Finite State Machine:-

    4.1. A machine which compute finite number of computations is called finite

    automata or finite state machine

    4.2. Formal Definition:- It consists of five tuples:-

    M = {Q, , , q0 , F}

    Where

    M = Machine

    Q = Non empty set of all states.

    = {q0, q1, q2, q3,..qn}

    = Non empty set of input values.

    = {a, b, c, 1, 2, 3, *, /, ( , ) , , }

    = Input transition function represented by transition

    or by transition diagram.

    q0 = Default initial state.

    F = Set of final state.

    E.g. An example of transition table (Rotation of a fan).

    State Switch Rotation

    OFF ON 100rpm 100rpm 200rpm

    200rpm 300rpm 300rpm OFF

  • Note:- Here rpm is rotation per minute.

    4.3. Types of Finite Automata:-

    There are two types of Finite Automata:-

    1. NFA or NDFA

    (Non Finite Automata or Non Deterministic Finite Automata)

    2. DFA (Deterministic Finite Automata)

    4.4. Technical definition :-

    1. DFA:-

    (Q X ) Q

    2. NFA:-

    (Q X ) 2Q

    Where 2Q is the power set of all the states (multiple outputs)

    E.g. If A = {a, b, c}

    Then 2A = {, a, b, c, ab, bc, ca, abc,}

    Implementation of Lex Program with DFA

    Rules:-

    1. For given lex program, check regular definition or regular expressions

    associated with it.

    2. Construct NFA for each individual regular expressions and define initial

    state and final state

    3. Give a unique name to each valuable state.

    4. Assume a common initial state and connect it with all NFA by using

    transition.

    5. Draw the empty DFA transition table and initialize it with joint initial

    state combination.

    6. Construct new output states by checking input values and apply DFA

    construction rules accordingly.

  • 7. Check input patterns associated with each input state and enter

    matched pattern value in patter announced column of DFA table. In

    this way, lex program will be implemented by DFA with associated

    patterns.

    Question:-

    %

    { C declaration (empty) }

    %

    { regular definition }

    a

    abb

    a*b+

    %

    { translation rules (empty)}

    %

    Solution:-

    1) NFA for a

    Start aa

    2) NFA for abb

    start a b b

    1

    2

    3

    6 5

    4

  • 3) NFA for a*b+

    Start b

    ab

    4) Now according to rule number 4, combine all NFAs with the help of .

    a

    a b b

    b

    ab

    Note: - In the following table, the Pattern announced are the common

    ways to reach the input state.

    7

    8

    0

    1

    2

    7

    8

    3

    6 5

    4

  • 5) Transition table :-

    a b Pattern announced

    [0137] [247] [8] No pattern

    [ 2 4 7 ] [7] [ 5 8 ] a

    [ 8 ] - [ 8 ] a*b+

    [ 7 ] [7] [ 8 ] a*

    [ 5 8 ] - [ 6 8 ] ab

    [ 6 8 ] - [ 8 ] abb

    ***********************************************************

    Grammar & Language

    1. Language:- It is alpha numeric, symbolic, alphabetic, syntactical way of

    representation by which we form some words and by arranging words in a

    meaningful sequence we form some sentences. Sentences are helpful to

    establish a communication link or interaction between two machines, two

    humans and human beings with the machine.

    E.g. Programming language (C, C++, and JAVA), frameworks (Dot Net), general

    languages (English, Hindi, Urdu, Marathi, and Telugu etc.), interfaces and drivers.

    2. Grammar: - Set of rules to define any language so that the communication will

    be meaningful.

    e.g. #include cant be written as include#.h. Every

    programming language has to follow some language and grammar rules.

  • Formal definition of Grammar

    It consists of four tuples:-

    G = {V, T, P, S}

    Where

    V = Non empty set of variables or non-terminals.

    = {A, B, C, D, EZ}

    T = Non empty set of terminals.

    = {a, b, c, d, e.z }

    P = Non empty set of production rules.

    S = Default starting production variable.

    We can understand with the following example.

    Example:-

    S aB/bA

    A d

    B g

    Note: - S aB/bA can also be written as S aB & S bA separately.

    Derivation: - Any string value can be derived by any grammar production and it is

    known as derivation.

    Acceptability: - If any string is generated by starting production variable that the

    string is accepted by the grammar.

    There are two types of Derivation:-

    1. LMD (Left Most Derivation):- It is used in top down parsing approach. To

    generate any string, if we open left most non-terminal before other non-

    terminal, then, it is an LMD. This technique is based on backtracking.

    E.g.:

    S ABC

    Where A a

    B b

    C c

  • Then according to the rule of LMD, we can solve the above expression as follows

    S ABC

    S aBC

    S abC

    S abc

    Note In the given expression S ABC, S is called or derivative part and

    ABC is called as or derived part

    2. RMD (Right Most Derivation):- It is used in bottom up parsing approach.

    Whenever we open right most non terminal before others to generate any string,

    then, it is an RMD. This technique is also based on backtracking.

    E.g.:

    S ABC

    Where A a

    B b

    C c

    Then according to the rule of RMD, we can solve the above expression as follows

    S ABC

    S ABc

    S Abc

    S abc

    Derivation Tree:- Step by step, expansion process of any string can be

    expressed by a tree known as derivation or parser tree.

    E.g.:

    S ABC

    Where A a

    B b

    C c

    The derivation tree of the above expression can be made as follows.

  • S

    A B C

    a b c

    Question: - Generate the string for the following:-

    (1) id + id * id

    (2) (id + id) * id

    By the grammar as follows:-

    E E + T/T

    T T * F/ F

    F (E)/ id

    Solve the above equations by LMD and RMD.

    Solution:-

    (1) id + id * id

    Solve by LMD :-

    First we take the grammar E E + T/T and solve it by LMD

    E E + T

    E E + T

    E T + T

    E F + T

    E id + T

    E id + T * F

    E id + F * F

    E id + id * F

    E id + id * id

    E T

    E T

    E F

    E (E)

    This case is not possible because here the brackets ( ) are not there in the

    string id + id * id. Now, consider the following case.

  • E T

    E T * F

    E F * F

    E id * F

    Now, this case is also not possible because of * sign come first here after

    id.

    Now solve by RMD :-

    E E + T

    E E + T * F

    E E + T * id

    E E + F * id

    E E + id * id

    E T + id * id

    E F + id * id

    E id + id * id

    E T is also not possible here in RMD.

    (2) (id + id) * id

    Solve by LMD :-

    First we take the grammar E E + T/T and solve it by LMD

    E E + T

    Here the above grammar cant be possible in this case. So, without wasting

    our time, we need to go to the further case.

    E T

    E T

    E T * F

    E F * F

    E (E) * F

    E (E + T) * F

    E (T + T) * F

    E (F + T) * F

    E (id + T) * F

    E (id + F) * F

    E (id + id) * F

  • E (id + id) * id

    Solved by RMD :-

    E E + T is not possible here.

    E T

    E T

    E T * F

    E T * id

    E F * id

    E (E) * id

    E (E + T) * id

    E (E + F) * id

    E (T + id) * id

    E (F + id) * id

    E (id + id) * id

    If you want to make a derivation tree for the above grammars, then I will make a tree

    for you as an example. I am going to make a tree for grammar LMD of E T

    which is given as follows:-

    LMD of E T

    E

    T

    T * F

    F id

    (E) + T

    T F

    F id

    id

  • Left recursion: - Whenever any non-terminal produces itself at left most position of

    grammar production, then, it is a left recursion.

    Example:-

    S Sab/b

    Note: - S produces itself at leftmost part as indicated here.

    Drawbacks of recursion (also known as repetition):-

    1. Ambiguous.

    2. Repetition.

    Example:-

    S Sab

    S a b

    S a b

    Format of Left Recursion Technique:-

    If A A/

    Then it is a single left recursion.

    Where A V

    (V u T)* (any value)

    (V u T)* (any value)

    Note: - Consider the following case:-

    S S + T/T

    A A /

    Where S represents A

    + T represents

    T represents

  • Multiple Left Recursions: - Whenever any non-terminal produces itself at left most

    position of grammar production but in a multiple way, then, it is a multiple left

    recursion.

    Consider an example as follows:-

    Example:-

    If A A1/ A2/ A3/.. /An

    And A 1 / 2 / 3 / 4 /../n

    It can also be written in the following manner:-

    E E + T/ E F/ E * T/a/b

    A A 1 / A 2 / A 3 /1 /2

    Where E = A

    +T = 1

    F = 2

    * T = 3

    a = 1

    And b = 2

    Removal of left recursion:-

    1. Single

    A A /

    The removal formula of the above expression is:-

    A B

    B B/

    2. Multiple

    A A /

    Then the removal formula of the above expression is:-

    A 1 B/ 2 B/ 3 B/ 4 B n B

    B 1 B/ 2 B/ 3 B/ 4 B/ n B

    Question:- Remove the recursion for the following grammar.

    E E + T/T

  • Solution:-

    E T B

    B +T B/

    Note :-

    When E E + T

    Then E T B

    B +T B/

    Now we can solve it by another method as follows:-

    E E + T (Convert E in this equation into E + T)

    E E + T + T (Convert E in this equation into T)

    E T + T + T

    Now take

    E EB (Convert B here into +TB)

    E T + TB (Convert B here into +TB)

    E T + T + T (Here B is converted into )

    Question:- Remove the recursion.

    E E + T/E*F/a/b/d

    Solution:-

    E aB / bB / dB

    B +T B / *FB /

    Question :- Remove the left recursion.

    1. E E + T/T

    2. T T * F/F

    3. F (E) / id

    Solution:-

    1. E TB

    B +TB /

    2. T FB

    B *FB/

  • 3. No left recursion is there.

    Indirect Left Recursion:- Whenever any non-terminal produces itself at many position

    of grammar production way (indirectly), then, it is a indirect left recursion.

    Example:-

    S AA/0 (1)

    A SS/1 (2)

    Now put equation (2) in (1), then we get

    S SSA/0

    We have to take the following steps for the removal of these types of recursions. The

    steps are as follows:-

    1. Reduce left recursion.

    2. Removal of it.

    Now we are going to apply these steps in the following examples.

    Example:-

    1) S SSA/1A/0

    A SS/1

    2) S 1AB/0B

    B SAB/

    A SS/1

    There are some other ways to solve this problem which are given as

    follows:-

    1) S AA/0

    A AAS/0 S/1

    2) S AA/0

    B 0SB/ 1B

    A ASB/

    Left Factor: - Whenever any value repeat itself at leftmost position of any grammar

    production more than it is left factored value.

    Example:-

  • S ab / ac / ad / a / b / g (One common factor i.e. a)

    S aBc / aBd / aB / g (Combination of left common factor i.e. aB)

    Note: - There can be one or more than one left factors.

    Format for Left Factor:-

    A 1 / 2 / 3 /.. n /1 /2 /3n

    Example:-

    1. S aB/aC/d

    Then the format is:-

    S aA/d

    A B / C /

    2. S aBD / aBG / aB / a / d

    Take aB as left factor.

    S aBA/a/d

    A D / G /

    S aC / d

    C BA /

    A D / G /

    S aA / d

    A BD / BG / B /

    S aA / d

    A BC /

    C D / G /

    Parsing: - It is the technique where we construct a fixed parser record of any

    grammar. By using parser, we can check acceptability or rejection of any string by the

    grammar for which we construct parser. It is predictive technique.

    Note:- LMD and RMD are non-predictive or with backtracking technique.

  • Classification of Parsing:-

    The classification of parsing is given below. Because of lack of space, first I will define

    the abbreviations used in the classification and then provide the hierarchical diagram

    or classification diagram of parsing. The definitions are:-

    SLR (Simple) (Left to right scan) (RMD)

    LR (0) (Left to right scan) (RMD) (No look ahead values)

    SLR (1) (Simple) (Left to right scan) (RMD) (One entry is permitted in parsing table)

    LR (Left to right scan)

    CLR (Canonical) (Left to right scan) (RMD)

    LR (1) (Left to right scan) (RMD) (One set of look ahead values)

    LALR (Look Ahead) (Left to right scan) (RMD)

    LALR (1) - (Look Ahead) (Left to right scan) (RMD) (Single entry in the table)

    Note:-Canonical = One shape with different names.

    LALR = One shape with different names but are merged together to form a single

    entity.

    Now the classification diagram is given as follows:-

    Parsing

    Top Down Parser(LMD) Bottom Up Parser(RMD)

    With backtracking Without backtracking Shift reduce parsing

    Recursive decent parser Non-recursive Operation LR

    parser or table procedure

    driven parser parser SLR LR LALR

    [LL (1)] or or or

    LR(0) CLR LALR(1)

    or or or

    SLR(1) LR(1) Merge

    (LR)

  • FIRST & FOLLOW FIRST and FOLLOW:- FIRST It is first terminal value produced by any non-terminal at derived side in all possible ways

    If S aB Then FIRST(S) = a

    FOLLOW It is also a terminal value which appears after any non-terminal at derived side of grammar production.

    S aAd Then FOLLOW (A) = d

    Algorithms for FIRST and FOLLOW:- Algorithm for FIRST:- Rules

    1. If A is any production, then FIRST (A) =

    2. It is a first terminal value produced by any non-terminal in all possible ways, which will be discussed in next lemmas or rules.

    3. If A is any production where A V T (V U T)*

    or in other words A is derived. is single terminal. can contain any value.

    Then FIRST (A) = NOTE In S bD, b and D . Example:-

    If S aBCDEFGH Then FIRST (S) = a

    4. If A is any production, where contain single non-terminal and never tends to anywhere in the grammar, then:-

    FIRST (A) = FIRST ()

  • Example:- If S AB

    A aB B d

    Then FIRST (S) = FIRST (A) = a 5. If A is any production, where contain single non-terminal and produces

    anywhere in the grammar, then:- FIRST (A) = FIRST ()

    But, for possibility, we check next to and apply rule 1, 3, 4 and 5. Example:-

    S AB A aB/ B d

    Non- terminal FIRST

    S a, d

    A a,

    B d

    Algorithm for FOLLOW:- 1. A non-terminal for which we calculate FOLLOW value always appears derived

    side of production. The terminal value arrived after non terminal will be FOLLOW value of that non-terminal.

    2. Add $ in FOLLOW of starting production variable directly. 3. If A B is any production where FOLLOW (B) is to be calculated.

    (V U T)* A V T

    Then FOLLOW (B) =

    Example:- S aBd A aBg B bBe

    Then FOLLOW (B) = {d, g, e} 4. If A B, is any production where contain single non-terminal and never

    tends to , then:-

  • FOLLOW (B) = FIRST () Example:-

    S BA A aB/bA

    Then FOLLOW (B) = a, b = FIRST (A) 5. If A B, is any production where contain single non-terminal and

    produces , then :- FOLLOW (B) = FIRST () and for or for A b FOLLOW (B) = FOLLOW (A)

    Example:- S BA A aB/bA/

    Non-terminal FOLLOW

    S $

    A $

    B a, b, $

    6. If A B is any production where contains any value, then FOLLOW of B is

    totally dependent on FIRST of . Apply rules 3, 4, 5 and 6 accordingly after checking FIRST of . Also check next to is possible. Example:-

    If S aBdefgh Then FOLLOW (B) = d If S BAefgh Then FOLLOW (B) = efgh

    Question:- Find the FIRST and FOLLOW for the following.

    E TE E +TE/ T FT T *FT/ F (E)/ id

    Solution:-

    Non-terminal FIRST FOLLOW

    E {( , id} {$ , ) }

  • E {+ , } {$ , ) }

    T {( , id} {+ , $ , ) }

    T {* , } {+ , $ , ) }

    F {( , id} {* , + , $ , ) }

    1. FIRST (F) = FIRST (T) = FIRST (E) = {(, id}. To see why, note that the two

    productions for F have bodies that start with these two terminal symbols, id and the left parenthesis. T has only one production, and its body starts with F. Since F does not derive , FIRST (T) must be the same as FIRST (F). The same argument covers FIRST (E).

    2. FIRST (E') = {+, }. The reason is that one of the two productions for E' has a body that begins with terminal +, and the other's body is . whenever a nonterminal derives , we place in FIRST for that nonterminal.

    3. FIRST (T') = {*, }. The reasoning is analogous to that for FIRST ( E ' )- 4. FOLLOW (E) = FOLLOW (E') = {), $}. Since E is the start symbol, FOLLOW (E) must

    contain $. The production body (E) explains why the right parenthesis is in FOLLOW (E). For E', note that this nonterminal appears only at the ends of bodies of E-productions. Thus, FOLLOW (E') must be the same as FOLLOW (E).

    5. FOLLOW (T) = FOLLOW (T') = {+, ) , $}. Notice that T appears in bodies only followed by E'. Thus, everything except that is in FIRST (E') must be in FOLLOW (T); that explains the symbol +. However, since FIRST (E') contains , and E' is the entire string following T in the bodies of the E-productions, everything in FOLLOW (E) must also be in FOLLOW (T). That explains the symbols $ and the right parenthesis. As for T', since it appears only at the ends of the T-productions, it must be that FOLLOW (T') = FOLLOW (T).

    6. FOLLOW (F) = {+, *,), $}. The reasoning is analogous to that for T in point (5).

    ***********************************************************

  • LL (1)Parser

    Rules:-

    1. Remove left recursion or left factor from the given grammar, if available.

    2. Calculate FIRST and FOLLOW.

    3. Construct LL (1) parsing table according to table construction rules.

    4. Check LL (1) parsing table for multiple entries. If found, then, declare the parser

    is NOT LL (1) parser.

    5. Check the acceptability of string by LL (1) parsing table.

    Practice questions for FIRST and FOLLOW:-

    Question: - Calculate the FIRST and FOLLOW for the following:-

    S CC S cC/d

    Solution:-

    Non-terminal FIRST FOLLOW

    S c, d $

    C c, d c, d, $

    Question: - Calculate the FIRST and FOLLOW for the following:-

    S aAB A aBd/B B bA/

    Solution:-

    Non-terminal FIRST FOLLOW Depends On

    S A $

    A a, b, b, $ B

    B b, $, d, b

    Question: - Calculate the FIRST and FOLLOW for the following:-

    S aSD / ABC A BC / bAC B cB / CD / eCf C gBA / hDi / D

  • D jD / Dk / Solution:-

    Non-terminal FIRST FOLLOW Depends On

    S a, b, c, e, g, h, j, k, $, d

    A b, c, e, g, h, j, k, g, h, j, k, c, e, $, d, f, b C

    B c, e, g, h, j, k, g, h, j, k, $, d, b, c, e, f A, C

    C g, h, j, k, f, j, k, $, d, g, h, c, e, b B, A

    D j, k, i, k, f, j, k, $, d, g, h, c, e, b B, C

    Note: - Put D as to get the k in the FIRST (D).

    Questions for LL (1):- Question: - Make LL (1) for the following grammar:-

    E E + T/T

    T T *F/ F

    F (E)/ id

    And strings:-

    (3) id + id *id

    (4) (id + id) * id

    Solution:-

    1. Removal of left recursion.

    E TE E +TE/ T FT T *FT/ F (E)/ id

    2. Calculate FIRST and FOLLOW.

    Non-terminal FIRST FOLLOW

    E {( , id} {$ , ) }

    E {+ , } {$ , ) }

    T {( , id} {+ , $ , ) }

    T {* , } {+ , $ , ) }

    F {( , id} {* , + , $ , ) }

  • 3. Arrange the non-terminal row wise & all terminals column wise including $

    and excluding .

    Non-terminals

    Input Symbols + * ( ) id $

    E E T E' E T E'

    E' E +TE E E

    T T FT T FT

    T' T T *FT T T

    F F (E) F id

    Table entry rules (To fill out the above table):-

    1) Enter FIRST generating production in row of FIRST generating non-terminal with

    column of FIRST terminal value.

    2) Enter production in row of derivative with column of FOLLOW of

    derivative.

    Example:-

    If E , then

    Answer: - The above grammar and table satisfies the LL(1) grammar.

    4. Make the LL (1) parsing table for string id + id *id.

    Now what do we do with this table? This table forms one part in a three part data structure. The other two parts are a stack of grammar symbols (E, E', T, T', F, +, *, (, ), int, and $), and an input stream (the expression we want to parse, already tokenized into lexemes by the scanner). We start our stack with the starting non-terminal E here.

    Stack Input Action

    $E id + id * id $ E TE

    $ET id + id * id $ T FT

    $ETF id + id * id $ F id

    N-T $ ) E E E

  • $ET id id + id * id $ POP id

    $ET + id * id $ T

    $E + id * id $ E + TE

    $ ET + + id * id $ POP +

    $ ET id * id $ T FT

    $ ETF id * id $ F id

    $ ET id id * id $ POP id

    $ ET * id $ T *FT

    $ ETF * * id $ POP *

    $ ETF id $ F id

    $ ETid id $ POP id

    $ ET $ T

    $ E $ E

    $ $ ACCEPTED

    5. Now, make LL (1) table for (id + id) * id.

    Stack Input Action

    $E (id + id) * id $ E TE

    $ET (id + id) * id $ T FT

    $ETF (id + id) * id $ F (E)

    $ET ) E ( (id + id) * id $ POP (

    $ET ) E id + id) * id $ E TE

    $ET ) ET id + id) * id $ T FT

    $ET ) ETF id + id) * id $ F id

    $ET ) ETid id + id) * id $ POP id

    $ET ) ET + id) * id $ T

    $ET ) E + id) * id $ E +TE

    $ET ) ET+ + id) * id $ POP +

    $ET ) ET id) * id $ T FT

    $ET ) ETF id) * id $ F id

    $ET ) ETid id) * id $ POP id

    $ET ) ET ) * id $ T

    $ET ) E ) * id $ E

    $ET ) ) * id $ POP )

    $ET * id $ T *FT

  • $ET F* * id $ POP *

    $ET F id $ F id

    $ET id id $ POP id

    $ET $ T

    $E $ E

    $ $ ACCEPTED

    ***********************************************************

    Bottom-Up Parsing

    SLR/LR (0)/SLR (1)

    Rules:-

    1. Calculate FIRST and FOLLOW for given grammar.

    2. Numbering of productions.

    3. Augmentation of grammar. (Initialization)

    4. Construction of LR (0) item set.

    5. Construction of LR (0) parsing table.

    6. Parsing table entries. (SHIFT, REDUCE, GOTO & ACCEPT)

    7. Declaration of parser by checking conflict in parsing table.

    8. SHIFT or GOTO or GOTO SHIFT graph. (Optional)

    9. Parsing of string or acceptability of any string.

    Question: - Construct SLR parser for given grammar and check the acceptability of

    ccdd.

    S CC

    C c C / d

    Solution:-

    1. Calculate FIRST and FOLLOW:-

    Non Terminal FIRST FOLLOW

    S c,d $

    C c, d $, c, d

  • 2. Numbering of productions:-

    S CC R1

    C c C R2

    C d R3

    3. Augmentation: - The process where we initialize starting production variable by

    any auxiliary variable.

    Example: - Ignition of matchstick before burn the gas stove. So, ignition is an

    augmentation.

    S S

    Scanning Rule:-

    1) Whenever . (Dot) scans any non-terminal then we write all productions of it

    with . (Dot).

    2) Whenever . scans any terminal then we stop for only such possibilities.

    Now:-

    I0 ; S S

    S CC

    C cC

    C d

    Dot Scanning Rules:-

    a. Similar scanning always moves together.

    b. At a time, only one scanning movement is possible.

    c. For non-terminal, we use GOTO operation and for terminal, we use SHIFT

    operation.

    d. Whenever any new collection is found, then, declare a new item set name

    otherwise refer previous name for it.

    Note:-

    If

    S S

    S SC

  • Then after one scanning move.

    S S

    S SC

    Now, move on to the question.

    4. Construction of LR (0) item set.

    I0 ; S GOTO

    S S

    I1

    I0 ; C GOTO

    S cC

    C cC

    C d

    I2

    I0 ; c SHIFT

    C cC

    C c C

    C d

    I3

    I0 ; d SHIFT

    C d

    I4

    I2 ; C GOTO

    S CC

    I5

    I2 ; c SHIFT

    I3

  • I2 ; d SHIFT

    I4

    I3 ; C GOTO

    C c C

    I6

    I3 ; c SHIFT

    I3

    I3 ; d SHIFT

    I4

    Construction of parsing table:-

    a. Arrange all item sets row wise

    b. Arrange all terminals including $ column wise in column of ACTION.

    c. Arrange all non-terminals column wise in column of GOTO.

    Note: - In following table, S stands for SHIFT moves.

    Items ACTION GOTO c d $ S C

    I0 S3 S4 1 2

    I1 ACCEPT

    I2 S3 S4 5

    I3 S3 S4 6

    I4 R3 R3 R3

    I5 R1

    I6 R2 R2 R2

    Declaration:-

    There is no conflict in the table (no dual values in single cell). So, it is an SLR

    parser.

  • Acceptability of String by SLR, LR (1) and LALR:- Rules:-

    1. Draw three columns for STACK, INPUT and ACTION, and do the following:-

    a. Enter the entire input string in input column followed by $.

    b. Initialize stack with $ and initial item set number.

    c. Check top stack with first input and:-

    1) If SHIFT entry is found then PUSH first input in top stack followed by

    shifting number, then repeat step (c) for next input with new top stack.

    2) If reduce entry is found, then we enter reduction production in ACTION

    column. We check derived side of reduction production and we POP

    double values compared with derived side of reduction production. After

    POP operation, we PUSH derivative of reduction production in top stack.

    We check GOTO entry with previous top stack along with new top stack.

    Then, we repeat step (c).

    3) If, we found ACCEPT entry, then only the string will be accepted.

    STACK INPUT ACTION

    $0 ccdd$ S3

    $0c3 cdd$ S3

    $0c3c3 dd$ S4

    $0c3c3d4 d$ R3

    $0c3c3C6 d$ R2

  • ***********************************************************

    LR (1) / CLR / LR 1. Numbering of production.

    2. Augmentation.

    3. Construction of canonical collection of LR (1) item set.

    4. Construction of LR (1) parsing table.

    5. Fill out parsing table entries.

    6. Declaration of parsers by checking conflicts.

    7. Construct graph (SHIFT, GOTO and GOTO SHIFT).

    8. Acceptability of string.

    Look Ahead:-

    1. It is a collection of values used for reduce entry.

    2. $ is default LOOK AHEAD of augmentation variable.

    3. We calculate LOOK AHEADS for each new production in three possible ways.

    Question: - Make LR (1) parser for the following grammar.

    S CC

    C c C / d

    Solution:-

    1) Numbering

    1. S CC R1

    2. C cC R2

    3. C d R3

    2) Augmentation

    S .S

    $0c3C6 d$ R2

    $0C2 d$ S4

    $C2d4 $ R3

    $0C2C5 $ R1

    $051 $ ACCEPTED

  • 3) Construction of canonical collection of LR (1) item set.

    I0; S S $

    S CC $

    C cC cd

    C d cd

    I0; S . GOTO

    S S. $

    I1

    I0 ; C GOTO

    S CC $

    C cC $

    C d $

    I2

    I0 ; c SHIFT

    C cC cd

    C c C cd

    C d cd

    I3

    I0 ; d SHIFT

    C d cd

    I4

    I2 ; C GOTO

    S CC $

    I5

  • I2 ; c SHIFT

    C cC $

    C c C $

    C d $

    I6

    I2 ; d SHIFT

    C d $

    I7

    I3 ; C GOTO

    C c C cd

    I8

    I3 ; c SHIFT

    I3

    I3 ; d SHIFT

    I4

    I6 ; C GOTO

    C c C $

    I9

    I6 ; c SHIFT

    I6

    I6; d SHIFT

    I7

  • 4) Construction of LR (1) parsing table.

    ITEMS ACTION GOTO c d $ S C I0 S3 S4 1 2

    I1 ACCEPTED

    I2 S6 S7 5

    I3 S3 S4 8

    I4 R3 R3

    I5 R1

    I6 S6 S7 9

    I7 R3

    I8 R2 R2

    I9 R2 ***********************************************************

    Question:- Check that the grammar is SLR and LR.

    S L = R / R

    L *R / id

    R L

    Solution:-

    1) Numbering

    1. S L=R R1

    2. S R R2

    3. L * R R3

    4. L id R4

    5. R L R5

    2) Augmentation

    S S

    3) Construction of canonical collection of LR (1) item set. LOOK AHEADS

    I0 ; S S $

    S L=R $

    S R $

  • L *R =$

    L id =$

    R L $

    I0 ; S GOTO

    S S $

    I1

    I0 ; L GOTO

    S L=R $

    R L $

    I2

    I0 ; R GOTO

    S R $

    I3

    I0 ; * SHIFT

    L *R =$

    R L =$

    L *R =$

    L id =$

    I4

    I0 ; id SHIFT

    L id =$

    I5

    I2 ; = SHIFT

    S L=R $

    R L $

    L *R $

    L id $

  • I6

    I4 ; R GOTO

    L *R =$

    I7

    I4 ; L GOTO

    R L =$

    I8

    I4 ; * SHIFT

    L *R =$

    R L =$

    L *R =$

    L id =$ I4

    I4 ; id SHIFT

    I5

    I6 ; R GOTO

    S L=R $

    I9

    I6 ; L GOTO

    R L $

    I10

    I6 ; * SHIFT

    L *R $

    R L $

    L *R $

    L id $

    I11

  • I6 ; id SHIFT

    L id $

    I12

    I11 ; R GOTO

    L *R $

    I13

    I11 ; L GOTO

    I10

    I11 ; * SHIFT I11

    I11 ; id SHIFT

    I12

    4) Construct LR (1) parsing table.

    ITEMS ACTION GOTO = * id $ S L R

    I0 S4 S5 1 2 3

    I1 ACCEPTED

    I2 S6 R5

    I3 R2

    I4 S4 S5 8 7

    I5 R4 R4

    I6 S11 S12 10 9

    I7 R3 R3

    I8 R5 R5

    I9 R1

    I10 R5

    I11 S11 S12 10 13

    I12 R4

    I13 R3 As there is no multiple value in the same cell of the table, the grammar is said to be LR

    (1)

  • LALR (Direct Method) Rules:-

    1. Numbering of productions.

    2. Augmentation.

    3. Construction of LALR item set.

    4. LALR parsing table.

    5. Fill out the table entries.

    6. Declaration of parser after checking conflicts.

    7. Construction of graph (SHIFT, GOTO and GOTO SHIFT).

    8. Acceptability of string.

    Note: - There is also an indirect method. We only have to use the indirect method

    when the question is asking for both LR (1) and then LALR. An example of this is given

    as follows:-

    Example of INDIRECT method

    Question: - Construct the LR (1) and LALR for the following grammar.

    S CC

    C c C / d

    Solution:-

    For LR (1) - See previous method and For LALR

    As we have noted that in LR (1) item sets, item I3 = I6 and item I4 = I7 . So, we have to

    merge these items and make a single item by combining the equal items as

    I3 = I6 = I3, 6

    I4 = I7 = I4, 7

    Now, construction of LALR table.

    ITEMS ACTION GOTO c d $ S C I0 S3, 6 S4, 7 1 2

    I1 ACCEPTED

    I2 S3, 6 S4, 7 5

    I3, 6 S3, 6 S4, 7 8, 9

    I4, 7 R3 R3 R3

    I5 R1

    I8, 9 R2 R2 R2

  • Question:- Construct LR (1) and LALR for the following grammar.

    S L=R/R

    L *R / id

    R L

    Solution:-

    Hint I4 = I11 = I4, 11

    I5 = I12 = I5, 12

    I7 = I13 = I7, 13

    I8 = I10 = I8, 10 ***********************************************************

    Direct Method for LALR Question: - Check that the following grammar is LALR or not.

    S CC

    C cC / d

    Solution:-

    1. Numbering of productions.

    S CC ...1

    C cC ...2

    C d ...3

    2. Augmentation.

    S S

    3. Construction of LALR item set.

    I0; S S $

    S CC $

    C cC cd

    C d cd

    I0 ; S GOTO

    S S $

    I1

  • I0 ; C GOTO

    S CC $

    C cC $

    C d $

    I2

    I0 ; c SHIFT

    C cC $cd

    C c C $cd

    C d $cd

    I3

    I0 ; d SHIFT

    C d $cd

    I4

    I2 ; C GOTO

    S CC $

    I5

    Now merge the LOOK AHEADS of I3, we get the following.

    I2 ; c SHIFT

    I3

    I2 ; d SHIFT

    I4

    I3 ; C GOTO

    C c C $cd

    I6

  • I3 ; c SHIFT

    I3

    I3 ; d SHIFT

    I4

    4. LALR parsing table with entries.

    Is given below-------------- >>>>

    ITEMS ACTION GOTO c d $ S C I0 S3 S4 1 2

    I1 ACCEPTED

    I2 S6 S7 5

    I3 S3 S4 6

    I4 R3 R3 R3

    I5 R1

    I6 R2 R2 R2

    Question: - Check that the following grammar is LALR or not.

    Solution:-

    1. Numbering of productions.

    6. S L=R R1

    7. S R R2

    8. L * R R3

    9. L id R4

    10. R L R5

    2. Augmentation

    S S

  • 3. Construction of canonical collection of LR (1) item set. LOOK AHEADS

    I0 ; S S $

    S L=R $

    S R $

    L *R =$

    L id =$

    R L $

    I0 ; S GOTO

    S S $

    I1

    I0 ; L GOTO

    S L=R $

    R L $

    I2

    I0 ; R GOTO

    S R $

    I3

    I0 ; * SHIFT

    L *R =$

    R L =$

    L *R =$

    L id =$

    I4

    I0 ; id SHIFT

    L id =$

    I5

  • I2 ; = SHIFT

    S L=R $

    R L $

    L *R $

    L id $

    I6

    I4 ; R GOTO

    L *R =$ I7

    I4 ; L GOTO

    R L =$

    I8

    I4 ; * SHIFT

    I4

    I4 ; id SHIFT

    I5

    I6 ; R GOTO

    S L=R $

    I9

    I6 ; L GOTO

    I8

    I6 ; * GOTO

    I4

  • I6 ; id SHIFT

    L id $

    I5

    4. LALR parsing table with entries.

    ITEMS ACTION GOTO id * = $ S L R I0 S5, 12 S4, 11 1 2 3

    I1 ACCEPTED

    I2 S6 R5

    I3 R2

    I4, 11 S5, 12 S4, 11 8, 10 7, 13

    I5, 12 R4 R4

    I6 S5, 12 S4, 11 8, 10 9

    I7, 13 R3 R3

    I8, 12 R5 R5

    I9 R1

    ***********************************************************