cd2 [autosaved]

62
DEFINITION OF PARSING A parser is a compiler or interpreter component that breaks data into smaller elements for easy translation into another language. A parsertakes input in the form of a sequence of tokens or program instructions and usually builds a data structure in the form of a parse tree or an abstract syntax tree.

Upload: ankur-srivastava

Post on 13-Apr-2017

71 views

Category:

Engineering


0 download

TRANSCRIPT

Page 1: Cd2 [autosaved]

DEFINITION OF PARSING

A parser is a compiler or interpreter component that breaks data into smaller elements for easy translation into another language.

A parsertakes input in the form of a sequence of tokens or program instructions and usually builds a data structure in the form of a parse tree or an abstract syntax tree.

Page 2: Cd2 [autosaved]

ROLE OF PARSER

Page 3: Cd2 [autosaved]

In the compiler model, the parser obtains a string of tokens from the lexical analyser,

and verifies that the string can be generated by the grammar for the source language.

The parser returns any syntax error for the source language. It collects sufficient number of tokens anf builds a parse tree

Page 4: Cd2 [autosaved]
Page 5: Cd2 [autosaved]

There are basically two types of parser:

Top-down parser: starts at the root of derivation tree and fills in picks a production and tries to match the input may require backtracking some grammars are backtrack-free (predictive)

Bottom-up parser: starts at the leaves and fills in starts in a state valid for legal first tokens uses a stack to store both state and sentential forms

Page 6: Cd2 [autosaved]

TOP DOWN PARSING A top-down parser starts with the root of the parse tree, labeled with the start

or goal symbol of the grammar.

To build a parse, it repeats the following steps until the fringe of the parse tree matches the input string STEP1: At a node labeled A, select a production A α and construct the

appropriate child for each symbol of α STEP2: When a terminal is added to the fringe that doesn’t match the input string,

backtrack STEP3: Find the next node to be expanded.

The key is selecting the right production in step 1

Page 7: Cd2 [autosaved]

EXAMPLE FOR TOP DOWN PARSING

Supppose the given production rules are as follows: S-> aAd|aB A-> b|c B->ccd

Page 8: Cd2 [autosaved]

PROBLEMS WITH TOPDOWN PARSING

1) BACKTRACKING Backtracing is a technique in which for expansion of non-terminal symbol we choose one alternative and if some mismatch occurs then we try another alternative if any.If for a non-terminal there are multiple production rules beginning with the same input symbol then to get the correct derivation we need to try all these alternatives.

Page 9: Cd2 [autosaved]

EXAMPLE OF BACKTRACKING

Suppose the given production rules are as follows: S->cAd A->a|ab

Page 10: Cd2 [autosaved]

2) LEFT RECURSION Left recursion is a case when the left-most non-terminal in a production of a

non-terminal is the non-terminal itself( direct left recursion ) or through some other non-terminal definitions, rewrites to the non-terminal again(indirect left recursion). Consider these examples -

(1) A -> Aq (direct) (2) A -> Bq

B -> Ar (indirect) Left recursion has to be removed if the parser performs top-down parsing

Page 11: Cd2 [autosaved]

REMOVING LEFT RECURSION To eliminate left recursion we need to modify the grammar. Let, G be a

grammar having a production rule with left recursion A-> Aa A->B Thus, we eliminate left recursion by rewriting the production rule as: A->BA’ A’->aA’ A’->c

Page 12: Cd2 [autosaved]

3) LEFT FACTORINGLeft factoring is removing the common left factor that appears in two productions of the same non-terminal. It is done to avoid back-tracing by the parser. Suppose the parser has a look-ahead ,consider this example-A -> qB | qCwhere A,B,C are non-terminals and q is a sentence. In this case, the parser will be confused as to which of the two productions to choose and it might have to back-trace. After left factoring, the grammar is converted to-A -> qDD -> B | C

Page 13: Cd2 [autosaved]

RECURSIVE DESCENT PARSING

A recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures (or a non-recursive equivalent) where each such procedure usually implements one of the productions of the grammar.

Page 14: Cd2 [autosaved]

EXAMPLE OF RECURSIVE DESCENT PARSING Suppose the grammar given is as follows:

E->iE’ E’->+iE’

Program: E()

{if(l==‘i’){match(‘i’);E’();}} l=getchar();

Page 15: Cd2 [autosaved]

E’(){

if(l==‘+”){

match(‘+’);match(‘i’);E’();

}else

return ;}Match(char t){

if(l==t)l=getchar();

elseprintf(“Error”);

}

Page 16: Cd2 [autosaved]

Main() {

E();If(l==‘$’){

printf(“parsing successful”);}}

Page 17: Cd2 [autosaved]

PREDICTIVE LL(1) PARSING

The first “L” in LL(1) refers to the fact that the input is processed from left to right.

The second “L” refers to the fact that LL(1) parsing determines a leftmost derivation for the input string.

The “1” in parentheses implies that LL(1) parsing uses only one symbol of input to predict the next grammar rule that should be used.

The data structures used by LL(1) are 1. Input buffer 2. Stack 3. Parsing table

Page 18: Cd2 [autosaved]

The construction of predictive LL(1) parser is bsed on two very important functions and those are First and Follow.

For construction of predictive LL(1) parser we have to follow the following steps:STEP1: computate FIRST and FOLLOW function.STEP2: construct predictive parsing table using first

and follow function.STEP3: parse the input string with the help of

predictive parsing table

Page 19: Cd2 [autosaved]

FIRST If X is a terminal then First(X) is just X! If there is a Production X → ε then add ε to first(X) If there is a Production X → Y1Y2..Yk then add

first(Y1Y2..Yk) to first(X) First(Y1Y2..Yk) is either

First(Y1) (if First(Y1) doesn't contain ε)OR (if First(Y1) does contain ε) then First (Y1Y2..Yk) is

everything in First(Y1) <except for ε > as well as everything in First(Y2..Yk)

If First(Y1) First(Y2)..First(Yk) all contain ε then add ε to First(Y1Y2..Yk) as well.

Page 20: Cd2 [autosaved]

FOLLOWFirst put $ (the end of input marker) in Follow(S)

(S is the start symbol) If there is a production A → aBb, (where a can be

a whole string) then everything in FIRST(b) except for ε is placed in FOLLOW(B).

If there is a production A → aB, then everything in FOLLOW(A) is in FOLLOW(B)

If there is a production A → aBb, where FIRST(b) contains ε, then everything in FOLLOW(A) is in FOLLOW(B)

Page 21: Cd2 [autosaved]

EXAMPLE OF FIRST AND FOLLOW

The GrammarE → TE'E' → +TE'E' → εT → FT'T' → *FT'T' → εF → (E)F → id

Page 22: Cd2 [autosaved]

PROPERTIES OF LL(1) GRAMMARS

1. No left-recursive grammar is LL(1) 2. No ambiguous grammar is LL(1) 3. Some languages have no LL(1) grammar 4. A ε–free grammar where each alternative expansion for A begins

with a distinct terminal is a simple LL(1) grammar.

Example:S aS a

is not LL(1) because FIRST(aS) = FIRST(a) = { a } S aS´

S´ aS εaccepts the same language and is LL(1)

Page 23: Cd2 [autosaved]

PREDICTIVE PARSING TABLE

Method:1. production A α:

a) a FIRST(α), add A α to M[A,a]b) If ε FIRST(α):

I. b FOLLOW(A), add A α to M[A,b]II. If $ FOLLOW(A), add A α to M[A,$]

2. Set each undefined entry of M to error

If M[A,a] with multiple entries then G is not LL(1).

Page 24: Cd2 [autosaved]

EXAMPLE OF PREDICTIVE PARSING LL(1) TABLE

The given grammar is as followsS EE TE´E´ +E —E εT FT´T´ * T / T εF num id

Page 25: Cd2 [autosaved]

BOTTOM UP PARSING Bottom-up parsing starts from the leaf nodes of a tree

and works in upward direction till it reaches the root node.

we start from a sentence and then apply production rules in reverse manner in order to reach the start symbol.

Here, parser tries to identify R.H.S of production rule and replace it by corresponding L.H.S. This activity is known as reduction.

Also known as LR parser, where L means tokens are read from left to right and R means that it constructs rightmost derivative.

Page 26: Cd2 [autosaved]

EXAMPLE OF BOTTOM-UP PARSER E → T + E | T T → int * T | int | (E) Consider the string: int * int + int

int * int + int T → intint * T + int T → int * TT + int T → intT + T E → TT + T E → TE

Page 27: Cd2 [autosaved]

SHIFT REDUCE PARSING

Bottom-up parsing uses two kinds of actions: 1.Shift 2.Reduce Shift: Move | one place to the right , Shifts a terminal to the left

string ABC|xyz ⇒ ABCx|yz Reduce: Apply an inverse production at the right end of the left

string If A → xy is a production, then Cbxy|ijk ⇒ CbA|ijk

Page 28: Cd2 [autosaved]

EXAMPLE OF SHIFT REDUCE PARSING

|int * int + int shift int | * int + int shift int * | int + int shift int * int | + int reduce T → int int * T | + int reduce T → int * T T | + int shift T + | int shift T + int | reduce T → int T + T | reduce E → T T + E | reduce E → T + E E |

Page 29: Cd2 [autosaved]

OPERATOR PRECEDENCE PARSING

Operator grammars have the property that no production right side

is empty or has two adjacent nonterminals. This property enables the implementation of efficient operator-

precedence parsers. These parser rely on the following three precedence relations:

Relation Meaninga <· b a yields precedence to ba =· b a has the same precedence as ba ·> b a takes precedence over b

Page 30: Cd2 [autosaved]

These operator precedence relations allow to delimit the handles in the right sentential forms: <· marks the left end, =· appears in

the interior of the handle, and ·> marks the right end. . Suppose that $ is the end of the string, Then for all terminals we can write:

$ <· b and b ·> $ If we remove all nonterminals and place the correct precedence relation:<·,

=·, ·> between the remaining terminals, there remain strings that can be analyzed by easily developed parser.

Page 31: Cd2 [autosaved]

EXAMPLE OF OPERATOR PRECEDENCE PARSING

  id + * $id   ·> ·> ·>+ <· ·> <· ·>* <· ·> ·> ·>$ <· <· <· ·>

For example, the following operator precedence relations can

be introduced for simple expressions:

Example: The input string: id1 + id2 * id3

after inserting precedence relations becomes$ <· id1 ·> + <· id2 ·> * <· id3 ·> $

Page 32: Cd2 [autosaved]

UNIT-IIISyntax Directed Translations

We can associate information with a language construct by attaching attributes to the grammar symbols.

A syntax directed definition specifies the values of attributes by associating semantic rules with the grammar productions.Production Semantic Rule

E->E1+T E.code=E1.code||T.code||’+’

• We may alternatively insert the semantic actions inside the grammarE -> E1+T {print ‘+’}

Page 33: Cd2 [autosaved]

Syntax Directed Definitions1. We associate information with the programming language

constructs by attaching attributes to grammar symbols.

2. Values of these attributes are evaluated by the semantic rules associated with the production rules.

3. Evaluation of these semantic rules: may generate intermediate codes may put information into the symbol table may perform type checking, may issue error messages may perform some other activities in fact, they may perform almost any activities.

4. An attribute may hold almost any thing. a string, a number, a memory location, a complex record.

Page 34: Cd2 [autosaved]

Syntax-Directed Definitions and Translation Schemes

1. When we associate semantic rules with productions, we use two notations: Syntax-Directed Definitions Translation Schemes

A. Syntax-Directed Definitions: give high-level specifications for translations hide many implementation details such as order of evaluation of

semantic actions. We associate a production rule with a set of semantic actions, and

we do not say when they will be evaluated.

B. Translation Schemes: indicate the order of evaluation of semantic actions associated with

a production rule. In other words, translation schemes give a little bit information

about implementation details.

Page 35: Cd2 [autosaved]

Syntax-Directed Translation Conceptually with both the syntax directed translation and

translation scheme we Parse the input token stream Build the parse tree Traverse the tree to evaluate the semantic rules at the parse tree

nodes.

Input string parse tree dependency graph evaluation

order for semantic rules

Conceptual view of syntax directed translation

Page 36: Cd2 [autosaved]

Syntax-Directed Definitions1. A syntax-directed definition is a generalization of a context-free grammar

in which: Each grammar symbol is associated with a set of attributes. This set of attributes for a grammar symbol is partitioned into two

subsets called synthesized and inherited attributes of that grammar symbol.

Each production rule is associated with a set of semantic rules.

2. The value of an attribute at a parse tree node is defined by the semantic rule associated with a production at that node.

3. The value of a synthesized attribute at a node is computed from the values of attributes at the children in that node of the parse tree.

4. The value of an inherited attribute at a node is computed from the values of attributes at the siblings and parent of that node of the parse tree.

Page 37: Cd2 [autosaved]

Syntax-Directed DefinitionsExamples:Synthesized attribute : E→E1+E2 { E.val =E1.val + E2.val}Inherited attribute :A→XYZ {Y.val = 2 * A.val}

1. Semantic rules set up dependencies between attributes which can be represented by a dependency graph.

2. This dependency graph determines the evaluation order of these semantic rules.

3. Evaluation of a semantic rule defines the value of an attribute. But a semantic rule may also have some side effects such as printing a value.

Page 38: Cd2 [autosaved]

Syntax Trees

Syntax-Tree an intermediate representation of the compiler’s input. A condensed form of the parse tree. Syntax tree shows the syntactic structure of the program

while omitting irrelevant details. Operators and keywords are associated with the interior

nodes. Chains of simple productions are collapsed.

Syntax directed translation can be based on syntax tree as well as parse tree.

Page 39: Cd2 [autosaved]

Syntax Tree-ExamplesExpression:

+

5 *

3 4 Leaves: identifiers or constants Internal nodes: labelled with

operations Children: of a node are its operands

if B then S1 else S2 if - then - else

Statement:Node’s label indicates what kind of a statement it isChildren of a node correspond to the components of the statement

B S1 S2

Page 40: Cd2 [autosaved]

Intermediate representation and code generation

Two possibilities:

1. ..... semanticroutines

codegeneration

Machine code

(+) no extra pass for code generation(+) allows simple 1-pass compilation

2. semantic routines

code generation Machine code

IR

(+) allows higher-level operations e.g. open block, call procedures.(+) better optimization because IR is at a higher level.(+) machine dependence is isolated in code generation.

.....

Page 41: Cd2 [autosaved]

Intermediate representation and code generation

IR good for optimization and portability

Machine Code simple

Page 42: Cd2 [autosaved]

Intermediate code 1. postfix form

Examplea+b ab+(a+b)*c ab+c*a+b*c abc*+a:=b*c+b*d abc*bd*+:=

(+) simple and concise (+) good for driving an interpreter (- ) Not good for optimization or code generation

Page 43: Cd2 [autosaved]

INTERMEDIATE CODE

2. 3-addr code Triple

op arg1 arg2 Quadruple

op arg1 arg2 arg3 Triple: more concise

But what if instructions are deleted, Moved or added during optimization?

Triples and quadruples are more similar to machine code.

Page 44: Cd2 [autosaved]

More detailed 3-addr code Add type information

Example a := b*c + b*d Suppose b,c are integer type, d is float type.

(1) ( I* b c ) (I* b c t1)(2) (FLOAT b _ ) (FLOAT b t2 _)(3) ( F* (2) d ) (F* t2 d t3)(4) (FLOAT (1) _ ) (FLOAT t1 t4 _)(5) ( *f+ (4) (3)) ( F+ t4 t3 t5)(6) ( := (5) a ) ( := t5 a _)

Page 45: Cd2 [autosaved]

PARSE TREESParsing:

build the parse treeNon-terminals for operator precedence

and associatively are included.

parse tree

<target> := <exp>

id<exp> + <term>

<term>

<term> <factoor>

<factor>

Const

id

<factor>

id

Page 46: Cd2 [autosaved]

PARSE TREE

Lexical Analyzer Parser

Sourceprogram

token

getNextToken

Symboltable

Parse tree Rest of Front End

Intermediaterepresentation

Page 47: Cd2 [autosaved]

BOOLEAN EXPRESSIONSControl flow translation of boolean

expressions: Basic idea: generate the jumping code without evaluating the whole

boolean expression. Example:

Let E = a < b, we will generate the code as (1) If a < b then goto E.true(2) Goto T.false

Grammar: E->E or E | E and E | not E | (E) | id relop id |

true | false.

Page 48: Cd2 [autosaved]

E -> E1 or E2 { E1.true = E.true; E1.false = newlabel; E2.true = E.true; E2.false = E.false;

E.code = E1.code || gen(E1.false ‘:’) || E2.code}E->E1 and E2 {E1.true = newlabel; E1.false = E.false; E2.true = E.true; E2.false = E.false; E.code = E1.code || gen(E1.true ‘:’) || E2.code}E->not E {E1.true = E.false; E1.false = E.true; E.code =

E1.code}E->(E1) {E1.true = E.true; E1.false = E.false; E.code =

E1.code;}E->id1 relop id2 {E.code = gen(‘if’ id1.place relop.op id2.place

‘goto’ E.true); gen (‘goto’ E.false);}E->true {gen(‘goto’ E.true);}E->false{gen(‘goto’ E.false);}

Page 49: Cd2 [autosaved]

Example: a < b or (c < d and e < f)

Example: while a< b do if c < d then x := y + z; else x: = y – z;

Page 50: Cd2 [autosaved]

Three address code

In a three address code there is at most one operator at the right side of an instruction

Example:

+

+ *

-

b c

a

d

t1 = b – ct2 = a * t1t3 = a + t2t4 = t1 * dt5 = t3 + t4

*

Page 51: Cd2 [autosaved]

Forms of three address instructions x = y op z x = op y x = y goto L if x goto L and ifFalse x goto L if x relop y goto L Procedure calls using:

param x call p,n y = call p,n

x = y[i] and x[i] = y x = &y and x = *y and *x =y

Page 52: Cd2 [autosaved]

Example

L: t1 = i + 1i = t1t2 = i * 8t3 = a[t2]if t3 < v goto L

Symbolic labels

100: t1 = i + 1101: i = t1102: t2 = i * 8103: t3 = a[t2]104: if t3 < v goto 100

Position numbers

do i = i+1; while (a[i] < v);

Page 53: Cd2 [autosaved]

Data structures for three address codes

QuadruplesHas four fields: op, arg1, arg2 and result

TriplesTemporaries are not used and instead references

to instructions are made Indirect triples

In addition to triples we use a list of pointers to triples

Page 54: Cd2 [autosaved]

Example b * minus c + b * minus c

t1 = minus ct2 = b * t1t3 = minus ct4 = b * t3t5 = t2 + t4a = t5

Three address code

Quadruples Triples Indirect Triples

Op Arg1 Arg2 result

Minus c T1

* b T1 T2

Minus c T3

* b T3 T4

+ t2 t4 T5

= t5 a

Op Arg1 arg2Minus c* b (0)Minus c* b (2)+ (1) (3)

a (4)

012345

(0)(1)(2)(3)(4)(5)

353637383940

Op Arg1 arg2Minus c* b (0)Minus c* b (2)+ (1) (3)

a (4)

012345

Page 55: Cd2 [autosaved]

ASSIGNMENT STATEMENTS The assignment statement mainly deals with the expressions. The

expressions Can be of type integer, real, array and record.Consider the following grammar:S-> id : =EE-> E1+ E2E-> E1* E2E-> -E1E-> (E1)E-> id

Page 56: Cd2 [autosaved]

The translation scheme of above grammar is given below:

Production Rule

Semantic actions

S-> id : =E { p=look_up(id.name);If p≠ nil thenEmit(p= E.place)ElseError; }

E-> E1+ E2 { E.place= newtemp();Emit (E.place= E1.place ‘+’ E2.place) }

E-> E1* E2 { E.place= newtemp();Emit (E.place= E1.place ‘*’ E2.place) }

E-> -E1 { E.place= newtemp();Emit (E.place = ‘uminus’E1.place) }

E-> (E1) {E.place=E1.place}E-> id { p= look_up(id.name);

If p≠ nil thenEmit (p = E.place)ElseError; }

Page 57: Cd2 [autosaved]

Boolean Functions:

Page 58: Cd2 [autosaved]

Control loop

Fig. A flow chart showing control flow.

Page 59: Cd2 [autosaved]

Control flow (or alternatively, flow of control) is the order in which individual statements.

Instructions or function calls of an imperative program are executed or evaluated.

It is a statement whose execution results in a choice being made as to which of two or more paths should be followed.

A set of statements is in turn generally structured as a block, which in addition to grouping also defines a lexical scope.

Page 60: Cd2 [autosaved]

Postfix notation

The postfix notation for an expression E can be defined:-1. If E is a variable or constant, then the postfix notation for E is E

itself.

2. If E is an expression of the form E1 op E2, where op is any binary operator.

3. If E is a parenthesized expression of the form (E1), then the postfix notation for E is the same as the postfix notation for E1.

Page 61: Cd2 [autosaved]

UNIT-IV SYMBOL TABLE Symbol table: A data structure used by a compiler to keep

track of semantics of variables. Data type. When is used: scope.

The effective context where a name is valid. Where it is stored: storage address. Possible implementations: Unordered list: for a very small set of variables. Ordered linear list: insertion is expensive, but implementation is relatively easy

Page 62: Cd2 [autosaved]

Data structure for symbol tables Possible entries in a symbol table: Name: a string. Attribute: Reserved word Variable name Type name Procedure name Constant name Data type. Scope information: where it can be used. Storage allocation, size