1 chapter 3 context-free grammars and parsing. 2 parsing: syntax analysis decides which part of the...

39
1 Chapter 3 Context-Free Grammars and Parsing

Post on 22-Dec-2015

231 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

1

Chapter 3

Context-Free Grammars and Parsing

Page 2: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

2

Parsing: Syntax Analysis

decides which part of the incoming token stream should be grouped together.

the output of parsing is some representation of a parse tree.

intermediate code generator transforms the parse tree into an intermediate language.

Page 3: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

3

Comparisons between r.e. (regular expressions) and c.f.g. (context-free grammars)

r.e. tokens F.A. to test a valid token

c.f.g. programming language constructs P.F.A. to test a valid program (sentence)

describes using

describes

using

Page 4: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

4

Features of programming languages

contents:

- declarations

- sequential statements

- iterative statements

- conditional statements

Page 5: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

5

features:

- declare/state recursively & repeatedly

- hierarchical specification

e.g., compound statement -> statement ->

expression -> id

- nested structures

- similarity

Page 6: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

6

Description of programming languages

Syntax Diagrams (See Sec. 3.5.2) Context Free Grammars (CFG)

Page 7: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together
Page 8: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

exp exp addop term | term

addop + | -

term term mulop factor | factor

mulop

factor ( exp ) | number

Contex Free Grammar (in BNF)

*

Page 9: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

9

History

- In 1956 BNF (Backus Naur Form) is used for

description of natural language. - Algol uses BNF to describe its language.- The Syntactic Specification of Programming

Languages - CFG ( a BNF description)

Page 10: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

10

Capabilities of Context-free grammars

give precise syntactic specification of programming languages

a parser can be constructed automatically by CFG

the syntax entity specified in CFG can be used for translating into object code.

useful for describing nested structures such as balanced parentheses, matching begin-end's, corresponding if-then-else, etc.

Page 11: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

11

Def. of context free grammars

- A CFG is a 4-tuple (V,T,P,S), where V - a finite set of variables (non-terminals) T - a finite set of terminal symbols (tokens) P - a finite set of productions (or grammar rules) S - a start symbol and V T = S V Productions are of the form: A -> , where A V, (V+T)* - CFG generates CFL(Context Free Languages)

Page 12: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

12

An Example

G = ({E}, {+, *, (, ), id}, P, E)

P: { E -> E + E

E -> E * E

E -> ( E )

E -> id }

Page 13: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

13

Rules from F.A.(r.e.) to CFG

1. For each state there is a nonterminal symbol.

2. If state A has a transition to state B on symbol a, introduce A -> aB.

3. If A goes to B on input , introduce A -> B.4. If A is an accepting state, introduce A -> .5. Make the start state of the NFA be the start

symbol of the grammar.

Page 14: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

14

Examples

(1) r.e.: (a|b)(a|b|0|1)* c.f.g.: S -> aA|bA A -> aA|bA|0A|1A|

(2) r.e.: (a|b)*abb c.f.g.: S -> aS | bS | aA A -> bB B -> bC C ->

Page 15: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

15

Why don’t we use c.f.g. to replace r.e. ?

r.e. => easy & clear description for token. r.e. => efficient token recognizer modularizing the components

Page 16: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

16

Derivations (How does a CFG defines a

language?)

Definitions: directly derive derive in zero or more steps derive in one or more steps derive in i steps A => (V+T)* sentential form (V+T)* sentence T* language: { w | S => w , w T* } leftmost derivations rightmost derivations

+

=> (V+T)* *

=> (V+T)* +

i

+

Page 17: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

(number-number)*number

P : { exp exp op exp | ( exp ) | number op + | - | * }

G = ( {exp, op}, {+, *, (, ), number}, P, exp )

Page 18: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

18

Page 19: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

19

Parse trees

=> a graphical representation for derivations.

(Note the difference between parse tree and syntax tree.)

=> Often the parse tree is produced in only a figurative sense; in reality, the parse tree exists only as a sequence of actions made by stepping through the tree construction process.

Page 20: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

20

Ambiguity

Ambiguous Grammars

- Def.: A context-free grammar that can produce more than one parse tree for some sentence.

- The ways to disambiguate a grammar: (1) specifying the intention (e.g. associtivity and precedence for arithmetic operators, other) (2) rewrite a grammar to incorporate the intention into the grammar itself.

Page 21: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

For (1) Precedence: negate > exponent ( ) > * / > + - Associtivity: exponent ==> right associtivity others ==> left associtivity   In yacc, a “specification rule” is used to solve the problem

of (1), e.g., the alignment order, the special syntax, default value (refer to yacc manual for the disambiguating rules)

 

For (2) 1. introducing one nonterminal for each precedence level.

Page 22: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

22

Example 1

E -> E + E | E- E | E * E | E / E | E E | ( E )

| - E | id

is ambiguous ( is exponent operator with right associtivity.)

Page 23: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

E

E + E

E * Eid

id id

E

E * E

E + E

id id

id

More than one parse tree for the sentence id + id * id

Page 24: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

+

*id

id id

*

+

id id

id

More than one syntax tree for the sentence id + id * id

Page 25: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

25

The corresponding grammar shown below is unambiguous

element - > (expression) | id /*((expression) 括號內的最

優先做之故 ) */

primary - > -primary | element

factor - > primary factor | primary /*has right

associtivity */

term - > term * factor | term / factor | factor

expression -> expression + term | expression –

term | term

Page 26: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

expression

expression + term

term * factor

primaryelement

id

term

factor

primary

element

id

factor

primary

element

id

Ex: id + id * id

Page 27: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together
Page 28: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together
Page 29: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

29

Example 2

stat -> IF cond THEN stat

| IF cond THEN stat ELSE stat

| other stat

is an ambiguous grammar

Page 30: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

if c1 then

stat

IF cond THEN stat

IF cond THEN stat ELSE stat

if c2 then s2 else s3

stat

IF cond THEN stat ELSE stat

IF cond THEN stat if c1 then

if c2 then

else

s2

s3

If c1 then if c2 then s2 else s3

Dangling else problem

Page 31: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

31

The corresponding grammar shown below is

unambiguous.

  stat - > matched-stat | unmatched-stat

matched-stat - > if cond then matched-stat else matched-

stat | other-stat

unmatched-stat - > if cond then stat | if cond then

matched-stat else unmatched-stat

Page 32: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together
Page 33: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

33

Non-context free language constructs

L = {wcw | w is in (a|b)*} L = {anbmcndm | n 1 and m 1} L = {anbncn| n 0}

Page 34: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

34

Basic Parsing Techniques

1. How to check if an input string is a sentence of a given grammar?

(check the syntax -- not only used in the programming language)

 

2. How to construct a parse tree for the input string, if desired?

Page 35: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

35

Method classic approach modern approach

1. top-down recursive descent LL parsing

(produce leftmost derivation)

2. bottom-up operator precedence LR parsing (shift-

reduce parsing; produce rightmost derivation in reverse order)

Page 36: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

36

An Example (for LR Parsing)

S -> aABe A -> Abc | b B -> d w = abbcde S => aABe => aAde => aAbcde => abbcde

LR parsing: abbcde ==> aAbcde ==> aAde ==> aABe ==> S 

rm rmrm rm

Page 37: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

37

Page 38: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

38

Page 39: 1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together

39

Assignment #3a

1. Do exercises 3.3, 3.5, 3.24, 3.25

Using the grammar in BNF of the TINY

language in Fig. 3.6 to derive step by step the sequence of tokens of the program in Fig. 3.8. (for practice only)