2.2. context-free syntax analysis

19
Compilers and Language Processing Tools Summer Term 2011 Prof. Dr. Arnd Poetzsch-Heffter Software Technology Group TU Kaiserslautern c Prof. Dr. Arnd Poetzsch-Heffter 1 Content of Lecture 1. Introduction 2. Syntax and Type Analysis 2.1 Lexical Analysis 2.2 Context-Free Syntax Analysis 2.3 Context-Dependent Syntax Analysis 3. Translation to Target Language 3.1 Translation of Imperative Language Constructs 3.2 Translation of Object-Oriented Language Constructs 4. Selected Aspects of Compilers 4.1 Intermediate Languages 4.2 Optimization 4.3 Data Flow Analysis 4.4 Register Allocation 4.5 Code Generation 5. Garbage Collection 6. XML Processing (DOM, SAX, XSLT) c Prof. Dr. Arnd Poetzsch-Heffter 2 Context-Free Syntax Analysis 2.2. Context-Free Syntax Analysis c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 3 Context-Free Syntax Analysis Introduction Section outline 1. Specification of parsers 2. Implementation of parsers 2.1 Top-down syntax analysis - Recursive descent - LL(k) parsing theory - LL parser generation 2.2 Bottom-up syntax analysis - Principles of LR parsing - LR parsing theory - SLR, LALR, LR(k) parsing - LALR parser generation 3. Error handling 4. Concrete and abstract syntax c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 4 Context-Free Syntax Analysis Introduction Task of context-free syntax analysis Check if token stream (from scanner) matches context-free syntax of language if erroneous: error handling if correct: construct syntax tree Parser Token Stream Abstract / Concrete Syntax Tree c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 5 Context-Free Syntax Analysis Introduction Task of context-free syntax analysis (2) Remarks: Parsing can be interleaved with other actions processing the program (e.g. attributation). Syntax tree controls translation. We distinguish Concrete syntax tree corresponding to context-free grammar Abstract syntax tree providing a more compact representation tailored to subsequent phases c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 6 Context-Free Syntax Analysis Specification of Parsers 2.2.1 Specification of Parsers c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 7 Context-Free Syntax Analysis Specification of Parsers Specification of parsers 2 general specification techniques Syntax diagrams Context-free grammars (often in extended form) c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 8

Upload: doandung

Post on 09-Dec-2016

240 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 2.2. Context-Free Syntax Analysis

Compilers and Language Processing ToolsSummer Term 2011

Prof. Dr. Arnd Poetzsch-Heffter

Software Technology GroupTU Kaiserslautern

c© Prof. Dr. Arnd Poetzsch-Heffter 1

Content of Lecture

1. Introduction2. Syntax and Type Analysis

2.1 Lexical Analysis2.2 Context-Free Syntax Analysis2.3 Context-Dependent Syntax Analysis

3. Translation to Target Language3.1 Translation of Imperative Language Constructs3.2 Translation of Object-Oriented Language Constructs

4. Selected Aspects of Compilers4.1 Intermediate Languages4.2 Optimization4.3 Data Flow Analysis4.4 Register Allocation4.5 Code Generation

5. Garbage Collection6. XML Processing (DOM, SAX, XSLT)

c© Prof. Dr. Arnd Poetzsch-Heffter 2

Context-Free Syntax Analysis

2.2. Context-Free Syntax Analysis

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 3

Context-Free Syntax Analysis Introduction

Section outline

1. Specification of parsers2. Implementation of parsers

2.1 Top-down syntax analysis- Recursive descent- LL(k) parsing theory- LL parser generation

2.2 Bottom-up syntax analysis- Principles of LR parsing- LR parsing theory- SLR, LALR, LR(k) parsing- LALR parser generation

3. Error handling4. Concrete and abstract syntax

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 4

Context-Free Syntax Analysis Introduction

Task of context-free syntax analysis

• Check if token stream (from scanner) matches context-free syntaxof language

I if erroneous: error handlingI if correct: construct syntax tree

Parser

Token Stream

Abstract / Concrete Syntax Tree

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 5

Context-Free Syntax Analysis Introduction

Task of context-free syntax analysis (2)

Remarks:• Parsing can be interleaved with other actions processing the

program (e.g. attributation).• Syntax tree controls translation. We distinguish

I Concrete syntax tree corresponding to context-free grammarI Abstract syntax tree providing a more compact representation

tailored to subsequent phases

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 6

Context-Free Syntax Analysis Specification of Parsers

2.2.1 Specification of Parsers

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 7

Context-Free Syntax Analysis Specification of Parsers

Specification of parsers

2 general specification techniques• Syntax diagrams• Context-free grammars (often in extended form)

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 8

Page 2: 2.2. Context-Free Syntax Analysis

Context-Free Syntax Analysis Specification of Parsers

Context-Free Grammars

DefinitionLet• N and T be two alphabets with N ∩ T = ∅• Π a finite subset of N × (N ∪ T )∗

• S ∈ NThen, Γ = (N,T ,Π,S) is a context-free grammar (CFG) where• N is the set of nonterminals• T is the set of terminals• Π is the set of productions rules• S is the start symbol (axiom)

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 9

Context-Free Syntax Analysis Specification of Parsers

Context-Free Grammars (2)

Notations:• A,B,C, . . . denote nonterminals• a,b, c, . . . denote terminals• x , y , z, . . . denote strings of terminals, i.e. x ∈ T ∗

• α, β, γ, ψ, φ, σ, τ are strings of terminals and nonterminals, i.e.α ∈ (N ∪ T )∗

Productions are denoted by A→ α.

The notation A→ α | β | γ | . . . is an abbreviation forA→ α, A→ β, A→ γ, . . .

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 10

Context-Free Syntax Analysis Specification of Parsers

Derivation

Let Γ = (N,T ,Π,S) be a CFG:• ψ is directly derivable from φ in Γ and φ directly produces ψ,

written as φ⇒ ψ, if there are σ, τ withσAτ = φ and σατ = ψ and A→ α ∈ Π

• ψ is derivable from φ in Γ, written as φ⇒∗ ψ, if there existφ0, . . . , φn with φ = φ0 and ψ = φn and φi ⇒ φi+1 for alli ∈ {0, . . . ,n − 1}.

• φ0, . . . , φn is called a derivation of ψ from φ.

• ⇒∗ is the reflexive, transitive closure of⇒.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 11

Context-Free Syntax Analysis Specification of Parsers

Derivation (2)

• A derivation φ0, . . . , φn is a leftmost derivation (rightmost) if inevery derivation step φi ⇒ φi+1 the leftmost (rightmost)nonterminal in φi is replaced.

• Leftmost and rightmost derivation steps are denoted by φ⇒lm ψand φ⇒rm ψ resp.

• The tree representation of a derivation is a syntax tree.

• L(Γ) = {z ∈ T ∗ |S ⇒∗ z} is the language generated by Γ.

• x ∈ L(Γ) is a sentence of Γ (germ. Satz).

• φ ∈ (N ∪ T )∗ with S ⇒∗ φ is a sentential form of Γ (germ.Satzform).

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 12

Context-Free Syntax Analysis Specification of Parsers

Derivation (3)

Remarks:• Each derivation corresponds to exactly one syntax tree. In

reverse, for each syntax tree, there can be several derivations.• For “syntax tree”, the term “derivation tree” is also used.• For each language, there can be several generating grammars,

i.e., the mapping L: Grammar→ Language is in general notinjective.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 13

Context-Free Syntax Analysis Specification of Parsers

Ambiguity in Grammars

• A sentence is unambiguous if it has exactly one syntax tree. Asentence is ambiguous if it has more than one syntax tree.

• For each syntax tree, there exists exactly on leftmost derivationand exactly one rightmost derivation.

• Thus: A sentence is unambiguous iff it has exactly one leftmost(rightmost) derivation.

• A grammar is ambiguous if it contains an ambiguous sentence.• For programming languages, unambiguous grammars are

essential, as the semantics and the translation are defined by thesyntactic structure.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 14

Context-Free Syntax Analysis Specification of Parsers

Ambiguity in Grammars (2)

Example 1: Grammar Γ0 for expressions:

• S → E• E → E + E• E → E ∗ E• E → (E)

• E → ID

Consider the input string

(av + av) ∗ bv + cv + dv

resulting in the following input for the context-free analysis

(ID + ID) ∗ ID + ID + ID

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 15

Context-Free Syntax Analysis Specification of Parsers

Ambiguity in Grammars (3)Syntax tree for (ID + ID) ∗ ID + ID + ID

54© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007

Beispiele: (Mehrdeutigkeit)

1. Beispiel einer Ausdrucksgrammatik:

!0: S E, E E + E, E E * E, E ( E ), E ID

Betrachte die Eingabe: (av+av) * bv + cv +dv)

Eingabe zur kf-Analyse: ( ID + ID ) * ID + ID + ID

S

"

" "

" "

E E E E E

( ID + ID ) * ID + ID + ID

- Syntaxbaum entspricht nicht den üblichen

Rechenregeln.

- Es gibt mehrere Syntaxbäume gemäß !0,

insbesondere ist die Grammatik mehrdeutig.

• Syntax tree does not match conventional rules of arithmetic.• There are several syntax trees according to Γ0 for this input,

hence Γ0 is ambiguous.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 16

Page 3: 2.2. Context-Free Syntax Analysis

Context-Free Syntax Analysis Specification of Parsers

Ambiguity in Grammars (4)Example 2: Ambiguity in if-then-else construct

if B1 then if B2 then A:= 9 else A:= 7

First Derivation

55© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007

2. Mehrdeutigkeit beim if-then-else-Konstrukt:

if B1 then if B2 then A:=8 else A:= 7

IFTHENELSE

ANW

IFTHEN

ANW ANW

ZW ZW

IF ID THEN IF ID THEN ID EQ CO ELSE ID EQ CO

ZW ZW

ANW ANW

IFTHENELSE

ANW

IFTHEN

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 17

Context-Free Syntax Analysis Specification of Parsers

Ambiguity in Grammars (5)

Second Derivation

55© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007

2. Mehrdeutigkeit beim if-then-else-Konstrukt:

if B1 then if B2 then A:=8 else A:= 7

IFTHENELSE

ANW

IFTHEN

ANW ANW

ZW ZW

IF ID THEN IF ID THEN ID EQ CO ELSE ID EQ CO

ZW ZW

ANW ANW

IFTHENELSE

ANW

IFTHEN

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 18

Context-Free Syntax Analysis Specification of Parsers

Ambiguity as Grammar Property

Ambiguity is a grammar property. The grammar for expressions Γ0 isan example of an ambiguous grammar.

Γ0:• S → E• E → E + E• E → E ∗ E• E → (E)

• E → ID

57© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007

Beispiel: (Mehrdeutigkeit als Grammatikeig.)

Die obige Ausdrucksgrammatik

!0: S E, E E + E | E * E | E ( E ) | E ID

ist ein Beispiel für eine mehrdeutige Grammatik:

S

E

E E

E E

ID + ID * ID

E E

E E

E

S

Mehrdeutigkeit ist zunächst einmal eine Grammatik-

eigenschaft.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 19

Context-Free Syntax Analysis Specification of Parsers

Ambiguity as Grammar Property (2)

But there exists an unambiguous grammar for the same language:

Γ1:• S → E• E → T + E |T• T → F ∗ T |F• F → (E) |ID

58© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007

Aber es gibt eine eindeutige Grammatik für die Sprache:

!1: S E, E T + E | T, T F * T | F, F ( E ) | ID

S

E

E

E

E

F

F

F

F

T

T

T

T

( ID + ID ) * ID + ID

F

T

Lesen Sie zu Abschnitt 2.2.1:

Wilhelm, Maurer:

• aus Kap. 8, Syntaktische Analyse, die S. 271 - 283

Appel:

• aus Chap. 3, S. 40 - 47

(Es gibt aber auch kontextfreie Sprachen, die nur durch

mehrdeutige Grammatiken beschrieben werden.)c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 20

Context-Free Syntax Analysis Specification of Parsers

Ambiguity as Grammar Property (3)

Remark:

• A context-free language for which every grammar is ambiguous iscalled inherently ambiguous.

• There are inherently ambiguous CFLs.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 21

Context-Free Syntax Analysis Specification of Parsers

Literature

Recommended reading:• Wilhelm, Maurer: Chapter 8, pp. 271 - 283 (Syntactic Analysis)• Appel: Chapter 3, pp. 40-47

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 22

Context-Free Syntax Analysis Implementation of Parsers

2.2.2 Implementation of Parsers

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 23

Context-Free Syntax Analysis Implementation of Parsers

Implementation of parsers

Overview• Top-down parsing

I Recursive descentI LL parsingI LL parser generation

• Bottom-up parsingI LR parsingI LALR, SLR, LR(k) parsingI LALR parser generation

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 24

Page 4: 2.2. Context-Free Syntax Analysis

Context-Free Syntax Analysis Implementation of Parsers

Methods for context-free analysis

• Manually developed, grammar-specific implementation(error-prone, inflexible)

• Backtracking (simple, but inefficient)• Cocke-Younger-Kasami-Algorithm (1967):

I for all CFGs in Chomsky normalformI based on idea of dynamic programmingI time complexity O(n3) (however linear complexity desired)

• Top-down methods: from axiom to word/token stream• Bottom-up methods: from word/token stream to axiom

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 25

Context-Free Syntax Analysis Implementation of Parsers

Example: Top-down analysis

Top-down analysis leads to leftmost derivation.Example derivation with Γ1:

60© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007

Beispiel: (Top-down-Analyse)

S

E =>

T + E =>

F * T + E =>

( E ) * T + E =>

( T + E ) * T + E =>

( F + E ) * T + E =>

( ID + E ) * T + E =>

( ID + T ) * T + E =>

( ID + F ) * T + E =>

( ID + ID ) * T + E =>

( ID + ID ) * F + E =>

( ID + ID ) * ID + E =>

( ID + ID ) * ID + T =>

( ID + ID ) * ID + F =>

( ID + ID ) * ID + ID

Ergebnis der td-Analyse ist eine Linksableitung.

Gemäß !1 :

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 26

Context-Free Syntax Analysis Implementation of Parsers

Example: Bottom-up analysis

Bottom-up analysis leads to rightmost derivation.Example derivation with Γ1:

61© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007

Beispiel: (Bottom-up-Analyse)

( ID + ID ) * ID + ID <=

( F + ID ) * ID + ID <=

( T + ID ) * ID + ID <=

( T + F ) * ID + ID <=

( T + T ) * ID + ID <=

( T + E ) * ID + ID <=

( E ) * ID + ID <=

F * ID + ID <=

F * F + ID <=

F * T + ID <=

T + ID <=

T + F <=

T + T <=

T + E <=

E <=

S <=

Ergebnis der bu-Analyse ist eine Rechtsableitung.

Gemäß !1 :

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 27

Context-Free Syntax Analysis Implementation of Parsers

Context-free analysis with linear complexity

• Restrictions on grammar (not every CFG has a linear parser)• Use of push-down automata or systems of recursive procedures• Usage of look ahead to remaining input in order to select next

production rule to be applied

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 28

Context-Free Syntax Analysis Implementation of Parsers

Syntax analysis methods and parser generators

• Basic knowledge of syntax analysis is essential for use of parsergenerators.

• Parser generators are not always applicable.• Often, error handling has to be done manually.• Methods underlying parser generation is a good example for a

generic technique (and a highlight of computer science!).

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 29

Context-Free Syntax Analysis Implementation of Parsers

2.2.2.1 Top-down syntax analysis

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 30

Context-Free Syntax Analysis Implementation of Parsers

Top-down syntax analysis

Learning objectives• Understand the general principle of top-down syntax analysis• Be able to implement recursive descent parsing (by example)• Know expressiveness and limitations of top-down parsing• Understand the basic concepts of LL(k) parsing

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 31

Context-Free Syntax Analysis Implementation of Parsers

Recursive descent parsing

Basic idea

• Each nonterminal A is associated with a procedure. Thisprocedure accepts a partial sentence derived from A.

• The procedure implements a finite automaton constructed fromthe productions with A as left-hand side. This automaton is calledthe item automaton of A.

• The recursiveness of the grammar is mapped to mutual recursiveprocedures such that the stack of higher programing languages isused for handling the recursion.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 32

Page 5: 2.2. Context-Free Syntax Analysis

Context-Free Syntax Analysis Implementation of Parsers

Construction of recursive descent parser

Let Γ′1 be an CFG accepting w# iff w ∈ L(Γ1), i.e.,# is used as a special character denoting the end of the input.

Γ′1:• S → E#

• E → T + E | T• T → F ∗ T | F• F → (E) | ID

Construct item automaton for each nonterminal.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 33

Context-Free Syntax Analysis Implementation of Parsers

Item automata

S → E#

64© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007

Konstruktion eines Parsers mit der Methode

des rekursiven Abstiegs (exemplarisch):

Sei !‘ wie !1, aber mit Randzeichen #, d.h.

S E #, E T + E | T, T F * T | F, F ( E ) | ID

Konstruiere für jedes Nichtterminal A den sogenannten

Item-Automaten. Er beschreibt die Analyse derjenigen

Produktionen, deren linke Seite A ist:

1

[S .E #] [S E.# ] [S E#.]

[E .T+E]

[E .T ]

[T .F*T]

[ T .F ]

[F .(E)]

[F .ID ]

[E T+.E] [E T+E.][E T.+E]

[E T.]

[ T F.]

[T F.*T][T F*.T] [T F*T.]

[F ID.]

[F (.E)] [F (E.)] [F (E).]

E #

T + E

F*

T

(

ID

E )

E → T + E | T

64© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007

Konstruktion eines Parsers mit der Methode

des rekursiven Abstiegs (exemplarisch):

Sei !‘ wie !1, aber mit Randzeichen #, d.h.

S E #, E T + E | T, T F * T | F, F ( E ) | ID

Konstruiere für jedes Nichtterminal A den sogenannten

Item-Automaten. Er beschreibt die Analyse derjenigen

Produktionen, deren linke Seite A ist:

1

[S .E #] [S E.# ] [S E#.]

[E .T+E]

[E .T ]

[T .F*T]

[ T .F ]

[F .(E)]

[F .ID ]

[E T+.E] [E T+E.][E T.+E]

[E T.]

[ T F.][T F.*T]

[T F*.T] [T F*T.]

[F ID.]

[F (.E)] [F (E.)] [F (E).]

E #

T + E

F*

T

(

ID

E )

T → F ∗ T | F

64© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007

Konstruktion eines Parsers mit der Methode

des rekursiven Abstiegs (exemplarisch):

Sei !‘ wie !1, aber mit Randzeichen #, d.h.

S E #, E T + E | T, T F * T | F, F ( E ) | ID

Konstruiere für jedes Nichtterminal A den sogenannten

Item-Automaten. Er beschreibt die Analyse derjenigen

Produktionen, deren linke Seite A ist:

1

[S .E #] [S E.# ] [S E#.]

[E .T+E]

[E .T ]

[T .F*T]

[ T .F ]

[F .(E)]

[F .ID ]

[E T+.E] [E T+E.][E T.+E]

[E T.]

[ T F.]

[T F.*T][T F*.T] [T F*T.]

[F ID.]

[F (.E)] [F (E.)] [F (E).]

E #

T + E

F*

T

(

ID

E )c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 34

Context-Free Syntax Analysis Implementation of Parsers

Item automata (2)

F → (E) | ID

64© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007

Konstruktion eines Parsers mit der Methode

des rekursiven Abstiegs (exemplarisch):

Sei !‘ wie !1, aber mit Randzeichen #, d.h.

S E #, E T + E | T, T F * T | F, F ( E ) | ID

Konstruiere für jedes Nichtterminal A den sogenannten

Item-Automaten. Er beschreibt die Analyse derjenigen

Produktionen, deren linke Seite A ist:

1

[S .E #] [S E.# ] [S E#.]

[E .T+E]

[E .T ]

[T .F*T]

[ T .F ]

[F .(E)]

[F .ID ]

[E T+.E] [E T+E.][E T.+E]

[E T.]

[ T F.][T F.*T]

[T F*.T] [T F*T.]

[F ID.]

[F (.E)] [F (E.)] [F (E).]

E #

T + E

F*

T

(

ID

E )

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 35

Context-Free Syntax Analysis Implementation of Parsers

Recursive descent parsing procedures

• The recursive procedures are constructed from the item automata.• The input is a token stream terminated by #.• The variable currToken contains one token look ahead, i.e., the

first symbol of the input rest.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 36

Context-Free Syntax Analysis Implementation of Parsers

Recursive descent parsing procedures (2)

Production: S → E#

void S() {E();if( currToken == ’#’ ) {

accept();} else {

error();}

}

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 37

Context-Free Syntax Analysis Implementation of Parsers

Recursive descent parsing procedures (3)

Production: E → T + E | Tvoid E() {

T();if( currToken == ’+’ ) {

readToken();E();

}}

Production: T → F ∗ T | Fvoid T() {

F();if( currToken == ’*’ ){

readToken();T();

}}

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 38

Context-Free Syntax Analysis Implementation of Parsers

Recursive descent parsing procedures (4)

Production: F → ( E ) | IDvoid F() {

if( currToken == ’(’ ) {readToken();E();if( currToken == ’)’ ) {

readToken();} else error();

} else if( currToken == ID ) {readToken();

} elseerror();

}

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 39

Context-Free Syntax Analysis Implementation of Parsers

Recursive descent parsing procedures (5)

Remarks:

• Recursive descentI is relatively easy to implementI can easily be used with other tasks (see following example)I is a typical example for syntax-directed methods (see also following

example)

• Example uses one token look ahead.• Error handling is not considered.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 40

Page 6: 2.2. Context-Free Syntax Analysis

Context-Free Syntax Analysis Implementation of Parsers

Recursive descent and evaluation

Example: Interpreter for expressions using recursive descent

int env(Ident); // Ident -> int

// local variables imr store intermediate results

int S() {int imr = E();if (currToken == ’#’) {

return imr;} else {

error();return err_result;

}}

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 41

Context-Free Syntax Analysis Implementation of Parsers

Recursive descent and evaluation (2)

int E() {int imr = T();if( currToken == ’+’ ) {

readToken();return imr + E();

}}

int T() {int imr := F();if (currToken == ’*’){

readToken();return imr * T();

}}

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 42

Context-Free Syntax Analysis Implementation of Parsers

Recursive descent and evaluation (3)

int F() {int imr;if (currToken == ’(’){

readToken();imr := E();if (currToken == ’)’){

readToken(); return imr;} else {

error(); return err_result;}

} else if (currToken == ID) {readToken(); return env(code(ID));

} else {error(); return err_result;

}}

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 43

Context-Free Syntax Analysis Implementation of Parsers

Recursive descent and evaluation (4)

• Extension of parser with actions/computations can easily beimplemented, but mixes conceptually different phases/tasks andcauses programs hard to maintain.

• Question: For which grammars does the recursive descenttechnique work?→ LL(k) parsing theory

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 44

Context-Free Syntax Analysis Implementation of Parsers

LL parsing

• Basis for town-down syntax analysis• First “L” refers to reading input from left to right• Second “L” refers to search for leftmost derivations

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 45

Context-Free Syntax Analysis Implementation of Parsers

LL(k) grammars

Definition (LL(k) grammar)Let Γ = (N,T ,Π,S) be a CFG and k ∈ N.

Γ is an LL(k) grammar if for any two leftmost derivations

S ⇒∗lm uAα ⇒lm uβα⇒∗lm ux

and

S ⇒∗lm uAα ⇒lm uγα⇒∗lm uy

the following holds:

if prefix(k , x) = prefix(k , y), then β = γ

where prefix(k , x) yields the longest prefix of x with length ≤ k .

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 46

Context-Free Syntax Analysis Implementation of Parsers

LL(k) grammars (2)

Remarks:• A grammar is an LL(k) grammar if for a leftmost derivation with k

token look ahead the correct production for the next derivationstep can be found.

• A language Lk ⊆ Σ∗ is LL(k) if there exists an LL(k) grammar Γwith L(Γ) = Lk .

• The definition of LL(k) grammars provides no method to test if agrammar has the LL(k) property.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 47

Context-Free Syntax Analysis Implementation of Parsers

Non LL(k) grammars

Example 1: Grammar with left recursion Γ2:• S → E#

• E → E + T | T• T → T ∗ F | F• F → (E) | ID

Elimination of left recursion:Replace productions of form A→ Aα | β where β does not start with Aby A→ βA′ and A′ → αA′ | ε.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 48

Page 7: 2.2. Context-Free Syntax Analysis

Context-Free Syntax Analysis Implementation of Parsers

Non LL(k) grammars (2)

Elimination of left recursion: From Γ2 we obtain Γ3.

Γ2:• S → E#

• E → E + T | T• T → T ∗ F | F• F → (E) | ID

Γ3

• S → E#

• E → TE ′

• E ′ → +TE ′ | ε• T → FT ′

• T ′ → ∗FT | ε• F → (E) | ID

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 49

Context-Free Syntax Analysis Implementation of Parsers

Non LL(k) grammars (3)

Example 2: Grammar Γ4 with unlimited look ahead

• STM → VAR := VAR | ID(IDLIST )

• VAR → ID | ID(IDLIST )

• IDLIST → ID | ID, IDLIST

Γ4 is not an LL(k) grammar for any k.(Proof: cf. Wilhelm, Maurer, Example 8.3.4, p. 319)

Transformation to LL(2) grammar Γ′4:• STM → ASS_CALL | ID := VAR• ASS_CALL→ ID(IDLIST )ASS_CALL_REST• ASS_CALL_REST →:= VAR | ε

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 50

Context-Free Syntax Analysis Implementation of Parsers

Non LL(k) grammars (4)

Remarks:

• The transformed grammars accept the same language, butgenerate other syntax trees:

I From a theoretical point of view, this is acceptable.I From a programming language implementation perspective, this is

in general not acceptable.

• There are languages L for which no LL(k) grammar Γ exists thatgenerates the language, i.e. L(Γ) = L. (Example: grammar Γ5)

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 51

Context-Free Syntax Analysis Implementation of Parsers

Non LL(k) grammars (5)

Example 3:For the following grammar, there is no k such that Γ5 is an LL(k).

• S → A | B• A→ aAb | 0• B → aBbb | 1

Remark:For L(Γ5), there exists noLL(k) grammar.

Proof.Let k be arbitrary, but fixed.Choose two derivations according to the LL(k) definition and show that,despite of equal prefixes of length k, β and γ are not equal:

S ⇒∗lm S ⇒lm A⇒∗lm ak0bk

S ⇒∗lm S ⇒lm B ⇒∗lm ak1b2k

Then: prefix(k ,ak0bk ) = ak = prefix(k ,ak1b2k ), but β = A 6= B = γ.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 52

Context-Free Syntax Analysis Implementation of Parsers

FIRST and FOLLOW sets

DefinitionLet Γ = (N,T ,Π,S) be a CFG, k ∈ N;

T≤k = {u ∈ T ∗ | length(u) ≤ k}

denotes the set of all prefixes of length at least k .

We define:• FIRSTk : (N ∪ T )∗ → P(T≤k )

FIRSTk (α) = {prefix(k ,u) |α⇒∗ u}where prefix(n,u) = u for all u with length(u) ≤ n.

• FOLLOWk : (N ∪ T )∗ → P(T≤k ) ßFOLLOWk (α) = {w |S ⇒∗ βαγ ∧ w ∈ FIRSTk (γ)}

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 53

Context-Free Syntax Analysis Implementation of Parsers

FIRST and FOLLOW sets in parse trees

X

S

FIRSTk(X) FOLLOWk(X)

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 54

Context-Free Syntax Analysis Implementation of Parsers

Characterization of LL(1) grammars

Definition (reduced CFG)A CFG Γ = (N,T ,Π,S) is reduced if each nonterminal occurs in aderivation and each nonterminal derives at least one word.

LemmaA reduced CFG is LL(1) iff for any two productions A→ β and A→ γthe following holds:

(FIRST1(β)⊕1 FOLLOW1(A)) ∩ (FIRST1(γ)⊕1 FOLLOW1(A)) = ∅

where L1 ⊕1 L2 = {prefix(1, vw) | v ∈ L1,w ∈ L2}

Remark: FIRST and FOLLOW sets are computable, so this criterioncan be checked automatically.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 55

Context-Free Syntax Analysis Implementation of Parsers

Example: FIRSTk and FOLLOWk

Check that the modified expression grammar Γ3 is LL(1).• S → E#

• E → TE ′

• E ′ → +TE ′ | ε• T → FT ′

• T ′ → ∗FT | ε• F → (E) | ID

Compute FIRST1 and FOLLOW1 for each nonterminal.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 56

Page 8: 2.2. Context-Free Syntax Analysis

Context-Free Syntax Analysis Implementation of Parsers

Example: FIRSTk and FOLLOWk (2)

• F → (E) | ID:FIRST1((E))⊕1 FOLLOW1(F ) ∩ FIRST1(ID)⊕1 FOLLOW1(F )= {(} ⊕1 FOLLOW1(F ) ∩ {ID} ⊕1 FOLLOW1(F )= ∅

• E ′ → +TE ′ | ε:FIRST1(+TE ′)⊕1 FOLLOW1(E ′) ∩ FIRST1(ε)⊕1 FOLLOW1(E ′)= {+} ⊕1 FOLLOW1(E ′) ∩ {ε} ⊕1 FOLLOW1(E ′)= {+} ∩ {#, )}= ∅

• T ′ → ∗FT | ε :FIRST1(∗FT ′)⊕1 FOLLOW1(T ′) ∩ FIRST1(ε)⊕ FOLLOW1(T ′)= {∗} ⊕1 FOLLOW1(T ′) ∩ {ε} ⊕1 FOLLOW1(T ′)= {∗} ∩ {+,#, )}= ∅

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 57

Context-Free Syntax Analysis Implementation of Parsers

Proof of LL characterization lemma

• Direction from left to right:Γ is LL(1) implies FIRST-FOLLOW disjointness.

Proof by contradiction:(“FIRST-FOLLOW intersection non empty” implies “not LL(1)” )Let A→ β and A→ γ be two distinct productions of Gamma(β 6= γ) such that the FIRST-FOLLOW intersection is non empty.

Case distinction. We consider three cases:

Case 1: β ⇒∗ ε and γ ⇒∗ εIn this case, the LL(1) property does not hold for A→ β, A→ γ.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 58

Context-Free Syntax Analysis Implementation of Parsers

Proof of LL characterization lemma (2)Case 2: β 6⇒∗ εThen, there is a z with length(z) = 1 and

z ∈ ((FIRST1(β)⊕1FOLLOW1(A))∩ (FIRST1(γ)⊕1FOLLOW1(A)))

Because Γ is reduced, there are two derivations:S ⇒∗ ψAα⇒ ψβα⇒∗ ψzxS ⇒∗ ψAα⇒ ψγα⇒∗ ψzy

and there is a u such that ψ ⇒∗ u, i.e., there are leftmostderivations

S ⇒∗lm uAα⇒lm uβα⇒∗lm uzxS ⇒∗lm uAα⇒lm uγα⇒∗lm uzy

But, prefix(1, zx) = z = prefix(1, zy) contradicts the LL(1)property of Γ.

Case 3: γ 6⇒∗ ε: similar to Case 2.c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 59

Context-Free Syntax Analysis Implementation of Parsers

Proof of LL characterization lemma (3)

• Direction from right to left:FIRST-FOLLOW disjointness implies Γ is LL(1):

Proof:Consider any two derivations with β 6= γ:

S ⇒∗lm uAα⇒lm uβα⇒∗lm uxS ⇒∗lm uAα⇒lm uγα⇒∗lm uy

that is, prefix(1, x) ∈ (FIRST1(β)⊕1 FOLLOW1(A)) andprefix(1, y) ∈ (FIRST1(γ)⊕1 FOLLOW1(A)). Because ofFIRST-FOLLOW disjointness, prefix(1, x) 6= prefix(1, y)

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 60

Context-Free Syntax Analysis Implementation of Parsers

Parser generation for LL(k) languages

LL(k) Parser Generator

Grammar

Table for Push-Down Automaton/Parser Program

Error: Grammar is not LL(k)

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 61

Context-Free Syntax Analysis Implementation of Parsers

Parser generation for LL(k) languages (2)

Remarks:• Use of push-down automata with look ahead• Select production from tables• Advantages over bottom-up techniques in error analysis and error

handling

Example system: ANTLR (http://www.antlr.org/)

Recommended reading for top-down analysis:• Wilhelm, Maurer: Chapter 8, Sections 8.3.1. to Sections 8.3.4, pp.

312 - 329

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 62

Context-Free Syntax Analysis Implementation of Parsers

2.2.2.2 Bottom-up syntax analysis

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 63

Context-Free Syntax Analysis Implementation of Parsers

Bottom-up syntax anaysis

Learning objectives:• General principles of bottom-up syntax analysis• LR(k) analysis• Resolving conflicts in parser generation• Connection between CFGs and push-down automata

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 64

Page 9: 2.2. Context-Free Syntax Analysis

Context-Free Syntax Analysis Implementation of Parsers

Basic ideas: bottom-up syntax analysis

• Bottom-up analysis is more powerful than top-down analysis,since production is chosen at the end of the analysis while intop-down analysis the production is selected up front.

• LR: read input from left (L)and search for rightmost derivations (R)

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 65

Context-Free Syntax Analysis Implementation of Parsers

Principles of LR parsing

1. Reduce from sentence to axiom according to productions of Γ

2. Reduction yields sentential forms αx with α ∈ (N ∪ T )∗ andx ∈ T ∗ where x is the input rest

3. α has to be a prefix of a right sentential form of Γ. Such prefixesare called viable prefixes. This prefix property has to holdinvariantly during LR parsing to avoid dead ends.

4. Reductions are always made at the leftmost possible position.

More precisely:

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 66

Context-Free Syntax Analysis Implementation of Parsers

Viable prefix

DefinitionLet S ⇒∗rm βAu ⇒rm βαu be a right sentential form of Γ.

Then α is called a handle or redex of the right sentential form βαu .

Each prefix of βα is a viable prefix of Γ.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 67

Context-Free Syntax Analysis Implementation of Parsers

Regularity of viable prefixes

TheoremThe language of viable prefixes of a grammar Γ is regular.

Proof.Cf. Wilhelm, Maurer Thm. 8.4.1 and Corrollary 8.4.2.1. (pp. 361, 362).Essential proof steps are illustrated in the following by the constructionof the LR-DFA(Γ).

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 68

Context-Free Syntax Analysis Implementation of Parsers

Examples: towards LR parsing

• Consider Γ1I S → aCDI C → bI D → a | b

Analysis of aba can lead to a dead end (cf. lecture).

Considering viable prefixes can avoid this.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 69

Context-Free Syntax Analysis Implementation of Parsers

Examples: towards LR parsing (2)

• Consider Γ2I S → E#I E → a | (E) | EE

Analysis of ((a))(a)# (cf. lecture)

Stack can manage prefixes already read.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 70

Context-Free Syntax Analysis Implementation of Parsers

Examples: towards LR parsing (3)

• Consider Γ3I S → E#I E → E + T | TI T → ID

Analysis of ID + ID + ID # (cf. lecture)

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 71

Context-Free Syntax Analysis Implementation of Parsers

LR parsing: shift and reduce actions

Schematic syntax tree for input xay withα ∈ (N ∪ T )∗, a ∈ T , x , y ∈ T ∗ and start symbol S:

80© A. Poetzsch-Heffter, TU Kaiserslautern26.04.2007

x a y

!a

Lesezeiger

Schematischer Syntaxbaum zur Eingabe xay mit

a in T, x,y in T* und Startsymbol S:

x a y

! = "#

Lesezeiger

x a y

!

Lesezeiger

Schiebe Schritt (shift): Reduktionsschritt (reduce):

"$=>

80© A. Poetzsch-Heffter, TU Kaiserslautern26.04.2007

x a y

!a

Lesezeiger

Schematischer Syntaxbaum zur Eingabe xay mit

a in T, x,y in T* und Startsymbol S:

x a y

! = "#

Lesezeiger

x a y

!

Lesezeiger

Schiebe Schritt (shift): Reduktionsschritt (reduce):

"$=>

80© A. Poetzsch-Heffter, TU Kaiserslautern26.04.2007

x a y

!a

Lesezeiger

Schematischer Syntaxbaum zur Eingabe xay mit

a in T, x,y in T* und Startsymbol S:

x a y

! = "#

Lesezeiger

x a y

!

Lesezeiger

Schiebe Schritt (shift): Reduktionsschritt (reduce):

"$=>

Read Pointer

Read Pointer

Read Pointerc© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 72

Page 10: 2.2. Context-Free Syntax Analysis

Context-Free Syntax Analysis Implementation of Parsers

LR parsing: shift and reduce actions (2)

Shift step:

80© A. Poetzsch-Heffter, TU Kaiserslautern26.04.2007

x a y

!a

Lesezeiger

Schematischer Syntaxbaum zur Eingabe xay mit

a in T, x,y in T* und Startsymbol S:

x a y

! = "#

Lesezeiger

x a y

!

Lesezeiger

Schiebe Schritt (shift): Reduktionsschritt (reduce):

"$=>

80© A. Poetzsch-Heffter, TU Kaiserslautern26.04.2007

x a y

!a

Lesezeiger

Schematischer Syntaxbaum zur Eingabe xay mit

a in T, x,y in T* und Startsymbol S:

x a y

! = "#

Lesezeiger

x a y

!

Lesezeiger

Schiebe Schritt (shift): Reduktionsschritt (reduce):

"$=>

80© A. Poetzsch-Heffter, TU Kaiserslautern26.04.2007

x a y

!a

Lesezeiger

Schematischer Syntaxbaum zur Eingabe xay mit

a in T, x,y in T* und Startsymbol S:

x a y

! = "#

Lesezeiger

x a y

!

Lesezeiger

Schiebe Schritt (shift): Reduktionsschritt (reduce):

"$=>

Read Pointer

Read Pointer

Read Pointer

Reduce step:

80© A. Poetzsch-Heffter, TU Kaiserslautern26.04.2007

x a y

!a

Lesezeiger

Schematischer Syntaxbaum zur Eingabe xay mit

a in T, x,y in T* und Startsymbol S:

x a y

! = "#

Lesezeiger

x a y

!

Lesezeiger

Schiebe Schritt (shift): Reduktionsschritt (reduce):

"$=>

80© A. Poetzsch-Heffter, TU Kaiserslautern26.04.2007

x a y

!a

Lesezeiger

Schematischer Syntaxbaum zur Eingabe xay mit

a in T, x,y in T* und Startsymbol S:

x a y

! = "#

Lesezeiger

x a y

!

Lesezeiger

Schiebe Schritt (shift): Reduktionsschritt (reduce):

"$=>

80© A. Poetzsch-Heffter, TU Kaiserslautern26.04.2007

x a y

!a

Lesezeiger

Schematischer Syntaxbaum zur Eingabe xay mit

a in T, x,y in T* und Startsymbol S:

x a y

! = "#

Lesezeiger

x a y

!

Lesezeiger

Schiebe Schritt (shift): Reduktionsschritt (reduce):

"$=>

Read Pointer

Read Pointer

Read Pointer

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 73

Context-Free Syntax Analysis Implementation of Parsers

LR parsing: shift and reduce actions (3)

Problems:• Make sure that all reductions guarantee that the resulting prefix

remains a viable prefix.• When to shift? When to reduce? Which production to use?

Solution:For each grammar Γ construct LR-DFA(Γ) automaton (also calledLR(0) automaton), that describes the viable prefixes.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 74

Context-Free Syntax Analysis Implementation of Parsers

Construction of LR-DFA

Let Γ = (T ,N,Π,S) be a CFG.• For each nonterminal A ∈ N, construct item automaton• Build union of item automata: Start state is the start state of item

automaton for S, final states are final states of item automata• Add ε transitions from each state which contains the dot in front of

a nonterminal A to the starting state of the item automaton of A

TheoremThe automatonobtained from LR-DFA(Γ) by declaring all states to be final statesexactly accepts the language of viable prefixes of Γ.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 75

Context-Free Syntax Analysis Implementation of Parsers

Example: Construction of LR-DFA

Γ3: S → E#, E → E + T | T , T → ID

82© A. Poetzsch-Heffter, TU Kaiserslautern26.04.2007

!5 : S E # , E E + T | T , T ID

Beispiel: (Konstruktion eines LR-DEA)

Konstruktion des LR-DEA für

[S .E #] [S E.# ] [S E#.]

[E .E+T]

[E .T ]

[T .ID ]

[E E+.T] [E E+T.]

[E T.]

[T ID.]

E #

E + T

ID

[E E.+T]

T

"

" "

"

Deterministisch machen liefert folgenden Automaten:c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 76

Context-Free Syntax Analysis Implementation of Parsers

Example: Construction of LR-DFA (2)

Power set construction:

83© A. Poetzsch-Heffter, TU Kaiserslautern26.04.2007

[S .E #]

[S E.# ][S E#.]

[E .E+T]

[E .T ]

[T .ID ]

[E E+.T]

[E E+T.][E T.] [T ID.]

E #

+

T

IDFehlerT

[E E.+T]

bezeichnet Fehlerkanten

q0

q1 q2

q3

q4q5

q6

Die zuverlässigen Präfixe maximaler Länge:

E# , T , ID , E+ID , E+T

[T .ID ]

ID

Bemerkungen:

• Im Beispiel enthält jeder Endzustand genau eine

vollständig gelesene Produktion. Dies ist im Allg.

nicht so.

• Enthält ein Endzustand mehrere vollständig gelesene

Produktionen spricht man von einem reduce/reduce-

Konflikt.

• Enthält ein Endzustand eine vollständig gelesene

und eine unvollständig gelesene Produktion

mit einem Terminal nach dem Positionspunkt,

spricht man von einem shift/reduce-Konflikt.

q7

Error

Error Transitions

Viable prefixes of maximal length: E#, T , ID, E + ID, E + T

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 77

Context-Free Syntax Analysis Implementation of Parsers

Example: Construction of LR-DFA (3)

Remarks:• In the example, each final state contains one completely read

production, this is in general not the case.• If a final state contains more than one completely read

productions, we have a reduce/reduce conflict.• If a final state contains a completely read and an uncompletely

read production with a terminal after the dot, we have ashift/reduce conflict.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 78

Context-Free Syntax Analysis Implementation of Parsers

Analysis with LR-DFA

Analysis of ID + ID + ID # with LR-DFA(the viable prefix is underlined)

84© A. Poetzsch-Heffter, TU Kaiserslautern26.04.2007

Analyse von ID + ID + ID # mit dem LR-DEA,

unterstrichen ist jeweils der zuverlässige Präfix:

ID + ID + ID # <=

T + ID + ID # <=

E + ID + ID # <=

E + T + ID # <=

E + ID # <=

E + T # <=

E # <=

S

Beispiel: (Analyse mit LR-DEA)

Beachte:

• Die Satzformen bestehen immer aus einem

zuverlässigen Präfix und der Resteingabe.

• Verwendet man nur den LR-DEA

zur Analyse muss man nach jeder Reduktion

die Satzform von Anfang an lesen.

deshalb: verwende Kellerautomaten zur Analyse

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 79

Context-Free Syntax Analysis Implementation of Parsers

Analysis with LR-DFA (2)

Note:• The sentential forms always consist of a viable prefix and an input

rest.

• If an LR-DFA is used, after each reduction the sentential form hasto be read from the beginning.

Thus: Use pushdown automaton for analysis.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 80

Page 11: 2.2. Context-Free Syntax Analysis

Context-Free Syntax Analysis Implementation of Parsers

LR pushdown automaton

DefinitionLet Γ = (N,T ,Π,S) be a CFG. The LR-DFA pushdown automaton for Γcontains:• a finite set of states Q (the states of the LR-DFA(Γ))• a set of actions Act = {shift ,accept ,error} ∪ red(Π), where

red(Π) contains an action reduce(A→ α) for each A→ α .• an action table at : Q → Act .• a successor table succ : P × (N ∪ T )→ Q with

P = {q ∈ Q | at(q) = shift}

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 81

Context-Free Syntax Analysis Implementation of Parsers

LR pushdown automaton (2)

Remarks:• The LR-DFA pushdown automaton is a variant of pushdown

automata particularly designed for LR parsing.• States encode the read left context.• If there are no conflicts, the action table can be directly

constructed from the LR-DFA:I accept: final state of item automaton of start symbolI reduce: all other final statesI error: error stateI shift: all other states

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 82

Context-Free Syntax Analysis Implementation of Parsers

Execution of Pushdown Automaton

• Configuration: Q∗ × T ∗ where variable stack denotes thesequence of states and variable inr denotes the input rest

• Start configuration: (q0, input), where q0 is the start state of theLR-DFA

• Interpretation Procedure:

(stack, inr) := (q0,input);do {

step(stack,inr);} while( at( top(stack) ) != accept

&& at( top(stack) ) != error );if( at( top(stack) ) == error ) return error;

with

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 83

Context-Free Syntax Analysis Implementation of Parsers

Execution of Push-Down Automaton (2)

void step ( var StateSeq stack, var TokenSeq inr) {State tk: = top(stack);switch( at(tk) ) {case shift:

stack := push ( succ(tk,top(inr)), stack );inr := tail(inr);break;

case reduce A -> a:stack := mpop( length(a), stack );stack := push( succ(top(stack),A), stack);break;

}}

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 84

Context-Free Syntax Analysis Implementation of Parsers

Example: LR push down automaton

LR-DFA with states q0, . . . ,q7 for grammar Γ3

Action table

87© A. Poetzsch-Heffter, TU Kaiserslautern26.04.2007

Beispiel: (LR-Kellerautomat zu !5 )

Aktionstabelle:

q0 schieben

q1 schieben

q2 akzeptieren

q3 schieben

q4 reduzieren E E+T

q5 reduzieren E T

q6 reduzieren T ID

q7 fehler

Nachfolgertabelle:

ID + # E T

q0 q6 q7 q7 q1 q5

q1 q7 q3 q2 q7 q7

q2

q3 q6 q7 q7 q7 q4

q4

q5

q6

q7

LR-DEA mit Zuständen q0 – q7 (siehe Beipiel oben)

Rechnung zu Eingabe ID + ID + ID # :

Keller Eingaberest Aktion

q0

q0 q6

q0 q5

q0 q1

q0 q1 q3

q0 q1 q3 q6

q0 q1 q3 q4

q0 q1

q0 q1 q3

q0 q1 q3 q6

q0 q1 q3 q4

q0 q1

q0 q1 q2

ID + ID + ID # schieben

+ ID + ID # reduzieren T ID

+ ID + ID # reduzieren E T

+ ID + ID # schieben

ID + ID # schieben

+ ID # reduzieren T ID

+ ID # reduzieren E E+T

+ ID # schieben

ID # schieben

# reduzieren T ID

# reduzieren E E+T

# schieben

akzeptieren

shift

accept

error

reduce

shift

shift

reduce

reduce

Successor table

87© A. Poetzsch-Heffter, TU Kaiserslautern26.04.2007

Beispiel: (LR-Kellerautomat zu !5 )

Aktionstabelle:

q0 schieben

q1 schieben

q2 akzeptieren

q3 schieben

q4 reduzieren E E+T

q5 reduzieren E T

q6 reduzieren T ID

q7 fehler

Nachfolgertabelle:

ID + # E T

q0 q6 q7 q7 q1 q5

q1 q7 q3 q2 q7 q7

q2

q3 q6 q7 q7 q7 q4

q4

q5

q6

q7

LR-DEA mit Zuständen q0 – q7 (siehe Beipiel oben)

Rechnung zu Eingabe ID + ID + ID # :

Keller Eingaberest Aktion

q0

q0 q6

q0 q5

q0 q1

q0 q1 q3

q0 q1 q3 q6

q0 q1 q3 q4

q0 q1

q0 q1 q3

q0 q1 q3 q6

q0 q1 q3 q4

q0 q1

q0 q1 q2

ID + ID + ID # schieben

+ ID + ID # reduzieren T ID

+ ID + ID # reduzieren E T

+ ID + ID # schieben

ID + ID # schieben

+ ID # reduzieren T ID

+ ID # reduzieren E E+T

+ ID # schieben

ID # schieben

# reduzieren T ID

# reduzieren E E+T

# schieben

akzeptieren

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 85

Context-Free Syntax Analysis Implementation of Parsers

Example: LR push down automaton (2)Computation for input ID + ID + ID #

87© A. Poetzsch-Heffter, TU Kaiserslautern26.04.2007

Beispiel: (LR-Kellerautomat zu !5 )

Aktionstabelle:

q0 schieben

q1 schieben

q2 akzeptieren

q3 schieben

q4 reduzieren E E+T

q5 reduzieren E T

q6 reduzieren T ID

q7 fehler

Nachfolgertabelle:

ID + # E T

q0 q6 q7 q7 q1 q5

q1 q7 q3 q2 q7 q7

q2

q3 q6 q7 q7 q7 q4

q4

q5

q6

q7

LR-DEA mit Zuständen q0 – q7 (siehe Beipiel oben)

Rechnung zu Eingabe ID + ID + ID # :

Keller Eingaberest Aktion

q0

q0 q6

q0 q5

q0 q1

q0 q1 q3

q0 q1 q3 q6

q0 q1 q3 q4

q0 q1

q0 q1 q3

q0 q1 q3 q6

q0 q1 q3 q4

q0 q1

q0 q1 q2

ID + ID + ID # schieben

+ ID + ID # reduzieren T ID

+ ID + ID # reduzieren E T

+ ID + ID # schieben

ID + ID # schieben

+ ID # reduzieren T ID

+ ID # reduzieren E E+T

+ ID # schieben

ID # schieben

# reduzieren T ID

# reduzieren E E+T

# schieben

akzeptieren

Stack Input Rest Action

shiftshift

shift

shiftshift

shiftaccept

reduce

reduce

reducereduce

reducereduce

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 86

Context-Free Syntax Analysis Implementation of Parsers

LR-DFA construction

Questions:• Does LR-DFA construction work for all unambiguous grammars?• For which grammars does the construction work?• How can the construction be generalized / made more

expressive?

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 87

Context-Free Syntax Analysis Implementation of Parsers

Example: LR-DFA with conflicts

LR-DFA for Γ6: S → E#, E → T + E | T , T → ID | N(), N → ID

88© A. Poetzsch-Heffter, TU Kaiserslautern26.04.2007

Fragen:

• Funktioniert die obige Konstruktion für alle

eindeutigen Grammatiken?

• Für welche Grammatiken funktioniert sie?

• Wie kann man sie verallgemeinern/mächtiger machen?

Beispiel:

LR-DEA für !6 :

S E # , E T+E | T , T ID | N( ) , N ID

[S .E #]

[S E.# ] [S E#.]

[E .T+E]

[E .T ]

[T .ID ]

[ T N(.) ]

E

#

+

T

ID

Fehler

T

bezeichnet Fehlerkanten

q0

q1 q2 q3

q4

q5

q6

ID[T .N( ) ]

[N .ID ]

[E T.+E]

[E T. ][E T+.E]

[E .T ]

[E .T+E]

[T .N( ) ]

[N .ID ]

[T .ID ]

[E T+E.]

E

[T ID.]

q10

[N ID.]

N

[ T N.( ) ] [ T N( ). ]

N

( )q8q7 q9

error

Error Transitions

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 88

Page 12: 2.2. Context-Free Syntax Analysis

Context-Free Syntax Analysis Implementation of Parsers

LR parsing conflicts

2 kinds of conflicts:• shift/reduce conflicts (q4 in example)• reduce/reduce conflicts (q6 in example)

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 89

Context-Free Syntax Analysis Implementation of Parsers

LR parsing theory

DefinitionLet Γ = (N,T ,Π,S) be a CFG and k ∈ N.Γ is an LR(k) grammar if for any two rightmost derivations

S ⇒∗rm αAu ⇒rm αβuS ⇒∗rm γBv ⇒rm αβw

it holds that:If prefix(k ,u) = prefix(k ,w) then α = γ, A = B and v = w

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 90

Context-Free Syntax Analysis Implementation of Parsers

LR parsing theory (2)

Remarks:

• While for LL grammars the selection of the production depends onthe nonterminal to be derived, for LR grammars it depends on thecomplete left context.

• For LL grammars, the look ahead considers the language to begenerated from the nonterminal. For LR grammars, the lookahead considers the language generated from not yet readnonterminals.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 91

Context-Free Syntax Analysis Implementation of Parsers

Characterization of LR(0)

TheoremLet Γ be a reduced CFG;Γ is LR(0) if-and-only-if LR-DFA(Γ) contains no conflicts.

We first present some properties of LR-DFA construction that areneeded for the proof.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 92

Context-Free Syntax Analysis Implementation of Parsers

Properties of LR-DFA Construction

Lemma 1

1. Each path leading to a state q with item [A→ α.β] ends with α.2. If q is a state with items [A→ α.δ] and [B → β.γ],

then α is a postfix of β or β a postfix of α.3. If there is a transition marked with H from state p to state q,

then p contains an item of form [C → γ.Hδ].

Proof.The properties are• obvious for LR-NFA (by construction) and• remain valid for LR-DFA.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 93

Context-Free Syntax Analysis Implementation of Parsers

Properties of LR-DFA construction (2)

Lemma 2

For any A, ψ, α, β:Item [A→ α.β] is contained in a state that is reached by ψαiff there exists a rightmost derivation

S ⇒∗rm ψAu ⇒rm ψαβu

Proof.not worked out (cf. Wilhelm, Maurer, Thm. 8.4.3, p. 363.)

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 94

Context-Free Syntax Analysis Implementation of Parsers

Proof of LR(0) characterization

• Left-to-Right-Direction:LR(0) property implies that LR-DFA has no conflicts.

Let q be a state of LR-DFA(Γ) with two items [A→ α.] and[B → β.γ].We show that these items do not cause a conflict.

By Lemma 1(b), there are µ, ν with µα = νβ and with µ = ε orν = ε.Let ψ be a path leading to q, then according to Lemma 1(a),there exists ϕ with ψ = ϕµα = ϕνβ.By Lemma 2, there are the following rightmost derivations

S ⇒∗rm ϕµAu ⇒rm ϕµαu

S ⇒∗rm ϕνBv ⇒rm ϕνβγv

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 95

Context-Free Syntax Analysis Implementation of Parsers

Proof of LR(0) characterization (2)

Case 1: Suppose, the items [A→ α.] and [B → β.γ] are differentand cause a reduce/reduce-conflict, i.e. γ = ε.

We show that then it has to holds that A = B and α = β,i.e. both items are identical which is a contradiction to theassumption.

Since γ = ε and ϕµα = ϕνβ, it holds that ϕµα = ϕνβγ.By the LR(0) property, it holds ϕµ = ϕν and A = B (and v = v ).From ϕν = ϕµ and ϕµα = ϕνβ, it follows that α = β,i.e. both items are identical.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 96

Page 13: 2.2. Context-Free Syntax Analysis

Context-Free Syntax Analysis Implementation of Parsers

Proof of LR(0) characterization (3)

Case 2: Suppose the items [A→ α.] and [B → β.γ] cause ashift/reduce-conflict, i.e. γ 6= ε and γ starts with a terminal symbol.

We consider the cases γ ∈ T+ and γ = cDρ, where in the firstcase γ contains no non-terminal, while in the second it contains atleast one non-terminal.

If γ ∈ T+, for w = γ, we obtain the derivation

S ⇒∗rm ϕνBv ⇒rm ϕνβwv = ϕµαwv

The LR(0) property yields ϕν = ϕµ, A = B and v = wv .Thus, it has to hold that w = ε which contradicts the assumptionγ 6= ε.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 97

Context-Free Syntax Analysis Implementation of Parsers

Proof of LR(0) characterization (4)

If γ = cDρ, we can extend the above rightmost derivation

S ⇒∗rm ϕνBv ⇒rm ϕνβγv = ϕµαcDρv⇒∗rm ϕµαcxEyv ⇒rm ϕµαcxwyv

Then we have the rightmost derivations

S ⇒∗rm ϕµAu ⇒rm ϕµαu

S ⇒∗rm ϕµαcxEyv ⇒rm ϕµαcxwyv

The LR(0) property yields in particular that ϕµαcx = ϕµ.Thus, it has to hold that αcx = ε which is not possible; thus, theassumption yieds to a contradiction.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 98

Context-Free Syntax Analysis Implementation of Parsers

Proof of LR(0) characterization (5)

• Right-to-Left Direction: Conflict-freedom implies LR(0) property.

According to the LR(0) definition, we consider two rightmostderivations

S ⇒∗rm ψAu ⇒rm ψαu

S ⇒∗rm ϕBx ⇒rm ψαy

In derivation 2, we assume a production B → δsuch that ϕδx = ψαy .By Lemma 2, [A→ α.] belongs to a state reached by ψαand [B → δ.] to a state reached by ϕδ.Since ϕδx = ψαy , it holds ϕδ is prefix of ψα or vice versa.

Case distinction on the relationship between ϕδ und ψα:

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 99

Context-Free Syntax Analysis Implementation of Parsers

Proof of LR(0) characterization (6)

1. ϕδ = ψα:

[A→ α.] and [B → δ.] belong to the same state.Since the LR-DFA has no conflicts, it holds that α = δ and A = B,thus also ϕ = ψ and x = y . This yields the LR(0) property.

2. ϕδ is a proper prefix of ψα:

Since ϕδx = ψαy , there exists c ∈ T und z ∈ T ∗

such that x = czy and hence ϕδcz = ψα.By Lemma 1(c), the state reached by ϕδ has to contain a transitionmarked with c and an item [C → µ.cν].Furthermore, by Lemma 2, the state reached by ϕδ has to contain[B → δ.] But this would cause a conflict with [B → δ.] whichcontradicts the conflict-freeness of LR-DFA such that this casecannot occur.

3. ψα is a proper prefix of ϕδ: analog to case (b).

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 100

Context-Free Syntax Analysis Implementation of Parsers

Example: Application of LR(0) chracterization

Show (using the above theorem) that Γ5 is LR(0).Γ5:

• S → A | B• A→ aAb | 0• B → aBbb | 1

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 101

Context-Free Syntax Analysis Implementation of Parsers

Expressiveness of LR(k)

• For each context-free language L with the prefix property(i.e. ∀v ,w ∈ L: v is no prefix of w), there exists an LR(0) grammar.

• Grammar Γ5 is not LL(k), but LR(0).• Methods for LR(1) can be generalized to LR(k), SLR(k) and

LALR(k).

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 102

Context-Free Syntax Analysis Implementation of Parsers

Resolving conflicts by look ahead

• Compute look ahead sets from (N ∪ T )≤k for items. The lookahead set of an item approximates the set of prefixes of length kwith which the input rest at this item can start.

• If the look ahead sets at an item are disjoint, then the action to beexecuted (shift, reduce) can be determined by k symbols lookahead.

• For an item, select the action whose look ahead set contains theprefix of the input rest. Action table has to be extended.

• For computation of look ahead sets, there are different methods.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 103

Context-Free Syntax Analysis Implementation of Parsers

Common methods for look ahead computation

• SLR(k) uses LR-DFA and FOLLOWk of conflicting items for lookahead

• LALR(k) - look ahead LR - uses LR-DFA with state-dependentlook ahead sets

• LR(k) integrates computation of look ahead sets in automataconstruction (LR(k) automaton)

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 104

Page 14: 2.2. Context-Free Syntax Analysis

Context-Free Syntax Analysis Implementation of Parsers

SLR grammars

Definition (SLR(1) grammar)Let Γ = (N,T ,Π,S) be a CFG and LA([A→ α.]) = FOLLOW1(A).

A state LR-DFA(Γ) has an SLR(1) conflict if there exists• two different reduce items with LA([A→ α.]) ∩ LA([B → β.]) 6= ∅ or• two items [A→ α.] and [B → α.aβ] with a ∈ LA([A→ a]).

Γ is SLR(1) if there is no SLR(1) conflict.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 105

Context-Free Syntax Analysis Implementation of Parsers

SLR grammars (2)

Example: Γ6 is an SLR(1) grammar• S → E#

• E → T + E | T• T → ID | N()

• N → ID

Consider the conflicts between [E → T .] and [E → T .+ E ] andbetween [T → ID.] and [N → ID.]

FOLLOW1(E) ∩ {+} = {#} ∩ {+} = ∅FOLLOW1(T ) ∩ FOLLOW1(N) = {#,+} ∩ {(} = ∅

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 106

Context-Free Syntax Analysis Implementation of Parsers

SLR grammars (3)

Example: Γ7 (simplifed C expressions) is not an SLR(1) grammar

• S → E#

• E → L = R | R• L→ ∗R | ID• R → L

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 107

Context-Free Syntax Analysis Implementation of Parsers

SLR grammars (4)LR-DFA for Γ7

93© A. Poetzsch-Heffter, TU Kaiserslautern26.04.2007

Beispiel: (nicht SLR(1)-Sprache)

Betrachte folgende Grammatik für vereinfachte

C-Ausdrücke:

!7 : S E # , E L = R | R , L *R | ID , R L

Der zugehörige LR-DEA:

[S .E# ]

[S E.# ] [S E#. ]

[E .L=R]

[E .R]

[E L .=R]

[E L= .R]

[E L=R.]

[E R.]

[R .L]

[R L .]

[L .*R]

[L .ID]

[L * .R][L *R.]

[L ID.]

[R .L]

[L .*R]

[L .ID]

[R .L]

[L .*R]

[L .ID]

[R L .]

ER

L

*

=

ID

#

R

* LID

ID

R

L

*

Der einzige Zustand mit einem Konflikt enthält die

Items [E L .=R] und [R L .] mit

FOLLOW1(R) { = } = { =, # } { = } = { =} = { }

U U

/

Only conflict in items [E → L. = R] and [R → L.] with

FOLLOW1(R) ∩ {=} = {=,#} ∩ {=} 6= ∅

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 108

Context-Free Syntax Analysis Implementation of Parsers

Construction of LR(1) automata

LR(1) automaton contains items [A→ α.β,V ] with V ⊆ T where

• α is on top of the stack

• the input rest is derivable from βc with c ∈ V , i.e.V ⊆ FOLLOW1(A).

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 109

Context-Free Syntax Analysis Implementation of Parsers

Construction of LR(1) automata (2)

LR(1) automaton for Γ7. Conflict is resolved, as {=} ∩ {#} = ∅.

94© A. Poetzsch-Heffter, TU Kaiserslautern26.04.2007

Konstruktion von LR(1)-Automaten:

Items der Form [ A !.", V ] mit V T

und der Bedeutung, dass ! auf dem Keller liegt und

der Anfang des Eingaberests aus "c ableitbar ist mitc in V. D.h. V FOLLOW1(A) .

U

U

S .E#

S E.#

S E#.

E .L=R #

E .R #

E L .=R # E L= .R #

E L=R. #

E R. #

R .L #

R L . #

L .*R #,=

L .ID #,=

L * .R #,=

L *R. #,=

L ID. #

R .L #

L .*R #

L .ID #

R .L #,=

L .*R #,=

L .ID #,=

R L . #

E R

L

*

=

ID

#

R

*

L

ID

ID

R

L

*

L * .R #

R .L #

L .*R #

L .ID #

L *R. #

*L ID. #,=

R L . #,=

R

L

Konflikt kann behoben werden, da {=} {#} = {}U

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 110

Context-Free Syntax Analysis Implementation of Parsers

LALR(1) automata

Informal description of LALR(1) automata:• LALR(1) automata can be constructed from LR(1) automata by

merging states in which items only differ in look ahead sets.• In the merged state, the items has the union of the look ahead

sets as look ahead set.

Remarks:

1. The LALR(1) automaton has the same states as the LR-DFA.

2. The LALR(1) automata can be constructed more efficiently.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 111

Context-Free Syntax Analysis Implementation of Parsers

Example: LALR(1) automaton

LALR(1) automaton for Γ7.

95© A. Poetzsch-Heffter, TU Kaiserslautern26.04.2007

LALR(1)-Automaten:

Aus dem LR(1)-Automaten erhält man den LALR(1)-

Automaten durch Zusammenlegen der Zustände, in

denen sich die Items nur in der Vorausschaumenge

unterscheiden. Die Vorausschaumengen zu gleichen

Items werden dabei vereinigt. Der resultierende Automat

hat die gleichen Zustände wie der LR-DEA..

S .E#

S E.#

S E#.

E .L=R #

E .R #

E L .=R # E L= .R #

E L=R. #

E R. #

R .L #

R L . #

L .*R #,=

L .ID #,=

L * .R #,=

L *R. #,=

R .L #

L .*R #

L .ID #

R .L #,=

L .*R #,=

L .ID #,=

E R

L

*

=

ID

#

*

ID

ID

R

L

*

L ID. #,=

R L . #,=

R

L

Der LALR(1)-Automat lässt sich allerdings effizienter

direkt konstruieren.

q0

q1

q2

q3

q4

q5

q6

q7

q8

q9

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 112

Page 15: 2.2. Context-Free Syntax Analysis

Context-Free Syntax Analysis Implementation of Parsers

Grammar classes

96© A. Poetzsch-Heffter, TU Kaiserslautern26.04.2007

Zusammenhang der Grammatikklassen:

Lesen Sie zu Unterabschnitt 2.2.2.2:

Wilhelm, Maurer:

• aus Kap. 8, Abschnitt 8.4.1 bis einschl. 8.4.5,

S. 353 – 383.

mehrdeutige Grammatiken

eindeutige Grammatiken

LR(k)

LR(1)

LALR(1)

SLR(1)

LR(0)

LL(k)

LL(1)

LL(0)

unambiguous grammars

ambiguous grammars

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 113

Context-Free Syntax Analysis Implementation of Parsers

Literature

Recommended reading for bottom-up analysis:• Wilhelm, Maurer: Chapter 8, Sections 8.4.1 - 8.4.5, pp. 353 - 383

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 114

Context-Free Syntax Analysis Implementation of Parsers

Parser generators

Learning objectives

• Usage of parser generators• Characteristics of parser generators

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 115

Context-Free Syntax Analysis Implementation of Parsers

JavaCUP parser generator

• CUP - Constructor of useful parsershttp://www2.cs.tum.edu/projects/cup/

• Java-based generator for LALR-parsers

• JFlex can be used to generate cooperating scanners.

• Running JavaCUP:

java -jar java-cup-11a.jar options inputfile

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 116

Context-Free Syntax Analysis Implementation of Parsers

Structure of JavaCUP specification

package JavaPackageName;import java_cup.runtime.*;

/* User supplied code for scanner, actions, ... */

/* Terminals (tokens returned by the scanner). */terminal TerminalDecls;

/* Nonterminals */non terminal NonTerminalDecls;

/* Precedences */precedence [left | right | nonassoc ] TerminalList;

/* Grammar */start with nonterminalName;

non_terminalName :: = prod_1 | ... | prod_n ;c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 117

Context-Free Syntax Analysis Implementation of Parsers

Example: JavaCUP specification for Γ7

import java_cup.runtime.*;

/* Terminals (tokens returned by the scanner). */terminal ID, EQ, MULT;

/* Non terminals */non terminal S, E, L, R;

/* The grammar */

start with S;

S ::= E;E ::= L EQ R | R;L ::= MULT R | ID;R ::= L;

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 118

Context-Free Syntax Analysis Implementation of Parsers

Structure of generated parser code

• Output files parser.java and sym.java

• Tables for LALR automatonI Production table: provides the symbol number of the left hand side

nonterminal, along with the length of the right hand side, for eachproduction in the grammar,

I Action table: indicates what action (shift, reduce, or error) is to betaken on each lookahead symbol when encountered in each state

I Reduce-goto table: indicates which state to shift to after reduction

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 119

Context-Free Syntax Analysis Implementation of Parsers

Usage of generated parser

• Parser calls scanner with scan() method when a new terminaltoken is needed

• Initialising parser with new scanner

parser parser_obj = new parser(new my_scanner());

• Usage of parser:

Token parse_tree = parser_obj.parse();

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 120

Page 16: 2.2. Context-Free Syntax Analysis

Context-Free Syntax Analysis Error Handling

2.2.3 Error Handling

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 121

Context-Free Syntax Analysis Error Handling

Error Handling

Learning objectives:

• Problems and principles of error handling• Techniques of error handling for context-free analysis

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 122

Context-Free Syntax Analysis Error Handling

Principles of error handling

Error handling is required in all analysis phases and at runtime. Onedistinguishes• lexical errors• parse errors (in context-free analysis)• errors in name and type analysis• runtime errors (cannot be avoided in most cases)• logical errors (behavioural errors)

Remarks:

1. The first 2 (3) kinds of errors are syntactic errors. In the following,we only consider error handling in context-free analysis.

2. The specification of what an error is, is defined by the languagespecification.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 123

Context-Free Syntax Analysis Error Handling

Requirements for error handling

• Errors should be localized as precisely as possible.Problem: Errors are often not detected at the position that causedthe error.

• As many errors at possible should be detected at once/in onephase/in one run. Problem: Avoid to report consecutive errors

• Errors are not always unique, i.e., it is not clear in general how tocorrect an error: class int { Int a; .... } or int a = 1-;

• Error handling should not slow down analysis of correct programs.

Therefore, error handling is nontrivial and depends on the sourcelanguage to be analysed.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 124

Context-Free Syntax Analysis Error Handling

Error handling in context-free analysis

1. Panic error handlingMark synchronizing terminal symbols, e.g. “end” or “;”

If parser reaches error state, all symbols up to next synchronizingsymbol are skipped and the stack is corrected as if the productionwith the synchronizing symbol was read correctly.

I Pros: easy to implement, termination guaranteedI Cons: large parts of the program can be skipped or misinterpretedI Example: Incorrect input a : = b *** c;

Read until “;” correct stack and continue as if statement has beenaccepted

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 125

Context-Free Syntax Analysis Error Handling

Error handling in context-free analysis (2)

2. Error productionsExtend grammar with productions describing typical errorsituations, so called error productions.Error messages can be directly associated with error productions.

I Pros: easy to implement, termination guaranteedI Cons: extended grammar can belong to more general grammar

class, knowledge of typical error situations is necessaryI Example: Typical error in PASCAL

if ... then A := E; else ...Error Production:Stmt → if Expr then Stmt∗ ; else Stmt∗

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 126

Context-Free Syntax Analysis Error Handling

Error handling in context-free analysis (3)

3. Production-local error correctionGoal is local correction of input such that analysis can beresumed.Local means that it is tried to correct the input for the currentproduction.

I Pros: flexible and powerful techniqueI Cons: problematic if errors occur earlier than they can be detected,

operations for corrections can lead to nonterminating analysis

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 127

Context-Free Syntax Analysis Error Handling

Error handling in context-free analysis (4)

4. Global error correctionAttempt to get a correction that is as good as possible by alteringthe read input or the look ahead input.

Idea: Define distance or quality measure on inputs. For eachincorrect input, look for a syntactically correct input that is bestaccording to the used measure.

I Pros: very powerful techniqueI Cons: analysis effort can be rather high, implementation is complex

and poses risk of nontermination.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 128

Page 17: 2.2. Context-Free Syntax Analysis

Context-Free Syntax Analysis Error Handling

Error handling in context-free analysis (5)

5. Interactive error correctionIn modern programming languages, syntactic analysis is oftenalready supported by editors. In this case, editor marks errorpositions.

I Pros: quick feedback, possible error positions are shown directly,interaction with programmer possible

I Cons: editing can be disturbed, analysis must be able to handleincomplete programs

The presented techniques can be combined.Selection criteria depend on programming language syntax.Error handling also depends on grammar class and implementationtechniques used for parser.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 129

Context-Free Syntax Analysis Error Handling

Burke-Fischer error handling

Example of global error correction technique

• Procedure: Use correction window of n symbols before symbol atwhich error was detected. Check all possible variations of symbolsequence in correction window that can be obtained by insertion,exchange or modification of a symbol at any position.

• Quality measure: Choose variation that allows longestcontinuation of parsing procedure

• Implementation: Work with two stack automata, one representsthe configuration at the beginning of the correction window, theother one the configuration at the end of the correction window.In an error case, the automaton running behind can be used toresume at the old position and to test the computed variations.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 130

Context-Free Syntax Analysis Error Handling

Literature

Recommended reading: Wilhelm, Maurer: Chapter 8,Sections 8.3.6 and 8.4.6 (general understanding sufficient)

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 131

Context-Free Syntax Analysis Concrete and Abstract Syntax

2.2.4 Concrete and Abstract Syntax

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 132

Context-Free Syntax Analysis Concrete and Abstract Syntax

Concrete and Abstract Syntax

Learning objectives

• Connection of parsing to other phases of language processingand translation

• Differences between abstract and concrete syntax• Language concepts for describing syntax trees• Syntax tree construction

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 133

Context-Free Syntax Analysis Concrete and Abstract Syntax

Connection of parsing to other phases

1. Parsing directly controls subsequent phases2. Concrete syntax tree as interface3. Abstract syntax tree as interface

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 134

Context-Free Syntax Analysis Concrete and Abstract Syntax

Direct control by parser

• Parser calls other actions after each derivation/reduction step(Example: recursive descent parsing):

• Pros:I simple (if realizable)I flexibleI efficient (especially memory efficient)

• Cons:I non-modular, no clear interfacesI not suitable for global aspects of translationI subsequent phases depend on parsingI cannot be used with every parser generator

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 135

Context-Free Syntax Analysis Concrete and Abstract Syntax

Abstract syntax vs. concrete Syntax

Let PL be some (programming) language with CFG Γ andp ∈ PL be a program.

Definition (Concrete syntax)The concrete syntax of PL determines the actual text representation ofprograms (incl. key words, separators, etc.).The syntax tree of p according to Γ is the concrete syntax tree of p.

Definition (Abstract syntax)The abstract syntax of PL describes the tree structure of programs in aform that is sufficient and suitable for further processing.A tree for representing a program p according to the abstract syntax iscalled the abstract syntax tree of p.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 136

Page 18: 2.2. Context-Free Syntax Analysis

Context-Free Syntax Analysis Concrete and Abstract Syntax

Abstract syntax

• abstraction from keywords and separators• operator precedences are represented in tree structure (different

nonterminals are not necessary)• better incorporation of symbol information• simplifying transformations from concrete to abstract syntax might

be applied

Remarks:• The abstract syntax of a language is often not specified in the

language report.• The abstract syntax usually also comprises information about

source code positions.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 137

Context-Free Syntax Analysis Concrete and Abstract Syntax

Example: Concrete vs. abstract syntax

Concrete syntax: Γ2

• S → E#

• E → T + E | T• T → F ∗ T | F• F → (E) | ID

Abstract syntax• Exp = Add | Mult Ident• Add (Exp left, Exp right)• Mult (Exp left, Exp right)

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 138

Context-Free Syntax Analysis Concrete and Abstract Syntax

Example: Concrete vs. abstract syntax (2)Text: (a + b) ∗ c

Concrete syntax tree

Textrepräsentation: ( a + b ) * c

Konkreter Syntaxbaum: Abstrakter Syntaxbaum:

S Mult

T

E #

Mult

Add c

F

F

T

a b

E

E T

FF

T

( ID ID ) * ID( ID + ID ) * IDa b c

113© A. Poetzsch-Heffter, TU Kaiserslautern07.05.2007

Abstract syntax tree

Textrepräsentation: ( a + b ) * c

Konkreter Syntaxbaum: Abstrakter Syntaxbaum:

S Mult

T

E #

Mult

Add c

F

F

T

a b

E

E T

FF

T

( ID ID ) * ID( ID + ID ) * IDa b c

113© A. Poetzsch-Heffter, TU Kaiserslautern07.05.2007

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 139

Context-Free Syntax Analysis Concrete and Abstract Syntax

Concrete syntax tree as interface

Token Stream

Parser (with Tree Construction)

Concrete Syntax Tree

Further LanguageProcessing

• Resolves disadvantages of direct control by parser• Advantages over abstract syntax

I No additional specification of abstract syntax requiredI Tree construction does not have to be described.I Tree construction can be done automatically by parser generators.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 140

Context-Free Syntax Analysis Concrete and Abstract Syntax

Abstract syntax tree as interface

Token Stream

Parser (with Transforming Tree Construction)

Abstract Syntax Tree

Further LanguageProcessing

• Advantages over concrete syntaxI Simpler, more compact tree representationI Simplifies later phasesI Often implemented by programming or specification language as

mutable data structure

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 141

Context-Free Syntax Analysis Concrete and Abstract Syntax

Abstract syntax: Specification and tree construction

• For representing abstract syntax trees, we use order-sorted terms.• The sets and types of these terms are described by type

declarations.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 142

Context-Free Syntax Analysis Concrete and Abstract Syntax

Order-sorted data types

DefinitionOrder-sorted data types are specified by declarations of the followingform:• Variant type declarations V = V0 | V1 | . . . | Vm

• Tuple type declaration T (T1sel1, . . . ,Tnseln)

• List type declarations L ∗ S

Example:• Exp = Add | Mult | Ident• Add (Exp left, Exp right)

Mult (Exp left, Exp right)• ExpList * Exp

where Ident is a predefined type.

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 143

Context-Free Syntax Analysis Concrete and Abstract Syntax

Order-sorted data types (2)

Definition (Order-sorted types - contd.)Order-sorted terms are recursively defined as• If t is a term of type Vi , then it is also of type V.• If ti is a term of type Ti for each i, then T (t1, . . . , tn) is of type T ,

T is also the constructor.• If s1, . . . , sk are terms of type S, then L(s1, . . . , sn) is of type L,

L is also the list constructor.Additional operators• the selectors selk : T → Tk returns the k-th subterm• the usual list operations (rest, append, conc, ...)

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 144

Page 19: 2.2. Context-Free Syntax Analysis

Context-Free Syntax Analysis Concrete and Abstract Syntax

Order-sorted data types (3)

Remarks: Order-sorted data types

• generalize data types of functional languages by subtyping, a termcan belong to serveral types

• are used in specification languages, e.g. OBJ3, Katja, ...

• are a very compact form for type declaration

• have a cannonical implementation in OO languages

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 145

Context-Free Syntax Analysis Concrete and Abstract Syntax

Example: Order-sorted data types and OO types

Declaration of order-sorted data types:• Exp = Add | Mult | Neg | Ident• Add (Exp left, Exp right)• Mult (Exp left, Exp right)• Neg (Exp val)

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 146

Context-Free Syntax Analysis Concrete and Abstract Syntax

Implementation in Java

interface Exp {Exp left() throws IllegalSelectException;Exp right() throws IllegalSelectException;Exp val() throws IllegalSelectException;

}

class Add implements Exp {private Exp left;private Exp right;

Add( Exp l, Exp r ) {left = l; right = r; }

Exp left() { return left; }Exp right(){ return right; }Exp val() throws IllegalSelectException {

throw new IllegalSelectException(); }}

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 147

Context-Free Syntax Analysis Concrete and Abstract Syntax

Implementation in Java (2)

class Mult implements Exp {// analog zu Add }

class Neg implements Exp {private Exp val;

Neg( Exp v ) { val = v; }

Exp left() throws IllegalSelectException {throw new IllegalSelectException(); }

Exp right() throws IllegalSelectException {throw new IllegalSelectException(); }

Exp val() { return val; }}

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 148

Context-Free Syntax Analysis Concrete and Abstract Syntax

Implementation in Java (3)

class Ident implements Exp extends PredefIdent {

Exp left() throws IllegalSelectException {throw new IllegalSelectException(); }

Exp right() throws IllegalSelectException {throw new IllegalSelectException();}

Exp val() throws IllegalSelectException {throw new IllegalSelectException(); }

}

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 149

Context-Free Syntax Analysis Concrete and Abstract Syntax

Transformation of concrete to abstract syntaxTransformation konkreter in abstrakte Syntax:

S

E #

T

Mult(_,_)

F

F

T

E

Add(_,_)

E T

T

FF

( ID + ID ) * ID

120© A. Poetzsch-Heffter, TU Kaiserslautern07.05.2007

a b c

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 150

Context-Free Syntax Analysis Concrete and Abstract Syntax

Transformation to abstract syntax with JavaCUP

S ::= E:e #{: RESULT = e; :} ;

E ::= E:e ’+’ T:t{: RESULT = Add(e,t); :} |T:t{: RESULT = t; :} ;

T ::= T:t ’*’ F:f{: RESULT = Mult(t,f); :} |F:f{: RESULT = f; :} ;

F ::= ’(’ E:e ’)’{: RESULT = e ; :} |ID:i{: RESULT = i; :} ;

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 151

Context-Free Syntax Analysis Concrete and Abstract Syntax

Recommended reading

• Wilhelm, Maurer: Section 9.1, pp. 406 + 407• Appel: Chapter 4, pp. 89 – 105

c© Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 152