topic: syntactic analysis · 2020. 5. 7. · syntactic analysis token-streamsyntaxtreeparser...

135
Topic: Syntactic Analysis 1 / 55

Upload: others

Post on 06-Jun-2021

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Topic:

Syntactic Analysis

1 / 55

Page 2: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Syntactic Analysis

ParserToken-Stream Syntaxtree

Syntactic analysis tries to integrate Tokens into larger program units.

Such units may possibly be:

→ Expressions;

→ Statements;

→ Conditional branches;

→ loops; ...

2 / 55

Page 3: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Syntactic Analysis

ParserToken-Stream Syntaxtree

Syntactic analysis tries to integrate Tokens into larger program units.

Such units may possibly be:

→ Expressions;

→ Statements;

→ Conditional branches;

→ loops; ...

2 / 55

Page 4: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Discussion:

In general, parsers are not developed by hand, but generated from a specification:

ParserSpecification Generator

Specification of the hierarchical structure: contextfree grammarsGenerated implementation: Pushdown automata + X

3 / 55

Page 5: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Discussion:

In general, parsers are not developed by hand, but generated from a specification:

E→E{op}E Generator

Specification of the hierarchical structure: contextfree grammarsGenerated implementation: Pushdown automata + X

3 / 55

Page 6: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Chapter 1:

Basics of Contextfree Grammars

4 / 55

Syntactic Analysis

Page 7: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Basics: Context-free Grammars

Programs of programming languages can have arbitrary numbers of tokens, but onlyfinitely many Token-classes.This is why we choose the set of Token-classes to be the finite alphabet of terminals T .The nested structure of program components can be described elegantly viacontext-free grammars...

Definition: Context-Free GrammarA context-free grammar (CFG) is a4-tuple G = (N,T , P , S) with:

N the set of nonterminals,T the set of terminals,P the set of productions or rules, andS ∈ N the start symbol

5 / 55

Page 8: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Basics: Context-free Grammars

Programs of programming languages can have arbitrary numbers of tokens, but onlyfinitely many Token-classes.This is why we choose the set of Token-classes to be the finite alphabet of terminals T .The nested structure of program components can be described elegantly viacontext-free grammars...

Definition: Context-Free GrammarA context-free grammar (CFG) is a4-tuple G = (N,T , P , S) with:

N the set of nonterminals,T the set of terminals,P the set of productions or rules, andS ∈ N the start symbol

5 / 55

Noam Chomsky John Backus

Page 9: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Conventions

The rules of context-free grammars take the following form:

A→ α with A ∈ N , α ∈ (N ∪ T )∗

... for example:S → aS bS → ε

Specified language: {anbn | n ≥ 0}

Conventions:In examples, we specify nonterminals and terminals in general implicitely:

nonterminals are: A,B,C, ..., 〈exp〉, 〈stmt〉, ...;terminals are: a, b, c, ..., int, name, ...;

6 / 55

Page 10: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Conventions

The rules of context-free grammars take the following form:

A→ α with A ∈ N , α ∈ (N ∪ T )∗

... for example:S → aS bS → ε

Specified language: {anbn | n ≥ 0}

Conventions:In examples, we specify nonterminals and terminals in general implicitely:

nonterminals are: A,B,C, ..., 〈exp〉, 〈stmt〉, ...;terminals are: a, b, c, ..., int, name, ...;

6 / 55

Page 11: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Conventions

The rules of context-free grammars take the following form:

A→ α with A ∈ N , α ∈ (N ∪ T )∗

... for example:S → aS bS → ε

Specified language: {anbn | n ≥ 0}

Conventions:In examples, we specify nonterminals and terminals in general implicitely:

nonterminals are: A,B,C, ..., 〈exp〉, 〈stmt〉, ...;terminals are: a, b, c, ..., int, name, ...;

6 / 55

Page 12: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

... a practical example:

S → 〈stmt〉〈stmt〉 → 〈if〉 | 〈while〉 | 〈rexp〉;〈if〉 → if ( 〈rexp〉 ) 〈stmt〉 else 〈stmt〉〈while〉 → while ( 〈rexp〉 ) 〈stmt〉〈rexp〉 → int | 〈lexp〉 | 〈lexp〉 = 〈rexp〉 | ...〈lexp〉 → name | ...

More conventions:For every nonterminal, we collect the right hand sides of rules and list them together.The j-th rule for A can be identified via the pair (A, j)( with j ≥ 0).

7 / 55

Page 13: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

... a practical example:

S → 〈stmt〉〈stmt〉 → 〈if〉 | 〈while〉 | 〈rexp〉;〈if〉 → if ( 〈rexp〉 ) 〈stmt〉 else 〈stmt〉〈while〉 → while ( 〈rexp〉 ) 〈stmt〉〈rexp〉 → int | 〈lexp〉 | 〈lexp〉 = 〈rexp〉 | ...〈lexp〉 → name | ...

More conventions:For every nonterminal, we collect the right hand sides of rules and list them together.The j-th rule for A can be identified via the pair (A, j)( with j ≥ 0).

7 / 55

Page 14: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Pair of grammars:

E → E+E 0 | E∗E 1 | ( E ) 2 | name 3 | int 4

E → E+T 0 | T 1

T → T∗F 0 | F 1

F → ( E ) 0 | name 1 | int 2

Both grammars describe the same language

8 / 55

Page 15: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Pair of grammars:

E → E+E 0 | E∗E 1 | ( E ) 2 | name 3 | int 4

E → E+T 0 | T 1

T → T∗F 0 | F 1

F → ( E ) 0 | name 1 | int 2

Both grammars describe the same language

8 / 55

Page 16: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

DerivationGrammars are term rewriting systems. The rules offer feasible rewriting steps. A sequenceof such rewriting steps α0 → . . . → αm is called derivation.

E

→ E + T→ T + T→ T ∗ F + T→ T ∗ int + T→ F ∗ int + T→ name ∗ int + T→ name ∗ int + F→ name ∗ int + int

DefinitionThe rewriting relation→ is a relation on words over N ∪ T , with

α→ α′ iff α = α1 A α2 ∧ α′ = α1 β α2 for an A→ β ∈ P

The reflexive and transitive closure of → is denoted as: →∗

9 / 55

... for example:

Page 17: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

DerivationGrammars are term rewriting systems. The rules offer feasible rewriting steps. A sequenceof such rewriting steps α0 → . . . → αm is called derivation.

E → E + T

→ T + T→ T ∗ F + T→ T ∗ int + T→ F ∗ int + T→ name ∗ int + T→ name ∗ int + F→ name ∗ int + int

DefinitionThe rewriting relation→ is a relation on words over N ∪ T , with

α→ α′ iff α = α1 A α2 ∧ α′ = α1 β α2 for an A→ β ∈ P

The reflexive and transitive closure of → is denoted as: →∗

9 / 55

... for example:

Page 18: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

DerivationGrammars are term rewriting systems. The rules offer feasible rewriting steps. A sequenceof such rewriting steps α0 → . . . → αm is called derivation.

E → E + T→ T + T

→ T ∗ F + T→ T ∗ int + T→ F ∗ int + T→ name ∗ int + T→ name ∗ int + F→ name ∗ int + int

DefinitionThe rewriting relation→ is a relation on words over N ∪ T , with

α→ α′ iff α = α1 A α2 ∧ α′ = α1 β α2 for an A→ β ∈ P

The reflexive and transitive closure of → is denoted as: →∗

9 / 55

... for example:

Page 19: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

DerivationGrammars are term rewriting systems. The rules offer feasible rewriting steps. A sequenceof such rewriting steps α0 → . . . → αm is called derivation.

E → E + T→ T + T→ T ∗ F + T

→ T ∗ int + T→ F ∗ int + T→ name ∗ int + T→ name ∗ int + F→ name ∗ int + int

DefinitionThe rewriting relation→ is a relation on words over N ∪ T , with

α→ α′ iff α = α1 A α2 ∧ α′ = α1 β α2 for an A→ β ∈ P

The reflexive and transitive closure of → is denoted as: →∗

9 / 55

... for example:

Page 20: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

DerivationGrammars are term rewriting systems. The rules offer feasible rewriting steps. A sequenceof such rewriting steps α0 → . . . → αm is called derivation.

E → E + T→ T + T→ T ∗ F + T→ T ∗ int + T

→ F ∗ int + T→ name ∗ int + T→ name ∗ int + F→ name ∗ int + int

DefinitionThe rewriting relation→ is a relation on words over N ∪ T , with

α→ α′ iff α = α1 A α2 ∧ α′ = α1 β α2 for an A→ β ∈ P

The reflexive and transitive closure of → is denoted as: →∗

9 / 55

... for example:

Page 21: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

DerivationGrammars are term rewriting systems. The rules offer feasible rewriting steps. A sequenceof such rewriting steps α0 → . . . → αm is called derivation.

E → E + T→ T + T→ T ∗ F + T→ T ∗ int + T→ F ∗ int + T

→ name ∗ int + T→ name ∗ int + F→ name ∗ int + int

DefinitionThe rewriting relation→ is a relation on words over N ∪ T , with

α→ α′ iff α = α1 A α2 ∧ α′ = α1 β α2 for an A→ β ∈ P

The reflexive and transitive closure of → is denoted as: →∗

9 / 55

... for example:

Page 22: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

DerivationGrammars are term rewriting systems. The rules offer feasible rewriting steps. A sequenceof such rewriting steps α0 → . . . → αm is called derivation.

E → E + T→ T + T→ T ∗ F + T→ T ∗ int + T→ F ∗ int + T→ name ∗ int + T

→ name ∗ int + F→ name ∗ int + int

DefinitionThe rewriting relation→ is a relation on words over N ∪ T , with

α→ α′ iff α = α1 A α2 ∧ α′ = α1 β α2 for an A→ β ∈ P

The reflexive and transitive closure of → is denoted as: →∗

9 / 55

... for example:

Page 23: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

DerivationGrammars are term rewriting systems. The rules offer feasible rewriting steps. A sequenceof such rewriting steps α0 → . . . → αm is called derivation.

E → E + T→ T + T→ T ∗ F + T→ T ∗ int + T→ F ∗ int + T→ name ∗ int + T→ name ∗ int + F

→ name ∗ int + int

DefinitionThe rewriting relation→ is a relation on words over N ∪ T , with

α→ α′ iff α = α1 A α2 ∧ α′ = α1 β α2 for an A→ β ∈ P

The reflexive and transitive closure of → is denoted as: →∗

9 / 55

... for example:

Page 24: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

DerivationGrammars are term rewriting systems. The rules offer feasible rewriting steps. A sequenceof such rewriting steps α0 → . . . → αm is called derivation.

E → E + T→ T + T→ T ∗ F + T→ T ∗ int + T→ F ∗ int + T→ name ∗ int + T→ name ∗ int + F→ name ∗ int + int

DefinitionThe rewriting relation→ is a relation on words over N ∪ T , with

α→ α′ iff α = α1 A α2 ∧ α′ = α1 β α2 for an A→ β ∈ P

The reflexive and transitive closure of → is denoted as: →∗

9 / 55

... for example:

Page 25: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

DerivationGrammars are term rewriting systems. The rules offer feasible rewriting steps. A sequenceof such rewriting steps α0 → . . . → αm is called derivation.

E → E + T→ T + T→ T ∗ F + T→ T ∗ int + T→ F ∗ int + T→ name ∗ int + T→ name ∗ int + F→ name ∗ int + int

DefinitionThe rewriting relation→ is a relation on words over N ∪ T , with

α→ α′ iff α = α1 A α2 ∧ α′ = α1 β α2 for an A→ β ∈ P

The reflexive and transitive closure of → is denoted as: →∗

9 / 55

... for example:

Page 26: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

DerivationGrammars are term rewriting systems. The rules offer feasible rewriting steps. A sequenceof such rewriting steps α0 → . . . → αm is called derivation.

E → E + T→ T + T→ T ∗ F + T→ T ∗ int + T→ F ∗ int + T→ name ∗ int + T→ name ∗ int + F→ name ∗ int + int

DefinitionThe rewriting relation→ is a relation on words over N ∪ T , with

α→ α′ iff α = α1 A α2 ∧ α′ = α1 β α2 for an A→ β ∈ P

The reflexive and transitive closure of → is denoted as: →∗9 / 55

... for example:

Page 27: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Derivation

Remarks:The relation → depends on the grammarIn each step of a derivation, we may choose:

∗ a spot, determining where we will rewrite.

∗ a rule, determining how we will rewrite.

The language, specified by G is:

L(G) = {w ∈ T ∗ | S →∗ w}

Attention:The order, in which disjunct fragments are rewritten is not relevant.

10 / 55

Page 28: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Derivation

Remarks:The relation → depends on the grammarIn each step of a derivation, we may choose:

∗ a spot, determining where we will rewrite.

∗ a rule, determining how we will rewrite.

The language, specified by G is:

L(G) = {w ∈ T ∗ | S →∗ w}

Attention:The order, in which disjunct fragments are rewritten is not relevant.

10 / 55

Page 29: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Derivation Tree

Derivations of a symbol are represented as derivation trees:

... for example:

E → 0 E + T→ 1 T + T→ 0 T ∗ F + T→ 2 T ∗ int + T→ 1 F ∗ int + T→ 1 name ∗ int + T→ 1 name ∗ int + F→ 2 name ∗ int + int

A derivation tree for A ∈ N :inner nodes: rule applications

root: rule application for Aleaves: terminals or ε

The successors of (B, i) correspond to right hand sides of the rule11 / 55

E 0

+E 1

T 0

T 1

F 1

F 2

F 2

T 1

name

int

int∗

Page 30: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Special Derivations

Attention:In contrast to arbitrary derivations, we find special ones, always rewriting the leftmost (orrather rightmost) occurance of a nonterminal.

These are called leftmost (or rather rightmost) derivations and are denoted with theindex L (or R respectively).Leftmost (or rightmost) derivations correspondt to a left-to-right (or right-to-left)preorder-DFS-traversal of the derivation tree.Reverse rightmost derivations correspond to a left-to-right postorder-DFS-traversal ofthe derivation tree

12 / 55

Page 31: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Special Derivations

... for example:E 0

+E 1

T 0

T 1

F 1

F 2

F 2

T 1

name

int

int∗

Leftmost derivation: (E, 0) (E, 1) (T , 0) (T , 1) (F , 1) (F , 2) (T , 1) (F , 2)Rightmost derivation: (E, 0) (T , 1) (F , 2) (E, 1) (T , 0) (F , 2) (T , 1) (F , 1)Reverse rightmost derivation: (F , 1) (T , 1) (F , 2) (T , 0) (E, 1) (F , 2) (T , 1) (E, 0)

13 / 55

Page 32: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Special Derivations

... for example:E 0

+E 1

T 0

T 1

F 1

F 2

F 2

T 1

name

int

int∗

Leftmost derivation: (E, 0) (E, 1) (T , 0) (T , 1) (F , 1) (F , 2) (T , 1) (F , 2)

Rightmost derivation: (E, 0) (T , 1) (F , 2) (E, 1) (T , 0) (F , 2) (T , 1) (F , 1)Reverse rightmost derivation: (F , 1) (T , 1) (F , 2) (T , 0) (E, 1) (F , 2) (T , 1) (E, 0)

13 / 55

Page 33: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Special Derivations

... for example:E 0

+E 1

T 0

T 1

F 1

F 2

F 2

T 1

name

int

int∗

Leftmost derivation: (E, 0) (E, 1) (T , 0) (T , 1) (F , 1) (F , 2) (T , 1) (F , 2)Rightmost derivation: (E, 0) (T , 1) (F , 2) (E, 1) (T , 0) (F , 2) (T , 1) (F , 1)

Reverse rightmost derivation: (F , 1) (T , 1) (F , 2) (T , 0) (E, 1) (F , 2) (T , 1) (E, 0)

13 / 55

Page 34: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Special Derivations

... for example:E 0

+E 1

T 0

T 1

F 1

F 2

F 2

T 1

name

int

int∗

Leftmost derivation: (E, 0) (E, 1) (T , 0) (T , 1) (F , 1) (F , 2) (T , 1) (F , 2)Rightmost derivation: (E, 0) (T , 1) (F , 2) (E, 1) (T , 0) (F , 2) (T , 1) (F , 1)Reverse rightmost derivation: (F , 1) (T , 1) (F , 2) (T , 0) (E, 1) (F , 2) (T , 1) (E, 0)

13 / 55

Page 35: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Unique Grammars

The concatenation of leaves of a derivation tree t are often called yield(t) .

... for example:E 0

+E 1

T 0

T 1

F 1

F 2

F 2

T 1

name

int

int∗

gives rise to the concatenation: name ∗ int + int .14 / 55

Page 36: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Unique Grammars

Definition:Grammar G is called unique, if for every w ∈ T ∗ there is maximally one derivationtree t of S with yield(t) = w.

... in our example:

E → E+E 0 | E∗E 1 | ( E ) 2 | name 3 | int 4

E → E+T 0 | T 1

T → T∗F 0 | F 1

F → ( E ) 0 | name 1 | int 2

The first one is ambiguous, the second one is unique

15 / 55

Page 37: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Conclusion:

A derivation tree represents a possible hierarchical structure of a word.For programming languages, only those grammars with a unique structure are ofinterest.Derivation trees are one-to-one corresponding with leftmost derivations as well as(reverse) rightmost derivations.

Leftmost derivations correspond to a top-down reconstruction of the syntax tree.Reverse rightmost derivations correspond to a bottom-up reconstruction of the syntaxtree.

16 / 55

Page 38: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Conclusion:

A derivation tree represents a possible hierarchical structure of a word.For programming languages, only those grammars with a unique structure are ofinterest.Derivation trees are one-to-one corresponding with leftmost derivations as well as(reverse) rightmost derivations.

Leftmost derivations correspond to a top-down reconstruction of the syntax tree.Reverse rightmost derivations correspond to a bottom-up reconstruction of the syntaxtree.

16 / 55

Page 39: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Chapter 2:

Basics of Pushdown Automata

17 / 55

Syntactic Analysis

Page 40: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Basics of Pushdown Automata

Languages, specified by context free grammars are accepted by Pushdown Automata:

The pushdown is used e.g. to verify correct nesting of braces.

18 / 55

Page 41: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Example:

States: 0, 1, 2Start state: 0Final states: 0, 2

0 a 111 a 1111 b 212 b 2

Conventions:We do not differentiate between pushdown symbols and statesThe rightmost / upper pushdown symbol represents the stateEvery transition consumes / modifies the upper part of the pushdown

19 / 55

Page 42: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Example:

States: 0, 1, 2Start state: 0Final states: 0, 2

0 a 111 a 1111 b 212 b 2

Conventions:We do not differentiate between pushdown symbols and statesThe rightmost / upper pushdown symbol represents the stateEvery transition consumes / modifies the upper part of the pushdown

19 / 55

Page 43: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Definition: Pushdown AutomatonA pushdown automaton (PDA) is a tupleM = (Q,T , δ, q0, F ) with:

Q a finite set of states;T an input alphabet;q0 ∈ Q the start state;F ⊆ Q the set of final states andδ ⊆ Q+ × (T ∪ {ε})×Q∗ a finite set of transitions

We define computations of pushdown automata with the help of transitions; a particularcomputation state (the current configuration) is a pair:

(γ,w) ∈ Q∗ × T ∗

consisting of the pushdown content and the remaining input.

20 / 55

Friedrich Bauer Klaus Samelson

Page 44: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Definition: Pushdown AutomatonA pushdown automaton (PDA) is a tupleM = (Q,T , δ, q0, F ) with:

Q a finite set of states;T an input alphabet;q0 ∈ Q the start state;F ⊆ Q the set of final states andδ ⊆ Q+ × (T ∪ {ε})×Q∗ a finite set of transitions

We define computations of pushdown automata with the help of transitions; a particularcomputation state (the current configuration) is a pair:

(γ,w) ∈ Q∗ × T ∗

consisting of the pushdown content and the remaining input.

20 / 55

Friedrich Bauer Klaus Samelson

Page 45: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

... for example:

States: 0, 1, 2Start state: 0Final states: 0, 2

0 a 111 a 1111 b 212 b 2

(0 , a a a b b b) ` (1 1 , a a b b b)` (1 1 1 , a b b b)` (1 1 1 1 , b b b)` (1 1 2 , b b)` (1 2 , b)` (2 , ε)

21 / 55

Page 46: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

... for example:

States: 0, 1, 2Start state: 0Final states: 0, 2

0 a 111 a 1111 b 212 b 2

(0 , a a a b b b)

` (1 1 , a a b b b)` (1 1 1 , a b b b)` (1 1 1 1 , b b b)` (1 1 2 , b b)` (1 2 , b)` (2 , ε)

21 / 55

Page 47: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

... for example:

States: 0, 1, 2Start state: 0Final states: 0, 2

0 a 111 a 1111 b 212 b 2

(0 , a a a b b b) ` (1 1 , a a b b b)

` (1 1 1 , a b b b)` (1 1 1 1 , b b b)` (1 1 2 , b b)` (1 2 , b)` (2 , ε)

21 / 55

Page 48: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

... for example:

States: 0, 1, 2Start state: 0Final states: 0, 2

0 a 111 a 1111 b 212 b 2

(0 , a a a b b b) ` (1 1 , a a b b b)` (1 1 1 , a b b b)

` (1 1 1 1 , b b b)` (1 1 2 , b b)` (1 2 , b)` (2 , ε)

21 / 55

Page 49: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

... for example:

States: 0, 1, 2Start state: 0Final states: 0, 2

0 a 111 a 1111 b 212 b 2

(0 , a a a b b b) ` (1 1 , a a b b b)` (1 1 1 , a b b b)` (1 1 1 1 , b b b)

` (1 1 2 , b b)` (1 2 , b)` (2 , ε)

21 / 55

Page 50: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

... for example:

States: 0, 1, 2Start state: 0Final states: 0, 2

0 a 111 a 1111 b 212 b 2

(0 , a a a b b b) ` (1 1 , a a b b b)` (1 1 1 , a b b b)` (1 1 1 1 , b b b)` (1 1 2 , b b)

` (1 2 , b)` (2 , ε)

21 / 55

Page 51: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

... for example:

States: 0, 1, 2Start state: 0Final states: 0, 2

0 a 111 a 1111 b 212 b 2

(0 , a a a b b b) ` (1 1 , a a b b b)` (1 1 1 , a b b b)` (1 1 1 1 , b b b)` (1 1 2 , b b)` (1 2 , b)

` (2 , ε)

21 / 55

Page 52: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

... for example:

States: 0, 1, 2Start state: 0Final states: 0, 2

0 a 111 a 1111 b 212 b 2

(0 , a a a b b b) ` (1 1 , a a b b b)` (1 1 1 , a b b b)` (1 1 1 1 , b b b)` (1 1 2 , b b)` (1 2 , b)` (2 , ε)

21 / 55

Page 53: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

A computation step is characterized by the relation ` ⊆ (Q∗ × T ∗)2 with

(αγ, xw) ` (αγ′, w) for (γ, x, γ′) ∈ δ

Remarks:

The relation ` depends on the pushdown automaton MThe reflexive and transitive closure of ` is denoted by `∗

Then, the language accepted by M is

L(M) = {w ∈ T ∗ | ∃ f ∈ F : (q0, w)`∗ (f, ε)}

We accept with a final state together with empty input.

22 / 55

Page 54: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

A computation step is characterized by the relation ` ⊆ (Q∗ × T ∗)2 with

(αγ, xw) ` (αγ′, w) for (γ, x, γ′) ∈ δ

Remarks:

The relation ` depends on the pushdown automaton MThe reflexive and transitive closure of ` is denoted by `∗

Then, the language accepted by M is

L(M) = {w ∈ T ∗ | ∃ f ∈ F : (q0, w)`∗ (f, ε)}

We accept with a final state together with empty input.

22 / 55

Page 55: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

A computation step is characterized by the relation ` ⊆ (Q∗ × T ∗)2 with

(αγ, xw) ` (αγ′, w) for (γ, x, γ′) ∈ δ

Remarks:

The relation ` depends on the pushdown automaton MThe reflexive and transitive closure of ` is denoted by `∗

Then, the language accepted by M is

L(M) = {w ∈ T ∗ | ∃ f ∈ F : (q0, w)`∗ (f, ε)}

We accept with a final state together with empty input.

22 / 55

Page 56: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Definition: Deterministic Pushdown AutomatonThe pushdown automaton M is deterministic, if every configuration has maximally onesuccessor configuration.

This is exactly the case if for distinct transitions (γ1, x, γ2) , (γ′1, x′, γ′2) ∈ δ we can

assume:Is γ1 a suffix of γ′1, then x 6= x′ ∧ x 6= ε 6= x′ is valid.

... for example:

0 a 111 a 1111 b 212 b 2

... this obviously holds

23 / 55

Page 57: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Definition: Deterministic Pushdown AutomatonThe pushdown automaton M is deterministic, if every configuration has maximally onesuccessor configuration.

This is exactly the case if for distinct transitions (γ1, x, γ2) , (γ′1, x′, γ′2) ∈ δ we can

assume:Is γ1 a suffix of γ′1, then x 6= x′ ∧ x 6= ε 6= x′ is valid.

... for example:

0 a 111 a 1111 b 212 b 2

... this obviously holds

23 / 55

Page 58: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Pushdown Automata

Theorem:For each context free grammar G = (N,T , P , S)a pushdown automaton M with L(G) = L(M) can be built.

The theorem is so important for us, that we take a look at two constructions for automata,motivated by both of the special derivations:

MLG to build Leftmost derivations

MRG to build reverse Rightmost derivations

24 / 55

M. Schützenberger A. Öttinger

Page 59: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Chapter 3:

Top-down Parsing

25 / 55

Syntactic Analysis

Page 60: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Item Pushdown Automaton

Construction: Item Pushdown Automaton MLG

Reconstruct a Leftmost derivation.Expand nonterminals using a rule.Verify successively, that the chosen rule matches the input.

==⇒ The states are now Items (= rules with a bullet):

[A→α • β] , A→ αβ ∈ P

The bullet marks the spot, how far the rule is already processed

26 / 55

Page 61: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Item Pushdown Automaton – Example

Our example:

S → AB0 A → a0 B → b0

a b

0S

A B0 0

27 / 55

Page 62: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Item Pushdown Automaton – Example

Our example:

S → AB0 A → a0 B → b0

a b

0S

A B0 0

27 / 55

Page 63: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Item Pushdown Automaton – Example

Our example:

S → AB0 A → a0 B → b0

a b

0S

A B0 0

27 / 55

Page 64: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Item Pushdown Automaton – Example

Our example:

S → AB0 A → a0 B → b0

a b

0S

A B0 0

27 / 55

Page 65: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Item Pushdown Automaton – Example

Our example:

S → AB0 A → a0 B → b0

a b

0S

A B0 0

27 / 55

Page 66: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Item Pushdown Automaton – Example

Our example:

S → AB0 A → a0 B → b0

a b

0S

A B0 0

27 / 55

Page 67: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Item Pushdown Automaton – Example

Our example:

S → AB0 A → a0 B → b0

a b

0S

A B0 0

27 / 55

Page 68: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Item Pushdown Automaton – Example

Our example:

S → AB0 A → a0 B → b0

a b

0S

A B0 0

27 / 55

Page 69: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Item Pushdown Automaton – Example

Our example:

S → AB0 A → a0 B → b0

a b

0S

A B0 0

27 / 55

Page 70: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Item Pushdown Automaton – Example

Our example:

S → AB0 A → a0 B → b0

a b

S

A B0 0

0

27 / 55

Page 71: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Item Pushdown Automaton – Example

Our example:

S → AB0 A → a0 B → b0

a b

0S

A B0 0

27 / 55

Page 72: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Item Pushdown Automaton – Example

We add another rule S′ → S $ for initialising the construction:

Start state: [S′→ • S $]End state: [S′→S • $]Transition relations:

[S′→ • S $] ε [S′→ • S $] [S→ • AB][S→ • AB] ε [S→ • AB] [A→ • a][A→ • a] a [A→ a •][S→ • AB] [A→ a •] ε [S→A • B][S→A • B] ε [S→A • B] [B→ • b][B→ • b] b [B→ b •][S→A • B] [B→ b •] ε [S→AB •][S′→ • S $] [S→AB •] ε [S′→S • $]

28 / 55

Page 73: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Item Pushdown Automaton

The item pushdown automaton MLG has three kinds of transitions:

Expansions: ([A→α •B β], ε, [A→α •B β] [B→ • γ]) forA → αB β, B→ γ ∈ P

Shifts: ([A→α • a β], a, [A→αa • β]) for A→αaβ ∈ PReduces: ([A→α •B β] [B→ γ•], ε, [A→αB • β]) for

A→αB β, B→ γ ∈ P

Items of the form: [A→α •] are also called completeThe item pushdown automaton shifts the bullet around the derivation tree ...

29 / 55

Page 74: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Item Pushdown Automaton

Discussion:

The expansions of a computation form a leftmost derivationUnfortunately, the expansions are chosen nondeterministically

For proving correctness of the construction, we show that for every Item [A→α •B β]the following holds:

([A→α •B β], w) `∗ ([A→αB • β], ε) iff B →∗ w

LL-Parsing is based on the item pushdown automaton and tries to make theexpansions deterministic ...

30 / 55

Page 75: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Item Pushdown Automaton

Example: S′ → S $ S → ε | aS b

The transitions of the according Item Pushdown Automaton:

0 [S′→ • S $] ε [S′→ • S $] [S→•]1 [S′→ • S $] ε [S′→ • S $] [S→ • aS b]2 [S→ • aS b] a [S→ a • S b]3 [S→ a • S b] ε [S→ a • S b] [S→•]4 [S→ a • S b] ε [S→ a • S b] [S→ • aS b]5 [S→ a • S b] [S→•] ε [S→ aS • b]6 [S→ a • S b] [S→ aS b•] ε [S→ aS • b]7 [S→ aS • b] b [S→ aS b•]8 [S′→ • S $] [S→•] ε [S′→S • $]9 [S′→ • S $] [S→ aS b•] ε [S′→S • $]

Conflicts arise between the transitions (0, 1) and (3, 4), resp..

31 / 55

Page 76: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Item Pushdown Automaton

Example: S′ → S $ S → ε | aS b

The transitions of the according Item Pushdown Automaton:

0 [S′→ • S $] ε [S′→ • S $] [S→•]1 [S′→ • S $] ε [S′→ • S $] [S→ • aS b]2 [S→ • aS b] a [S→ a • S b]3 [S→ a • S b] ε [S→ a • S b] [S→•]4 [S→ a • S b] ε [S→ a • S b] [S→ • aS b]5 [S→ a • S b] [S→•] ε [S→ aS • b]6 [S→ a • S b] [S→ aS b•] ε [S→ aS • b]7 [S→ aS • b] b [S→ aS b•]8 [S′→ • S $] [S→•] ε [S′→S • $]9 [S′→ • S $] [S→ aS b•] ε [S′→S • $]

Conflicts arise between the transitions (0, 1) and (3, 4), resp..

31 / 55

Page 77: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Topdown Parsing

Problem:Conflicts between the transitions prohibit an implementation of the item pushdownautomaton as deterministic pushdown automaton.

Idea 1: GLL ParsingFor each conflict, we create a virtual copy of the complete configuration and continuecomputing in parallel.

Idea 2: Recursive Descent & BacktrackingDepth-first search for an appropriate derivation.

Idea 3: Recursive Descent & LookaheadConflicts are resolved by considering a lookup of the next input symbols.

32 / 55

Page 78: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Topdown Parsing

Problem:Conflicts between the transitions prohibit an implementation of the item pushdownautomaton as deterministic pushdown automaton.

Idea 1: GLL ParsingFor each conflict, we create a virtual copy of the complete configuration and continuecomputing in parallel.

Idea 2: Recursive Descent & BacktrackingDepth-first search for an appropriate derivation.

Idea 3: Recursive Descent & LookaheadConflicts are resolved by considering a lookup of the next input symbols.

32 / 55

Page 79: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Topdown Parsing

Problem:Conflicts between the transitions prohibit an implementation of the item pushdownautomaton as deterministic pushdown automaton.

Idea 1: GLL ParsingFor each conflict, we create a virtual copy of the complete configuration and continuecomputing in parallel.

Idea 2: Recursive Descent & BacktrackingDepth-first search for an appropriate derivation.

Idea 3: Recursive Descent & LookaheadConflicts are resolved by considering a lookup of the next input symbols.

32 / 55

Page 80: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Topdown Parsing

Problem:Conflicts between the transitions prohibit an implementation of the item pushdownautomaton as deterministic pushdown automaton.

Idea 1: GLL ParsingFor each conflict, we create a virtual copy of the complete configuration and continuecomputing in parallel.

Idea 2: Recursive Descent & BacktrackingDepth-first search for an appropriate derivation.

Idea 3: Recursive Descent & LookaheadConflicts are resolved by considering a lookup of the next input symbols.

32 / 55

Page 81: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Structure of the LL(1)-Parser:

δ

MOutput

The parser accesses a frame of length 1 of the input;it corresponds to an item pushdown automaton, essentially;table M [q, w] contains the rule of choice.

33 / 55

Page 82: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Topdown Parsing

Idea:Emanate from the item pushdown automatonConsider the next input symbol to determine the appropriate rule for the next expansionA grammar is called LL(1) if a unique choice is always possible

Definition:A reduced grammar is called LL(1), if for each two distinctrules A→α , A→α′ ∈ P and each derivationS →∗L uAβ with u ∈ T ∗ the following is valid:

First1(αβ) ∩ First1(α′ β) = ∅

34 / 55

Page 83: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Topdown Parsing

Idea:Emanate from the item pushdown automatonConsider the next input symbol to determine the appropriate rule for the next expansionA grammar is called LL(1) if a unique choice is always possible

Definition:A reduced grammar is called LL(1), if for each two distinctrules A→α , A→α′ ∈ P and each derivationS →∗L uAβ with u ∈ T ∗ the following is valid:

First1(αβ) ∩ First1(α′ β) = ∅

34 / 55

Philip Lewis Richard Stearns

Page 84: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Topdown Parsing

Example 1:

S → if ( E ) S else S |while ( E ) S |E ;

E → id

is LL(1), since First1(E) = {id}

Example 2:

S → if ( E ) S else S |if ( E ) S |while ( E ) S |E ;

E → id

... is not LL(k) for any k > 0.

35 / 55

Page 85: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Topdown Parsing

Example 1:

S → if ( E ) S else S |while ( E ) S |E ;

E → id

is LL(1), since First1(E) = {id}Example 2:

S → if ( E ) S else S |if ( E ) S |while ( E ) S |E ;

E → id

... is not LL(k) for any k > 0.35 / 55

Page 86: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Lookahead Sets

Definition: First1-SetsFor a set L ⊆ T ∗ we define:

First1(L) = {ε | ε ∈ L} ∪ {u ∈ T | ∃ v ∈ T ∗ : uv ∈ L}

Example: S → ε | aS bFirst1([[S]])εa ba a b ba a a b b b. . .

≡ the yield’s prefix of length 1

36 / 55

Page 87: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Lookahead Sets

Definition: First1-SetsFor a set L ⊆ T ∗ we define:

First1(L) = {ε | ε ∈ L} ∪ {u ∈ T | ∃ v ∈ T ∗ : uv ∈ L}

Example: S → ε | aS bFirst1([[S]])εa ba a b ba a a b b b. . .

≡ the yield’s prefix of length 136 / 55

Page 88: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Lookahead Sets

Arithmetics:First1(_) is distributive with union and concatenation:

First1(∅) = ∅First1(L1 ∪ L2) = First1(L1) ∪ First1(L2)First1(L1 · L2) = First1(First1(L1) · First1(L2))

:= First1(L1) �1 First1(L2)

�1 being 1− concatenation

Definition: 1-concatenationLet L1, L2 ⊆ T ∪ {ε} with L1 6= ∅ 6= L2. Then:

L1 �1 L2 =

{L1 if ε 6∈ L1

(L1\{ε}) ∪ L2 otherwise

If all rules of G are productive, then all sets First1(A) are non-empty.

37 / 55

Page 89: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Lookahead Sets

Arithmetics:First1(_) is distributive with union and concatenation:

First1(∅) = ∅First1(L1 ∪ L2) = First1(L1) ∪ First1(L2)First1(L1 · L2) = First1(First1(L1) · First1(L2))

:= First1(L1) �1 First1(L2)

�1 being 1− concatenation

Definition: 1-concatenationLet L1, L2 ⊆ T ∪ {ε} with L1 6= ∅ 6= L2. Then:

L1 �1 L2 =

{L1 if ε 6∈ L1

(L1\{ε}) ∪ L2 otherwise

If all rules of G are productive, then all sets First1(A) are non-empty.37 / 55

Page 90: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Lookahead Sets

For α ∈ (N ∪ T )∗ we are interested in the set:

First1(α) = First1({w ∈ T ∗ | α→∗ w})

Idea: Treat ε separately: First1(A) = F ε(A) ∪ {ε | A→∗ε}Let empty(X) = true iff X→∗ ε .

We characterize the ε-free First1-sets with an inequality system:

F ε(a) = {a} if a ∈ TF ε(A) ⊇ F ε(Xj) if A→X1 . . . Xm ∈ P , empty(X1) ∧ . . . ∧ empty(Xj−1)

38 / 55

Page 91: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Lookahead Sets

For α ∈ (N ∪ T )∗ we are interested in the set:

First1(α) = First1({w ∈ T ∗ | α→∗ w})

Idea: Treat ε separately: First1(A) = F ε(A) ∪ {ε | A→∗ε}Let empty(X) = true iff X→∗ ε .

F ε(X1 . . . Xm) = F ε(X1) ∪ . . . ∪ F ε(Xj) if ¬empty(Xj) ∧∧j−1i=1 empty(Xi)

We characterize the ε-free First1-sets with an inequality system:

F ε(a) = {a} if a ∈ TF ε(A) ⊇ F ε(Xj) if A→X1 . . . Xm ∈ P , empty(X1) ∧ . . . ∧ empty(Xj−1)

38 / 55

Page 92: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Lookahead Sets

For α ∈ (N ∪ T )∗ we are interested in the set:

First1(α) = First1({w ∈ T ∗ | α→∗ w})

Idea: Treat ε separately: First1(A) = F ε(A) ∪ {ε | A→∗ε}Let empty(X) = true iff X→∗ ε .

F ε(X1 . . . Xm) =⋃ji=1 F ε(Xi) if ¬empty(Xj) ∧

∧j−1i=1 empty(Xi)

We characterize the ε-free First1-sets with an inequality system:

F ε(a) = {a} if a ∈ TF ε(A) ⊇ F ε(Xj) if A→X1 . . . Xm ∈ P , empty(X1) ∧ . . . ∧ empty(Xj−1)

38 / 55

Page 93: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Lookahead Sets

For α ∈ (N ∪ T )∗ we are interested in the set:

First1(α) = First1({w ∈ T ∗ | α→∗ w})

Idea: Treat ε separately: First1(A) = F ε(A) ∪ {ε | A→∗ε}Let empty(X) = true iff X→∗ ε .

F ε(X1 . . . Xm) =⋃ji=1 F ε(Xi) if ¬empty(Xj) ∧

∧j−1i=1 empty(Xi)

We characterize the ε-free First1-sets with an inequality system:

F ε(a) = {a} if a ∈ TF ε(A) ⊇ F ε(Xj) if A→X1 . . . Xm ∈ P , empty(X1) ∧ . . . ∧ empty(Xj−1)

38 / 55

Page 94: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Lookahead Sets

for example...

E → E+T 0 | T 1

T → T ∗F 0 | F 1

F → ( E ) 0 | name 1 | int 2

with empty(E) = empty(T ) = empty(F ) = false

... we obtain:

F ε(S′) ⊇ F ε(E) F ε(E) ⊇ F ε(E)

F ε(E) ⊇ F ε(T ) F ε(T ) ⊇ F ε(T )F ε(T ) ⊇ F ε(F ) F ε(F ) ⊇ { ( , name, int}

39 / 55

Page 95: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Lookahead Sets

for example...

E → E+T 0 | T 1

T → T ∗F 0 | F 1

F → ( E ) 0 | name 1 | int 2

with empty(E) = empty(T ) = empty(F ) = false

... we obtain:

F ε(S′) ⊇ F ε(E) F ε(E) ⊇ F ε(E)

F ε(E) ⊇ F ε(T ) F ε(T ) ⊇ F ε(T )F ε(T ) ⊇ F ε(F ) F ε(F ) ⊇ { ( , name, int}

39 / 55

Page 96: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Fast Computation of Lookahead Sets

Observation:The form of each inequality of these systems is:

x w y resp. x w d

for variables x, y und d ∈ D.Such systems are called pure unification problemsSuch problems can be solved in linear space/time.

for example: D = 2{a,b,c}

x0 ⊇ {a}x1 ⊇ {b} x1 ⊇ x0 x1 ⊇ x3

x2 ⊇ {c} x2 ⊇ x1

x3 ⊇ {c} x3 ⊇ x2 x3 ⊇ x3

a b

c

c

0 1

3

2

40 / 55

Page 97: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Fast Computation of Lookahead Sets

a b

c

c

0 1

3

2

Proceeding:Create the Variable Dependency Graph for the inequality system.

Whithin a Strongly Connected Component (→ Tarjan) all variables have the same valueIs there no ingoing edge for an SCC, its value is computed via the smallest upperbound of all values within the SCCIn case of ingoing edges, their values are also to be considered for the upper bound

41 / 55

Frank DeRemer& Tom Pennello

Page 98: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Fast Computation of Lookahead Sets

a b

c

c

0 1

3

2

Proceeding:Create the Variable Dependency Graph for the inequality system.Whithin a Strongly Connected Component (→ Tarjan) all variables have the same value

Is there no ingoing edge for an SCC, its value is computed via the smallest upperbound of all values within the SCCIn case of ingoing edges, their values are also to be considered for the upper bound

41 / 55

Frank DeRemer& Tom Pennello

Page 99: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Fast Computation of Lookahead Sets

a

b c

0 1

3

2

Proceeding:Create the Variable Dependency Graph for the inequality system.Whithin a Strongly Connected Component (→ Tarjan) all variables have the same valueIs there no ingoing edge for an SCC, its value is computed via the smallest upperbound of all values within the SCC

In case of ingoing edges, their values are also to be considered for the upper bound

41 / 55

Frank DeRemer& Tom Pennello

Page 100: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Fast Computation of Lookahead Sets

a

a b c

0 1

3

2

Proceeding:Create the Variable Dependency Graph for the inequality system.Whithin a Strongly Connected Component (→ Tarjan) all variables have the same valueIs there no ingoing edge for an SCC, its value is computed via the smallest upperbound of all values within the SCCIn case of ingoing edges, their values are also to be considered for the upper bound

41 / 55

Frank DeRemer& Tom Pennello

Page 101: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Fast Computation of Lookahead Sets

... for our example grammar:

First1 :

E T FS’

(, int, name

42 / 55

Page 102: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Item Pushdown Automaton as LL(1)-Parser

context is relevant too: S′ → S $ S → ε 0 | aS b 1

First1(input) $ a b

S ? ? ?

43 / 55

S′i0

A1i1

in

β

β1

β0

w ∈ First1( )

S

?

Page 103: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Item Pushdown Automaton as LL(1)-Parser

context is relevant too: S′ → S $ S → ε 0 | aS b 1

First1(input) $ a b

S ? ? ?

43 / 55

γ

S′i0

A1i1

in

β

β1

β0

w ∈ First1( )

S

aBb1

S′i0

A1i1

in

β

β0

w ∈ First1( )

S

0 ε

β1

Page 104: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Item Pushdown Automaton as LL(1)-Parser

Inequality system for Follow1(B) = First1(β)�1 . . .�1 First1(β0)

Follow1(S) ⊇ {$}Follow1(B) ⊇ F ε(Xj) if A→αBX1 . . . Xm ∈ P , empty(X1) ∧ . . . ∧ empty(Xj−1)Follow1(B) ⊇ Follow1(A) if A→αBX1 . . . Xm ∈ P , empty(X1) ∧ . . . ∧ empty(Xm)

44 / 55

γ

S′i0

A1i1

in

Bi β

β1

β0

An

)w ∈ First1(

w ∈ First1(First1(γ)�1 First1(β)�1 . . .�1 First1(β0))w ∈ First1(γ)�1 Follow1(B)

Page 105: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Item Pushdown Automaton as LL(1)-Parser

Inequality system for Follow1(B) = First1(β)�1 . . .�1 First1(β0)

Follow1(S) ⊇ {$}Follow1(B) ⊇ F ε(Xj) if A→αBX1 . . . Xm ∈ P , empty(X1) ∧ . . . ∧ empty(Xj−1)Follow1(B) ⊇ Follow1(A) if A→αBX1 . . . Xm ∈ P , empty(X1) ∧ . . . ∧ empty(Xm)

44 / 55

γ

S′i0

A1i1

in

Bi β

β1

β0

An

)w ∈ First1(

w ∈ First1(First1(γ)�1 First1(β)�1 . . .�1 First1(β0))w ∈ First1(γ)�1 Follow1(B)

Page 106: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Item Pushdown Automaton as LL(1)-Parser

Is G an LL(1)-grammar, we can index a lookahead-table with items and nonterminals:

LL(1)-Lookahead TableWe set M [B, w] = i with B→ γ i if w ∈ First1(γ) �1 Follow1(B)

... for example: S′ → S $ S → ε 0 | aS b 1

First1(S) = {ε, a} Follow1(S) = {b, $}

S-rule 0 : First1(ε) �1 Follow1(S) = {b, $}S-rule 1 : First1(aSb) �1 Follow1(S) = {a}

$ a b

S 0 1 0

45 / 55

Page 107: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Item Pushdown Automaton as LL(1)-Parser

Is G an LL(1)-grammar, we can index a lookahead-table with items and nonterminals:

LL(1)-Lookahead TableWe set M [B, w] = i with B→ γ i if w ∈ First1(γ) �1 Follow1(B)

... for example: S′ → S $ S → ε 0 | aS b 1

First1(S) = {ε, a}

Follow1(S) = {b, $}

S-rule 0 : First1(ε) �1 Follow1(S) = {b, $}S-rule 1 : First1(aSb) �1 Follow1(S) = {a}

$ a b

S 0 1 0

45 / 55

Page 108: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Item Pushdown Automaton as LL(1)-Parser

Is G an LL(1)-grammar, we can index a lookahead-table with items and nonterminals:

LL(1)-Lookahead TableWe set M [B, w] = i with B→ γ i if w ∈ First1(γ) �1 Follow1(B)

... for example: S′ → S $ S → ε 0 | aS b 1

First1(S) = {ε, a} Follow1(S) = {b, $}

S-rule 0 : First1(ε) �1 Follow1(S) = {b, $}S-rule 1 : First1(aSb) �1 Follow1(S) = {a}

$ a b

S 0 1 0

45 / 55

Page 109: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Item Pushdown Automaton as LL(1)-Parser

Is G an LL(1)-grammar, we can index a lookahead-table with items and nonterminals:

LL(1)-Lookahead TableWe set M [B, w] = i with B→ γ i if w ∈ First1(γ) �1 Follow1(B)

... for example: S′ → S $ S → ε 0 | aS b 1

First1(S) = {ε, a} Follow1(S) = {b, $}

S-rule 0 : First1(ε) �1 Follow1(S) = {b, $}S-rule 1 : First1(aSb) �1 Follow1(S) = {a}

$ a b

S 0 1 0

45 / 55

Page 110: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Item Pushdown Automaton as LL(1)-Parser

Is G an LL(1)-grammar, we can index a lookahead-table with items and nonterminals:

LL(1)-Lookahead TableWe set M [B, w] = i with B→ γ i if w ∈ First1(γ) �1 Follow1(B)

... for example: S′ → S $ S → ε 0 | aS b 1

First1(S) = {ε, a} Follow1(S) = {b, $}

S-rule 0 : First1(ε) �1 Follow1(S) = {b, $}S-rule 1 : First1(aSb) �1 Follow1(S) = {a}

$ a b

S 0 1 0

45 / 55

Page 111: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Item Pushdown Automaton as LL(1)-Parser

For example: S′ → S $ S → ε 0 | aS b 1

The transitions of the according Item Pushdown Automaton:

0 [S′→ • S $] ε [S′→ • S $] [S→•]1 [S′→ • S $] ε [S′→ • S $] [S→ • aS b]2 [S→ • aS b] a [S→ a • S b]3 [S→ a • S b] ε [S→ a • S b] [S→•]4 [S→ a • S b] ε [S→ a • S b] [S→ • aS b]5 [S→ a • S b] [S→•] ε [S→ aS • b]6 [S→ a • S b] [S→ aS b•] ε [S→ aS • b]7 [S→ aS • b] b [S→ aS b•]8 [S′→ • S $] [S→•] ε [S′→S • $]9 [S′→ • S $] [S→ aS b•] ε [S′→S • $]

Lookahead table:$ a b

S 0 1 0

46 / 55

Page 112: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Left Recursion

Attention:Many grammars are not LL(k) !

A reason for that is:

DefinitionGrammar G is called left-recursive, if

A→+Aβ for an A ∈ N , β ∈ (T ∪N)∗

Example:E → E+T 0 | T 1

T → T ∗F 0 | F 1

F → ( E ) 0 | name 1 | int 2

... is left-recursive

47 / 55

Page 113: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Left Recursion

Attention:Many grammars are not LL(k) !

A reason for that is:

DefinitionGrammar G is called left-recursive, if

A→+Aβ for an A ∈ N , β ∈ (T ∪N)∗

Example:E → E+T 0 | T 1

T → T ∗F 0 | F 1

F → ( E ) 0 | name 1 | int 2

... is left-recursive47 / 55

Page 114: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Left Recursion

Theorem:Let a grammar G be reduced and left-recursive, then G is not LL(k) for any k.

Proof:Let wlog. A→Aβ |α ∈ Pand A be reachable from S

Assumption: G is LL(k)

⇒Firstk(αβn γ) ∩

Firstk(αβn+1 γ) = ∅

Case 1: β→∗ ε — Contradiction !!!Case 2: β→∗ w 6= ε ==⇒ Firstk(αw

k γ) ∩ Firstk(αwk+1 γ) 6= ∅

48 / 55

Page 115: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Left Recursion

Theorem:Let a grammar G be reduced and left-recursive, then G is not LL(k) for any k.

Proof:Let wlog. A→Aβ |α ∈ Pand A be reachable from S

Assumption: G is LL(k)

⇒Firstk(αβn γ) ∩

Firstk(αβn+1 γ) = ∅

Case 1: β→∗ ε — Contradiction !!!Case 2: β→∗ w 6= ε ==⇒ Firstk(αw

k γ) ∩ Firstk(αwk+1 γ) 6= ∅

48 / 55

Page 116: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Left Recursion

Theorem:Let a grammar G be reduced and left-recursive, then G is not LL(k) for any k.

Proof:Let wlog. A→Aβ |α ∈ Pand A be reachable from S

Assumption: G is LL(k)

⇒Firstk(αβn γ) ∩

Firstk(αβn+1 γ) = ∅

Case 1: β→∗ ε — Contradiction !!!Case 2: β→∗ w 6= ε ==⇒ Firstk(αw

k γ) ∩ Firstk(αwk+1 γ) 6= ∅

48 / 55

S

A

A

A β

n

Firstk(

βn γ

)

Page 117: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Left Recursion

Theorem:Let a grammar G be reduced and left-recursive, then G is not LL(k) for any k.

Proof:Let wlog. A→Aβ |α ∈ Pand A be reachable from S

Assumption: G is LL(k)

⇒Firstk(αβn γ) ∩

Firstk(αβn+1 γ) = ∅

Case 1: β→∗ ε — Contradiction !!!Case 2: β→∗ w 6= ε ==⇒ Firstk(αw

k γ) ∩ Firstk(αwk+1 γ) 6= ∅

48 / 55

α

S

A

A

A β

n

Firstk(

βn γ

)

Page 118: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Left Recursion

Theorem:Let a grammar G be reduced and left-recursive, then G is not LL(k) for any k.

Proof:Let wlog. A→Aβ |α ∈ Pand A be reachable from S

Assumption: G is LL(k)

⇒Firstk(αβn γ) ∩

Firstk(αβn+1 γ) = ∅

Case 1: β→∗ ε — Contradiction !!!Case 2: β→∗ w 6= ε ==⇒ Firstk(αw

k γ) ∩ Firstk(αwk+1 γ) 6= ∅

48 / 55

α

A

S

A

n

βn

A

βA

β

Firstk(

γ

)

Page 119: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Left Recursion

Theorem:Let a grammar G be reduced and left-recursive, then G is not LL(k) for any k.

Proof:Let wlog. A→Aβ |α ∈ Pand A be reachable from S

Assumption: G is LL(k)

⇒Firstk(αβn γ) ∩

Firstk(αβn+1 γ) = ∅

Case 1: β→∗ ε — Contradiction !!!Case 2: β→∗ w 6= ε ==⇒ Firstk(αw

k γ) ∩ Firstk(αwk+1 γ) 6= ∅

48 / 55

Page 120: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Right-Regular Context-Free ParsingRecurring scheme in programming languages: Lists of sth...S → b | S a bAlternative idea: Regular ExpressionsS → ( b a )∗ b

Definition: Right-Regular Context-Free GrammarA right-regular context-free grammar (RR-CFG) is a4-tuple G = (N,T , P , S) with:

N the set of nonterminals,T the set of terminals,P the set of rules with regular expressions of symbols as rhs,S ∈ N the start symbol

Example: Arithmetic ExpressionsS → EE → T ( +T )∗

T → F ( ∗F )∗

F → ( E ) | name | int

49 / 55

Page 121: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Right-Regular Context-Free ParsingRecurring scheme in programming languages: Lists of sth...S → b | S a bAlternative idea: Regular ExpressionsS → ( b a )∗ b

Definition: Right-Regular Context-Free GrammarA right-regular context-free grammar (RR-CFG) is a4-tuple G = (N,T , P , S) with:

N the set of nonterminals,T the set of terminals,P the set of rules with regular expressions of symbols as rhs,S ∈ N the start symbol

Example: Arithmetic ExpressionsS → EE → T ( +T )∗

T → F ( ∗F )∗

F → ( E ) | name | int

49 / 55

Page 122: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Right-Regular Context-Free ParsingRecurring scheme in programming languages: Lists of sth...S → b | S a bAlternative idea: Regular ExpressionsS → ( b a )∗ b

Definition: Right-Regular Context-Free GrammarA right-regular context-free grammar (RR-CFG) is a4-tuple G = (N,T , P , S) with:

N the set of nonterminals,T the set of terminals,P the set of rules with regular expressions of symbols as rhs,S ∈ N the start symbol

Example: Arithmetic ExpressionsS → EE → T ( +T )∗

T → F ( ∗F )∗

F → ( E ) | name | int49 / 55

Page 123: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Idea 1: Rewrite the rules from G to 〈G〉:

A → 〈α〉 if A→ α ∈ P〈α〉 → α if α ∈ N ∪ T〈ε〉 → ε〈α∗〉 → ε | 〈α〉〈α∗〉 if α ∈ RegexT,N〈α1 . . . αn〉 → 〈α1〉 . . . 〈αn〉 if αi ∈ RegexT,N〈α1 | . . . | αn〉 → 〈α1〉 | . . . | 〈αn〉 if αi ∈ RegexT,N

. . . and generate the according LL(k)-Parser ML〈G〉

Example: Arithmetic Expressions cont’dS → EE →T → F ( ∗F )∗

F → ( E ) | name | int〈T ( +T )∗〉 → T 〈( +T )∗〉〈( +T )∗〉 → ε | 〈 +T 〉〈( +T )∗〉〈 +T 〉 → +T〈F ( ∗F )∗〉 → F 〈( ∗F )∗〉〈( ∗F )∗〉 → ε | 〈∗F 〉 〈( ∗F )∗〉〈 ∗F 〉 → ∗F

50 / 55

Page 124: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Idea 1: Rewrite the rules from G to 〈G〉:

A → 〈α〉 if A→ α ∈ P〈α〉 → α if α ∈ N ∪ T〈ε〉 → ε〈α∗〉 → ε | 〈α〉〈α∗〉 if α ∈ RegexT,N〈α1 . . . αn〉 → 〈α1〉 . . . 〈αn〉 if αi ∈ RegexT,N〈α1 | . . . | αn〉 → 〈α1〉 | . . . | 〈αn〉 if αi ∈ RegexT,N

. . . and generate the according LL(k)-Parser ML〈G〉

Example: Arithmetic Expressions cont’dS → EE → T ( +T )∗

T → F ( ∗F )∗

F → ( E ) | name | int

〈T ( +T )∗〉 → T 〈( +T )∗〉〈( +T )∗〉 → ε | 〈 +T 〉〈( +T )∗〉〈 +T 〉 → +T〈F ( ∗F )∗〉 → F 〈( ∗F )∗〉〈( ∗F )∗〉 → ε | 〈∗F 〉 〈( ∗F )∗〉〈 ∗F 〉 → ∗F

50 / 55

Page 125: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Idea 1: Rewrite the rules from G to 〈G〉:

A → 〈α〉 if A→ α ∈ P〈α〉 → α if α ∈ N ∪ T〈ε〉 → ε〈α∗〉 → ε | 〈α〉〈α∗〉 if α ∈ RegexT,N〈α1 . . . αn〉 → 〈α1〉 . . . 〈αn〉 if αi ∈ RegexT,N〈α1 | . . . | αn〉 → 〈α1〉 | . . . | 〈αn〉 if αi ∈ RegexT,N

. . . and generate the according LL(k)-Parser ML〈G〉

Example: Arithmetic Expressions cont’dS → EE → 〈T ( +T )∗〉T → F ( ∗F )∗

F → ( E ) | name | int〈T ( +T )∗〉 → T 〈( +T )∗〉

〈( +T )∗〉 → ε | 〈 +T 〉〈( +T )∗〉〈 +T 〉 → +T〈F ( ∗F )∗〉 → F 〈( ∗F )∗〉〈( ∗F )∗〉 → ε | 〈∗F 〉 〈( ∗F )∗〉〈 ∗F 〉 → ∗F

50 / 55

Page 126: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Idea 1: Rewrite the rules from G to 〈G〉:

A → 〈α〉 if A→ α ∈ P〈α〉 → α if α ∈ N ∪ T〈ε〉 → ε〈α∗〉 → ε | 〈α〉〈α∗〉 if α ∈ RegexT,N〈α1 . . . αn〉 → 〈α1〉 . . . 〈αn〉 if αi ∈ RegexT,N〈α1 | . . . | αn〉 → 〈α1〉 | . . . | 〈αn〉 if αi ∈ RegexT,N

. . . and generate the according LL(k)-Parser ML〈G〉

Example: Arithmetic Expressions cont’dS → EE → 〈T ( +T )∗〉T → F ( ∗F )∗

F → ( E ) | name | int〈T ( +T )∗〉 → T 〈( +T )∗〉〈( +T )∗〉 → ε | 〈 +T 〉〈( +T )∗〉

〈 +T 〉 → +T〈F ( ∗F )∗〉 → F 〈( ∗F )∗〉〈( ∗F )∗〉 → ε | 〈∗F 〉 〈( ∗F )∗〉〈 ∗F 〉 → ∗F

50 / 55

Page 127: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Idea 1: Rewrite the rules from G to 〈G〉:

A → 〈α〉 if A→ α ∈ P〈α〉 → α if α ∈ N ∪ T〈ε〉 → ε〈α∗〉 → ε | 〈α〉〈α∗〉 if α ∈ RegexT,N〈α1 . . . αn〉 → 〈α1〉 . . . 〈αn〉 if αi ∈ RegexT,N〈α1 | . . . | αn〉 → 〈α1〉 | . . . | 〈αn〉 if αi ∈ RegexT,N

. . . and generate the according LL(k)-Parser ML〈G〉

Example: Arithmetic Expressions cont’dS → EE → 〈T ( +T )∗〉T → F ( ∗F )∗

F → ( E ) | name | int〈T ( +T )∗〉 → T 〈( +T )∗〉〈( +T )∗〉 → ε | 〈 +T 〉〈( +T )∗〉〈 +T 〉 → +T

〈F ( ∗F )∗〉 → F 〈( ∗F )∗〉〈( ∗F )∗〉 → ε | 〈∗F 〉 〈( ∗F )∗〉〈 ∗F 〉 → ∗F

50 / 55

Page 128: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Idea 1: Rewrite the rules from G to 〈G〉:

A → 〈α〉 if A→ α ∈ P〈α〉 → α if α ∈ N ∪ T〈ε〉 → ε〈α∗〉 → ε | 〈α〉〈α∗〉 if α ∈ RegexT,N〈α1 . . . αn〉 → 〈α1〉 . . . 〈αn〉 if αi ∈ RegexT,N〈α1 | . . . | αn〉 → 〈α1〉 | . . . | 〈αn〉 if αi ∈ RegexT,N

. . . and generate the according LL(k)-Parser ML〈G〉

Example: Arithmetic Expressions cont’dS → EE → 〈T ( +T )∗〉T → 〈F ( ∗F )∗〉F → ( E ) | name | int〈T ( +T )∗〉 → T 〈( +T )∗〉〈( +T )∗〉 → ε | 〈 +T 〉〈( +T )∗〉〈 +T 〉 → +T

〈F ( ∗F )∗〉 → F 〈( ∗F )∗〉〈( ∗F )∗〉 → ε | 〈∗F 〉 〈( ∗F )∗〉〈 ∗F 〉 → ∗F

50 / 55

Page 129: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Idea 1: Rewrite the rules from G to 〈G〉:

A → 〈α〉 if A→ α ∈ P〈α〉 → α if α ∈ N ∪ T〈ε〉 → ε〈α∗〉 → ε | 〈α〉〈α∗〉 if α ∈ RegexT,N〈α1 . . . αn〉 → 〈α1〉 . . . 〈αn〉 if αi ∈ RegexT,N〈α1 | . . . | αn〉 → 〈α1〉 | . . . | 〈αn〉 if αi ∈ RegexT,N

. . . and generate the according LL(k)-Parser ML〈G〉

Example: Arithmetic Expressions cont’dS → EE → 〈T ( +T )∗〉T → 〈F ( ∗F )∗〉F → ( E ) | name | int〈T ( +T )∗〉 → T 〈( +T )∗〉〈( +T )∗〉 → ε | 〈 +T 〉〈( +T )∗〉〈 +T 〉 → +T〈F ( ∗F )∗〉 → F 〈( ∗F )∗〉〈( ∗F )∗〉 → ε | 〈∗F 〉 〈( ∗F )∗〉〈 ∗F 〉 → ∗F 50 / 55

Page 130: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Definition:An RR−CFG G is called RLL(1),if the corresponding CFG 〈G〉 is an LL(1) grammar.

Discussiondirectly yields the table driven parser ML

〈G〉 for RLL(1) grammarshowever: mapping regular expressions to recursive productions unnessessarily strainsthe stack→ instead directly construct automaton in the style of Berry-Sethi

51 / 55

Reinhold Heckmann

Page 131: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Idea 2: Recursive Descent RLL Parsers:

Recursive descent RLL(1)-parsers are an alternative to table-driven parsers; apart fromthe usual function scan(), we generate a program frame with the lookahead functionexpect()and the main parsing method parse():

int next;void expect(Set E){

if ({ε, next} ∩ E = ∅){cerr << ”Expected” << E << ”found” << next;exit(0);

}return ;

}void parse(){

next = scan();expect(First1(S)) ;S();expect({EOF}) ;

}52 / 55

Page 132: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Idea 2: Recursive Descent RLL Parsers:

For each A→ α ∈ P , we introduce:

void A(){generate(α)

}

with the meta-program generate being defined by structural decomposition of α:

generate(r1 . . . rk) = generate(r1)expect(First1(r2 )) ;generate(r2)...expect(First1(rk )) ;generate(rk)

generate(ε) = ;generate(a) = next = scan();generate(A) = A();

53 / 55

Page 133: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Idea 2: Recursive Descent RLL Parsers:

generate(r∗) = while ( next ∈ Fε(r)) {generate(r)}

generate(r1 | . . . | rk) = switch(next) {labels(First1(r1 )) generate(r1) break ;...labels(First1(rk )) generate(rk) break ;}

labels({α1, . . . , αm}) = label(α1): . . . label(αm):label(α) = case αlabel(ε) = default

54 / 55

Page 134: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Topdown-Parsing

DiscussionA practical implementation of an RLL(1)-parser via recursive descent is astraight-forward ideaHowever, only a subset of the deterministic contextfree languages can be parsed thisway.As soon as First1(_) sets are not disjoint any more,

Solution 1: For many accessibly written grammars, the alternation between right hand sides happenstoo early. Keeping the common prefixes of right hand sides joined and introducing a new productionfor the actual diverging sentence forms often helps.Solution 2: Introduce ranked grammars, and decide conflicting lookahead always in favour of thehigher ranked alternative→ relation to LL parsing not so clear any more→ not so clear for _∗ operator how to decideSolution 3: Going from LL(1) to LL(k)The size of the occuring sets is rapidly increasing with larger kUnfortunately, even LL(k) parsers are not sufficient to accept all deterministic contextfreelanguages. (regular lookahead→ LL(∗))

In practical systems, this often motivates the implementation of k = 1 only ...

55 / 55

Page 135: Topic: Syntactic Analysis · 2020. 5. 7. · Syntactic Analysis Token-StreamSyntaxtreeParser Syntactic analysis tries to integrate Tokens into larger program units. Such units may

Topdown-Parsing

DiscussionA practical implementation of an RLL(1)-parser via recursive descent is astraight-forward ideaHowever, only a subset of the deterministic contextfree languages can be parsed thisway.As soon as First1(_) sets are not disjoint any more,

Solution 1: For many accessibly written grammars, the alternation between right hand sides happenstoo early. Keeping the common prefixes of right hand sides joined and introducing a new productionfor the actual diverging sentence forms often helps.Solution 2: Introduce ranked grammars, and decide conflicting lookahead always in favour of thehigher ranked alternative→ relation to LL parsing not so clear any more→ not so clear for _∗ operator how to decideSolution 3: Going from LL(1) to LL(k)The size of the occuring sets is rapidly increasing with larger kUnfortunately, even LL(k) parsers are not sufficient to accept all deterministic contextfreelanguages. (regular lookahead→ LL(∗))

In practical systems, this often motivates the implementation of k = 1 only ...55 / 55