lecture compiler construction - graz university of technology · concept of compilation –...
TRANSCRIPT
Lecture Compiler Construction
Franz [email protected]
Institute for Software TechnologyTechnische Universitat Graz
Inffeldgasse 16b/2, A-8010 Graz, Austria
Summer term 2016
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 1 / 309
Compilers are everywhere
Programming languages like Java, C#, C, C++, Pascal, Modula,SML, Lisp, VHDL, Basic,..Graphical languages also need compilers (HTML, LATEX,...)Means for communication between computers like XMLNatural languages
Compilers translate a sentence written in one language to anotherlanguage.
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 2 / 309
Example – HTML
<!DOCTYPE HTML PUBLIC "...- > <html> <head> <metacontent=text/html; ..."http-equiv=Content-Type- ><title>Hello World</title> </head> <body> <h1>HelloWorld!</h1> </body> </html>
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 3 / 309
Another example
(Source: en.wikipedia.org/wiki/Java bytecode)
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 4 / 309
Compiler vs. Interpreter
Compiler: Translates from one into another language whereprograms are directly executed (C,C++,Pascal, ...).
Interpreter: Executes a program directly without converting it (BASIC,batch languages,..).
Exact boundary difficult to define nowadays (Byte codeinterpreter vs. CPU, which executes machine codestatements.Front end (analysis phase) of compilers is always needed!
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 5 / 309
Organizational issues
Lecture (Vorlesung) 2hPractical exercises (Ubung) 1h
Examples from the Compiler Construction theoryHands on part: Development of a compilerMore information soon (via email and/or the webpage)
Office hours:Franz Wotawa: Tuesday, 13:00–14:00Roxane Koitz: Monday, 13:00–14:00
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 6 / 309
Lecture dates
Monday, 29.2.2016, 16:00 - 19:00 (preliminary discussion, lexical analysis)
Monday, 7.3.2016, 16:00 - 19:00 (lexical analysis, parsing)
Tuesday, 8.3.2016, 16:00 - 17:30 (parsing, attributed grammars)
Monday, 14.3.2016, 16:00 - 19:00 (attributed grammars, type checking)
Tuesday, 15.3.2016, 16:00 - 17:30 (type checking, runtime environment)
Monday, 11.4.2016, 16:00 - 19:00 (code generation)
Monday, 18.4.2016, 16:00 - 19:00 (code optimization)
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 7 / 309
Exam
Written form only; no papers, books etc. allowed!Content: DFA, NFA, LL(1), SLR(1), Attributed Grammars, TypeChecking, Code Generation, Code Optimization, etc.)1. date: Monday, 9.5.2016, 18:00-20:00, i13, (2 groups, each 1hour)2. date: Monday, 13.6.2016, 18:00-20:00, i13, (2 groups, each 1hour)3. date: in October 2016, will be announced soon
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 8 / 309
Literature, etc.
Aho, Seti, Ullman, Compilers Principles, Techniques, andTools, Addison-Wesley, 1985, ISBN 0-201-10088-6.Tremblay, Sorenson, The Theory and Practice of Compiler Writing,McGraw Hill, 1985, ISBN 0-07-065161-2.Copy of slides but no lecture notesCompiler construction webpage:http://www.ist.tugraz.at/cb16.html
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 9 / 309
PART 1 - INTRODUCTION
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 10 / 309
What is a compiler?
ProgramSource language⇒ Target languageError reports if source code contains errors
Source program −→ Compiler −→ Target program↓
Error messages
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 11 / 309
Implications of definition
There are (very) many compilersThousands of source languages (e.g. Pascal, Modula, C, C++,Java, . . . )Thousands of target languages (other high-level programminglanguages, machine code)Classification: single-pass, multi-pass, debugging, or optimizingBut: basics of creating a compiler are mostly consistent
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 12 / 309
Concept of compilation
Analysis phaseSplit source program into tokensGenerate intermediate code (intermediate representation of sourceprogram)
Synthesis phaseGenerate target program based on intermediate code
Enables reuse of program parts for different source and targetlanguages
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 13 / 309
Concept of compilation – Analysis
Operations used within program are stored in hierarchicalstructureSyntax TreeOther tools which perform an analysis:
Structure editorsPretty printersStatic checkersInterpretersWeb browser
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 14 / 309
Language-processing systems
skeletal source program↓
preprocessor↓
source program↓
compiler↓
target assembly language↓
assembler↓
relocatable machine code
↓relocatable machine code
↓
loader/link-editor ←−library,relocatableobject files
↓absolute machine code
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 15 / 309
Components of a compiler
lexical analyser
semantic analyser
syntax analyser
intermediate code
generator
code optimizer
code generator
error handler
manager
symbol-table
source code
target program
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 16 / 309
Lexical analysis, scanning
Example:pos := init + rate * 60
Tokens:1 Identifier pos2 Assignment symbol :=3 Identifier init4 Plusoperator5 Identifier rate6 Multiplication operator7 Constant (number) 60
Syntax tree
:=
pos+
init *
rate 60
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 17 / 309
Syntax analysis, parsing
Grammatical analysisToken↔ rules of grammarRules of grammar (example)
1 Each identifier is an expression2 Each number is an expression3 Assuming that ex1 and ex2 are expressions, then ex1 + ex2 andex1 ∗ ex2 are expressions as well
4 A term: identifier := Expression is a statement
Parse tree
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 18 / 309
Example - Parse tree
assignment
statement
:=identifier expression
pos
expression +
identifier
init
expression
expression
identifier
rate
*
60
expression
number
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 19 / 309
Semantic analysis
Check for semantic errorsType checkingIdentification of operators and operandsNecessary for code generationExample: conversion from int to real
statementpos := init + rate * 60
becomespos := init + rate * int2real(60)
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 20 / 309
Intermediate code generation
E.g.: three-address codeExample:1. tmp1 := int2real(60)2. tmp2 := id3 * tmp13. tmp3 := id2 + tmp24. id1 := tmp3
using the symbol table
pos (id1) real . . .init (id2) real . . .rate (id3) real . . .
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 21 / 309
Code optimizer
Goal: faster machine codeExample:
1. tmp1 := int2real(60)2. tmp2 := id3 * tmp13. tmp3 := id2 + tmp24. id1 := tmp3
⇓1. tmp1 := id3 * 60.02. id1 := id2 + tmp1
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 22 / 309
Code generator
Generation of actual target code (Machine code, Assembler)Example:
1. MOVF id3, R22. MULF #60.0, R23. MOVF id2, R14. ADDF R2, R15. MOVF R1, id1
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 23 / 309
PART 2 - LEXICAL ANALYSIS
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 24 / 309
Goals of this section
Specification and implementation of lexical analysersEasiest method:
1 Create diagram which describes structure of tokens2 Manually translate diagram into a program
Regular expressions in finite state machines
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 25 / 309
Tasks
get next token
token
source program analyser
lexicalparser
symbol
table
Filter comments, white spaces, ..Establish relations between error messages of compiler andsource code (e.g. via line number)Preprocessor functions (e.g. in C)Create copy of source program containing error messages
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 26 / 309
Why separate lexical analysis and parsing?
1 Simpler design: e.g. removal of comments and white spacesenables use of simpler parser design
2 Improve efficiency: large portion of time is invested into reading ofprograms and conversion into tokens
3 Enhance portability of compilers4 ( There are tools for both phases )
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 27 / 309
Tokens, patterns, lexemes
Pattern: Rule describing a set of strings (a token)Token: Set of strings described by a pattern
Lexem: Sequence of characters which are matched by a patternof a token
Token Lexem Patterns (verbal)if if ifrelation >, <, . . . < or > or . . .id pi, test cases letter followed by letters, digits, or ’ ’num 2.7, 0 , 12e-4 any numeric constantliteral “segmentation fault “ any characters between “ and “
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 28 / 309
Tokens
Terminals in the grammar (checked by the parser)In programming languages:keywords, operators, identifiers, constants, strings, brackets
Language conventions:1 Tokens have to occur in specific places within the code2 Tokens are separated by white spaces
Example (Fortran): DO 5 I = 1.25 is interpreted as DO5I =1.25
3 Keywords are reserved and must not be used as identifiers1. is (currently) not used2. is accepted3. is reasonable (and simplyfies compiler construction)
IF THEN THEN THEN = ELSE; ELSE ELSE = THEN;
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 29 / 309
Token attributes
Store lexeme for further processingStore pointers to entries in symbol tableLine number or position of token within source code
U = 2 * R * PI
< id, pointer to U >< assign op >< num, integer value 2 >< mult op >< id, pointer to R >< mult op >< id, pointer to PI >
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 30 / 309
Lexical errors
fi ( a == 1) ...⇒ fi is interpreted as undefined identifierString sequences which cannot be matched to a tokene.g.: 2.3e*4 instead of 2.3e+4.Error-correcting:
1 Panic Mode: Removal of faulty characters until a token can bematched
2 Remove a faulty character3 Include a new character4 Replace a character5 Change order of neighboring characters
Some errors (like the first example) may be only recognizable bythe parser
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 31 / 309
Specification of tokens
Strings over an alphabet:Alphabet . . . finite set of symbols example: {0, 1} Binary alphabetString (word, sentence) . . . finite sequence of symbols of the alphabetε . . . empty string
Language: Set of all strings over an alphabetOperators:
Concatenation xy is a string whereby y is attached to the end of xxi is defined as: x0 = ε, i > 0→ xi = xi−1xPrefix: computer→ compSuffix: computer→ uterSubstring: computer→ putSubsequence: computer→ opt
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 32 / 309
Operations on languages
Union L ∪M = {x|x ∈ L∨x ∈M}Concatenation LM = {xy|x ∈ L∧ y ∈M}Kleen Closure L∗ =
⋃∞i=0 L
i
Positive Closure L+ =⋃∞i=1 L
i
Example: L = {A, . . . , Z},M = {0, 1, . . . , 9}L ∪M Set of letters and numbersLM Strings containing a letter followed by a numberL4 Strings which are made up of four lettersL∗ Strings made up of letters including εM+ Number strings containing at least one number
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 33 / 309
Regular expressions
Example: Pascal identifier letter ( letter | digit )∗
Regular expression r defines a language over an alphabet Σusing the following rules:
1 The regular expression ε describes {ε}2 a ∈ Σ: The regular Expression a describes {a}3 r, s are regular expressions (languages L(r), L(s)):
1 (r)|(s) describes L(r) ∪ L(s)2 (r)(s) describes L(r)L(s)3 (r)∗ describes (L(r))∗
4 (r)+ describes (L(r))+
Regular sets
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 34 / 309
Examples
Assuming Σ = {a, b}1 a|b describes the language {a, b}2 (a|b)(a|b) describes {aa, ab, ba, bb}3 a∗ describes {ε, a, aa, aaa, aaaa, . . .}4 (a|b)∗ describes all strings which contain no or multiple a’s and b’s
Not all languages can be described by regular expressionse.g. {wcw|w is a string made up of a’s and b’s}Regular expressions may only describe words containing either afixed amount of repetitions or an infinite amount of repetitions
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 35 / 309
Regular definitions
A regular definition is a sequence of the form:
d1 → r1. . .dn → rn
whereby di is a distinct name and ri is a regular expression overΣ ∪ {d1, . . . , dn}Example:
letter → A| . . . |Z|a| . . . |zdigit → 0|1| . . . |9
id → letter(letter|digit)∗
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 36 / 309
Token detection
Goal: Identify lexeme of token in input buffer
Output: Token-attribute-pair
Regular Expression Token Attribute Valuewhite space - -
if if -then then -
id id pointer to table entrynum num pointer to table entry< relop LT> relop GT. . . . . . . . .
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 37 / 309
Transition diagrams
Actions performed by lexical analyser (after get next token ofthe parser)States connected by directed edgesEdges are labeled:
Labels contain symbols expected in input buffer in order to toprogress to next stateLabel other denotes all symbols not used by any other edgeoriginating from the same node
One start stateEnd states after matching a tokenDeterministic (not necessarily)
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 38 / 309
Example
Regular expression letter (letter | digit)∗ can be described byfollowing transition diagram:
start letter
letter or digit
otherreturn(gettoken(),install_id())1 2 3
Assume the input buffer contains:Pos 1 2 3 4 . . .
P I = . . .and the current-symbol pointer is pointing to position 1
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 39 / 309
Tokenmatching - Example
1 Current symbol = P, position = 1⇒ Transition from state 1 to state 2 and read new symbol
2 Current symbol = I, position = 2⇒ Remain in state 2 and read new symbol
3 Current symbol = SPACE, position = 3⇒ Transition from state 2 to state 3
4 State 3 is an end state
start letter
letter or digit
otherreturn(gettoken(),install_id())1 2 3
P
I
’ ’
1
2
3
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 40 / 309
Example – Implementing a Lexer directly
Pseudo code1 lexem = ““;2 if (next char ∈ letter) then3 lexem = lexem + next char;4 get next char();5 while (next char ∈ letter ∪ digit) do6 lexem = lexem + next char;7 get next char();8 return IDENTIFIER(lexem);
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 41 / 309
Generalization – Finite automata
Nondeterministic finite automaton (NFA) (S,Σ,move, s0, F )1 Set of states S2 Set of input symbols Σ3 Transition relation move : S × (Σ ∪ {ε}) 7→ 2S , relating
state-symbol-pairs to states4 Initial state s0 ∈ S5 Set of end states F ⊆ S
NFA can be visualized in form of directed graph (transition graph)Distinctions from transition diagram:
Nondeterministicε-moves allowed
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 42 / 309
NFA – Example
NFA for aa∗|bb∗ (language, accepting NFA):({0, 1, 2, 3, 4}, {a, b},move, 0, {2, 4}) withmove:
state symbol new state0 ε {1,3}1 a {2}2 a {2}3 b {4}4 b {4}
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 43 / 309
Deterministic finite automaton
Deterministic finite automaton (DFA) is a special case of a NFAwhereby:
1 No state has an ε-move2 For each state s there is at most one edge which is
marked with an input symbol a
DFAs contain exactly one transition for each inputEach entry in transition table contains exactly one state
State Input Symbolsa b
0 1 2
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 44 / 309
DFA simulation
Input: Input string x, DFA DOutput: ’Yes’ if D accepts x, ’No’ otherwise
s := s0c := nextcharwhile c 6= eof do
s := move(s, c)if s ==⊥ then
return ’No’c := nextchar
endif s ∈ F then
return ’Yes’else
return ’No’end
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 45 / 309
DFA – Example
DFA for aa∗|bb∗:({0, 1, 2}, {a, b},move, 0, {1, 2}) whereby
move:
state symbol new state0 a 10 b 21 a 12 b 2
0 b
a
2
1
a
b
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 46 / 309
Conversion NFA→ DFA
Subset construction algorithmIdea:
1 Each DFA state corresponds to a set of NFA states2 After reading a1a2 . . . an the DFA is in a state which ist equivalent to
a subset of the NFA. This subset corresponds to the path of theNFA when reading a1a2 . . . an.
Number of DFA states is exponential in the nunber of NFA states(worst case)
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 47 / 309
Subset construction
Input: NFA N = (S,Σ,move, s0, F )Output: DFA D, accepting the same language as N
Initially, ε-closure(s0) is the only state in Dstates and it is unmarkedwhile there is an unmarked state T in Dstates do
mark Tfor each input symbol a do
U := ε-closure(move(T ,a))if U 6∈ Dstates then
Add U as unmarked state to DstatesendDtran[T, a] := U
endend
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 48 / 309
ε-closure, move
ε-closure(s): Set of NFA states which are accessible from a state svia ε-movesε-closure(T ): Set of NFA states which are accessible from a states ∈ T via ε-movesmove(T , a): Set of NFA states which are accessible from a states ∈ T via a move relation of the input symbol a
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 49 / 309
Calculation of ε-closure
Input: NFA N , set of states TOutput: ε-closure
Push all states in T onto stackInitialize ε-closure(T ) to Twhile stack 6= ∅ do
Pop t, the top element of stackfor each state u with an edge from t to u labeled ε do
if u 6∈ ε-closure(T ) thenAdd u to ε-closure(T )Push u onto stack
endend
end
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 50 / 309
Example NFA→ DFA
0 1
2 3
4 5
6 7 8 9 10
ε
ε
ε
ε
ε
ε
ε
ε
a
b
a b b
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 51 / 309
Thompson’s ConstructionInput:
Regular expression r over an alphabet ΣOutput: NFA N , which accepts L(r)
ε ε
i f st fi
N(s)
N(t)
a ∈ Σ i fa s∗ i
N(s)
f
ε
ε
ε
ε
s|t i f
N(s)
N(t)
ε
ε
ε
ε
[s]
whereby [s] = (s | ε)
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 52 / 309
NFA simulation
Input: NFA N , input string xOutput: ’Yes’ if N accepts x, ’No’ otherwise
S := ε-closure({s0})a := nextCharwhile a 6= eof do
S := ε-closure(move(S,a))a := nextChar
endif S ∩ F 6= ∅ then
return ’Yes’else
return ’No’end
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 53 / 309
Summary
Check whether an input string x is contained in a languagedefined by the regular expression r:
1 Construct NFA (Thompson’s construction) and apply simulationalgorithm for NFAs
2 Construct NFA (Thompson’s construction), convert NFA to DFA andapply simulation algorithm for DFAs
Time-space tradeoffs
Automaton Space TimeNFA O(|r|) O(|r| · |x|)DFA O(2|r|) O(|x|)
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 54 / 309
Regular expression→ DFA
Direct conversionNotation:
NFA state s is important↔ s has at least one non-ε-move to asuccessorExtended regular expression (Augmented regular expression) (r)#Illustration of extended expressions via syntax trees (cat-node,or-node, star-node)Each leaf node is labeled either with a symbol (of the alphabet) orwith εEach leaf node (6= ε) has a designated number (position)
Remark: NFA-DFA-conversion only via important states =positions
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 55 / 309
Syntax tree – Example
*
|
a b
b
#
b
a
1 2
3
4
5
6
Syntax tree for (a|b)∗abb#
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 56 / 309
Algorithm - Idea
Approach:1 Create syntax tree for extended regular expression2 Calculate four functions: nullable, firstpos, lastpos, followpos3 Construct DFA using followpos
DFA states correspond to sets of positions (important NFA states)Position i, followpos(i) are positions j for which there exists aninput string . . . cd . . . where i corresponds to c and j correspondsto d
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 57 / 309
followpos – Example
Syntax tree for (a|b)∗abb#followpos(1) = {1, 2, 3}Explanation: Suppose we see an a. This symbol could belongeither to (a|b)∗ or to the following a. If we see a b then this symbolhas to belong to (a|b)∗. Thus positions 1,2,3 are contained infollowpos(1).Informal definition of functions (node n, string s)
firstpos . . . positions, which match the first symbol of slastpos . . . positions, which match the last symbol of snullable . . . True, if node n can create a language containing theempty string, false otherwise
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 58 / 309
Rules for nullable, firstpos, lastpos
Node n nullable firstpos [lastpos]Leaf n labeledε
true ∅ [∅]Leaf n labeledposition i false {i} [{i}]
c1|c2 nullable(c1)∨nullable(c2) firstpos(c1) ∪ firstpos(c2)[lastpos(c1) ∪ lastpos(c2)]
c1 • c2 nullable(c1)∧nullable(c2)
if nullable(c1) thenfirstpos(c1) ∪firstpos(c2)else firstpos(c1)if nullable(c2) thenlastpos(c1) ∪ lastpos(c2)else lastpos(c2)
c∗1 true firstpos(c1) [lastpos(c1)]
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 59 / 309
firstpos, followpos – Example
*
|
a b
b
#
b
a
1 2
3
4
5
6
{1} {2}
{3}
{4}
{5}
{6}
{1,2,3}
{1,2}
{1} {2}
{3}
{4}
{5}
{6}
{6}
{5}
{4}
{3}
{1,2,3}
{1,2,3}
{1,2,3}
{1,2}
{1,2} {1,2}
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 60 / 309
Calculation followpos
1 If n is a cat-node (•), c1 is the left, c2 is the right child and i is aposition in lastpos(c1), then all positions of firstpos(c2) arecontained in followpos(i)
2 If n is a star-node (∗) and i is contained in lastpos(n), then allpositions of firstpos(c) are contained in followpos(i)
Node followpos1 {1,2,3}2 {1,2,3}3 {4}4 {5}5 {6}6 -
1
2
3 4 5 6
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 61 / 309
followpos-graph→ NFA
A NFA without ε-moves can be created based on afollowpos-graph:
1 Mark all positions in firstpos of the root node of the syntax tree asinitial states
2 Label each directed edge (i, j) with the symbol of position i3 Mark position which belongs to # as end state
⇒ followpos-graph can be converted to DFA (using subsetconstruction)
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 62 / 309
DFA-algorithmInput:
Regular expression rOutput: DFA D, accepting the language L(r)
Initially, the only unmarked state in Dstates is firstpos(root)while there is an unmarked state T ∈ Dstates do
Mark Tfor each input symbol a do
Let U be the set of positions that are in followpos(p)for some position p ∈ T where the symbol of p is a.
if U 6= ∅ and U 6∈ Dstates thenAdd U as unmarked state to Dstates
endDTran(T, a) = U
endend
F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 63 / 309