g22.2130-001 compiler construction lecture 4: lexical analysis
TRANSCRIPT
![Page 1: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/1.jpg)
G22.2130-001
Compiler Construction
Lecture 4: Lexical Analysis
Mohamed Zahran (aka Z)
![Page 2: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/2.jpg)
Role of the Lexical Analyzer
• Remove comments and white spaces (aka scanning)
• Macros expansion• Read input characters from the source
program• Group them into lexemes• Produce as output a sequence of tokens• Interact with the symbol table• Correlate error messages generated by
the compiler with the source program
![Page 3: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/3.jpg)
Scanner-Parser Interaction
![Page 4: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/4.jpg)
Why Separating Lexical and Syntactic?
• Simplicity of design
• Improved compiler efficiency– allows us to use specialized technique for
lexer, not suitable for parser
• Higher portability– Input-device-specific peculiarities
restricted to lexer
![Page 5: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/5.jpg)
Some Definitions
• Token: a pair consisting of– Token name: abstract symbol representing
lexical unit [affects parsing decision]– Optional attribute value [influences
translations after parsing]
• Pattern: a description of the form that different lexemes take
• Lexeme: sequence of characters in source program matching a pattern
![Page 6: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/6.jpg)
Pattern
Token classes• One token per keyword• Tokens for the operators• One token representing all identifiers• Tokens representing constants (e.g. numbers)• Tokens for punctuation symbols
![Page 7: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/7.jpg)
Example
![Page 8: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/8.jpg)
Dealing With Errors
Lexical analyzer unable to proceed: no pattern matches
• Panic mode recovery: delete successive characters from remaining input until token found
• Insert missing character• Delete a character• Replace character by another• Transpose two adjacent characters
![Page 9: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/9.jpg)
Example
What tokens will be generated from the above C++ program?
![Page 10: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/10.jpg)
Buffering Issue
• Lexical analyzer may need to look at least a character ahead to make a token decision.
• Buffering: to reduce overhead required to process a single character
![Page 11: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/11.jpg)
![Page 12: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/12.jpg)
Tokens Specification
• We need a formal way to specify patterns: regular expressions
• Alphabet: any finite set of symbols
• String over alphabet: finite sequence of symbols drawn from that alphabet
• Language: countable set of strings over some fixed alphabet
![Page 13: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/13.jpg)
![Page 14: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/14.jpg)
Operations
Zero or one instance: r? is equivalent to r|ε
Character class: a|b|c|…|z can be replaced by [a-z]a|c|d|h can be replaced by [acdh]
![Page 15: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/15.jpg)
Examples
Which language is generated by:
• (a|b)(a|b)
• a*
• (a|b)*
• a|a*b
![Page 16: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/16.jpg)
Example
How can we present number that can be integer with option floating point and exponential parts?
![Page 17: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/17.jpg)
Examples
Write regular definition of all strings of lowercase letters in which the letters are in ascending order
![Page 18: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/18.jpg)
Tokens Recognition
![Page 19: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/19.jpg)
Implementation: Transition Diagrams
• Intermediate step in constructing lexical analyzer
• Convert patterns into flowcharts called transition diagrams– nodes or circles: called states
– Edges: directed from state to another, labeled by symbols
![Page 20: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/20.jpg)
![Page 21: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/21.jpg)
Initial state Accepting or final state
Actions associated withfinal state
![Page 22: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/22.jpg)
Means retract the forwardpointer
![Page 23: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/23.jpg)
![Page 24: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/24.jpg)
• Places ID in symbol table if not there.• Returns a pointer to symbol table entry
• Examine symbol table for the lexeme found• Returns whatever token name is there
![Page 25: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/25.jpg)
![Page 26: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/26.jpg)
Reserved Words and Identifiers
• Install reserved words in symbol table initially
OR
• Create transition diagram for each keyword, then prioritize the tokens so that keywords have higher preference
![Page 27: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/27.jpg)
Implementation of Transition Diagram
![Page 28: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/28.jpg)
Using All Transition Diagrams: The Big Picture
• Arrange for the transition diagrams for each token to be tried sequentially
• Run transition diagrams in parallel
• Combine all transition diagrams into one
![Page 29: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/29.jpg)
The First Part of the Project
![Page 30: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/30.jpg)
The First Part of the Project
declarations%%translation rules%%auxiliary functions
![Page 31: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/31.jpg)
declarations
translation rules
auxiliary functions
![Page 32: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/32.jpg)
Anything between these2 marks is copied as it isin lex.yy.c
braces means the patternis defined somewhere
pattern
Actions
![Page 33: G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis](https://reader031.vdocuments.net/reader031/viewer/2022022417/5891b28c1a28ab61108b72b0/html5/thumbnails/33.jpg)
Lecture of Today
• Sections 3.1 to 3.5
• First part of the project assigned