ch1.1 cse244 chapter 1: introduction to compiling prof. steven a. demurjian, sr. computer science...
DESCRIPTION
CH1.3 CSE244 Classifications of Compilers Compilers Viewed from Many Perspectives However, All utilize same basic tasks to accomplish their actions Single Pass Multiple Pass Load & Go Construction Debugging Optimizing FunctionalTRANSCRIPT
CH1.1
CSE244
Chapter 1: Introduction to CompilingChapter 1: Introduction to Compiling
Prof. Steven A. Demurjian, Sr.Computer Science & Engineering Department
The University of Connecticut191 Auditorium Road, Box U-155
Storrs, CT [email protected]
http://www.engr.uconn.edu/~steve(860) 486 - 4818
Dr. Robert LaBarreUnited Technologies Research Center
411 Silver LaneE. Hartford, CT [email protected][email protected]
CH1.2
CSE244
Introduction to CompilersIntroduction to Compilers As a Discipline, Involves Multiple CSE AreasAs a Discipline, Involves Multiple CSE Areas
Programming Languages and Algorithms Software Engineering & Theory / Foundations Computer Architecture & Operating Systems
But, Has Surprisingly Simplistic Intent:But, Has Surprisingly Simplistic Intent:
CompilerSource program
Target Program
Error messages
Diverse & Varied
CH1.3
CSE244
Classifications of CompilersClassifications of Compilers Compilers Viewed from Many PerspectivesCompilers Viewed from Many Perspectives
However, All utilize same basic tasks to However, All utilize same basic tasks to accomplish their actionsaccomplish their actions
Single Pass
Multiple Pass
Load & Go
Construction
Debugging
OptimizingFunctional
CH1.4
CSE244
Classifications of CompilersClassifications of Compilers Also, Broadly Categorized as:Also, Broadly Categorized as:
We Will Discuss Each Category in This ClassWe Will Discuss Each Category in This Class
Analysis:
Synthesis:
Decompose Source into an intermediate representation
Target program generation from representation
CH1.5
CSE244
Important Notes In Today’s Technology, In Today’s Technology, AnalysisAnalysis Is Often Performed Is Often Performed
by by Software ToolsSoftware Tools - This Wasn’t the Case in Early - This Wasn’t the Case in Early CSE DaysCSE Days Structure / Syntax directed editors: Force
“syntactically” correct code to be entered Pretty Printers: Standardized version for program
structure (i.e., blank space, indenting, etc.) Static Checkers: A “quick” compilation to detect
rudimentary errors Interpreters: “real” time execution of code a
“line-at-a-time”
CH1.6
CSE244
Important Notes Compilation Is Compilation Is NotNot Limited to Programming Limited to Programming
Language ApplicationsLanguage Applications Text Formatters
LATEX & TROFF Are Languages Whose Commands Format Text
Silicon Compilers Textual / Graphical: Take Input and Generate Circuit
Design Database Query Processors
Database Query Languages Are Also a Programming Language
Input Is“compiled” Into a Set of Operations for Accessing the Database
CH1.7
CSE244
The Many The Many PhasesPhases of a Compiler of a CompilerSource Program
Lexical Analyzer
1
Syntax Analyzer2
Semantic Analyzer3
Intermediate Code Generator
4
Code Optimizer5
Code Generator6
Target Program
Symbol-table Manager
Error Handler
1, 2, 3 : Analysis - Our Focus4, 5, 6 : Synthesis
CH1.8
CSE244
Three Phases:Three Phases: Linear / Lexical Analysis:
L-to-r Scan to Identify Tokens Hierarchical Analysis:
Grouping of Tokens Into Meaningful Collection Semantic Analysis:
Checking to Insure Correctness of Components
The Analysis Task For Compilation
CH1.9
CSE244
Phase 1. Lexical Analysis
Easiest Analysis - Identify tokens which are building blocks
For Example:
All are tokens
Blanks, Line breaks, etc. are scanned out
Position := initial + rate * 60 ;_______ __ _____ _ ___ _ __ _
CH1.10
CSE244
Phase 2. Phase 2. Hierarchical AnalysisHierarchical Analysisaka aka ParsingParsing or or Syntax AnalysisSyntax Analysis
For previous example, we would have Parse Tree:
identifier
identifier
expression
identifier
expression
number
expression
expression
expression
assignment statement
position
:=
+
*
60
initial
rate
Nodes of tree are constructed using a grammar for the language
CH1.11
CSE244
What is a Grammar?What is a Grammar? Grammar is a Set of Rules Which Govern the Grammar is a Set of Rules Which Govern the
Interdependencies & Structure Among the TokensInterdependencies & Structure Among the Tokens
statement is an assignment statement, or while statement, or if statement, or ...
assignment statement
expression is an
is an identifier := expression ;
(expression), or expression + expression, or expression * expression, or number, or identifier, or ...
CH1.12
CSE244
Why Have We Divided Analysis Why Have We Divided Analysis in This Manner?in This Manner?
Lexical Analysis - Scans Input & Its Linear Lexical Analysis - Scans Input & Its Linear Actions Are Not RecursiveActions Are Not Recursive Identify Only Individual “words” that are the
the Tokens of the Language Recursion Is Required to Identify Structure of an Recursion Is Required to Identify Structure of an
Expression, As Indicated in Parse TreeExpression, As Indicated in Parse Tree Verify that the “words” are Correctly
Assembled into “sentences” What is Third Phase?What is Third Phase?
Determine Whether the Sentences have One and Only One Unambiguous Interpretation
“John Took Picture of Mary Out on the Patio”
CH1.13
CSE244
Phase 3. Semantic AnalysisPhase 3. Semantic Analysis Find More Complicated Semantic Errors and Find More Complicated Semantic Errors and
Support Code GenerationSupport Code Generation Parse Tree Is Augmented With Semantic ActionsParse Tree Is Augmented With Semantic Actions
position
initial
rate
:=+
*
60
Compressed Tree
position
initial
rate
:=+
*
inttoreal
60
Conversion Action
CH1.14
CSE244
Phase 3. Semantic AnalysisPhase 3. Semantic Analysis Most ImportantMost Important Activity in This Phase: Activity in This Phase: Type CheckingType Checking - - Legality of OperandsLegality of Operands Many Different Situations:Many Different Situations:
Real := int + char ;
A[int] := A[real] + int ;
while char <> int do
…. Etc.
CH1.15
CSE244
Analysis in Text Formatting
Simple Commands : LATEX
\begin{single}
\end{single}
\noindent
\section{Introduction}
$A_i$
$A_{i_j}$
Embedded in a stream of text, i.e., a FILE
\ and $ serve as signals to LATEX
begin
single
noindent
section
Language
Commands
What are tokens?
What is hierarchical structure?
What kind of semantic analysis is required?
CH1.16
CSE244
Supporting Phases/ Activities for Analysis
Symbol Table Creation / MaintenanceSymbol Table Creation / Maintenance Contains Info on Each “Meaningful” Token,
Typically Identifiers Data Structure Created / Initialized During
Lexical Analysis Utilized / Updated During Later Analysis &
Synthesis Error HandlingError Handling
Detection of Different Errors Which Correspond to All Phases
What Kinds of Errors Are Found During the Analysis Phase?
What Happens When an Error Is Found?
CH1.17
CSE244
The Many The Many PhasesPhases of a Compiler of a CompilerSource Program
Lexical Analyzer
1
Syntax Analyzer2
Semantic Analyzer3
Intermediate Code Generator
4
Code Optimizer5
Code Generator6
Target Program
Symbol-table Manager
Error Handler
1, 2, 3 : Analysis - Our Focus4, 5, 6 : Synthesis
CH1.18
CSE244
The Synthesis Task For Compilation Intermediate Code GenerationIntermediate Code Generation
Abstract Machine Version of Code - Independent of Architecture
Easy to Produce and Do Final, Machine Dependent Code Generation
Code OptimizationCode Optimization Find More Efficient Ways to Execute Code Replace Code With More Optimal Statements 2-approaches: High-level Language &
“Peephole” Optimization Final Code GenerationFinal Code Generation
Generate Relocatable Machine Dependent Code
CH1.19
CSE244
Reviewing the Entire ProcessReviewing the Entire Process
Errors
position := initial + rate * 60
lexical analyzer
syntax analyzer
semantic analyzer
intermediate code generator
id1 := id2 + id3 * 60
:=
id1id2l
id3
+*
60
:=
id1id2l
id3
+*
inttoreal
60
Symbol Table
position ....
initial ….
rate….
CH1.20
CSE244
Reviewing the Entire ProcessReviewing the Entire Process
Errorsintermediate code generator
code optimizer
final code generator
temp1 := inttoreal(60)temp2 := id3 * temp1temp3 := id2 + temp2id1 := temp3
temp1 := id3 * 60.0id1 := id2 + temp1
mov f id3, r2mulf #60.0, r2movf id2, r1addf r2, r2movf r1, id1
position ....
initial ….
rate….
Symbol Table
CH1.21
CSE244
Compiler Cousins:Compiler Cousins: PreprocessorsPreprocessors Provide Input to Compilers
1. Macro Processing
#define in C: does text substitution before compiling
#define X 3
#define Y A*B+C
#define Z getchar()
CH1.22
CSE244
2. File Inclusion
#include in C - bring in another file before compiling
defs.h
//////////////////
main.c
#include “defs.h”
…---…---…---…---…---…---…---…---…---
//////////////////
…---…---…---…---…---…---…---…---…---
CH1.23
CSE244
3. Rational Preprocessors Augment “Old” Languages With Modern Augment “Old” Languages With Modern
ConstructsConstructs Add Macros for If - Then, While, Etc. Add Macros for If - Then, While, Etc. #Define Can Make C Code More Pascal-like#Define Can Make C Code More Pascal-like
#define begin {
#define end }
#define then
CH1.24
CSE244
4. Language Extensions for a Database System
EQUEL - Database query language embedded in C
## Retrieve (DN=Department.Dnum) where
## Department.Dname = ‘Research’
is Preprocessed into:
ingres_system(“Retr…..Research’”,____,____);
a procedure call in a programming language.
CH1.25
CSE244
The Grouping of Phases
Front End : Analysis + Intermediate Code Generation
Back End : Code Generation + Optimizationvs.
Number of Passes:Single - Preferred
Multiple - Easier, but less efficient
Tradeoffs ……..
CH1.26
CSE244
Compiler Construction Tools
Parser Generators : Produce Syntax Analyzers
Scanner Generators : Produce Lexical Analyzers
Syntax-directed Translation Engines : Generate Intermediate Code
Automatic Code Generators : Generate Actual Code
Data-Flow Engines : Support Optimization