chapter 1: introduction to compiling

37
RIT 08/11/47 Chapter 1 1 Chapter 1: Introduction to Chapter 1: Introduction to Compiling Compiling Dr. Winai Wichaipanitch Rajamangala Institute of Technology Klong 6 Thanyaburi Pathumthani 12110 Tel: 06-999-2974 [email protected] http://www.en.rit.ac.th/winai

Upload: mikko

Post on 23-Jan-2016

55 views

Category:

Documents


0 download

DESCRIPTION

Chapter 1: Introduction to Compiling. Dr. Winai Wichaipanitch Rajamangala Institute of Technology Klong 6 Thanyaburi Pathumthani 12110 Tel: 06-999-2974. [email protected] http://www.en.rit.ac.th/winai. Source program. Target Program. Compiler. Error messages. Diverse & Varied. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 1

Chapter 1: Introduction to CompilingChapter 1: Introduction to Compiling

Dr. Winai WichaipanitchRajamangala Institute of Technology

Klong 6 Thanyaburi Pathumthani 12110Tel: 06-999-2974

[email protected]://www.en.rit.ac.th/winai

Page 2: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 2

Purpose of CompilerPurpose of Compiler

Compilers translate a program written into one Compilers translate a program written into one language (source) into another (target)language (source) into another (target)

CompilerSource program

Target Program

Error messages

Diverse & Varied

Page 3: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 3

Introduction to CompilersIntroduction to Compilers

As a Discipline, Involves Multiple CS&E AreasAs a Discipline, Involves Multiple CS&E Areas Programming Languages and Algorithms Theory of Computing & Software Engineering Computer Architecture & Operating Systems

Page 4: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 4

Translation MechanismsTranslation Mechanisms

CompilationCompilation To translate a source program in one language into an

executable program in another language and produce results while executing the new program

Examples: C, C++, FORTRAN

InterpretationInterpretation To read a source program and produce the results while

understanding that program Examples: BASIC, LISP

Case Study: JAVACase Study: JAVA First, translate to java bytecode Second, execute by interpretation (JVM)

Page 5: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 5

Comparison of Compiler/InterpreterComparison of Compiler/Interpreter

CompilerCompiler InterpreterInterpreterOverviewOverview

AdvantagesAdvantages Fast program execution;Fast program execution;Exploit architecture Exploit architecture features;features;

Easy to debug;Easy to debug;Flexible to modify;Flexible to modify;Machine independent;Machine independent;

DisadvantaDisadvantagesges

Pre-processing of Pre-processing of program;program;Complexity;Complexity;

Execution overhead;Execution overhead;Space overhead;Space overhead;

interpreter

SourceCode

Data

Resultscompiler

SourceCode

Data Results

Object code

Page 6: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 6

Classifications of CompilersClassifications of Compilers

Compilers Viewed from Many PerspectivesCompilers Viewed from Many Perspectives

However, All utilize same basic tasks to However, All utilize same basic tasks to accomplish their actionsaccomplish their actions

Single Pass

Multiple Pass

Load & Go

Construction

Debugging

OptimizingFunctional

Page 7: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 7

เรายั�งไม่ทราบค่าแอดเดรส ด�งนั้��นั้ต้�องอานั้ เรายั�งไม่ทราบค่าแอดเดรส ด�งนั้��นั้ต้�องอานั้ Source code Source code 2 2 ค่ร��งค่ร��ง

Page 8: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 8

The ModelThe Model

The TWO Fundamental Parts:The TWO Fundamental Parts:

We Will Discuss Both in This Class, andWe Will Discuss Both in This Class, andFOCUS on analysis.FOCUS on analysis.

Analysis:

Synthesis:

Decompose Source into an intermediate representation

Target program generation from representation

Page 9: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 9

Important Notes

Today: There are many Today: There are many Software ToolsSoftware Tools for helping with the for helping with the AnalysisAnalysis Part. This Wasn’t the Case in Early Days. Part. This Wasn’t the Case in Early Days. (some) (some) analysis is also important inanalysis is also important in::

Structure / Syntax directed editors: Force “syntactically” correct code to be entered

Pretty Printers: Standardized version for program structure (i.e., blank space, indenting, etc.)

Static Checkers: A “quick” compilation to detect rudimentary errors

Interpreters: “real” time execution of code a “line-at-a-time”

Page 10: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 10

Important Notes

Compilation Is Compilation Is NotNot Limited to Programming Language Limited to Programming Language ApplicationsApplications Text Formatters

LATEX & TROFF Are Languages Whose Commands Format Text

Silicon Compilers Textual / Graphical: Take Input and Generate Circuit Design

Database Query Processors Database Query Languages Are Also a Programming

Language

Input is compiled Into a Set of Operations for Accessing the Database

Page 11: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 11

The Many The Many PhasesPhases of a Compiler of a CompilerSource Program

Lexical Analyzer

1

Syntax Analyzer2

Semantic Analyzer3

Intermediate Code Generator

4

Code Optimizer5

Code Generator6

Target Program

Symbol-table Manager

Error Handler

1, 2, 3 : Analysis - Our Focus

4, 5, 6 : Synthesis

Page 12: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 12

Phases of A Modern CompilerPhases of A Modern Compiler

Lexical Analyzer

Syntax Analyzer

Semantic Analyzer

Code Optimizer

Code Generation

Source ProgramSource Program IF (a<b) THEN c=1*d;

Token SequenceToken Sequence

Syntax TreeSyntax Tree

3-Address Code3-Address Code

Optimized 3-Addr. CodeOptimized 3-Addr. Code

Assembly CodeAssembly Code

IF (ID“a”

<ID“b”

THENID“c”

=CONST

“1” *ID“d”

IF_stmt

<a

b

cond_expr

listassign_stmt

c

*

lhs

rhs 1

dGE a, b, L1MUlT 1, d, cL1:

GE a, b, L1MOV d, cL1: loadi R1,a

cmpi R1,bjge L1loadi R1,dstorei R1,cL1:

Page 13: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 13

Language-Processing SystemLanguage-Processing System

Source Program

Pre-Processor1

Compiler2

Assembler3

RelocatableMachine Code

4

Loader Link/Editor

5

Executable

Library,relocatable object files

Page 14: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 14

Three Phases:Three Phases: Linear / Lexical Analysis:

L-to-r Scan to Identify Tokenstoken: sequence of chars having a collective meaning

Hierarchical Analysis:

Grouping of Tokens Into Meaningful Collection

Semantic Analysis:

Checking to ensure Correctness of Components

The Analysis Task For Compilation

Page 15: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 15

Phase 1. Lexical Analysis

Easiest Analysis - Identify tokens which are the basic building blocks

For Example:

All are tokens

Blanks, Line breaks, etc. are scanned out

Position := initial + rate * 60 ;_______ __ _____ _ ___ _ __ _

Page 16: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 16

Phase 2. Phase 2. Hierarchical AnalysisHierarchical Analysisaka aka ParsingParsing or or Syntax AnalysisSyntax Analysis

For previous example,

we would have

Parse Tree:

identifier

identifier

expression

identifier

expression

number

expression

expression

expression

assignment statement

position

:=

+

*

60

initial

rate

Nodes of tree are constructed using a grammar for the language

Page 17: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 17

What is a Grammar?What is a Grammar?

Grammar is a Set of Rules Which Govern the Grammar is a Set of Rules Which Govern the Interdependencies & Structure Among the TokensInterdependencies & Structure Among the Tokens

statement is an assignment statement, or while statement, or if statement, or ...

assignment statement

expression is an

is an identifier := expression ;

(expression), or expression + expression, or expression * expression, or number, or identifier, or ...

Page 18: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 18

if statement

if expression then statement else statement ;

id idrelop

num = 0

assign statement

id :=expression

id

0

assign statement

id :=expression

id idmulop

avg

avg

num/

Syntax TreeSyntax Tree

Page 19: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 19

Why Have We Divided Analysis Why Have We Divided Analysis in This Manner?in This Manner?

Lexical Analysis - Scans Input, Its Linear Actions Lexical Analysis - Scans Input, Its Linear Actions Are Not RecursiveAre Not Recursive Identify Only Individual “words” that are the

the Tokens of the Language Recursion Is Required to Identify Structure of an Recursion Is Required to Identify Structure of an

Expression, As Indicated in Parse TreeExpression, As Indicated in Parse Tree Verify that the “words” are Correctly

Assembled into “sentences”

Page 20: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 20

Phase 3. Semantic AnalysisPhase 3. Semantic Analysis

Find More Complicated Semantic Errors and Find More Complicated Semantic Errors and Support Code GenerationSupport Code Generation

Parse Tree Is Augmented With Semantic ActionsParse Tree Is Augmented With Semantic Actions

position

initial

rate

:=+

*

60

Compressed Tree

position

initial

rate

:=+

*

inttoreal

60

Conversion Action

Page 21: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 21

Phase 3. Semantic AnalysisPhase 3. Semantic Analysis

Most ImportantMost Important Activity in This Phase: Activity in This Phase:

Type CheckingType Checking - - Legality of OperandsLegality of Operands

Page 22: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 22

Supporting Phases/ Activities for Analysis

Symbol Table Creation / MaintenanceSymbol Table Creation / Maintenance Contains Info (storage, type, scope, args) on

Each “Meaningful” Token, Typically Identifiers Data Structure Created / Initialized During

Lexical Analysis Utilized / Updated During Later Analysis &

Synthesis

Page 23: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 23

Name Type Def/ UDef other

avg realid D . . .

if keyword . . .

num intid D . . .

sum realid D . . .

then keyword . . .

Symbol Table for ExampleSymbol Table for Example

Page 24: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 24

Detection of Different Errors Which Correspond to All Phases

What Kinds of Errors Are Found During the Analysis Phase?

What Happens When an Error Is Found?

Error HandlingError Handling

Page 25: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 25

The Many The Many PhasesPhases of a Compiler of a CompilerSource Program

Lexical Analyzer

1

Syntax Analyzer2

Semantic Analyzer3

Intermediate Code Generator

4

Code Optimizer5

Code Generator6

Target Program

Symbol-table Manager

Error Handler

1, 2, 3 : Analysis - Our Focus

4, 5, 6 : Synthesis

Page 26: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 26

The Synthesis Task For Compilation Intermediate Code GenerationIntermediate Code Generation

Abstract Machine Version of Code - Independent of Architecture

Easy to Produce and Do Final, Machine Dependent Code Generation

Code OptimizationCode Optimization Find More Efficient Ways to Execute Code Replace Code With More Optimal Statements 2-approaches: High-level Language &

“Peephole” Optimization Final Code GenerationFinal Code Generation

Generate Relocatable Machine Dependent Code

Page 27: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 27

Reviewing the Entire ProcessReviewing the Entire Process

Errors

position := initial + rate * 60

lexical analyzer

syntax analyzer

semantic analyzer

intermediate code generator

id1 := id2 + id3 * 60

:=

id1id2l

id3

+*

60

:=

id1id2l

id3

+*

inttoreal

60

Symbol Table

position ....

initial ….

rate….

Page 28: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 28

Reviewing the Entire ProcessReviewing the Entire Process

Errors

intermediate code generator

code optimizer

final code generator

temp1 := inttoreal(60)

temp2 := id3 * temp1

temp3 := id2 + temp2

id1 := temp3

temp1 := id3 * 60.0

id1 := id2 + temp1

MOVF id3, R2

MULF #60.0, R2MOVF id2, R1ADDF R1, R2MOVF R1, id1

position ....

initial ….

rate….

Symbol Table

3 address code

Page 29: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 29

AssemblersAssemblers

Assembly code: names are used for instructions, Assembly code: names are used for instructions, and names are used for memory addresses.and names are used for memory addresses.

Two-pass Assembly:Two-pass Assembly: First Pass: all identifiers are assigned to

memory addresses (0-offset)e.g. substitute 0 for a, and 4 for b

Second Pass: produce relocatable machine code:

MOV a, R1

ADD #2, R1MOV R1, b

0001 01 00 00000000 *

0011 01 10 000000100010 01 00 00000100 *

relocationbit

Page 30: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 30

Loaders and Link-EditorsLoaders and Link-Editors

Loader: taking relocatable machine code, altering Loader: taking relocatable machine code, altering the addresses and placing the altered instructionsthe addresses and placing the altered instructionsinto memory.into memory.

Link-editor: taking many (relocatable) machine Link-editor: taking many (relocatable) machine code programs (with cross-references) and produce code programs (with cross-references) and produce a single file.a single file. Need to keep track of correspondence between

variable names and corresponding addresses in each piece of code.

Page 31: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 31

Compiler Cousins:Compiler Cousins: PreprocessorsPreprocessors Provide Input to Compilers

1. Macro Processing

#define in C: does text substitution before compiling

#define X 3

#define Y A*B+C

#define Z getchar()

Page 32: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 32

2. File Inclusion

#include in C - bring in another file before compiling

defs.h

//////

//////

//////

main.c

#include “defs.h”

…---…---…---…---…---…---…---…---…---

//////

//////

//////

…---…---…---…---…---…---…---…---…---

Page 33: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 33

3. Rational Preprocessors

Augment “Old” Languages With Modern Augment “Old” Languages With Modern ConstructsConstructs

Add Macros for If - Then, While, Etc. Add Macros for If - Then, While, Etc.

#Define Can Make C Code More Pascal-like#Define Can Make C Code More Pascal-like

#define begin {

#define end }

#define then

Page 34: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 34

4. Language Extensions for a Database System

EQUEL - Database query language embedded in C

## Retrieve (DN=Department.Dnum) where

## Department.Dname = ‘Research’

is Preprocessed into:

ingres_system(“Retr…..Research’”,____,____);

a procedure call in a programming language.

Page 35: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 35

The Grouping of Phases

Front End : Analysis + Intermediate Code Generation

Back End : Code Generation + Optimizationvs.

Number of Passes:

A pass: requires r/w intermediate files

Fewer passes: more efficiency.

However: fewer passes require more sophisticated memory management and compiler phase interaction.

Tradeoffs ……..

Page 36: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 36

Compiler Construction Tools

Parser Generators : Produce Syntax Analyzers

Scanner Generators : Produce Lexical Analyzers

Syntax-directed Translation Engines : Generate Intermediate Code

Automatic Code Generators : Generate Actual Code

Data-Flow Engines : Support Optimization

Page 37: Chapter 1: Introduction to Compiling

RIT

08/11/47 Chapter 1 37

ToolsTools

Tools exist to help in the development of some Tools exist to help in the development of some stages of the compilerstages of the compiler Lex (Flex) - lexical analysis generator Yacc (Bison) - parser generator