lecture 1 introduction to language processors

33
Lectured by Rebaz Na Compilers (CPL5316) Software Engineering Koya university 2015-2016 e-mail: rebaz.najeeb at koyauniversity dot org Lecture 1 : Introduction to language processors

Upload: rebaz-najeeb

Post on 16-Apr-2017

53 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Lecture 1  introduction to language processors

Lectured by Rebaz Najeeb

Compilers (CPL5316)

Software EngineeringKoya university2015-2016 e-mail: rebaz.najeeb at koyauniversity dot org

Lecture 1 : Introduction to language processors

Page 2: Lecture 1  introduction to language processors

Outline

Overview Introduction to compiler The Architecture of a Compiler Lexical analysis (Scanning) Syntax Analysis (Parsing) Semantic Analysis Intermediate Code Generation Code Optimization Code Generation

Page 3: Lecture 1  introduction to language processors

What to know?

The essential knowledge in Computation Theory RE, FSA, and CFG.

The essential skills of a Java developer.

The Dragon book

Page 4: Lecture 1  introduction to language processors

Why compilers?

• Increase productivity

• Low level programming languages are harder to write , less portable , error-prone area, harder to

maintain.

• Hardware synthesis :

Verilog , VHDL to describe high-level hardware description language.

Register Transfer Language (RTL) gates transistors physical layout.

• Reverse engineering eg. (Executble applications .exe )

runtime (0100101100) assembly code high level language.

Page 5: Lecture 1  introduction to language processors

Understanding binary language is hard• 01000101 01110110 01100101 01110010 01111001 01101111 01101110

01100101 00100000 01100011 01100001 01101110 00100000 01110000 01100001 01110011 01110011 00101100 00100000 01110101 01101110 01101100 01100101 01110011 01110011 00100000 01010011 00101111 01101000 01100101 00100000 01100100 01101111 01100101 01110011 01101110 00100111 01110100 00100000 01110111 01100001 01101110 01110100 00100000 01110100 01101111 00101110

• What does that mean ?

Page 6: Lecture 1  introduction to language processors

Programming language classification (Levels)• Low-level language

Assembly languageMachine language

• High-level languageC, C++, java, Pascal, Prolog, Scheme

• Natural languageEnglish, Kurdish.

* There is also a classification by programming language generations.

Page 7: Lecture 1  introduction to language processors

Who is that girl? Tell me what is her name?

Page 8: Lecture 1  introduction to language processors

History of compilers• Grace Hopper, in 1952 ,A-0 System .

• Alick Glennie in 1952, Autocode.

• John W. Backus Speedcoding , 1953 . - More productive , but 10-20 slower in execution. - took 300 bytes of memory (30% memory)

• Fortran, first complete compiler, in 1954-1957 (18 yrs)

%50 code were in Fortran.

• 1960 Cobol, lisp

• 1970 Pascal , C

• 1980 OOP Ada , smalltalk , C++ .

• 1990 Java , script , Perl

• 2000 language specifications

Page 9: Lecture 1  introduction to language processors

Compilers• A compiler translates the code written in one language (HLL) to some other language (LLL) without changing the

meaning of the program.

• It is also expected that a compiler should make the target code efficient and optimized in terms of time and space.

• Compiler design covers basic translation mechanism and error detection & recovery.

• Fortran , Ada, C, C++ , C# , Cobol.

Source program Compiler Output (result)

Error/ Warning

Executable program

Executable program

Input(data)

Page 10: Lecture 1  introduction to language processors

interpreters• Interpreter is a type of language processor that directly executes the operations specified in the source program on

inputs supplied by the user.

• Python , Perl , Basic , Ruby, AWK.

Source program Interpreter Output (result)

Error/ Warning

Input(data)

Page 11: Lecture 1  introduction to language processors

Compilers vs interpreters • No Compiler Interpreter

1 Compiler Takes Entire program as input Interpreter Takes Single instruction as input .

2 Intermediate Object Code is Generated No Intermediate Object Code isGenerated

3 Conditional Control Statements are Executes faster Conditional Control Statements are Executes slower

4 Memory Requirement : More(Since Object Code is Generated) Memory Requirement is Less

5 Program need not be compiled every time Every time higher level program is converted into lower level program

6 Errors are displayed after entire program is checked Errors are displayed for every instruction interpreted (if any)

7 Example : C Compiler Example : BASIChttp://www.c4learn.com/c-programming/compiler-vs-interpreter/

Page 12: Lecture 1  introduction to language processors

Java Virtual Machine • Why Java machine independent ? • Tools to view and edit bytecodes

- ASM (http://asm.ow2.org)- Jasmin (http://jasmin.sourceforge.net)

• Show demo by CMD

Page 13: Lecture 1  introduction to language processors

Java compilers

List of compilershttps://en.wikipedia.org/wiki/List_of_compilers

Page 14: Lecture 1  introduction to language processors

Language processing phases? Source Program

Preprocessor

Compiler

Assembler

Linker/Loader

Target machine code

Modified source program

Target assembly language

Relocatable machine code Library files

Relocatable object files

Memory

Page 15: Lecture 1  introduction to language processors

Language processing phases? Source Program

Preprocessor

Compiler

Assembler

Linker/Loader

Target machine code

Modified source program

Target assembly language

Relocatable machine code

Compiler

Analysis

Synthesis

Library files Relocatable object files

Memory

Page 16: Lecture 1  introduction to language processors

Analysis and SynthesisCompiler

Analysis

Synthesis

Lexical analysis

Syntax analysis

Semantic analysis

Intermediate code generation

Front-end

Code optimization

Code Generation

Back-end

Page 17: Lecture 1  introduction to language processors

Analysis VS Synthesis Analysis part

Breaks up the source program into constituent pieces and create an intermediate representation of the source program.

It is often called the front end of the compiler.

The analysis part can be divided along the following phases:

Lexical analysis , syntax analysis , semantic analysis and intermediate code generation (optional)

Synthesis part

Construct the desired target program from the intermediate representation and the information of the symbol table. It is

often called the back end.

The Synthesis part can be divided along the following phases:

Intermediate code generator, code optimizer, code generator

Page 18: Lecture 1  introduction to language processors

Compiler phases

Lexical analysis Syntax analysis Semantic analysis

Intermediate code generation

Machine-independent Code optimizationCode Generator

Character stream

Token stream

Syntax tree

Syntax tree

Intermediate code representation

Intermediate code representation

Target machine code

Machine-independent Code optimization

Target machine code

Symbol table

Page 19: Lecture 1  introduction to language processors

Compiler phases

Target code

1. Lexical analysis

2. Syntax analysis

3. Semantic analysis

4. Code optimization

5. Code Generation

http://www.tutorialspoint.com/compiler_design/images/compiler_phases.jpg

Page 20: Lecture 1  introduction to language processors

Lexical analysis (Scanning)• How do we understand English ?

• Break up sentence and recognize words

Iamasmartstudent. I am a smart student.

SeparatorNoun

Page 21: Lecture 1  introduction to language processors

Lexical analysis (Scanning)

lexical analysis is the process of converting a sequence of characters (such as

a computer program or web page) into a sequence of tokens (strings with an

identified "meaning").

Page 22: Lecture 1  introduction to language processors

Syntax analysis• How do we understand English?

The Smart students never ever give up.

S Adv V

Sentence

Page 23: Lecture 1  introduction to language processors

Syntax analysis (Parsing)

A Parser reads a stream of tokens from scanner, and determines if the syntax of the

program is correct according to the context-free grammar (CFG) of the source language.

Then, Tokens are grouped into grammatical phrases represented by a Parse Tree or an

abstract syntax tree, which gives a hierarchical structure to the source program.

Page 24: Lecture 1  introduction to language processors

Syntax analysis example

Page 25: Lecture 1  introduction to language processors

Semantic analysis

For example : A woman without her man is nothing. Vs A woman: without her, man is nothing

Page 26: Lecture 1  introduction to language processors

Semantic analysis

The Semantic Analysis phase checks the (meaning of) source program for semantic errors (Type Checking) and gathers type information for the successive phases.

Semantic analysis is the heart of compiler. Also, type checking is the important part in this phase.

Check language requirements like proper declarations.

Semantic analysis catches inconsistencies for instance mismatching datatypes.

Page 27: Lecture 1  introduction to language processors

Intermediate code generationAfter syntax and semantic analysis of the source program , many compilers generate an explicit low-level

or machine-like intermediate code.

In some compilers, a source program is translated into an intermediate code first and then translated into

the target language. In other compilers, translated directly into the target language.

One of the popular intermediate code is three-address code.Example:

temp1 = int_to_float(60)temp2 = id3 temp1∗temp3 = id2 + temp2id1 = temp3

Page 28: Lecture 1  introduction to language processors

OptimizationThis phase attempts to improve the intermediate code, which is produced. So that faster-running machine code can be achieved in the term of time and space.

The optimized code MUST be CORRECT

Run Faster (time)

Minimize power consumption (Mobile devices)

Use less memory

Shorter is better

Consider network, database access.

Page 29: Lecture 1  introduction to language processors

Optimize this code

Page 30: Lecture 1  introduction to language processors

Code Generation

The code generator takes as input an intermediate code representation of

the source program and maps it into the target language.

If the target language is machine code , registers or memory locations are

selected for each of the variables used by the program.

Page 31: Lecture 1  introduction to language processors

Symbol table • A symbol table is a data structure containing a record for each variable name, with fields for the

attributes of the name.

• Symbol table should allow find the record for each name quickly and to store and retrieve data from that

record quickly.

• Attributes may provide information about storage allocation, type , scope , number and type of

arguments , method of passing, the type returned.

Page 32: Lecture 1  introduction to language processors

All phases in one example

Page 33: Lecture 1  introduction to language processors