lecture 1 introduction to language processors
TRANSCRIPT
Lectured by Rebaz Najeeb
Compilers (CPL5316)
Software EngineeringKoya university2015-2016 e-mail: rebaz.najeeb at koyauniversity dot org
Lecture 1 : Introduction to language processors
Outline
Overview Introduction to compiler The Architecture of a Compiler Lexical analysis (Scanning) Syntax Analysis (Parsing) Semantic Analysis Intermediate Code Generation Code Optimization Code Generation
What to know?
The essential knowledge in Computation Theory RE, FSA, and CFG.
The essential skills of a Java developer.
The Dragon book
Why compilers?
• Increase productivity
• Low level programming languages are harder to write , less portable , error-prone area, harder to
maintain.
• Hardware synthesis :
Verilog , VHDL to describe high-level hardware description language.
Register Transfer Language (RTL) gates transistors physical layout.
• Reverse engineering eg. (Executble applications .exe )
runtime (0100101100) assembly code high level language.
Understanding binary language is hard• 01000101 01110110 01100101 01110010 01111001 01101111 01101110
01100101 00100000 01100011 01100001 01101110 00100000 01110000 01100001 01110011 01110011 00101100 00100000 01110101 01101110 01101100 01100101 01110011 01110011 00100000 01010011 00101111 01101000 01100101 00100000 01100100 01101111 01100101 01110011 01101110 00100111 01110100 00100000 01110111 01100001 01101110 01110100 00100000 01110100 01101111 00101110
• What does that mean ?
Programming language classification (Levels)• Low-level language
Assembly languageMachine language
• High-level languageC, C++, java, Pascal, Prolog, Scheme
• Natural languageEnglish, Kurdish.
* There is also a classification by programming language generations.
Who is that girl? Tell me what is her name?
History of compilers• Grace Hopper, in 1952 ,A-0 System .
• Alick Glennie in 1952, Autocode.
• John W. Backus Speedcoding , 1953 . - More productive , but 10-20 slower in execution. - took 300 bytes of memory (30% memory)
• Fortran, first complete compiler, in 1954-1957 (18 yrs)
%50 code were in Fortran.
• 1960 Cobol, lisp
• 1970 Pascal , C
• 1980 OOP Ada , smalltalk , C++ .
• 1990 Java , script , Perl
• 2000 language specifications
Compilers• A compiler translates the code written in one language (HLL) to some other language (LLL) without changing the
meaning of the program.
• It is also expected that a compiler should make the target code efficient and optimized in terms of time and space.
• Compiler design covers basic translation mechanism and error detection & recovery.
• Fortran , Ada, C, C++ , C# , Cobol.
Source program Compiler Output (result)
Error/ Warning
Executable program
Executable program
Input(data)
interpreters• Interpreter is a type of language processor that directly executes the operations specified in the source program on
inputs supplied by the user.
• Python , Perl , Basic , Ruby, AWK.
Source program Interpreter Output (result)
Error/ Warning
Input(data)
Compilers vs interpreters • No Compiler Interpreter
1 Compiler Takes Entire program as input Interpreter Takes Single instruction as input .
2 Intermediate Object Code is Generated No Intermediate Object Code isGenerated
3 Conditional Control Statements are Executes faster Conditional Control Statements are Executes slower
4 Memory Requirement : More(Since Object Code is Generated) Memory Requirement is Less
5 Program need not be compiled every time Every time higher level program is converted into lower level program
6 Errors are displayed after entire program is checked Errors are displayed for every instruction interpreted (if any)
7 Example : C Compiler Example : BASIChttp://www.c4learn.com/c-programming/compiler-vs-interpreter/
Java Virtual Machine • Why Java machine independent ? • Tools to view and edit bytecodes
- ASM (http://asm.ow2.org)- Jasmin (http://jasmin.sourceforge.net)
• Show demo by CMD
Java compilers
List of compilershttps://en.wikipedia.org/wiki/List_of_compilers
Language processing phases? Source Program
Preprocessor
Compiler
Assembler
Linker/Loader
Target machine code
Modified source program
Target assembly language
Relocatable machine code Library files
Relocatable object files
Memory
Language processing phases? Source Program
Preprocessor
Compiler
Assembler
Linker/Loader
Target machine code
Modified source program
Target assembly language
Relocatable machine code
Compiler
Analysis
Synthesis
Library files Relocatable object files
Memory
Analysis and SynthesisCompiler
Analysis
Synthesis
Lexical analysis
Syntax analysis
Semantic analysis
Intermediate code generation
Front-end
Code optimization
Code Generation
Back-end
Analysis VS Synthesis Analysis part
Breaks up the source program into constituent pieces and create an intermediate representation of the source program.
It is often called the front end of the compiler.
The analysis part can be divided along the following phases:
Lexical analysis , syntax analysis , semantic analysis and intermediate code generation (optional)
Synthesis part
Construct the desired target program from the intermediate representation and the information of the symbol table. It is
often called the back end.
The Synthesis part can be divided along the following phases:
Intermediate code generator, code optimizer, code generator
Compiler phases
Lexical analysis Syntax analysis Semantic analysis
Intermediate code generation
Machine-independent Code optimizationCode Generator
Character stream
Token stream
Syntax tree
Syntax tree
Intermediate code representation
Intermediate code representation
Target machine code
Machine-independent Code optimization
Target machine code
Symbol table
Compiler phases
Target code
1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
4. Code optimization
5. Code Generation
http://www.tutorialspoint.com/compiler_design/images/compiler_phases.jpg
Lexical analysis (Scanning)• How do we understand English ?
• Break up sentence and recognize words
Iamasmartstudent. I am a smart student.
SeparatorNoun
Lexical analysis (Scanning)
lexical analysis is the process of converting a sequence of characters (such as
a computer program or web page) into a sequence of tokens (strings with an
identified "meaning").
Syntax analysis• How do we understand English?
The Smart students never ever give up.
S Adv V
Sentence
Syntax analysis (Parsing)
A Parser reads a stream of tokens from scanner, and determines if the syntax of the
program is correct according to the context-free grammar (CFG) of the source language.
Then, Tokens are grouped into grammatical phrases represented by a Parse Tree or an
abstract syntax tree, which gives a hierarchical structure to the source program.
Syntax analysis example
Semantic analysis
For example : A woman without her man is nothing. Vs A woman: without her, man is nothing
Semantic analysis
The Semantic Analysis phase checks the (meaning of) source program for semantic errors (Type Checking) and gathers type information for the successive phases.
Semantic analysis is the heart of compiler. Also, type checking is the important part in this phase.
Check language requirements like proper declarations.
Semantic analysis catches inconsistencies for instance mismatching datatypes.
Intermediate code generationAfter syntax and semantic analysis of the source program , many compilers generate an explicit low-level
or machine-like intermediate code.
In some compilers, a source program is translated into an intermediate code first and then translated into
the target language. In other compilers, translated directly into the target language.
One of the popular intermediate code is three-address code.Example:
temp1 = int_to_float(60)temp2 = id3 temp1∗temp3 = id2 + temp2id1 = temp3
OptimizationThis phase attempts to improve the intermediate code, which is produced. So that faster-running machine code can be achieved in the term of time and space.
The optimized code MUST be CORRECT
Run Faster (time)
Minimize power consumption (Mobile devices)
Use less memory
Shorter is better
Consider network, database access.
Optimize this code
Code Generation
The code generator takes as input an intermediate code representation of
the source program and maps it into the target language.
If the target language is machine code , registers or memory locations are
selected for each of the variables used by the program.
Symbol table • A symbol table is a data structure containing a record for each variable name, with fields for the
attributes of the name.
• Symbol table should allow find the record for each name quickly and to store and retrieve data from that
record quickly.
• Attributes may provide information about storage allocation, type , scope , number and type of
arguments , method of passing, the type returned.
All phases in one example