compilers and language translation gordon college
TRANSCRIPT
![Page 1: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/1.jpg)
Compilers and Language Translation
Gordon College
![Page 2: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/2.jpg)
2
What’s a compiler? All computers only understand machine language
Therefore, high-level language instructions must be translated into machine language prior to execution
10000010010110100100101……
This isa program
![Page 3: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/3.jpg)
3
What’s a compiler? Compiler
A piece of system software that translates high-level languages into machine language
10000010010110100100101……
Congrats!
while (c!='x') { if (c == 'a' || c == 'e' || c == 'i')
printf("Congrats!"); else if (c!='x') printf("You Loser!"); }
Compiler
gcc -o prog program.c
program.c
prog
![Page 4: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/4.jpg)
4
Assembler (a kind of compiler)
LOAD X Assembly
0101 0000 0000 1001 Machine Language
(symbol table)(opcode table)
One-to-one translation
![Page 5: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/5.jpg)
5
Compiler (high-level language translator)
a = b + c - d;
One-to-many translation
0101 00001110001 LOAD B0111 00001110010 ADD C0110 00001110011 SUBTRACT D0100 00001110100 STORE A
0101 00001110001 0111 00001110010…….
![Page 6: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/6.jpg)
6
Goals of a compiler Code produced must be correct
A = (B+C)-(D+E);
Possible translation:LOAD BADD CSTORE BLOAD DADD ESTORE DLOAD BSUBTRACT DSTORE A
No - STORE B and STORE D changes the values of variables B and D which is the high-level language does not intend
Is this correct?
![Page 7: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/7.jpg)
7
Goals of a compiler Code produced should be reasonably efficient
and conciseCompute the sum - 2x1+ 2x2+ 2x3+ 2x4+…. 2x50000
sum = 0.0for(i=0;i<50000;i++) {
sum = sum + (2.0 * x[i]);
Optimizing compiler:sum = 0.0for(i=0;i<50000;i++) {
sum = sum + x[i];sum = sum * 2.0; 49,999 less instructions
![Page 8: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/8.jpg)
8
General Structure of a Compiler
![Page 9: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/9.jpg)
9
The Compilation Process Phase I: Lexical analysis
Compiler examines the individual characters in the source program and groups them into syntactical units called tokens
Phase II: Parsing
The sequence of tokens formed by the scanner is checked to see whether it is syntactically correct
ScannerSourcecode
Groups of
tokens
ParserGroups of
tokens
correct
not correct
![Page 10: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/10.jpg)
10
The Compilation Process
Phase III: Semantic analysis and code generation The compiler analyzes the meaning of the
high-level language statement and generates the machine language instructions to carry out these actions
Code Generator
Groupsof
tokens
Machinelanguage
![Page 11: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/11.jpg)
11
The Compilation Process Phase IV: Code optimization
The compiler takes the generated code and sees whether it can be made more efficient
Code Optimizer Machine
languageMachinelanguage
![Page 12: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/12.jpg)
12
Overall Execution Sequence on a High-Level Language Program
![Page 13: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/13.jpg)
13
The Compilation Process
Source program
Original high-level language program
Object program
Machine language translation of the source program
![Page 14: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/14.jpg)
14
Phase I: Lexical Analysis Lexical analyzer
The program that performs lexical analysis
More commonly called a scanner
Job of lexical analyzer Group input characters into tokens
• Tokens: Syntactical units that are treated as single, indivisible entities for the purposes of translation
Classify tokens according to their type
![Page 15: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/15.jpg)
15
Phase I: Lexical AnalysisProgram statement
sum = sum + a[i];
Digital perspective:
tab,s,u,m,blank,=,blank,s,u,m,blank,+,blank,a,[,i,],;
Tokenized:
sum,=,sum,+,a[i],;
![Page 16: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/16.jpg)
16
Phase I: Lexical Analysis
TOKEN TYPE CLASSIFICATION NUMBERSymbol 1Number 2= 3+ 4- 5; 6== 7If 8Else 9( 10) 11[ 12] 13…
Typical Token Classifications
![Page 17: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/17.jpg)
17
Phase I: Lexical Analysis Lexical Analysis Process
1. Discard blanks, tabs, etc. - look for beginning of token.2. Put characters together 3. Repeat step 2 until end of token4. Classify and save token5. Repeat steps 1-4 until end of statement6. Repeat steps 1-5 until end of source code
Scannersum=sum+a[i];
sum 1= 3+ 4a 1[ 12i 1] 13; 6
![Page 18: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/18.jpg)
18
Phase I: Lexical Analysis
Input to a scanner- A high-level language statement from the source program
Scanner’s output- A list of all the tokens in that statement- The classification number of each token found
Scannersum=sum+a[i];
sum 1= 3+ 4a 1[ 12i 1] 13; 6
![Page 19: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/19.jpg)
19
Phase II: Parsing
Parsing phase
A compiler determines whether the tokens recognized by the scanner are a syntactically legal statement
Performed by a parser
![Page 20: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/20.jpg)
20
Phase II: Parsing
Output of a parser
A parse tree, if such a tree exists
An error message, if a parse tree cannot be constructed
Successful construction of a parse tree is proof that the statement is correctly formed
![Page 21: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/21.jpg)
21
Example High-level language statement: a = b + c
![Page 22: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/22.jpg)
22
Grammars, Languages, and BNF
Syntax
The grammatical structure of the language
The parser must be given the syntax of the language
BNF (Backus-Naur Form)Most widely used notation for representing the syntax of a programming language
literal_expression ::= integer_literal | float_literal | string | character
![Page 23: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/23.jpg)
23
Grammars, Languages, and BNF
In BNF
The syntax of a language is specified as a set of rules (also called productions)
A grammar
• The entire collection of rules for a language
Structure of an individual BNF rule
left-hand side ::= “definition”
![Page 24: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/24.jpg)
24
Grammars, Languages, and BNF
BNF rules use two types of objects on the right-hand side of a production Terminals
• The actual tokens of the language
• Never appear on the left-hand side of a BNF rule Nonterminals
• Intermediate grammatical categories used to help explain and organize the language
• Must appear on the left-hand side of one or more rules
![Page 25: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/25.jpg)
25
Grammars, Languages, and BNF
Goal symbol
The highest-level nonterminal
The nonterminal object that the parser is trying to produce as it builds the parse tree
All nonterminals are written inside angle brackets
Java BNF
![Page 26: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/26.jpg)
26
BNF Example<postal-address> ::= <name-part> <street-address> <zip-part> <name-part> ::= <personal-part> <last-name> <opt-jr-part> <EOL> | <personal-part> <name-part> <personal-part> ::= <first-name> | <initial> "." <street-address> ::= <opt-apt-num> <house-num> <street-name> <EOL><zip-part> ::= <town-name> "," <state-code> <ZIP-code> <EOL><opt-jr-part> ::= "Sr." | "Jr." | <roman-numeral> | ""
Identify the following:Goal symbol, terminals, nonterminals, a individual rule
Is this a legal postal address?Steve Moses Sr. 215 Rose Ave.Everywhere, NC 43563
![Page 27: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/27.jpg)
27
Parsing Concepts and Techniques Fundamental rule of parsing:
By repeated applications of the rules of the grammar-
If the parser can convert the sequence of input tokens into the goal symbol the sequence of tokens is a syntactically valid
statement of the languageelse
the sequence of tokens is not a syntactically valid statement of the language
![Page 28: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/28.jpg)
28
Parsing Example<httpaddress> ::= http:// <hostport> [ / <path> ] [ ? <search> ]<hostport> ::= <host> [ : <port> ]<host> ::= <hostname> | <hostnumber><hostname> ::= <ialpha> [ . <hostname> ]<hostnumber> ::= <digits> . <digits> . <digits> . <digits><port> ::= <digits><path> ::= <void> | <xpalphas> [ / <path> ]<search> ::= <xalphas> [ + <search> ]<xalpha> ::= <alpha> | <digit> | <safe> | <extra> | <escape> <xalphas> ::= <xalpha> [ <xalphas> ]<xpalpha> ::= <xalpha> | +<xpalphas> ::= <xpalpha> [ <xpalpha> ]<ialpha> ::= <alpha> [ <xalphas> ]<alpha> ::= a | b | … | z | A | B | … | Z<digit> ::= 0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9<safe> ::= $ | - | _ | @ | . | & | ~<extra> ::= ! | * | " | ' | ( | ) | : | ; | , | <space><escape> ::= % <hex> <hex><hex> ::= <digit> | a | b | c | d | e | f | A | B | C | D | E | F<digits> ::= <digit> [ <digits> ]<void> ::=
Is the following http address legal:http://www.csm.astate.edu/~rossa/cs3543/bnf.html
![Page 29: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/29.jpg)
29
Parsing Concepts and Techniques
Look-ahead parsing algorithms - intelligent parsers
One of the biggest problems in building a compiler is designing a grammar that:
Includes every valid statement that we want to be in the language
Excludes every invalid statement that we do not want to be in the language
![Page 30: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/30.jpg)
30
Parsing Concepts and Techniques
Another problem in constructing a compiler: Designing a grammar that is not ambiguous
An ambiguous grammar allows the construction of two or more distinct parse trees for the same statement
NOT GOOD - multiple interpretations
![Page 31: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/31.jpg)
31
Phase III: Semantics and Code Generation
Semantic analysis
The compiler makes a first pass over the parse tree to determine whether all branches of the tree are semantically valid
• If they are validthe compiler can generate machine language
instructionselse
there is a semantic error; machine language instructions are not generated
![Page 32: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/32.jpg)
32
Phase III: Semantics and Code Generation
Semantic analysis
Syntactically correct, but semantically incorrect
example:
sum = a + b;
int a;double sum; data type mismatchchar b;
Semantic recordsa integersum doubleb char
![Page 33: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/33.jpg)
33
Phase III: Semantics and Code Generation
Semantic analysis
a integerb char
temp ?
<expression> + <expression>
<expression>
Parse tree
Semantic recordSemantic record
Semantic record
![Page 34: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/34.jpg)
34
Phase III: Semantics and Code Generation
Semantic analysis
a integerb integer
temp integer
<expression> + <expression>
<expression>
Parse tree
Semantic recordSemantic record
Semantic record
![Page 35: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/35.jpg)
35
Phase III: Semantics and Code Generation
Code generation
Compiler makes a second pass over the parse tree to produce the translated code
![Page 36: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/36.jpg)
36
Phase IV: Code Optimization Two types of optimization
Local Global
Local optimization The compiler looks at a very small block of
instructions and tries to determine how it can improve the efficiency of this local code block
Relatively easy; included as part of most compilers:
![Page 37: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/37.jpg)
37
Phase IV: Code Optimization
Examples of possible local optimizations
Constant evaluation x = 1 + 1 ---> x = 2
Strength reduction x = x * 2 ---> x = x + x
Eliminating unnecessary operations
![Page 38: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/38.jpg)
38
Phase IV: Code Optimization
Global optimization The compiler looks at large segments of the program
to decide how to improve performance Much more difficult; usually omitted from all but the
most sophisticated and expensive production-level “optimizing compilers”
Optimization cannot make an inefficient algorithm efficient - “only makes an efficient algorithm more efficient”
![Page 39: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/39.jpg)
39
Summary A compiler is a piece of system software that
translates high-level languages into machine language
Goals of a compiler: Correctness and the production of efficient and concise code
Source program: High-level language program
![Page 40: Compilers and Language Translation Gordon College](https://reader037.vdocuments.net/reader037/viewer/2022102818/56649ca35503460f9496284d/html5/thumbnails/40.jpg)
40
Summary Object program: The machine language translation
of the source program
Phases of the compilation process
Phase I: Lexical analysis
Phase II: Parsing
Phase III: Semantic analysis and code generation
Phase IV: Code optimization