informatik iiuser.informatik.uni-goettingen.de/~fu/teaching/... · institut für informatik,...

30
Informatik II Languages, Compilers, and Theory Lecture 1: Introduction and Overview

Upload: others

Post on 06-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Informatik IILanguages, Compilers, and Theory

Lecture 1:Introduction and Overview

Page 2: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Institut für Informatik, University of Göttingen 2

IntroductionEight lectures covering:

Basic programming language conceptsOrganization of compilers for modern programming languagesIntroduction to the theory of formal languages and automata

Text:Programming Language Pragmatics by Michael L. Smith

Page 3: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Institut für Informatik, University of Göttingen 3

Abstractions…Eliminate detail unnecessary for solving a particular problem

Complexity is hiddenOften build upon one another

Allow us to solve increasingly complex problemsModern software’s complexity is without precedent

Abstractions are absolutely necessary to manage this complexity

Resource allocation (time, storage, etc.)Operating systemsPhysical networks, protocolsComputer communications

Machine languageAssembly languageDigital logicComputer architectureTransistorsDigital logicAbstractsAbstraction

Page 4: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Institut für Informatik, University of Göttingen 4

Languages as abstractionHuman languages are a tool for abstracting thought

“When I am warm I turn on the fan”Communicates a simple intention, whereas the cognitive and neurological conditions from which the intention arose are most likely too complex for anyone to understandMeaning of this statement is left to the understanding of the individual who utters it and the individuals who hear it

Programming languages are a tool for abstracting computation

if (temperature() > 30.0) { turn_on_fan(); }Involves a complex, but concrete sequence of actions:

read the thermostat; convert the reading to an IEEE floating point value on the centigrade scale; compare the value to 30.0; if greater then send a command to a PCI card which sends a signal to a relay which then turns on the fan

Meaning of this statement is fixed by the formal semantics of the programming language and by the implementations of the functionstemperature() and turn_on_fan()

Page 5: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Institut für Informatik, University of Göttingen 5

How do programming languages abstract computation?

1) Provide a notation for the expression of algorithms that:

Is (mostly) independent of the machine on which the algorithm will executeProvides high-level features that focus the programmer’s attention more on the algorithm and less on:

How the algorithm will be implemented in a machine languageImplementing auxiliary algorithms, e.g., hash tables

Allows the programmer to build her own abstractions (subroutines, modules, libraries, classes, etc.) thus allowing the continuation of the “complexity management by layering” principle

Page 6: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Institut für Informatik, University of Göttingen 6

How do programming languages abstract computation?

2) Hide low-level details of the target architectureAssembly language instruction names, register names, argument ordering, …Mapping of language elements to assembly language

Arithmetic expressions,Conditionals,Procedure calling conventions, …

if (a < b + 10) {do_1();

} else {do_2();

}

SPARC

MIPSaddi $t2,$t1,10bge $t0,$t2,L1call do_1b L2

L1: call do_2L2: …

add %l1,10,%l2cmp %l0,%l2bge .L1; nopcall do_1; nopba .L2; nop

.L1: call do_2; nop

.L2: …

SPA

RCM

IPS

Page 7: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Institut für Informatik, University of Göttingen 7

How do programming languages abstract computation?

3) Provide primitives, subroutines, and run-time support for common programming chores

ExamplesReading and writing filesCharacter string handling (comparison, substring detection, etc.)Dynamic memory allocation (new, malloc, etc.)Reclamation of unused memory (garbage collection)Sorting

Page 8: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Institut für Informatik, University of Göttingen 8

How do programming languages abstract computation?

4) Provide features that encourage or enforce a particular style of algorithm or software development

Object Oriented ProgrammingClassesInheritancePolymorphism

Structured programmingSubroutinesNested variable scopesLoopsRestricted forms of the “goto” statement

Page 9: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Institut für Informatik, University of Göttingen 9

Categories of programming languages

All languages fall into one of the following two categories:

Imperative languages require programmers to describe, step-by-step, how an algorithm should do its work

Real world analogy » A recipe is a type of imperative program that tells a cook how to prepare a dish

Declarative languages allow programmers to describe what an algorithm should do without having to specify exactly how it should do it

Real world analogy » The recent Pfand law is a declarative program that tells retailers that they must institute a recycling program for the einweg Flasche and Dosen that they sell, without telling them exactly how to do it

Page 10: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Institut für Informatik, University of Göttingen 10

Imperative LanguagesThe von Neumann languages

Include Fortran, Pascal, Basic, CAre a reflection of the von Neumann computer architectures on which their programs runExecute statements which alter the program’s state (variables/memory)

Sometimes called computing by side effectsExample: summing the first n integers in C

for(sum=0,i=1;i<=n;i++) { sum += i; }

Page 11: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Institut für Informatik, University of Göttingen 11

Imperative LanguagesThe object oriented languages

Include Smalltalk, Eiffel, C++, JavaAre similar to the von Neumann languages with the addition of objectsObjects:

Contain their own internal state (member variables) and functions which operate on that state (methods)Computation is organized as interactions between objects (one object calling another’s methods)

Most object oriented languages provide facilities based on objects that encourage object oriented programming

Encapsulation, inheritance, and polymorphismWe’ll talk about these more in a future lecture

Page 12: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Institut für Informatik, University of Göttingen 12

Declarative LanguagesThe functional languages

Include Lisp/Scheme, ML, HaskellAre a reflection of Church’s theory of recursive functions (lambda calculus)Computation carried out as functions return values based on the (possibly recursive) evaluation of other functions

Mechanism known as reductionNo side effects!

Permits equational reasoning, easier formal proofs of program correctness, etc.

Example: summing the first n integers in SMLfun sum (n) = if n <= 1 then n else n + sum(n-1)

Page 13: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Institut für Informatik, University of Göttingen 13

Declarative LanguagesThe logic languages

Include Prolog, SQL, and Microsoft ExcelAre a reflection of the theory of propositional logicComputation is an attempt to find values which satisfy a set of logical relationships

Mechanism most commonly used to find these values is known as resolution and unification

Example: summing the first n integers in Prologsum(1,1). sum(N,S) :- N1 is N-1, sum(N1,S1), S is S1+N.

Page 14: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Institut für Informatik, University of Göttingen 14

A historical perspective:Machine languages

First machines programmed directly in machine language or machine codeTedious, but machine time more expensive than programmer time

MIPS machine code for a program that computes the GCD of two integers

Page 15: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Institut für Informatik, University of Göttingen 15

A historical perspective:Assembly languages

Programs became more complexToo difficult, time-consuming, and expensive to write them in machine code

Assembly languages developedHuman readableOriginally provided a one-to-one correspondence between machine language instructions and assembly language instructionsEventually “macro” facilities were added to further speed software development by providing a primitive form of code-reuseThe assembler was the program which translated an assembly language program into machine code which the machine could run

Assembler

Page 16: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Institut für Informatik, University of Göttingen 16

A historical perspective:High-level languages

Programs became even more complexToo difficult, time-consuming, and expensive to write in assembly languageToo difficult to move from one machine to another with a different assembly language

High-level languages developedMid 1950’s Fortran was designed and implemented

Allowed numerical computations to be expressed in a form similar to mathematical formulaeThe compiler was the program which translated a high-level program into an assembly language or machine language program

Originally, good programmers could write faster assembly language programs than the compilerOther high-level languages followed Fortran in the late 50’s and early 60’s

Lisp: first functional language, based on recursive function theoryAlgol: first block-structured language

int gcd (int i, int j) {while (i != j) {

if (i > j)i = i – j;

elsej = j – i;

}printf(“%d\n”,i);

}

Compiler

Page 17: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Institut für Informatik, University of Göttingen 17

Running high-level language programs

CompilationProgram is translated into assembly language or directly into machine languageCompiled programs can be made to run relatively fastFortran, C, C++, …

InterpretationProgram is read by another program and executed one source language element at a timeSlower than compiled programsInterpreters are (usually) easier to implement than compilers, are more flexible, and can provide excellent debugging and diagnostic supportJava, Python, Perl,…

Page 18: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Institut für Informatik, University of Göttingen 18

Designing a CompilerCompilers are well-studied, but very complex programsComplexity is managed by dividing the compilers work into independent stages or phasesTypically a phase analyzes a representation of a program, and translates that representation into another that is more appropriate for the next phaseDesign of these intermediate program representations are critical to the successful implementation of a compiler

Page 19: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Institut für Informatik, University of Göttingen 19

The Compilation Process(Phases)

Scanner (lexical analysis)

Parser (syntax analysis)

Semantic analysis

Intermediate code generation

Optimization

Target code generation

Machine-level optimization

Read program and convert character stream into tokens.

Read stream of tokens and generate a parse tree.

Traverse parse tree, checking non-syntactic rules.

Traverse parse tree again, emitting intermediate code.

Examine intermediate code, attempting to improve it.

Translate intermediate code to assembly / machine code.

Examine machine code, attempting to improve it.

Page 20: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Institut für Informatik, University of Göttingen 20

Lexical analysisProgram file is just a sequence of charactersWrong level of detail for syntax analysisLexical analysis groups sequence of characters into tokensTokens are the smallest “units of meaning” in the compilation process and are the foundation for parsing (syntax analysis)The compiler component that performs lexical analysis is called the scanner and is often automatically generated from a high-level specification

More on scanners in the next lecture…

Page 21: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Institut für Informatik, University of Göttingen 21

Lexical analysis example

int gcd (int i, int j) {

while (i != j) {if (i > j)

i = i – j;else

j = j – i;

}printf(“%d\n”,i);

}

int gcd (int i ,int j ) { while (i != j) { if ( i > j ) i= i –j ; elsej = j – i ;} printf (“%d\n” , I

) ; }

A GCD program in C Tokens

Page 22: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Institut für Informatik, University of Göttingen 22

Syntactic analysisLexical analysis generates a stream of tokensWrong level of detail for semantic analysis and code generationSyntax analysis groups a string of tokens into parse trees guided by the context-free grammar that specifies the syntax of the language being compiled

conditional -> if ( expr ) block else blockParse trees represent the phrase structure of the program and are the foundation for semantic analysis and code generationThe compiler component that performs syntax analysis is called the parser and is also often generated from a high-level specification

More on context-free grammars and parsing in upcoming lectures…

Page 23: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Institut für Informatik, University of Göttingen 23

Syntax analysis example

if ( i > j )i = i – j ;else j = j – i

; }

Tokensconditional

if ( expr ) block else block

id comp id

i j>

statement

id = expr

i id op id

i j-

statement

id = expr

j id op id

j i-

Page 24: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Institut für Informatik, University of Göttingen 24

Semantic analysisDetermines the meaning of a program based on the a parse tree representationEnforces rules that are not governed by the program language syntax

Consistent use of types, e.g.,int a; char s[10]; s = s + a; illegal!

Every identifier is declared before usedSubroutine calls provide the correct number and type of argumentsEtc.

Updates the symbol table, notating among other things, the types of variables, their sizes, and the scope in which they were declared

Page 25: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Institut für Informatik, University of Göttingen 25

Intermediate code generation

Parse trees are the wrong level of detail for optimization and target code generationIntermediate code generation transforms the parse tree into a sequence of intermediate language statements which embody the semantics of the source programThe intermediate language is just as powerful, but simpler than the high-level source language

E.g., the intermediate language may only have one type of looping construct (goto) whereas the source language may have several (for, while, do, etc.)

A simple intermediate language makes subsequent phases of the compiler easier to implement

Page 26: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Institut für Informatik, University of Göttingen 26

Target code generationUltimate goal of the compilation process is to generate a program which the computer can run

This is the job of the target code generatorStep 1: traverse the symbol table, assigning variables to locations in memoryStep 2: traverse the parse tree or intermediate language program, emitting loads and stores for variable references, along with arithmetic operations, comparisons, branches, and subroutine calls

Page 27: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Institut für Informatik, University of Göttingen 27

OptimizationIntermediate code and/or target code is typically not as efficient as it could be

Limitation allows code generators to focus on code generation, not on code improvement

An optimizer can be called to improve the quality of intermediate and/or target code after each of these phasesThe compiler component which improves the quality of generated code is called the optimizerOptimizers are the most complicated part of the compilerOptimization algorithms are often very elaborate, require substantial amounts of memory and time to run, and produce only small improvement in program size and/or run-time performanceTwo important optimizations:

Register allocation – deciding which program variables can reside in registers at a particular point in the program’s executionDead code elimination – removing functions, blocks, etc., which won’t be executed by the program

Page 28: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Institut für Informatik, University of Göttingen 28

So why study programming languages and compilers?

According to Michael Scott:Understand obscure language featuresChoose among alternative ways to express thingsMake good use of debuggers, assemblers, linkers, and related toolsSimulate useful features in languages that lack them

According to me:Compilers are large, complex programs: studying them helps you better understand “big software”Lots of programs contain “little programming languages”

Unix shells, Microsoft Office applications, etc.Its useful to know a bit about language design and implementation so that you can incorporate little languages into your own software

Page 29: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Institut für Informatik, University of Göttingen 29

Roadmap of remaining lectures

Next two lectures (Chapter 2 from text)Lexical analysisSyntactic analysisAutomata theory and automatic generation of scanners and parsers

Final five lecturesNames, scopes and binding (Chapter 3)Control flow (Chapter 6)Subroutines and control abstraction (Chapter 8)Building a runnable program (Chapter 9)Object oriented programming (Chapter 10)

Page 30: Informatik IIuser.informatik.uni-goettingen.de/~fu/teaching/... · Institut für Informatik, University of Göttingen 3 Abstractions… Eliminate detail unnecessary for solving a

Institut für Informatik, University of Göttingen 30

Programming language and compiler resources

Website for our texthttp://www.cs.rochester.edu/u/scott/pragmatics/

Catalog of compiler construction toolshttp://www.first.gmd.de/cogent/catalog/

Conferences and journalsACM Transactions on Programming Languages and SystemsACM SIGPLAN Conference on Programming Language Design and ImplementationACM SIGPLAN Conference on Programming Language Principles