the lance v2.0 c compiler system rainer leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146...

35
The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 [email protected] University of Dortmund, Informatik 12 44221 Dortmund, Germany fax: +49 (231) 755 6116 http://ls12-www.cs.uni-dortmund

Upload: thomas-mathews

Post on 12-Jan-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

The LANCE V2.0

C compiler system

Rainer Leupersphone: +49 (231) 755 6151mobile: +49 (177) 2131146

[email protected]

University of Dortmund, Informatik 1244221 Dortmund, Germany

fax: +49 (231) 755 6116http://ls12-www.cs.uni-dortmund

Page 2: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

Overview

Functionality of LANCE Software structure C frontend Intermediate representation (IR) IR optimizations Control and data flow analysis Backend interface

Page 3: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

The LANCE V2.0 compiler system

Tasks covered by LANCE: Source code analysis Generation of IR Machine-independent optimizations Data flow graph generation

Tasks not covered by LANCE: Assembly code generation (backend) Machine-specific optimizations Code assembly and linking

Purpose of LANCE: Facilitate C compiler development for new target processors Give insight into compiler structure

Page 4: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

Key features

Full ANSI C coverage (C 89)

Modular tool and library structure

Simple three address code IR (C subset)

Plug & play IR optimizations

Backend interface compatible to OLIVE

Proven in numerous compiler projects

Page 5: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

LANCE software structure

lance2.h

header file

liblance2.a

C++ library

C frontend

common IR

IR optimization 1

IR optimization n

machine-specificbackend

LANCE library LANCE tools

used by

Page 6: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

ANSI C frontend

Functionality: Lexical, syntactical, and semantical analysis of C source Generation of three address code IR for a C file Emission of error messages if required (gcc style) Machine-specific constants (type bitwidth, alignment) stored in a configuration file

Implementation: Based on a context-free C grammar, according to K&R spec C source automatically generated with attribute grammar compiling system (OX, extension of lex & yacc) In total approx. 26,000 lines of C source code Validated with comprehensive test suite

Page 7: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

Setup and IR generation

file test.c

file test.ir.c

>compile test.c

config.sparc

Environment variables: setenv LANCE2_CPP „gcc –E“ setenv LANCE2_CONFIG „config.sparc“

Call C frontend by „compile“ command:

Page 8: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

General IR format

One IR file (*.ir.c) generated for each C source file (*.c)

External IR format: C subset (compilable !)

Internal IR format: Accessible via LANCE library IR contains a symbol table + three address code (3AC) for each C function defined in the source code

3AC is a sequence of IR statements

3AC = at most two operands, one result per statement

IR statements (mostly) consist of IR expressions blocks of 3AC augmented with source information (C code, source line no.) for debugging purposes

Page 9: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

Classes of IR statements

Assignment: a = b + c; *p = !a; x = f(y,z); cond = *x;

Jump:goto lab;

Conditional jump:if (cond) goto lab;

Label:lab:

Return void:return;

Return value:return x;

Page 10: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

Classes of IR expressions

Symbol: „a“, „b“, „main“, „count“, ...

Binary expression: a * b, x / 2, 3 ^ v, f &4, q % r, ...

Unary expression: !a, *p, ~x, -z, ...

Function call: f1(), f2(a,b), f3(*x, 1, y), ...

Type cast: (char)z, (int)a, (float*)b, ...

String constant: „compiler“, „design“, „is“, „fun“, ...

Integer constant: 1000, 3456, -234, -112, ...

Float constant: „3.1415926536“, „2.718281828459“, ...

Page 11: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

Why is the LANCE IR a C subset ?

C source frontend IR-C source

CC CCexe 2exe 1 test input

output 1 output 2= ?

Validation of frontend (or any IR optimization):

C-to-C optimization:IR optimization

toolsoptimizedC sourceCC

Page 12: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

IR data structure overview

GLOBAL SYMBOL TABLEint x1,x2,x3; double y1,y2,y3; ........

fun 1„name1“

Local symbol tableint a,b,c; ...

stm 1 stm 2 stm m

fun n„name n“

Class: assignmentID: 4123Left hand side: *pRight hand side: a + b

Class: cond. jumpID: 4124Target: „L1“Condition: c

..........

...

Class: binaryID: 10034Left arg: aRight arg: bOper: +Type: int

exp info

IR statement listfunction list

stm info

IR expression

Page 13: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

The IR type class

C++ class IRType stores type info for all symbols and expressions Primary type: void, char, short, int, array, pointer, struct, function, ... Secondary type: subtype of arrays and pointers Storage class: extern, static, register, ... Qualifiers: const, volatile Example: const int* A[100];

Type->Class() = IRTYPE_ARRAY // primary type Type->IsConst() = true Type->Subtype()->Class() = IRTYPE_POINTER Type->Subtype()->Subtype()->Class() = IRTYPE_INT Type->ArrayDim() = 100 Type->SizeOf() = 400 // in bytes, for 32-bit pointers

Type->MemoryWords() = 200 // for a 16-bit word memory

Page 14: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

The symbol table class

Symbol table stores all relevant information for symbols/identifiers Two hierarchy levels:

Global symbol table IR->GlobalSymbolTable() One local symbol table per function fun->LocalSymbolTable()

All local symbols get a unique numerical suffix, e.g.int f(int x) { int a,b; } int f(int x_1) { int a_2, b_3; }

Important access methods: ST->LookupSymbol(char* name) IRSymbol* ST->CreateSymbol(IRType* tp) Iterators: ST->FirstObject(), ST->NextObject()

Information stored in a table entry (class IRSymbol): Symbol type: IRType* sym->Type() Symbol name: char* sym->Name()

Page 15: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

IR generation example

source fileIR file

forward declaration

automatic conversion

auxiliary vars

debug info

suffix 3 for parameter i

Page 16: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

IR optimization tools

Purpose: perform machine-independent optimizations on IR Identical IR format for all tools, „plug & play“ concept Currently available tools:

Constant folding cfold tool Constant propagation constprop tool Copy propagation copyprop tool Common subexpression elimination cse tool Dead code elimination dce tool Jump optimization jmpopt tool Loop invariant code motion licm tool Induction variable elimination ive tool

Automatic iteration of IR optimizations via „iropt“ shell script

Page 17: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

IR optimization example

compile

C source code

unoptimized IR

Page 18: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

Constant folding

cfold

Page 19: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

Constant propagation

constprop

Page 20: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

Copy propagation

copyprop

Page 21: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

Common subexpression elimination

cse

Page 22: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

Dead code elimination

dce

Page 23: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

Jump optimization

jmpopt

Page 24: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

Loop invariant code motion

licm

Page 25: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

Induction variable elimination

ive

Page 26: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

Control flow analysis

Purpose: identify basic block structure of a C function Basic block (BB): IR statement sequence with unique entry and exit points Control flow graph (CFG): One node per BB, edge (BB1, BB2) iff BB2 may be an immediate successor of BB1 during execution Assembly code generation usually done BB after BB Example:

while (x){ BB1; if (x) then BB2; else BB3; BB4;}

BB1

BB2 BB3

BB4

Page 27: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

CFG generation by LANCE

Class ControlFlowGraph contained in LANCE library Constructor ControlFlowGraph(Function* fun) generates CFG for any function fun LANCE tool showcfg exports CFGs in the VCG text format VCG can be used to visualize generated CFGs

showcfg xvcg

IR file VCG file CFG

Page 28: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

CFG visualization example

showcfg +VCG tool

Page 29: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

Data flow analysis

Goal: convert IR into data flow graph (DFG) representation for assembly code generation by tree pattern matching Performed by def/use analysis between IR statements/expressions LANCE lib class DataFlowAnalysis provides required methods Constructor DataFlowAnalysis(Function* fun) constructs data flow information for any function fun Example:

x = 5; goto lab; ... x = 6;lab: y = x + 1; ... z = 1 – y; u = y / 5;

x has two definitions: x and xy has two uses: y and y

Page 30: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

DFG visualization example

showdfg +VCG tool

Page 31: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

Backend interface

a b

*

+ +

2c

x y

a b

*

+ +

2c

x y

t

t t

CSE

auxiliaryvariable

LANCE lib classes LANCEDataFlowTree and DFTManager provide link between LANCE IR and tree pattern matching OLIVE/IBURG accept only trees instead of general DFGs Hence: split DFGs at the common subexpressions (CSEs)

Page 32: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

Data structure overview

Constructor DFTManager(Function* fun) generates data flow tree (DFT) representation for an entire function fun DFTManager contains internal list of basic blocks Each BB in turn is a list of DFTs

BB 1 DFT 1 DFT 2 DFT m

BB n

..........

...

BB 2

Page 33: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

DFT covering with OLIVE

DFTs are directly in the format required by code generators produced by OLIVE All DFTs consist of a fixed set of terminal symbols (e.g. cs_STORE) (specified in file INCL/termlist.c) Example (only a single DFT):

C file

IR fileDFT representation

Page 34: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

Example (cont.)

simplifiedOLIVE spec

DFT in OLIVE format

assemblycode for

hypotheticalmachine

Page 35: The LANCE V2.0 C compiler system Rainer Leupers phone: +49 (231) 755 6151 mobile: +49 (177) 2131146 leupers@icd.de University of Dortmund, Informatik 12

© 2000, R. Leupers

Summary

LANCE provides you with ... C frontend IR optimizations C++ library for IR access (+ important basic classes) interface to OLIVE data flow trees

Full C compiler additionally requires ... OLIVE based backend for the concrete target machine target-specific optimizations (e.g. scheduling, address gen.)