u niversity of d elaware c omputer & i nformation s ciences d epartment optimizing compilers...

30
UNIVERSITY NIVERSITY OF OF D DELAWARE ELAWARE C COMPUTER & OMPUTER & INFORMATION NFORMATION SCIENCES CIENCES DEPARTMENT EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University of Delaware

Upload: quentin-anthony

Post on 26-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Optimizing CompilersCISC 673

Spring 2009Dynamic Compilation II

John CavazosUniversity of Delaware

Page 2: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

What is in a Dynamic Compiler?

Interpretation Popular approach for high-level languages

Ex, Python, APL, SNOBOL, BCPL, Perl, MATLAB Useful for memory-challenged

environments Low startup time & space overhead, but

much slower than native code execution MMI (Mixed Mode Interpreter)

[Suganauma’01] Fast interpreter implemented in assembler

Page 3: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

What is in a Dynamic Compiler?

Quick compilation Reduced set of optimizations for fast

compilation, little inlining Full compilation

Full optimizations only for selected hot methods Classic just-in-time compilation

Compile methods to native code on first invocation

Ex, ParcPlace Smalltalk-80, Self-91 Initial high (time & space) overhead for each

compilation Precludes use of sophisticated optimizations (eg. SSA)

Responsible for many of today’s myths

Page 4: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Interpretation vs JIT

0

20

40

60

80

100

120

Intepreter Compiler

Initial Overhead Execution

0

500

1000

1500

2000

2500

Intepreter Compiler

Execution: 20 time units Execution: 2000 time units

Page 5: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Selective Optimization

Hypothesis: most execution is spent in a small percentage of methods

Idea: use two execution strategies1. Interpreter or non-optimizing compiler2. Full-fledged optimizing compiler

Strategy: Use option 1 for initial execution of all

methods Profile to find “hot” subset of methods Use option 2 on this subset

Page 6: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Selective Optimization

0

20

40

60

80

100

120

Intepreter Compiler Selective

Initial Overhead Execution

0

500

1000

1500

2000

2500

Intepreter Compiler Selective

Initial Overhead Execution

Selective opt: compiles 20% of methods, representing 99% of execution time

Execution: 20 time units Execution: 2000 time units

Page 7: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Designing an Adaptive Optimization System What is the system architecture?

What are the profiling mechanisms and policies for driving recompilation? How effective are these systems?

Page 8: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Basic Structure of a Dynamic Compiler

ProgramMachine

code

Structural inlining

unrollingloop perm

Scalar cse

constantsexpressions

Memory scalar repl

ptrs

Reg. Alloc

Scheduling peephole

Still needs good core compiler - but more

Page 9: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Raw Profile Data

Instrumented code

Basic Structure of a Dynamic Compiler

Compiler subsystem

Optimizations

Interpreter or Simple Translation

Program Executing Program

Profile Processor

History

prior decisionscompile time

ControllerCompilation

decisions

Processed Profile

Page 10: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Method Profiling

Counters Call Stack Sampling Combinations

Page 11: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Method Profiling: Counters Insert method-specific counter on method entry and loop

back edges Counts how often a method is called and approximates how

much time is spent in a method Very popular approach: Self, HotSpot Issues: overhead for incrementing counter can be

significant Not present in optimized code

Page 12: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Method Profiling: Counters

foo ( … ) { fooCounter++; if (fooCounter > Threshold) { recompile( … ); } . . .

}

Page 13: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Method Profiling: Call Stack Sampling

Periodically record which method(s) are on call stack

Approximates amount of time spent in each method

Can be compiled into the code Jikes RVM, JRocket

or use hardware sampling Issues: timer-based sampling is not

deterministic

Page 14: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Method Profiling: Call Stack Sampling

ABC

AB

A AB

ABC

ABC

......

Sample

Page 15: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Method Profiling Mixed Combinations

Use counters initially and sampling later on IBM DK for Java

foo ( … ) { fooCounter++; if (fooCounter > Threshold) { recompile( … ); } . . . }

ABC

Page 16: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Method Profiling Mixed Software Hardware Combination

Use interupts & sampling

foo ( … ) { if (flag is set) { sample( … ); } . . . }

ABC

Page 17: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Recompilation Policies: Which Candidates to Optimize?

Problem: given optimization candidates, which should be optimized?

Counters: 1. Optimize method that surpasses threshold

Simple, but hard to tune, doesn’t consider context2. Optimize method on the call stack based on inlining

policies Addresses context issue

Call Stack Sampling: 1. Optimize all methods that are sampled

− Simple, but doesn’t consider frequency of sampled methods2. Use Cost/benefit model

Seemingly complicated, but easy to engineer Maintenance free Naturally supports multiple optimization levels

Page 18: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Jikes RVM: Recompilation Policy – Cost/Benefit Model Define

cur, current opt level for method m Exe(j), expected future execution time at level j Comp(j), compilation cost at opt level j

Choose j > cur that minimizes Exe(j) + Comp(j)

If Exe(j) + Comp(j) < Exe(cur) recompile at level j Assumptions

Sample data determines how long a method has executed Method will execute as much in the future as it has in the

past Compilation cost and speedup are offline averages

Page 19: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Startup Programs: Jikes RVM [Hind et al.’04]

0

1

2

3

4

5

db/10jack/10

ipsixql/short

jess/10

jbb/12000

mtrt/10javac10

xerces/short

mpeg/10

compress/10daikon/shortsoot/shortjack/100

xerces/longjavac/100

jess/100mrtr/100db/100

ipsixql/longsoot/long

jbb/200000compres/100mpeg/100 daikon/long

Geom

Speedup over Baseline

JIT 0 JIT 1 JIT 2

No FDO, Mar’04, AIX/PPC

Page 20: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Startup Programs: Jikes RVM

0

1

2

3

4

5

db/10jack/10

ipsixql/short

jess/10

jbb/12000

mtrt/10javac10

xerces/short

mpeg/10

compress/10daikon/shortsoot/shortjack/100

xerces/longjavac/100

jess/100mrtr/100db/100

ipsixql/longsoot/long

jbb/200000compres/100mpeg/100 daikon/long

Geom

Speedup over Baseline

JIT 0 JIT 1 JIT 2 Model

No FDO, Mar’04, AIX/PPC

Page 21: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Steady State: Jikes RVM

0

1

2

3

4

5

6

7

jbb-300ipsixqlcompress

jessdb

javac

mpegaudio

mtrt jack

Geomean

Speedup over Baseline

JIT 0 JIT 1 JIT 2

No FDO, Mar’04, AIX/PPC

Page 22: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Steady State: Jikes RVM

0

1

2

3

4

5

6

7

jbb-300ipsixqlcompress

jessdb

javac

mpegaudio

mtrt jack

Geomean

Speedup over Baseline

JIT 0 JIT 1 JIT 2 Model

Page 23: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Feedback-Directed Optimization (FDO)

Exploit information gathered at run-time to optimize execution “selective optimization”: what to

optimize “FDO” : how to optimize

Page 24: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Advantages of FDO Can exploit dynamic information

that cannot be inferred statically

System can change and revert decisions when conditions change

Runtime binding allows more flexible systems

Page 25: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Challenges for automatic online FDO

Compensate for profiling overhead

Compensate for runtime transformation overhead

Account for partial profile available and changing conditions

Page 26: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Profiling for What to Do

Clients Inlining, unrolling, method dispatch

Dispatch tables, synchronization services, GC

Pretching Misses, Hardware performance

monitors [Adl-Tabatabai et al.’04] Code layout

values - loop counts edges & paths

Page 27: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Profiling for What to Do

Myth: Sophisticated profiling is too expensive to perform online

Reality: Well-known technology can collect sophisticated profiles with sampling and minimal overhead

Page 28: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Method Profiling Timer Based

class Thread scheduler (...) { ... flag = 1;}void handler(...) { // sample stack, perform GC, swap threads, etc. .... flag = 0;}

foo ( … ) { // on method entry, exit, & all loop backedges if (flag) { handler( … ); } . . . }

ABC

Useful for more than profiling Jikes RVM

Schedule garbage collection Thread scheduling policies, etc.

if (flag) handler();

if (flag) handler();

if (flag) handler();

Page 29: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Arnold-Ryder [PLDI 01]: Full Duplication Profiling

Full-Duplication Framework

Duplicated CodeChecking Code

Method Entry

Checks

EntryBackedges

CheckPlacement

Generate two copies of a method• Execute “fast path” most of the time• Execute “slow path” with detailed profiling occassionally• Adapted by J9 due to proven accuracy and low overhead

Page 30: U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Suggested ReadingDynamic Compilation

Adaptive optimization in the Jalapeno JVM, M. Arnold, S. Fink, D. Grove, M. Hind, and P. Sweeney, Proceedings of the 2000 ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages & Applications (OOPSLA '00), pages 47--65, Oct. 2000.