michael j. voss and rudolf eigenmann ppopp, ‘01 (presented by kanad sinha)

21
Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)

Upload: joel-hancock

Post on 20-Jan-2018

217 views

Category:

Documents


0 download

DESCRIPTION

There’s only so much optimization that can be performed at compile-time.  Have to generate code for generic system models – make compile-time assumptions that may be sensitive to input, unknown till runtime.  Convergence of technologies – difficult to generate common binary to exploit individual system characteristics.

TRANSCRIPT

Page 1: Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)

Michael J. Voss and Rudolf EigenmannPPoPP, ‘01

(Presented by Kanad Sinha)

Page 2: Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)

Motivation General choices for adaptive

optimization ADAPT

The ArchitectureThe LanguageAn example

Results

Page 3: Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)

There’s only so much optimization that can be performed at compile-time.

Have to generate code for generic system models – make compile-time assumptions that may be sensitive to input, unknown till runtime.

Convergence of technologies – difficult to generate common binary to exploit individual system characteristics.

Page 4: Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)

Possible solution?

“Use of adaptive and dynamic optimization paradigms, where optimization is performed at runtime when complete system and input knowledge is available.”

Page 5: Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)

Choose from statically generated code-variants+ Easy- May not result in max possible optimization- Can result in code explosion

Parameterization+ Single copy of source- May still not result in max possible optimization

Dynamic compilation+ Complete input and system knowledge – max optimization possible- Considerable runtime overhead

Page 6: Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)

Automated De-Coupled Adaptive Program Optimization

Generic framework, which leverages existing tools

Uses a domain-specific language, AL, by which adaptive techniques can be specified

Page 7: Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)

Supports dynamic compilation and parameterization

Enables optimizations through “runtime sampling”

Facilitates an iterative modification and search approach

Page 8: Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)

3 functions of a dynamic/adaptive optimization system

Evaluate effectiveness of particular optimization for current input & system information

Apply optimization if profitable

Re-evaluate applied optimizations and tune according current runtime conditions

Page 9: Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)
Page 10: Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)

Runtime system consists of:Modified version of applicationRemote optimizer

has source code description of target machine stand-alone tools & compilers

Local optimizer agent of remote-optimizer on

system detects hot-spots tracks multiple interval

contexts (here, loop bounds) runs in separate thread

Optimization and execution truly asynchronous

Page 11: Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)

LO invokes RO, when hotspot detected

RO tunes the interval using available tools, according to user-specified heuristics

RPC returns

If new code available, dynamically link to application as the new best/experimental version, depending on RO’s message

Page 12: Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)
Page 13: Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)

Candidate code sections have 2 control flow paths through best known version through experimental versionEach of these can be replaced

dynamically

Flag indicates which version to execute

Monitor experimental versions of each context collected data used as

feedback if better, swap with best known

version

Page 14: Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)

Optimization process outside critical path/decoupled from execution

Page 15: Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)

ADAPT Language (AL) *

Features:Uses an LL1 grammar => simple parserDomain specific language with C-style formatDefines reserved words that at runtime contain

useful input data and system information

* “A full description of ADAPT language is beyond the scope of this paper”, and by extension, this presentation.

Page 16: Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)
Page 17: Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)

Initialize some variables Constraints Interface to tool to be

used This block defines the

heuristic

Page 18: Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)

Statement Description

constraint(compile-time constraint)

Supplies a compile-time constraint

apply_spec(condition,type, syntax[,params])

A description of a tool or flag 

collect (event list) execute;

Initiates the monitoring of an experimental code version

mark_as_best Specifies that the code variant that would be generated under the current runtime conditions is a new best known version

end_phase Denotes the end of an optimization phase

Page 19: Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)

Test Machines: 6 core Sun ULTRA Enterprise 4000, single-core Pentium II Linux workstation

Experiment Result

Useless Copying - Run a dynamically compiled version of code without applying any optimization

• Less than ~5%• Some cases show a speed-up!

Specialization – Loop bounds replaced as constants by their runtime value.

Average improvement: •E4000: 13.6%•Pentium: 2.2%

Flag Selection – Experiment with various combinations of compiler flags

Average improvement: •E4000: 35%•Pentium: 9.2%Identified some non-intuitive choices

Loop Unrolling – Loop unrolled by factors that evenly divide no. of iterations of innermost loop to a maximum factor of 10.

Average improvement: •E4000: 18%•Pentium: 5%

Loop Tiling – Loops deemed appropriate tiled for ½, ¼, .., 1 /16 of L2 cache size

Average improvement: •E4000: 13.5%•Pentium: 9.8%

Parallelization – Loops deemed appropriate by Polaris parallelized

Average improvement: •E4000: 51.8%

Page 20: Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)

There’s advantage in doing runtime optimization

Can be applied to general-purpose programs as well

For full-blown runtime optimization, need to move optimization process outside the critical path

Page 21: Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)

if (questions(“?!”) == 1)

delay();

THANK_YOU(“Have a great

weekend!”);