profile-based dynamic optimization research for future computer systems

54
Utsunomiya University 1 November 12, 2004 Seminar@UW-Madison Profile-Based Dynamic Optimization Research for Future Computer Systems Takanobu Baba Department of Information Science Utsunomiya University, Japan http://aquila.is.utsunomiya-u.ac.jp November 12, 2004

Upload: norris

Post on 12-Jan-2016

31 views

Category:

Documents


4 download

DESCRIPTION

Profile-Based Dynamic Optimization Research for Future Computer Systems. Takanobu Baba Department of Information Science Utsunomiya University, Japan http://aquila.is.utsunomiya-u.ac.jp November 12, 2004. Brief history of ‘my’ research. 1970’s: The MPG System - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University1 November 12, 2004Seminar@UW-Madison

Profile-Based Dynamic Optimization Research for Future Computer Systems

Takanobu Baba

Department of Information Science

Utsunomiya University, Japan

http://aquila.is.utsunomiya-u.ac.jp

November 12, 2004

 

Page 2: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University2 November 12, 2004Seminar@UW-Madison

Brief history of ‘my’ research• 1970’s: The MPG System

A Machine-Independent Efficient Microprogram

Generator

• 1980’s: MUNAP

A Two-Level Microprogrammed Multiprocessor

Computer

• 1990’s: A-NET

A Language-Architecture Integrated Approach

for Parallel Object-Oriented Computation

Page 3: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University3 November 12, 2004Seminar@UW-Madison

A Two-Level Microprogrammed Multiprocessor Computer-MUNAP

A 28-bit vertical microinstruction activates up to 4 nanoprograms in 4 PU’s every machine cycle

MUNAP

Page 4: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University4 November 12, 2004Seminar@UW-Madison

A Parallel Object-Oriented Total Architecture A-NET(Actors-NETwork )

• Massively parallel computation•Each node consists of a PE and a router.•PE has the language-oriented, typical CISC architecture.•The programmable router is topology- independent.

A-NET Multicomputer

Page 5: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University5 November 12, 2004Seminar@UW-Madison

Current dynamic optimization projects• Computation-oriented:

– YAWARA: A meta-level optimizing computer system

– HAGANE: Binary-level multithreading

• Communication-oriented:– Spec-All: Aggressive Read/Write Access

Speculation Method for DSM Systems – Cross-Line: Adaptive Router Using Dynamic

Information

Page 6: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University6 November 12, 2004Seminar@UW-Madison

YAWARA: A Meta-Level Optimizing Computer System

Page 7: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University7 November 12, 2004Seminar@UW-Madison

Background• Moore’s Law will be maintained by the

semiconductor technology

• how can we utilize the huge amount of transistors for speedup of program execution?

           • our idea is to utilize some chip area for

dynamically and autonomously tuning the configuration of on-chip multiprocessor

Page 8: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University8 November 12, 2004Seminar@UW-Madison

Base-level processor

Memory

Instructions and data

Results of computation

Meta-level

Profile ofcontrol anddata

Meta-levelprocessor

Base-level processor

Memory

  Results of  optimization

Base-level

Page 9: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University9 November 12, 2004Seminar@UW-Madison

Design considerations

• HW vs. SW reconfiguration

→ SW reconfiguration

• Static vs. dynamic reconfiguration

   → both a static and dynamic reconfig. capability

• Homogeneous vs. heterogeneous architecture

   → unified homogeneous structure

Page 10: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University10 November 12, 2004Seminar@UW-Madison

MT: Management Thread, PT: Profiling Thread, OT: Optimizing Thread, CT: Computing Thread

MT

Meta-level

PT PTPTPTPT PT

OTOTOT

OT

OT

OT

Base-level

CTCT

CTCT

CTCTCT CT

CTManagement Thread

ProfilingApplication

Optimization

Memory

Basic concepts of thread-level reconfiguration

CT

CTCT

CT

OT CT

Page 11: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University11 November 12, 2004Seminar@UW-Madison

Execution modelManagement Thread

(MT)

Computing Thread(CT)

collect profile

Optimizing Thread(OT)collect profile

optimization initiate condition satisfied

sleep

wake up

activate

activate

Computing Thread(CT)

Profiling Thread(PT)

Profiling-centric

sleep

sleep

Profiling Thread(PT)

collect profile

optimization initiate condition satisfied

Computing-centric

Page 12: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University12 November 12, 2004Seminar@UW-Madison

MT OT

OT OT

OTOT

PT

PT PT

Meta-level Base-level

CT

CT

CT

CT

PT

PT

PT

CT

CTPT

PT

CT

CTPT

PT

OT

OTOT

OT PT

OT

OTOT

OTOT

CT

CT

MT OT

OT OT

CT

OTOT

PT

PT

PT CT

CT

CT

CT

CT

CT

CT

CT CT

CT CT

CT CT

CT CT

CT CTCT

CT

PT

PT

PT

PT

OT

OT

OT

MT OT

OT

CT

PT

CT CT

CTCT

PT CT

CT

CT

CT

CT

CT

CT CT

CT CT

CT CTCT

CT

PT

CT

CT CT

CT CT

CT

CT CT

CT CTCT

Change of configurations by meta-level optimization

Page 13: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University13 November 12, 2004Seminar@UW-Madison

The YAWARA System

• an implementation of the computation model

• the SW system consists of static and dynamic optimization systems

• the HW system includes uniformly structured thread engines (TE); each TE can execute base- and meta-level threads

spirit of YAWARA ・・・  "A flexible method prevails where a rigid one fails."

Page 14: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University14 November 12, 2004Seminar@UW-Madison

Source Code(C/C++,Java,Fortran,…)

SOS(Static Optimization System)

ExecutionProfile

DOS(Dynamic Optimization System)

Code Analysis

Info

Static feedback

Dynamic feedback

Run-timeProfile

Executable image

Execution Results

TE(Thread Engine)

Thread Engines

TE(Thread Engine)TE

(Thread Engine)TE(Thread Engine)

TE(Thread Engine)TE

(Thread Engine)TE(Thread Engine)TE

(Thread Engine)

TE(Thread Engine)TE

(Thread Engine)TE(Thread Engine)TE

(Thread Engine)

Software System

Page 15: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University15 November 12, 2004Seminar@UW-Madison

TE TE TE

TE TE TE TE

TE TE TE TE

TE TE TE TE

Hardware System

net-work

IN

net-workOUTregister

fileINT*4 + FP*1

thread-data

cache

D$

to/from network

I$

Thread Engine(TE)

profiling buffer

thread- code cache

thread-0thread

-1thread-2 thread

-N

executioncontrol

I$ D$

feedback-directed resource control

profiling controller

TE

Page 16: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University16 November 12, 2004Seminar@UW-Madison

Hot loop

Example application – compress –

911 1

213

21 2

8

9

1112

13

21

22

Hot path

hot loop / hot path detection (PT, OT)

・ speculative multithreading code generation・ helper threads generation・ path predictor generation            (OT)

・ management thread ( MT )

Hot path#0Speculative

thread

#0

#1

Speculative multithreading usingpath prediction mechanism

#1

hit

9

1112

8

19

109

11 1

13

14

15

17

18

20

23

24

25

19

16

21

22

10

14

15

17

18

20

19

16

11 1

13

23

24

25

21

22

12

14

15

17

18

20

23

16

21

22

109

1112

24

25

19

8

Phased behavior

・ speculative multithreading profiling (PT)

Base

Meta

miss #1⇒

i - 1

i i +1#0

#1hit #0

(CT)

Page 17: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University17 November 12, 2004Seminar@UW-Madison

Conclusion -YAWARA-

• we proposed an autonomous reconfiguration mechanism based on dynamic behavior

• we also proposed a software and hardware system, called YAWARA, that implements the reconfiguration efficiently

• we are now developing the software system and the simulator.

Page 18: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University18 November 12, 2004Seminar@UW-Madison

Prediction and Execution Methods of Frequently Executed Two Paths for

Speculative Multithreading

YAWARA@PDCS2004

Page 19: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University19 November 12, 2004Seminar@UW-Madison

Occurrence ratios of the top-two paths

54.5% 22.4%

48.2% 42.1%

97.0% 3.0%

80.7% 19.3%

compress/compress

ijpeg/forward_DCT

m88ksim/killtime

li/sweep

The top two paths occupy 80-100% of execution

#1 path #2 path other p aths

Page 20: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University20 November 12, 2004Seminar@UW-Madison

Two-level path prediction• Introducing two-level branch prediction

– history register keeps sequence of #1 path executions (1: #1, 0: the other paths)

– counter table counts #1 path executions

1101 v0v1

v13v14v15

:

history register

threshold: X

if v13 >= Xpredict #1

otherwisepredict #2

counter table

Single Path Predictor (SPP)Single Path Predictor (SPP)

Page 21: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University21 November 12, 2004Seminar@UW-Madison

Another path predictor

1101 v0v1

v13v14v15

:

#1 pathhistory register

if v13 >= v2predict #1

otherwisepredict #2

#1 pathcounter table

Dual Path Predictor (DPP)Dual Path Predictor (DPP)

0010 v0v1

:v14v15

v2

#2 pathhistory register #2 path

counter table

Page 22: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University22 November 12, 2004Seminar@UW-Madison

Single Speculation (SS)

#1

path

#1

path

continuespeculative execution

executenon-speculativethread

#1

path

recovery process

#1

path

When a thread fails …Abort succeeding threads

Recovery processNon-speculative execution

Speculation failure degrades performance

Continue speculative execution

Page 23: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University23 November 12, 2004Seminar@UW-Madison

Double Speculation (DS)

• Even when 1st speculation fails,secondary choice has high possibilitybecauseTop-Two Paths are Dominant.

54.5% 22.4%

48.2% 42.1%

97.0% 3.0%

80.7% 19.3%

compress/compress

ijpeg/forward_DCT

m88ksim/killtime

li/sweep

expected #2 hit = 49.2%

expected #2 hit = 81.3%

expected #2 hit = 100%expected #2 hit = 100%

Page 24: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University24 November 12, 2004Seminar@UW-Madison

Double Speculation (DS)

If secondary speculation succeeds,performance loss is not so large.

#1

path

#2

path

#1

path

#1

path

continuespeculative execution

#1

path

recovery process

#2

path

#1

path

secondaryspeculation

Page 25: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University25 November 12, 2004Seminar@UW-Madison

Evaluation flow

hot-path detection(SIMCA)

thread codes• #1 path speculative thread• #2 path speculative thread• non-speculative thread

performanceestimator

path historyacquisition (SIMCA)

thread-codegeneration

path execution history

speculation hit ratiospeed-up ratio

Page 26: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University26 November 12, 2004Seminar@UW-Madison

SPP DPP

SPP DPP

Prediction success ratio

history length

1 2 3 4 65 7 8 9 10 11 12 13 14 15 160

20

40

60

80

100

1 2 3 4 65 7 8 9 10 11 12 13 14 15 160

20

40

60

80

100

succ

. rat

io (%

)su

cc. r

atio

(%)

forward_DCTforward_DCT

compresscompress

Page 27: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University27 November 12, 2004Seminar@UW-Madison

SPP DPP

SPP DPP

Prediction success ratio

history length

1 2 3 4 65 7 8 9 10 11 12 13 14 15 160

20

40

60

80

100

1 2 3 4 65 7 8 9 10 11 12 13 14 15 160

20

40

60

80

100

succ

. rat

io (%

)su

cc. r

atio

(%)

sweepsweep

killtimekilltime

Page 28: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University28 November 12, 2004Seminar@UW-Madison

SS DS

SS DS

Speed-up ratio

history length

1 2 3 4 65 7 8 9 10 11 12 13 14 15 160

1.0

2.0

0

1.0

2.0

3.0

4.0

speed

-up

ra

tio

speed

-up

ra

tio

forward_DCTforward_DCT

compresscompress

S100

P1only

1 2 3 4 65 7 8 9 10 11 12 13 14 15 16S100

P1only

Page 29: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University29 November 12, 2004Seminar@UW-Madison

SS DS

SS DS

Speed-up ratio

history length

1 2 3 4 65 7 8 9 10 11 12 13 14 15 160

2.0

3.0

0

1.0

2.0

3.0

speed

-up

ra

tio

speed

-up

ra

tio

sweepsweep

killtimekilltime

S100

P1only

1 2 3 4 65 7 8 9 10 11 12 13 14 15 16S100

P1only

1.0

Page 30: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University30 November 12, 2004Seminar@UW-Madison

Conclusions- Two-Path-Limited Speculative Multithreading -

• We proposed

- path prediction method and predictors

- speculation methods

for path-based speculative multithreading

• Preliminary performance estimation results are shown

Page 31: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University31 November 12, 2004Seminar@UW-Madison

Current and future works

• Accurate and detailed evaluation for various applications

SPEC 2000, MediaBench, …

• Integration to our Dynamic Optimization

Framework YAWARA

Page 32: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University32 November 12, 2004Seminar@UW-Madison

Current dynamic optimization projects• Computation-oriented:

– YAWARA: A meta-level optimizing computer system

– HAGANE: Binary-level multithreading

• Communication-oriented:– Spec-All: Aggressive Read/Write Access

Speculation Method for DSM Systems – Cross-Line: Adaptive Router Using Dynamic

Information

Page 33: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University33 November 12, 2004Seminar@UW-Madison

HAGANE:Binary-Level Multithreading

Page 34: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University34 November 12, 2004Seminar@UW-Madison

Background

• Multithread programming is not so easy.

→ Automatic multithreading system

However…

• Source codes are not always available.

→ Multithreading at binary level

Page 35: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University35 November 12, 2004Seminar@UW-Madison

STO(Static Translator

& Optimizer)

Source Binary Code

Multithreaded Binary Code(statically translated)

Process Memory Image

DTO(Dynamic Translator

& Optimizer)

Multithread Processor

Multithreaded Binary Code(dynamically translated)

AnalysisInfo

Execution Profile Info

Binary Translator & Optimizer System

ExecutionProfile

Page 36: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University36 November 12, 2004Seminar@UW-Madison

Thread Pipelining Model

Continuation

TSAG

Computation

Write-back

Continuation

TSAG

Computation

Write-back

Continuation

TSAG

Computation

Write-backTSAG = Target Store Address Generation

Thread i

Thread i+1

Thread i+2

- Loop iterations are mapped onto threads

Page 37: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University37 November 12, 2004Seminar@UW-Madison

mtc1 $zero[0],$f4addu $v1[3],$zero[0],$zero[0]

bstrslti $v0[2],$v1[3],5000beq $v0[2],$zero[0],$ST_LL0addu $t0[8],$a0[4],$zero[0]addu $t1[9],$a1[5],$zero[0]addi $v1[3],$v1[3],1addi $a0[4],$a0[4],4addi $a1[5],$a1[5],4lfrkwtsagdaddu $t2[10],$sp[28],$zero[0]altsw $t2[10]tsagdl.s $f0,0($t0[8])l.s $f2,0($t1[9])l.s $f4,0($t2[10])mul.s $f0,f0,f2add.s $f4,$f4,$f0sttsw $t2[10],$f4

$ST_LL0:estrmov.s $f0,$f4jr $ra[31]

Example translation

Source Binary Code

Translated Code

・ Thread Management Instructions

・ Overhead code for multithreading

mtc1 $zero[0],$f4addu $v1[3],$zero[0],$zero[0]

$BB1:l.s $f0,0($a0[4])l.s $f2,0($a1[5])mul.s $f0,f0,f2addiu $v1[3],$v1[3],1add.s $f4,$f4,$f0slti $v0[2],$v1[3],5000addiu $a1[5],$a1[5],4addiu $a0[4],$a0[4],4bne $v0[2],$zero[0],$BB1

$BB2:mov.s $f0,$f4jr $ra[31]

Cont.

TSAG

Comp.

W.B.

Page 38: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University38 November 12, 2004Seminar@UW-Madison

L1 Data Cache

L1 Instruction Cache

ExecutionUnit

CommunicationUnit

MemoryBuffer

Write-BackUnit

Thread Processing UnitExecution

Unit

CommunicationUnit

MemoryBuffer

Write-BackUnit

Thread Processing Unit

● ● ●

Superthreaded Architecture

Page 39: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University39 November 12, 2004Seminar@UW-Madison

m88ksim (SPECint95)

0

1

2

3Sp

eedu

p R

atio

4 8 16

Number of Thread Units

No Unroll

Unroll 4

Unroll 8

Unroll 16

•poor speedup ratios•loop unrolling does not affect the performance •number of iterations is quite small.

Page 40: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University40 November 12, 2004Seminar@UW-Madison

ijpeg (SPECint95)

0

1

2

3

4

5

Spee

dup

Rat

io

4 8 16

Number of Thread Units

No Unroll

Unroll 4

Unroll 8

Unroll 16

•the thread code size is too small to hide the thread management overhead• loop unrolling is effective to achieve good speedup ratios• excessive loop unrolling causes performance degradation• number of iterations is not so large.

Page 41: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University41 November 12, 2004Seminar@UW-Madison

swim (SPECfp95)

0123456789

1011

Sp

eed

up

Rat

io

4 8 16

Number of Thread Units

No Unroll

Unroll 4

Unroll 8

Unroll 16

• good speedup ratios• loop unrolling is effective to achieve linear speedup• number of iterations is large.

Page 42: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University42 November 12, 2004Seminar@UW-Madison

Conclusion-HAGANE-

• We have evaluated the binary-level multithreading using some SPEC95 benchmark programs.

• The performance evaluation results indicate:– the thread code size should be large enough to

improve the performance.– loop unrolling is effective for the small loop body.– excessive loop unrolling degrades performance

Page 43: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University43 November 12, 2004Seminar@UW-Madison

A Methodology ofBinary-Level Variable Analysis

for Multithreading

HAGANE@PDCS2004

Page 44: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University44 November 12, 2004Seminar@UW-Madison

Background and Objective

Usually, loop-iterations are interrelated through memory variables, such as induction ones.

Binary-level variable analysis method is strongly required for binary-level multithreading.

However, it is difficult to analyze this kind of dependency at binary level.

Page 45: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University45 November 12, 2004Seminar@UW-Madison

for (i = 1; i < N; i++) {z = i * 2;x = a[i-1];

y = x * 3;a[i] = z + y;

}

Example Binary Codelw $a1[5], 16($s8[30])lw $v1[3], 16($s8[30])lw $a0[4], 16($s8[30])sll $v1[3], $v1[3], 0x2addu $v1[3], $v1[3], $a2[6]lw $v0[2], 16($s8[30])lw $v1[3], -4($v1[3])addiu $v0[2], $v0[2], 1sw $v0[2], 16($s8[30])lw $v0[2], 16($s8[30])sll $a1[5], $a1[5], 0x1sll $a0[4], $a0[4], 0x2sll $v0[2], $v1[3], 0x1addu $v0[2], $v0[2], $v1[3]lw $v1[3], 16($s8[30])addu $a0[4], $a0[4], $a2[6]addu $a1[5], $a1[5], $v0[2]sw $a1[5], 0($a0[4]) slt $v1[3], $v1[3], $a3[7]

Thread ji++

load a[i-1]

s tore a[i]

load a[i-1]

i++

s tore a[i]

Thread j+1

-4($v1[3])

0($a0[4])

Page 46: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University46 November 12, 2004Seminar@UW-Madison

Binary-Level Variable Analysis

(1) Register values are analyzed using data flow trees.

(2) When register values, used for memory references, are judged as the same, the memory location is regarded as a virtual register.

(3) Using the virtual registers, steps (1) and (2) are repeated.

Page 47: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University47 November 12, 2004Seminar@UW-Madison

$2#2+

$3#1+

$2#1

$5#1+

$4#0

$0 $0

Construction of Dataflow Tree

addiu $29#1, $29#0, -8

sw $0, 0($29#1)

addu $5#1, $0, $0

lw $2#1, 0($29#1)

addu $3#1, $5#1, $4#0

addiu $5#2, $5#1, 1

addu $2#2, $2#1, $3#1

sw $2#2, 0($29#1)

slti $2#3, $5#2, 100

bne $2#3, $0, L1

Page 48: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University48 November 12, 2004Seminar@UW-Madison

Example Normalization

$2#2+

$2#1sll

14

$7#1+

2

0 $4#0

$2#2+

$4#0 4

14*

Page 49: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University49 November 12, 2004Seminar@UW-Madison

Detection ofLoop Induction Variables

Loop induction variable is the register, which– has inter-iteration dependency, and– increases with a fixed value between iterations.

$V2#2+

1$V2#1

The concept of virtual register makes it possible to detect induction variables on memory.

Page 50: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University50 November 12, 2004Seminar@UW-Madison

Application

• 101.tomcatv of SPECfp95 Benchmark

• Fortran to C translator ver. 19940927

• GCC cross compiler ver 2.7.2.3 for SIMCA

• Data set: test

• The six most inner loops (#1-#6) are selected

• They have induction variables on memory

Page 51: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University51 November 12, 2004Seminar@UW-Madison

Speedup Ratios

9.804

1.643

5.178

1.800

3.5832.611

5.361

0

2

4

6

8

10

12

#1 #2 #3 #4 #5 #6 ALL

Loop

Spee

dup

rati

o

Page 52: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University52 November 12, 2004Seminar@UW-Madison

Conclusion -Binary-Level Variable Analysis-

• We proposed a binary-level variable analysis method.

• This method makes it possible to detect induction variables and the increment/decrement values.

• The detected information allows us to multithread binary codes; they may not be multithreaded without our algorithm.

• We attained up to 9.8 speedup by the multithreading.

Page 53: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University53 November 12, 2004Seminar@UW-Madison

Summary

• Dynamic optimization projects at our laboratory

• The results show the performance improvement quantitatively in each project

Page 54: Profile-Based Dynamic Optimization Research  for Future Computer Systems

Utsunomiya University54 November 12, 2004Seminar@UW-Madison

What’s the next step of computer architecture research?

• from performance to reliability? or low power?

e.g. dependable computing• architecture for new device technologies?

e.g. quantum computing

However….

if we stick to conventional high-performance computing research,

what’s the promising way?