performance optimization for low-leakage caches based on sleep-line access density

24
1 26 March 2006 ODES-4 Performance Optimization for Low-Leakage Caches based on Sleep-Line Access Density Reiko Komiya , Koji Inoue and Kazuaki Murakami Fukuoka University, Japan Kyushu University, Japan

Upload: iona-herrera

Post on 01-Jan-2016

31 views

Category:

Documents


1 download

DESCRIPTION

Performance Optimization for Low-Leakage Caches based on Sleep-Line Access Density. Reiko Komiya † , Koji Inoue ‡ and Kazuaki Murakami ‡ † Fukuoka University, Japan ‡ Kyushu University, Japan. Outline. Introduction Leakage energy of cache memory - PowerPoint PPT Presentation

TRANSCRIPT

126 March 2006 ODES-4

Performance Optimization for Low-Leakage Caches based on

Sleep-Line Access Density

Reiko Komiya †, Koji Inoue ‡

and Kazuaki Murakami ‡

†Fukuoka University, Japan‡ Kyushu University, Japan

226 March 2006 ODES-4

Outline

• Introduction– Leakage energy of cache memory– Conventional low leakage cache : Cache decay

• Problem of cache decay approach

• Solution: Always-Active approach

• Evaluation

• Conclusions

326 March 2006 ODES-4

Introduction

Dynamic Pwr

Static Pwr

The breakdown of energy consumptionin a processor family * 1

Cache leakage reduction is very important!!Cache leakage reduction is very important!!

Energy consumption = Dynamic energy + Static energy

Leakage energy increases withthe progress of process technology

consumed by charging & discharging by leakage current

*1 Fred Pollack (Intel Fellow): New Microarchitecture Challenges in the Coming Generations of CMOS Process Technologies [Micro32] *2 Simon Segars, “Low Power Design Techniques for Microprocessors,” ISSCC2001

Cache energy is44%

Power Analysis of ARM920T

426 March 2006 ODES-4

Conventional Low-Leakage Cache

Sleep mode(destroy the data to reduce leakage)

Conventional low-leakage cache:Cache decay

Conventional cachedoesn’t support any leakage

reduction technique

activemode

( high-leakage )

sleepmode

( low-leakage )no-access time decay itnerval≧

access ( miss )

initial state

Active mode (high-leakage to preserve the data)

Sleep-miss(degrades processor performance)

The mode of each line transits based on this state transition diagram

526 March 2006 ODES-4

0.50

1.00

1.50

2.00

2.50

3.00

3.5017

7.m

esa

179.

art

183.

equa

ke

188.

amm

p

164.

gzip

175.

vpr

176.

gcc

181.

mcf

197.

pars

er

256.

bzip

2

Ave

rage

Benchmark programs

Nor

mal

ized

exe

cuti

on ti

me 11.7

0.50

1.00

1.50

2.00

2.50

3.00

3.50

Norm

alized DL

1 misses

Performance Impact of Sleep-misses

Many sleep-misses causelarge performance degradation!

626 March 2006 ODES-4

Our Goal

High-performance, low-leakage cache!

• Problem of conventional low-leakage cache– Performance degradation caused by sleep-

misses

• Our approach– To improve performance, reduce sleep-misses– Prohibit some cache lines from going to sleep

mode

726 March 2006 ODES-4

Analysis of Sleep-misses

• Sleep-Miss Density (SMD):shows amount of sleep-misses in each line

SMDi =the number of sleep-misses at the cache line i

the average number of sleep-misses for all cache lines

The number of sleep-misses at each cache line

• Example

6 5 1

2 4 1

60 1 10

•The total number of sleep-misses: 90•The number of lines: 9 ⇒ The average number of sleep-misses : 10

SMD6=6SMD7=0.1

SMD8=1

Cache lines which often cause sleep-misses havehigh SMD !

826 March 2006 ODES-4

Characteristics of Sleep-misses4 ≦ SMD2 ≦ SMD < 41 ≦ SMD < 2SMD < 1

0%

10%20%

30%40%

50%

60%70%

80%90%

100%

f179

.art

f183

.equ

ake

i164

.gzip

Ave

rage

0%

10%20%

30%40%

50%

60%70%

80%90%

100%

f179

.art

f183

.equ

ake

i164

.gzip

Ave

rage

The breakdown of cache linesin terms of SMD

The breakdown of sleep-missesin terms of SMD

Bre

akd

own

of l

ine

s

Bre

akd

own

of s

lee

p-m

iss

A small number of high SMD linesoften produce sleep-misses

3.1% of lines cause 94.4% of sleep-misses

926 March 2006 ODES-4

Always-Active Approach

• Support “Always-Active mode (AA mode)”

• AA mode prohibits the corresponding line from going to sleep mode

• Cache lines which cause frequently sleep-misses should operate in AA mode

• Such lines are called “Always-Active lines (AA lines)”

1026 March 2006 ODES-4initial state

How to Decide AA Lines

A line which causes frequently sleep-misses ⇒ AA line

6 5 1

2 4 1

60 1 10

The number of sleep-misses at each cache lineSMD at each cache line

0.6 0.5 0.1

0.2 0.4 0.1

6 0.1 1

SMD > ThresholdSMD ≦ Threshold

activemode

sleepmodeno-access time decay interval≧

access

always-activemode

1126 March 2006 ODES-4

How to Measure SMD Dynamically

SMDi =the average number of sleep-misses for all cache lines

> Threshold

① > ②×③Example ) The number of cache lines = 1024 (=210) , Threshold = 2 (=21)

the total number ofsleep-misses

10bit right shift ②

②×③① > ?

AA modeactive modeyes

no

1bit left shift

the number of sleep-misses at the cache line i

1226 March 2006 ODES-4

Hardware Implementation

Sleep-miss counterAlways-active flag

1023

012

Decay flag   2 bit local counter

tag data

Vol

tage

Con

trol

gated

Vdd or 0V

total sleep-miss counter

¼ decay interval

>? >

shifter

global counter

=?

If a line is in sleep mode, Cache decay tag is in sleep mode⇒ AA approach tag is in active mode⇒

The line is in sleep-mode && tag match⇒a sleep-miss occurs!

1326 March 2006 ODES-4

Experimental Setup

• Evaluation model– Cache decay: conventional low-leakage cache– AA1: Cache decay with AA approach (threshold value=1)

• Cache configuration– L1 data cache

• Cache size: 32KB• Associativity: 2way• Hit latency: 1 clock cycle • Miss penalty: 32 clock cycles

• Evaluation items– Performance improvement– Energy reduction

1426 March 2006 ODES-4

Results

0.0

0.2

0.4

0.6

0.8

1.0

1.2

f183

.equ

ake

i164

.gzip

i175

.vpr

i197

.par

ser

Ave

rage

_

正規

化消

費エ

ネル

ギー

0.0

0.2

0.4

0.6

0.8

1.0

1.2

f183

.equ

ake

i164

.gzip

i175

.vpr

i197

.par

ser

Ave

rage

_

正規

化消

費エ

ネル

ギー

0.0

0.2

0.4

0.6

0.8

1.0

1.2

f183

.equ

ake

i164

.gzip

i175

.vpr

i197

.par

ser

Ave

rage

_

正規

化消

費エ

ネル

ギー

0.9

1.0

1.1

1.2

1.3

f183

.equ

ake

i164

.gzip

i175

.vpr

i197

.par

ser

Ave

rage

_

正規

化実

行時

0.9

1.0

1.1

1.2

1.3

f183

.equ

ake

i164

.gzip

i175

.vpr

i197

.par

ser

Ave

rage

_

正規

化実

行時

0.9

1.0

1.1

1.2

1.3

f183

.equ

ake

i164

.gzip

i175

.vpr

i197

.par

ser

Ave

rage

_

正規

化実

行時

Cache decayAA1

Higher performance and lower energy consumption

Improve the performance by increasing energy consumption

Nor

mal

ized

exe

cutio

n tim

e

Nor

mal

ized

ene

rgy

1526 March 2006 ODES-4

Conclusions

• We have proposed a high-performance, low-leakage cache: AA approach– Detect lines which cause sleep-misses frequently at run tim

e– The performance is improved by operating the line as AA

mode• Evaluation results

– Higher performance and lower energy consumption – The best case (f183.equake):

• Performance degradation: 19%  → 4.2%• Energy consumption: 20% reduction

• Future work– Compare AA approach with an adaptive decay technique

(Kaxiras ISCA’00)

ODES-4

1626 March 2006

Thank you !ありがとう !

( in Japanese )

1726 March 2006 ODES-4

1826 March 2006 ODES-4

Impact of Threshold

0.9

1.0

1.1

1.2

1.3f1

83.e

quak

e

i164

.gzi

p

Ave

rage

_

正規

化実

行時

0.0

0.2

0.4

0.6

0.8

1.0

1.2

f183

.equ

ake

i164

.gzip

Ave

rage

_

正規

化消

費エ

ネル

ギー

Cache decayAA4AA2AA1

Threshold is small high performance. ⇒Because the number of AA lines increase!

Nor

mal

ized

exe

cutio

n tim

e

Nor

mal

ized

ene

rgy

1926 March 2006 ODES-4

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

(J)

_

消費

エネ

ルギ

ー内

LE L1 DE L1 DE memory

f183.equake i164.gzip

Breakdown of Energy Consumption

AA1 is

・ Leakage energy increase

・ Dynamic energy accompanying  reduce ‐ Because the number of sleep-miss reduce

Energy reduction is tradeoff of DEmemory and LEL1

AA1

Cache decay

Bre

akdo

wn

of e

nerg

y (J

)

2026 March 2006 ODES-4

Performance Impact of Decay Interval

0.81.01.21.41.61.82.0

177.

mes

a

179.

art

183.

equa

ke

188.

amm

p

164.

gzip

175.

vpr

176.

gcc

181.

mcf

197.

pars

er

256.

bzip

2

Ave

rage

Benchmark Programs

Nor

mal

ized

exe

cutio

n tim

e

Decay-1K Decay-8K Decay-64K Decay-512K AA1-4K AA2-8K

Cache decay: Performance improve along with the extension of decay intervalAA approach: Even if it uses short decay interval, performance fully improve

2126 March 2006 ODES-4

Energy Impact of Decay Interval

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Benchmark Programs

Bre

akdo

wn

of e

nerg

y (J

)

_

LE L1 DE L1 DE memory

177.

mes

a

179.

art

183.

equa

ke

188.

amm

p

164.

gzip

175.

vpr|

176.

gcc

181.

mcf

197.

pars

er .

256.

bzip

2

Ave

rage

Decay-1K, Decay-8K, Decay-64K, Decay-512K, AA1-4K, AA2-8K

Cache decay: Leakage energy increase along with the extension of decay intervalAA approach: Leakage reduction is large than cache decay using long decay interval

2226 March 2006 ODES-4

Energy Model(1/3)

Etotal = LEL1 + DEL1 + DEmemory

LEL1 = {LEbit×Nactive(i)}

CC

i 1

CC : プログラム実行時間LEbit : 1 クロックサイクルにおける 1 ビット SRAMセルでの  平均リーク消費エネルギーNactive(i): i clock cycle 時の活性状態 SRAM ビット数

LEL1 : L1 キャッシュのリーク消費エネルギーDEL1 : L1 キャッシュの動的消費エネルギーDEmemory :主記憶アクセス消費エネルギー

従来型低リーク

常活性ブロック方式

従来型

CC 長い 短いNactive(i) 少ない 多い

☺☺☹

2326 March 2006 ODES-4

DEL1 = DE 常活性 + DE 従来低 + DE 従来

消費エネルギー・モデル (2/3)

従来型低リーク

常活性方式

DE 常活

-  オーバヘッド

DE 従来

オーバヘッド  オーバヘッド

☹ ☹

DE 常活性 : 常活性ブロック方式の適用による 動的消費エネルギー・オーバヘッド

DE 従来低 : 従来型低リーク・キャッシュの適用による動的消費エネルギー

  オーバヘッド

DE 従来 : 従来型キャッシュでのアクセス消費エネルギーローカル待機状態中ミスカウンタ常活性フラグ

1023

012

待機状態フラグローカルカウンタtag data

電源

電圧

制御

状態破棄

Vdd / 0

総衰退ミスカウンタ設定値

>? >

シフタ

グローバルカウンタ

=?

2426 March 2006 ODES-4

消費エネルギー・モデル (3/3)

パラメータ

アクセス当りの平均消費エネルギ

ー積算根拠

LEbit 0.13pJ 文献 [1] を参考DEorg 1.90nJ CACTI3.0 を用いて測

定DE 従来 0.1pJ+0.5pJ 文献 [2]を参考DE 常活性 4.20pJ テーブルサイズと DEor

g から見積もり

DEmemory 38.0nJ DEorg×20と見積もり[1] K.Flautner, N.S.Kim, S.Martin, D.Blaauw, and T.Mudge, “Drowsy Caches: Simple Techniques for Reducing Leakage Power,” Proc. of the 29th Int, Symp. on Computer Architecture, pp.148-157, May 2002.[2] S.Kaxiras, Z.Hu, and M.Martonosi, “Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power,” Proc. of the 28th Int, Symp. on Computer Architecture, pp.240-251, June 2001.