improving energy efficiency of configurable caches via temperature-aware configuration selection...
TRANSCRIPT
Improving Energy Efficiency of Configurable Caches via
Temperature-Aware Configuration Selection
Hamid Noori† , Maziar Goudarzi‡ , Koji Inoue ‡ , andKazuaki Murakami ‡
Speaker: Tohru Ishihara ‡
†Institute of Systems & Information Technologies/KYUSHU, Japan ‡Kyushu University, Japan
2/26 ISVLSI2008@Montpellier, FranceKyushu University
Outline
Background Motivation Problem Definition Proposed Approach
Architecture Reconfiguration Flow
Experimental Results Conclusions
3/26 ISVLSI2008@Montpellier, FranceKyushu University
Outline
Background Motivation Problem Definition Proposed Approach
Architecture Reconfiguration Flow
Experimental Results Conclusions
4/26 ISVLSI2008@Montpellier, FranceKyushu University
Background(1/2)
Vdd:180nm = 1.66V100nm = 1.125V
70nm = 0.9 V
Temperature:Dynamic energy is
temperature independent
0
0.05
0.1
0.15
0.2
0.25
0.3
180nm 100nm 70nm
Technology
Dy
na
mic
En
erg
y (
nJ
)
32K 16K 8K 4K 2K 1K
Vdd:180nm = 1.66V
100nm = 1.125V70nm = 0.9V
Temperatue:100°C
0
50
100
150
200
250
300
180nm 100nm 70nm
Technology
Le
ak
ag
e P
ow
er
(mW
)
32K 16K 8K 4K 2K 1K
The dynamic energy per a cache access
The leakage power of a cache memory
5/26 ISVLSI2008@Montpellier, FranceKyushu University
Background(2/2)
Vdd:180nm = 1.66V
100nm = 1.125V70nm = 0.9V
Cache Size:32KB
0
20
40
60
80
100
120
140
0°C 20°C 40°C 60°C 80°C 100°C
Temperature
Le
ak
ag
e P
ow
er
for
Ca
ch
e
32
KB
(m
W)
180nm 100nm 70nm
6/26 ISVLSI2008@Montpellier, FranceKyushu University
Outline
Background Motivational Example Problem Definition Proposed Approach
Architecture Reconfiguration Flow
Experimental Results Conclusions
7/26 ISVLSI2008@Montpellier, FranceKyushu University
Motivational Example (1/3)
Execution time is Technology &Temperature Independent
0
2000000
4000000
6000000
8000000
10000000
12000000
14000000
16000000
18000000
20000000
128K 64K 32K 16K 8K 4K 2K 1K
Instruction Cache Size - qsort
No
. of
Ex
ec
uti
on
Clo
ck
Cy
cle
s (
K)
8/26 ISVLSI2008@Montpellier, FranceKyushu University
Motivational Example (2/3)
Technology: 70nm
Vdd: 0.9V
0
500
1000
1500
2000
2500
128K 64K 32K 16K 8K 4K 2K 1K
Cache SizeS
tati
c E
ne
rgy
(m
J)
0°C
20°C
40°C
60°C
80°C
100°C
Technology:70nm
Vdd: 0.9V
Dynamic Energy isTemperature Independent
0
500
1000
1500
2000
2500
3000
3500
4000
128K 64K 32K 16K 8K 4K 2K 1K
Cache Size
Dy
na
mic
En
erg
y (
mJ
)
Total dynamic energy for executing a program
Total static energy for executing a program
9/26 ISVLSI2008@Montpellier, FranceKyushu University
Motivational Example (3/3)
Technology: 70nm
Vdd: 0.9V
0
500
1000
1500
2000
2500
3000
3500
4000
4500
128K 64K 32K 16K 8K 4K 2K 1K
Instruction Cache Size - qsort
To
tal
En
erg
y (
mJ
)
0°C 20°C 40°C
60°C 80°C 100°C
Minimum-energy cache size
10/26 ISVLSI2008@Montpellier, FranceKyushu University
Outline
Background Motivation Problem Definition Proposed Approach
Architecture Reconfiguration Flow
Experimental Results Conclusions
11/26 ISVLSI2008@Montpellier, FranceKyushu University
Problem Definition (1/3)
Objective function: total memory energy Cache dynamic energy Cache static energy Off-chip memory access energy Energy consumption during processor stall
CPUI-$
D-$
Mainmemory
12/26 ISVLSI2008@Montpellier, FranceKyushu University
Problem Definition (2/3)energy_memory(C, Temp, Tech) =
energy_dynamic(C, Tech) + energy_static(C, Temp, Tech) (1)
energy_dynamic(C, Tech) = cache_accesses(C) * energy_cache_access(C, Tech) +
cache_misses(C) * energy_miss(C,Tech) (2)
energy_miss(C, Tech) = energy_off_chip_stall + energy_cache_block_refill(C, Tech) (3)
energy_static(C, Temp, Tech) = executed_clock_cycles(C) * clock_period * leakage_power(C, Temp, Tech) (4)
13/26 ISVLSI2008@Montpellier, FranceKyushu University
Problem Definition (3/3)
“For a given application, processor architecture, technology, and valid configurations of the configurable cache, find a valid cache configuration that results in minimum energy consumption in a specific temperature over the entire execution of the given application.”
14/26 ISVLSI2008@Montpellier, FranceKyushu University
Outline
Background Motivation Problem Definition Proposed Approach
Architecture Reconfiguration Flow
Experimental Results Conclusions
15/26 ISVLSI2008@Montpellier, FranceKyushu University
Architecture
TACC BCC (proposed by Zhang et al. [1])
Cache size (way shutdown) Number of ways (way concatenation) Line size
Thermal sensor Accessible port for reading the thermal sensor
[1] C. Zang, F. Vahid and W. Najjar,.“A Highly Configurable Cache Architecture for Embedded Systems,” ACM Trans. on Embedded Computing Systems, vol.4, no.2, May 2005
16/26 ISVLSI2008@Montpellier, FranceKyushu University
Reconfiguration FlowStatic and dynamicpower for differentcache configuration
and temperatures forthe target technology
Execution time, number ofhits and misses for
different cacheconfigurations obtained
through running theapplication on an ISS
Determining thelowest energy cache
configuration fordifferent targettemperatures
Fill the lookup table of theconfigurable cache withproper configuration for
each temperature
Evaluationphase
(offline)
Detect the currenttemperature
Use the lookup table andload the proper
configuration for thecurrent temperature
Execute theapplication
Reconfigurationphase (online)
17/26 ISVLSI2008@Montpellier, FranceKyushu University
Outline
Background Motivation Problem Definition Proposed Approach
Architecture Reconfiguration Flow
Experimental Results Conclusions
18/26 ISVLSI2008@Montpellier, FranceKyushu University
Experiment Setup (1/2)
Mibench Simplescalar
Cache hit: one clock cycle Cache miss: 100 clock cycles Clock freq of the base processor: 200 MHz
CACTI 4.2 Target technology 70nm (Vdd=0.9)
BCC (16KB) 16KB (4-, 2-, 1-way) 8KB (2-, and 1-way) 4KB (1-way) The line size for each of the configurations can be 8-, 16-, or 32-
byte.
19/26 ISVLSI2008@Montpellier, FranceKyushu University
Experimental Setup (2/2) Base Configurable Cache (BCC)
It has the same architecture proposed by Zhang et al. [1] It supports a limited set of configurations It is configured for each application for corner-case (i.e.
leakage at 100°C)
Temperature-Aware Configurable Cache (TACC) TACC is configured for each execution of an application
considering the chip temperature at that time
[1] C. Zang, F. Vahid and W. Najjar,.“A Highly Configurable Cache Architecture for Embedded Systems,” ACM Trans. on Embedded Computing Systems, vol.4, no.2, May 2005
20/26 ISVLSI2008@Montpellier, FranceKyushu University
Energy & Performance Evaluation
Energy Saving =
100__
_100__
tempBCCenergy
TACCenergytempBCCenergy × 100
BCCtimeexec
TACCtimeexecBCCtimeexec
__
____ Performance Enhancement =
× 100
21/26 ISVLSI2008@Montpellier, FranceKyushu University
Data and Instruction CacheD$ qsort djpeg lame dijkstra patricia sha adpcm crc fft
0°C 16K, 32, 2 16K, 32, 2 16K, 32, 4 16K, 32, 2 16K, 32, 2 16K, 32, 2 8K, 32, 2 8K, 32, 2 16K, 32, 4
20°C 8K, 32, 2 16K, 32, 2 16K, 32, 4 16K, 32, 2 16K, 32, 2 8K, 32, 1 8K, 32, 2 8K, 32, 2 16K, 32, 4
40°C 8K, 32, 2 16K, 32, 2 16K, 32, 4 8K, 32, 2 16K, 32, 2 4K, 32, 1 8K, 32, 2 8K, 32, 2 16K, 32, 4
60°C 8K, 32, 2 16K, 32, 2 16K, 32, 2 8K, 32, 2 8K, 32, 2 4K, 32, 1 4K, 16, 1 8K, 32, 2 8K, 32, 2
80°C 8K, 32, 2 8K, 32, 2 16K, 32, 2 8K, 32, 2 8K, 32, 2 4K, 32, 1 4K, 16, 1 4K, 32, 1 8K, 32, 2
100°C 4K, 32, 1 8K, 32, 2 8K, 32, 2 8K, 32, 2 8K, 32, 2 4K, 32, 1 4K, 32, 1 4K, 32, 1 8K, 32, 2
I$ basimath qsort djpeg lame dijkstra blowfish rijndael gsm fft
0°C 16K, 8, 4 16K, 8, 4 16K, 32, 1 16K, 32, 2 16K, 32, 1 16K, 16, 2 16K, 32, 1 16K, 16, 4 8K, 32, 1
20°C 16K, 16, 4 16K, 16, 4 16K, 32, 1 16K, 32, 2 16K, 32, 1 16K, 16, 2 16K, 32, 1 16K, 32, 2 8K, 32, 1
40°C 16K, 16, 4 16K, 16, 4 8K, 32, 2 8K, 32, 2 8K, 32, 2 16K, 32, 2 16K, 32, 1 16K, 32, 2 8K, 32, 1
60°C 16K, 16, 4 16K, 16, 4 8K, 32, 2 8K, 32, 2 8K, 32, 2 16K, 32, 2 16K, 32, 1 8K, 32, 2 8K, 32, 1
80°C 16K, 32, 4 16K, 32, 4 8K, 32, 2 8K, 32, 2 8K, 32, 2 8K, 32, 2 16K, 32, 1 4K, 32, 1 8K, 32, 1
100°C 16K, 32, 4 16K, 32, 4 8K, 32, 2 8K, 32, 2 8K, 32, 2 8K, 32, 2 16K, 32, 2 4K, 32, 1 8K, 32, 1
22/26 ISVLSI2008@Montpellier, FranceKyushu University
Energy Saving
Technology: 70nmVdd: 0.9V
BCC & TACC Max. Size = 16KB
Operation Temperature
0
10
20
30
40
50
60
70
80
basic
mat
hqso
rt
susa
n
cjpeg
djpeg
lam
e
dijkst
ra
patric
ia
blowfis
h
rijndae
lsh
agsm
adpc
m crc fft
aver
age-
DC
max
-IC
aver
age-
IC
En
erg
y s
av
ing
(%
)
0°C
20°C
60°C
23/26 ISVLSI2008@Montpellier, FranceKyushu University
Performance Enhancement
Technology:70nm Vdd:0.9VBCC & TACC Max. Size = 16KB
OperationTemperature
0
5
10
15
20
25
30
basicm
ath
qsort
susa
n
cjpeg
djpeg
lam
e
dijkst
ra
patrici
a
blowfis
h
rijndae
lsh
agsm
adpcm cr
c fft
aver
age-D
C
max-
IC
aver
age-IC
Per
form
ance
en
han
cem
ent
(%) 0°C
20°C
60°C
24/26 ISVLSI2008@Montpellier, FranceKyushu University
Outline
Background Motivation Problem Definition Proposed Approach
Architecture Reconfiguration Flow
Experimental Results Conclusions
25/26 ISVLSI2008@Montpellier, FranceKyushu University
Conclusions
1. Importance of temperature-aware configurable cache for finer technologies. Up to 61% (17% on average) energy consumption in 70nm technology for instruction cache
2. Data cache is more easily affected by temperature than instruction cache. Using a configurable data cache, up to 77% (36% on average) energy can be saved in 70nm technology.
3. The TACC improves the performance for instruction cache up to 28% (5% on average) and for data cache, it is up to 17% (8.1% in average).
26/26 ISVLSI2008@Montpellier, FranceKyushu University
Thank you for your attention
Please ask any questions to [email protected]
27/26 ISVLSI2008@Montpellier, FranceKyushu University
Backup slides
28/26 ISVLSI2008@Montpellier, FranceKyushu University
29/26 ISVLSI2008@Montpellier, FranceKyushu University
Technology: 180nm
Vdd: 1.66V
0
500
1000
1500
2000
2500
3000
3500
4000
128K 64K 32K 16K 8K 4K 2K 1K
Instruction Cache Size - qsort
To
tal E
ne
rgy
(m
J)
0°C 20°C 40°C
60°C 80°C 100°C
30/26 ISVLSI2008@Montpellier, FranceKyushu University
ARM7TDMI ARM966E-S
130nm Power consumption
7.98 mW 62.5 mW
Frequency 133 MHz 250 MHz
90nm Power consumption
7.08 mW 51.7 mW
Frequency 236 MHz 470 MHz