fast cache and bus power estimation for parameterized system-on-a-chip design

24
Tony Givargis University of California, Riverside & NEC USA 1 Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design Tony D. Givargis & Frank Vahid Department of Computer Science University of California Riverside, CA 92521 {givargis,vahid}@cs.ucr.edu Jörg Henkel C&C Research Laboratories, NEC USA 4 Independence Way, Princeton, NJ 08540 [email protected] A DAC scholarship and a NSF grant in part supported this research.

Upload: carsyn

Post on 19-Jan-2016

53 views

Category:

Documents


0 download

DESCRIPTION

Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design. Tony D. Givargis & Frank Vahid Department of Computer Science University of California Riverside, CA 92521 {givargis,vahid}@cs.ucr.edu. Jörg Henkel C&C Research Laboratories, NEC USA - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design

Tony Givargis University of California, Riverside & NEC USA

1

Fast Cache and Bus Power Estimation for Parameterized

System-on-a-Chip Design

Tony D. Givargis & Frank VahidDepartment of Computer Science

University of CaliforniaRiverside, CA 92521

{givargis,vahid}@cs.ucr.edu

Jörg HenkelC&C Research Laboratories, NEC USA

4 Independence Way, Princeton, NJ [email protected]

A DAC scholarship and a NSF grant in part supported this research.

Page 2: Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design

Tony Givargis University of California, Riverside & NEC USA

2

Introduction

• Systems-on-a-chip (SOC) era– increased chip capacity– parametrizable core based system design

• Large power/performance tradeoffs possible just by varying bus/cache parameter values [givargis99]

• But, simulation based cache/bus power evaluation is slow

Page 3: Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design

Tony Givargis University of California, Riverside & NEC USA

3

Introduction

• We present a two-step approach for fast cache power evaluation– collect intermediate data using simulation– use equations to rapidly predict power– couple with a fast bus estimation approach

• Our approach is– orders of magnitude faster than simulation– yields good accuracy

Page 4: Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design

Tony Givargis University of California, Riverside & NEC USA

4

Target Architecture

Bus ACPU I-Cache

D-Cache

Bridge

Peripheral 1

Peripheral Bus

Peripheral 2 Peripheral n

Memory

Bus B

Page 5: Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design

Tony Givargis University of California, Riverside & NEC USA

5

Focus on Cache/Bus Parameters

Bus A Bus B

Peripheral 1

Peripheral Bus

Bridge

CPUI-Cache

D-Cache

Peripheral 2 Peripheral n

Memory

CACHE40%

BUS20%

Others10%

CPU30%

Power dissipation breakdownin a Digital Camera example

Page 6: Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design

Tony Givargis University of California, Riverside & NEC USA

6

Cache Parameters

Bus A

Peripheral 1

CPU I-Cache

D-Cache

Bridge

Peripheral Bus

Peripheral 2 Peripheral n

Memory

Bus B

Page 7: Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design

Tony Givargis University of California, Riverside & NEC USA

7

Cache Parameters

Tag Index Offset

V T D V T D

== ==Mux

Data

• Associativity

• Cache Size

• Line Size

Page 8: Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design

Tony Givargis University of California, Riverside & NEC USA

8

Bus Parameters

Bus A

Peripheral 1

CPU I-Cache

D-Cache

Bridge

Peripheral Bus

Peripheral 2 Peripheral n

Memory

Bus B

Page 9: Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design

Tony Givargis University of California, Riverside & NEC USA

9

Bus Parameters

Bus A/BMuxDemux

MuxDemux

Bus A/BMuxDemux

MuxDemux

Change Bus Width [givargis98]

C1

C2

C1 < C2

Page 10: Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design

Tony Givargis University of California, Riverside & NEC USA

10

Bus Parameters

Bus A/BEncoderDecoder

EncoderDecoder

Change Data Representation (Bus Invert) [Stan95]

Bus A/BEncoderDecoder

EncoderDecoder

invert_ctr

Reduce Bus Switching

Page 11: Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design

Tony Givargis University of California, Riverside & NEC USA

11

Bus Parameters

01001011

10010110

Hamming Dist = 6

01001011

0

01101001

1 inverted_ctr

Binary Encoding Bus-Invert Encoding

Hamming Dist = 3

Page 12: Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design

Tony Givargis University of California, Riverside & NEC USA

12

Related Work

• Important to explore various cache and bus parameters for best performance and power [Wilton96][Li98][givargis99]

– large number of cache/bus configurations– need to estimate power/performance in constant time

• Trace stripping [Wolf99], configuration ordering, single pass simulation [Kirovski])

Page 13: Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design

Tony Givargis University of California, Riverside & NEC USA

13

Approach Overview• Given a trace of memory refs• Cache parameters

• Size (S)• Line/block-size (L)• Associativity (A)

• Compute # of misses (N)

6

5

4

3

2

1

,,

,,

,,

,,

,,

,,

NALSf

NALSf

NALSf

NALSf

NALSf

NALSf

MaxMinMax

MinMaxMax

MaxMinMin

MinMaxMin

MinMinMax

MinMinMin

0

20

40

60

80

100

120

0.5 1 2 4 8 16 32 64

Size (S)

# of misses (N)

}}}

Page 14: Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design

Tony Givargis University of California, Riverside & NEC USA

14

Approach Overview

• Capture improvements obtainable by:– changing line-size at

small/large values of cache-size

– changing associativity at small/large values of cache-size

)1(),,(

)(

)(

)(

/

/

/

321

2243

1132

1121

tttALSf

RRRat

RRRlt

NNNst

AAAa

LLLl

SSSs

kji

MaxMink

MaxMinj

MaxMini

Page 15: Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design

Tony Givargis University of California, Riverside & NEC USA

15

Approach Overview• Bus equation:

• m items/second (denotes the traffic N on the bus)• n bits/item• k bit wide bus• binary encoding• random data assuption

mkk

nCP bus

2

1

Page 16: Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design

Tony Givargis University of California, Riverside & NEC USA

16

Approach Overview• Bus equation:

• m items/second (denotes the traffic N on the bus)• n bits/item• k bit wide bus• bus-invert encoding• random data assumption

222

21

2

1

2

1

2

1

1 k

k

nmCP

k

k

k

k

k

k

k

bus

Page 17: Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design

Tony Givargis University of California, Riverside & NEC USA

17

Experiments

Bus A Bus B

Peripheral 1

Peripheral Bus

Bridge

CPUI-Cache

D-Cache

Peripheral 2 Peripheral n

Memory

• Cache parameters– size: 128, 256, 512, 1k, 2k, 4k, 8k, 16k, 32k– assoc: 2, 4, 8– line: 8, 16, 32

• Bus Parameters– width: 4, 8, 16, 32– code: binary/bus-invert

• Analyzed 45K sets exhaust.– 3d-Image

– MPEG

– CKey

– Diesel• 5kB to 230kB of C code

Page 18: Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design

Tony Givargis University of California, Riverside & NEC USA

18

Experiment Setup

CProgram

TraceGenerator

CacheSimulator

CPUPower

ISS

Performance

+

Pow

er

MemoryPower

BusSimulator

I/D CachePower

• Dinero [Edler, Hill]

• CPU power [Tiwari96]

Page 19: Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design

Tony Givargis University of California, Riverside & NEC USA

19

Experiment Results

0

0. 05

0. 1

0. 15

0. 2

0. 25

0. 3

Conf 0 Conf 1 Conf 2 Conf 3 Conf 4 Conf 5 Conf 6 Conf 7 Conf 8 Conf 9Execu

tio

n T

ime (

sec)

• Diesel application’s performance• Blue (light-gray) is obtained using full simulation• Red (dark-gray) is obtained using our equations

4% error320x faster

Page 20: Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design

Tony Givargis University of California, Riverside & NEC USA

20

Experiment Results• Diesel application’s energy consumption• Blue (light-gray) is obtained using full simulation• Red (dark-gray) is obtained using our equations

0

500

1000

1500

2000

2500

3000

Conf 0 Conf 1 Conf 2 Conf 3 Conf 4 Conf 5 Conf 6 Conf 7 Conf 8 Conf 9

mic

ro-J

ou

les

2% error420x faster

Page 21: Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design

Tony Givargis University of California, Riverside & NEC USA

21

Experiment Results• CKey application’s performance• Blue (light-gray) is obtained using full simulation• Red (dark-gray) is obtained using our equations

0

5

10

15

20

25

Conf 0 Conf 1 Conf 2 Conf 3 Conf 4 Conf 5 Conf 6 Conf 7 Conf 8 Conf 9

Execu

tio

n T

ime (

sec)

8% error125x faster

Page 22: Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design

Tony Givargis University of California, Riverside & NEC USA

22

Experiment Results• CKey application’s energy consumption• Blue (light-gray) is obtained using full simulation• Red (dark-gray) is obtained using our equations

0

50

100

150

200

250

300

Conf 0 Conf 1 Conf 2 Conf 3 Conf 4 Conf 5 Conf 6 Conf 7 Conf 8 Conf 9

milli-J

ou

les

3 % error125x faster

Page 23: Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design

Tony Givargis University of California, Riverside & NEC USA

23

Experiment Results

• 125 - 400x speedup

• 1-18% absolute error (power & performance)

• 2% average power error

020406080

100120140160180200

3d-image mpeg ckey diesel

SimEq.

Time (hours)

0

0.5

1

1.5

2

2.5

3

3d-image mpeg ckey diesel

Power Error (%)

Page 24: Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design

Tony Givargis University of California, Riverside & NEC USA

24

Conclusion

• Presented a technique for rapidly estimating the power and performance of cache and bus sub-systems– orders of magnitude faster than exhaustive

simulation– yields good accuracy

• Enable exploration of parameters in parameterized system-on-a-chip architecture