single-isa heterogeneous multi-core architectures: the potential for processor power reduction

22
1 Heterogeneous Multi- Core Architectures: The Potential for Processor Power Reduction Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy Ranganathan, Dean M. Tullsen Presenter: Borys Bradel

Upload: hanley

Post on 06-Jan-2016

37 views

Category:

Documents


4 download

DESCRIPTION

Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction. Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy Ranganathan, Dean M. Tullsen. Presenter: Borys Bradel. Introduction. Different programs have different requirements (e.g. ILP) - PowerPoint PPT Presentation

TRANSCRIPT

1

Single-ISA Heterogeneous Multi-Core Architectures:

The Potential for Processor Power Reduction

Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi,

Parthasarathy Ranganathan, Dean M. TullsenPresenter: Borys Bradel

2

Introduction

Different programs have different requirements (e.g. ILP) Extends to phases of a single program Heterogeneous cores Use core that matches the requirements

Reuse existing cores Use multiple generations of the same

family of processors

3

Outline

Methodology Hardware Assumptions Power

Experiments Optimal – energy/energy delay product Heuristic based – static/dynamic

Related Work Conclusion

4

Single ISA Multi-Core Benefits

Small area overhead because of the growth in core sizes between generations

Clock frequencies of older cores would scale with technology P3 1 GHz = P4 1.4 GHz Increased pipeline depth precisely

because could not scale

5

Hardware – Alpha Family

2 in order cores EV4=21064 EV5=21164

2 out of order cores EV6=21264 EV8-=21464 (multi thread support

removed)

6

Hardware Size

15% more area than just using 21464

7

Assumptions

Can switch cores dynamically Private L1 cache and common L2 cache All cores use 0.10 micron technology Single process executing on a single core at any one

time 2.1 GHz clock (=21264 0.35 micron 600 MHz) Input voltage 1.2V Cores shut down when idle 1000 cycle restart cost (staged, phase lock loop left

alone) 150 ms memory access Stall cycles through CACTI

8

Core Configurations

9

Power Model

Use Wattch to account for activity based dissipation

Use scaling and offset factors to account for other factors

This hybrid model is closer to manufacturer’s data points

Peak power: data sheets less L2 cache and output pins

Typical power: scaled based on Intel chips

10

Power and Area Statistics

11

Performance Modeling

Use SMTSIM, a cycle accurate simulator

simpoint is used to identify representative instructions of programs and how many instructions need to be fast forwarded

12

Varying Performance Ratio

13

Varying Energy Efficiency Ratio

14

Oracle Switching for Energy

Performance always within 10% of EV8-

15

Oracle Switching for Energy

16

Oracle Switching for Energy Delay Product

Performance always within 50% of EV8-

17

Oracle Switching for Energy Delay Product

18

Others

Voltage/frequency scaling – not as good

Static core selection only EV6 and EV8- are used

Dynamic heuristic Running average performance within 10% Every 100 time intervals (100 million

instructions) cores are sampled for 5 intervals

Select best core based on sampling

19

Results for Heuristics

20

Results for Heuristics/Static Core

21

Related Work

Gating based power optimization Cannot gate at a fine enough

granularity May still have leakage This could be thought of as gating to

reduce capabilities of different units Voltage and frequency scaling

Chip wide – one size does not fit all Fine grained – granularity problems

22

Conclusions

Heterogeneous multi core architectures reduce the energy-delay product More fine grained than other approaches

Using several cores from the same family is good Reduces development/testing costs Is it scalable?

Just use EV6??