instruction level power analysis

30
Instruction Level Power Analysis 1

Upload: radhegovind

Post on 14-Jan-2015

493 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Instruction level power analysis

Instruction Level Power Analysis

1

Page 2: Instruction level power analysis

2

Layout

Introduction Components of Power Consumption Power Characterization Instruction Level Power Analysis for RISC

processors Extensions for VLIW/EPIC processors Register Files Caches

Page 3: Instruction level power analysis

3

Introduction

Why power of nano-electronics became so important? Because of Moore’s law still holds true through

complex applications Mobile systems – battery “bottleneck” High performance computation – heat

extraction Operating cost and reliability

Data warehouse of ISP with 8000 servers needs 2 MW

Page 4: Instruction level power analysis

4

Introduction

Power or Energy? Aren’t they go hand-in-hand? Power varies significantly with time! A given battery has fixed amount of energy Average power consumption = Energy/Execution-

time Decides average chip and junction temperature Decides battery life (if peak current < rated

current) Peak power and current

Voltage drops, hot spots, rate of battery discharge Power-efficient, Energy-efficient, Battery-efficient

design paradigms do exist!

Page 5: Instruction level power analysis

5

Components of Power Consumption

System = hardware platform + software (sys. & app.) Software impacts hardware power consumption

Static power Sub-threshold leakage & reverse biased junction leakage Quiescent biasing power (in case of non-CMOS circuits)

Dynamic power Charging and discharging of capacitance (switching

activity) Short circuit power during transition (rate of change,

delay) Alternative grouping (used at component/cell level)

Switching power at the boundaries of cells Internal cell power

Short circuit power Switching power at internal nodes

Page 6: Instruction level power analysis

6

System Abstractions - PowerFunctional Specifications and Constraints

System Level Netlist

Register Transfer Level (RTL) Netlist

Component/Cell Level Netlist

Layout or Configuration-bits

Chip

Tim

e co

mp

lexity

Accu

racy

of p

ow

er

chara

cteriza

tion

Op

port

un

itie

s fo

r op

tim

izati

on

Page 7: Instruction level power analysis

7

Power Characterization

Measurement (Chip/Board Level) Most accurate Perhaps the fastest, if setup and tools

exist Too late to change hardware details Software/Load control is still possible Typically used for software

optimizations

Page 8: Instruction level power analysis

8

Power Characterization (cont…)

Transistor Level (estimation) Spice simulation of transistor level netlist Most accurate in the simulation world Requires complete implementation details Unmanageable time complexity even for

simpler designs Typically used for cell/component

characterization Synopsys PowerMill (said to provide spice-

like accuracy)

Page 9: Instruction level power analysis

9

Power Characterization (cont…)

Cell Level (estimation) After logic synthesis Requires RTL implementation Simulation to capture switching activity

Requires delay simulation if glitches need to be accounted Characterized cells – empirical formulas or table look-up Interconnect power

Either unaccounted or Using estimated wire load models (typically based on

experience) or Extracted layout (if done after physical synthesis)

Still unmanageable time complexity especially to use in design space exploration

Synopsys PrimePower Netlist, interconnect capacitance, VCD traces, cell power

library

Page 10: Instruction level power analysis

10

Power Characterization (cont…)

Register Transfer Level (estimation) Requires conceptual RTL description (detailed

micro-architecture) Data-path is modeled as netlist of macro cells,

which are characterized offline Control path and glue logic

Either unaccounted or estimated based on I/O Simulation to capture switching activity

Typically glitches are not considered but methods do exist

Interconnect power Typically unaccounted but possible to estimate

through floor-planning Typically used in DSE mostly using in-house tools

Page 11: Instruction level power analysis

11

System Level Power Estimation

For Design Space Exploration Least accurate but uncertainty of exploration results

can be reduced if models have good fidelity Purpose, target architecture and available system

details govern the system-level estimation models Selecting algorithm or designing hardware for given

algorithm? ASIC based or processor based? Is ISA fixed or extensible?

Typically system-level power estimation models are macro-architecture template specific

Major constituents of power consumption Computation, communication, storage units & peripherals

Page 12: Instruction level power analysis

12

Power Estimation Models

Activity Based Models Instruction Level Energy Models

Page 13: Instruction level power analysis

13

Activity Based Models

Fixed Activity Model N-Transition Model Dual Bit Model

Page 14: Instruction level power analysis

14

Fixed Activity Model

P = ∑ i kiGifi

Where:ki = PFA proportionality constant extracted

empirically from past designsGi = Measure of hardware complexity

fi = Activation frequency

Disadvantage: Do not model the influence of data activity on power consumption

Page 15: Instruction level power analysis

15

N-Transition Model

P = Pconst + n.Pchange

Disadvantage:

It does not differentiate between transitions on different inputs.

Page 16: Instruction level power analysis

16

Dual Bit Type Model

Drawback in previous approaches: Less Accurate Characterizes the

module on basis of Uniform White Noise (UWN) input

Leads to high error if the input dynamic range does not fully occupy the word length

Page 17: Instruction level power analysis

17

Dual Bit Type ModelThe Approach

Combines reduced complexity of the architecture level with the accuracy of gate and circuit level

Black box model of capacitance switched in each module for various types of inputs

Easy to parameterize capacitance models to take into account size , etc.

Page 18: Instruction level power analysis

18

Dual Bit Type ModelModeling Complexity

Power consumed by a module is a function of its complexity as large modules contain more circuitry

Examples: Capacitance of N-bit ripple carry subtracter:

CT = Ceff * N Not restricted to linear models, but can be

used to specify even more complex models

Page 19: Instruction level power analysis

19

Dual Bit Type ModelCapacitive Data Coefficients

Describe the average amount of capacitance switched within a module during an input transition LSB regions suffer random transitions and

hence can be characterized by a single capacitive coefficient CUU

MSB region experiences sign transitions and so is characterized by capacitive sign coefficients C+-,C++, etc.

Page 20: Instruction level power analysis

20

Instruction Level Power Estimation

First introduced to characterize processor power consumption to drive software optimizations

Each instruction is associated with some current

Inter instruction effects for better accuracy

Page 21: Instruction level power analysis

21

Instruction Level Power Estimation

E = Σ(Bi x Ni) + Σ(O(i,j) x N(I,j)) + ΣEk

Bi: Base Energy Cost Oi.j: Inter-instruction effect Energy Cost Ek: additional energy penalties due to

resource constraints Require cost associated with every pair

of instructions: O(N2), where N = number of instructions in ISA

Page 22: Instruction level power analysis

22

JouleTrack

Experiments on StrongARM by Amit Sinha & A.P.Chandran Current/instruction ~ 0.2A (averaged over all

instructions) Min-max variation of 38% of average current Address mode and data dependent variation is

smaller But, max current variation across benchmarks is

< 8% ! Concluded that first order energy model of a

given processor is, E = V I(V, f) T Second order effects can be significant for data-

path dominated processors such as DSP, VLIW

Page 23: Instruction level power analysis

23

Instruction Level Power Estimation

Impractical for CISC processors with very large instruction set Higher Average Instruction Energy Low Energy Per Instruction Variance Do not consider inter instruction effects Cluster Similar Instructions as a single

class Exponential Storage Problem for VLIW

architectures No. of Long Instructions = N operations

into a K-wide VLIW = N(2k)

Page 24: Instruction level power analysis

24

Modified Energy Model for VLIW

Assume Independent Energy dissipation for different Execution slots

Consider nop as the base energy E(W) = ΣU(wn|wn-1) + mxpxS + lxqxM U(wn|wn-1) = U(0|0) + Σv(wnk,wn-1k)

Wnk = operation issued on lane k by instruction wn Example

Wn = [ ALU NOP NOP NOP], Wn-1 = [ LS NOP ALU NOP]

U(wn|wn-1) = U(0|0) + v(ALU|LS) + v(NOP|ALU) Memory Requirement

O(K*N2)

Page 25: Instruction level power analysis

25

Modified Energy Model for VLIW Cluster Similar Instructions based on cost

Θ = {e1, e2, …, et} et = energy consumption of instruction t

Partition Θ into K clusters (C1, C2, …, Ck) s.t. ΣΣ (xi,j –cj)2 = minimum

Large number of clusters Good Accuracy Huge no. of experiments

Small number of clusters Small number of experiments High Variance between clusters Reduced Accuracy

Memory Requirement O(C*N2)

Page 26: Instruction level power analysis

26

Limitations of ILPA

Does not provide any insight on the causes of power consumption within the processor core

Does not account for the power consumed in the memory system, which is often dominant

To address the second limitation, power estimation frameworks which integrate processor and memory models are built around instruction set simulators

Page 27: Instruction level power analysis

27

MicroArchitecture ILPA

Pipeline Aware Instruction Level Energy Model Divide the design into smaller architectural blocks

Usually Processor’s Pipeline Stages Fetch, Decode, RF, Execute, WB

E(wn|wn-1) = Σ As(wn|wn-1) + I(wn|wn-1) As = Energy Consumed Per stage s when executing

wn after wn-1 I(wn|wn-1) = Interstage connections energy

(PipeLine Registers + Buses) Provides better insight for power bottlenecks Smoother Energy Behaviour than Blackbox model Require a Pipeline Structure Aware ISS

Page 28: Instruction level power analysis

28

Energy Models for Register File

Assume Linear Power Behaviour for access across different ports PRF = Pi + 1/T Σ (Er,n + Ew,n) Er,n = Σ H(RRi,n, RRi,n-1) *ErbEw,n = Σ H(RWi,n, oldi,n) * Ewb

Page 29: Instruction level power analysis

29

Energy Model for Caches

Power consumption depends on mode of operation (read, write, idle)

Energy consumed in a given clock cycle is function of node transition between previous and current cycle.

Characterize energy as function of state transitions(read-read, read-write, etc).

For a given transition, dependence upon transition on address lines.

Page 30: Instruction level power analysis

30

Thank You