power, energy and performance - computer science and...

46
1 Power, Energy and Performance Tajana Simunic Rosing Department of Computer Science and Engineering UCSD

Upload: phunghanh

Post on 08-May-2018

222 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

1

Power, Energy and

Performance

Tajana Simunic Rosing

Department of Computer Science and Engineering

UCSD

Page 2: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

2

Motivation for Power Management

Power consumption is a critical issue in system design today

Mobile systems: maximize battery life

High performance systems: minimize operational costs

1.2% of total electrical use in US devoted for powering and cooling data centers

$1 operation => $1 cooling

Page 3: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

3

Power and Energy Relationship

dtPE

t

P

E

Page 4: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

4

Low Power vs. Low Energy

Minimizing the power consumption is important for the design of the power supply

the design of voltage regulators

the dimensioning of interconnect

short term cooling

Minimizing the energy consumption is important due to restricted availability of energy (mobile systems)

limited battery capacities (only slowly improving)

very high costs of energy (solar panels, in space)

cooling high costs

limited space

dependability

long lifetimes, low temperatures

Page 5: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

Intuition

System and components are: Designed to deliver peak performance, but …

Not needing peak performance most of the time

:Working

:Idle

Dynamic Power Management (DPM) Shut down components during idle times

Dynamic Voltage Frequency Scaling (DVFS) Reduce voltage and frequency of components

time

System Level Power Management Policies Manage devices with different power management capabilities

Understand tradeoff between DPM and DVFS

Page 6: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

DPM

6

DPM is an effective way of reducing the CPU

energy consumption by shutting down (completely

or partially) when components are not in use.

Page 7: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

Idle States

7

http://impact.asu.edu/cse591sp11/Nahelempm.pdf

Page 8: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

8

Dynamic Voltage Scaling (DVS)

Power consumption of CMOS

circuits (ignoring leakage):

frequency clock

voltagesupply

ecapacitanc load

activity switching

with2

:

:

:

:

f

V

C

fVCP

dd

L

ddL

) (

voltagethreshhold:

with2

ddt

t

tdd

ddL

VV

V

VV

VCk

Delay for CMOS circuits:10 25

(b) Power-down

Time25

(c) Dynamic

voltage

scaling

(a) No

power-down

DeadlinePower

10 25

DVFS vs. power down

Page 9: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

Android Frequency Governors

9

Highest Frequency

Lowest Frequency

Ondemand

Performance

Powersave

Governors

Fre

quen

cy

In-between frequency range

Time More info at:

http://docs.fedoraproject.org/en-US/Fedora/15/html/Power_Management_Guide/index.html

http://doc.opensuse.org/documentation/html/openSUSE/opensuse-tuning/cha.tuning.power.html

Page 10: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

Energy Savings with DVFS

Reduction in CPU power

Extra system power

Page 11: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

Effectiveness of DVFS

For energy savings

ER > EE

Factors in modern systems affecting this

equation:

Performance delay (tdelay)

Idle CPU power consumption

Power consumption of other devices (PE)

Page 12: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

40%

50%

60%

70%

80%

90%

100%

110%

120%

130%

140%

40% 60% 80% 100%

En

fn

mem

combo

burn_loop

DVFS: Mem vs. CPU

40%

50%

60%

70%

80%

90%

100%

40% 60% 80% 100%

En

fn

mem

combo

burn_loop

Static power ratio = 30% Static power ratio = 50%

Page 13: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

Successful Low Power S/W Techniques

1. Understand workload variations

2. Devise efficient ways to detect them

3. Utilize the detected workload variations with

available H/W

Relatively easy for real-time jobs

Hard for non real-time jobs

No timing constraints, no periodic execution

Unknown execution time

It is hard to predict the future workload!!

Page 14: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

PAST

Looking a fixed window into the past

Assume the next window will be like the

previous one

If the past window was

mostly busy increase speed

mostly idle decrease speed

Page 15: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

Example: PAST

timelow utilization low utilization ?

PAST FUTURE

Decreasespeed

Decreasespeed

size window

busy timenUtilizatio

Page 16: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

FLAT

Try to smooth speed to a global average

Make the utilization of next window=<const>

Set speed fast enough to complete the

predicted new work being pushed into the

coming window

Page 17: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

Example: FLAT

time

Increase/Decrease the speedthe next utilization to be 0.7

<Const>=0.7

?

Page 18: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

LONG-SHORT

Look up the last 12 windows

Short-term past : 3 most recent windows

Long-term past : the remaining windows

Workload Prediction

the utilization of next window will be a

weighted average of these 12 windows’

utilizations

Page 19: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

Example: LONG-SHORT

time0 1 2 3 4 5 6 7 8 9 10 11 12

utilization = # cycles of busy interval / window size

0 .3 .5 1 1 1 .8 .5 .3 .1 0 0

276.0)3(49

)001(.43.5.8.1115.3.0

max276.0 ffclk

currenttime

Page 20: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

AGED-AVERAGE

Employs an exponential-smoothing

method

Workload Prediction

The utilization of next window will be a

weighted average of all previous windows’

utilizations

geometrically reduce the weight

Page 21: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

Example: AGED-AVERAGE

time0 1 2 3 4 5 6 7 8 9 10 11 12

utilization = # cycles of busy interval / window size

0 .3 .5 1 1 1 .8 .5 .3 .1 0 0

)3.0(81

8)1.0(

27

40

9

20

3

1average

maxfaveragefclk

currenttime

Page 22: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

CYCLE

Workload Prediction

Examine the last 16 windows

Does there exist a cycle of length X?

If so, predict by extending this cycle

Otherwise, use the FLAT algorithm

Page 23: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

Example: CYCLE

time0 1 2 3 4 5 6 7 8

utilization = # cycles of busy interval / window size

0 .4 .8 .1 .3 .5 .7 .0

currenttime

Predict : The next utilization will be .3

Page 24: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

PATTERN

A generalized version of CYCLE

Workload Prediction

Convert the n-most recent windows’ utilizations

into a pattern in alphabet {A, B, C, D}.

Find the same pattern in the past

Page 25: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

Example: PATTERN

time1 2 3 4 8 9 10 11 12

0 .3 .5 1 .1 .35 .6 .9

ABCDDPattern

currenttime

5

1

ABCDPattern

Predict : The next utilization will be D

A B C D

0 0.25 0.5 7.25 1.0

Page 26: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

26

Dynamic Power Management Components with several internal states

Corresponding to power and service levels

Abstracted as power state machines

State diagram with:

Power and service annotation on states

Power and delay annotation on edges

Example: SA-1100 RUN: operational

IDLE: a sw routine may stop the CPU when not in use, while monitoring interrupts

SLEEP: Shutdown of on-chip activity

RUN

SLEEPIDLE

400mW

160uW50mW

90us

90us10us

10us160ms

Page 27: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

27

Example: Hard Disk Drive

Working: 2.2 W

(spinning + IO)

Sleeping: 0.13 W

(stop spinning)

Idle: 0.95 W

(spinning)

read / write

IO complete

shut down

0.36 J, 0.67 secspin up

4.4J, 1.6 sec

Fujitsu MHF 2043 AT

Page 28: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

28

Waking Up Hard Disk

0

0.5

1

1.5

2

2.5

0 2 4 6

pow

er

(W)

.

time (sec)

sleeping working

waking up

Measurements done on Fujitsu MHF 2043 AT hard disk

Page 29: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

29

The Applicability of DPM

State transition power (Ptr) and delay (Ttr)

If Ttr = 0, Ptr = 0 the policy is trivial

Stop a component when it is not

needed

If Ttr != 0 or Ptr != 0 (always…)

Shutdown only when idleness is long

enough to amortize the cost

What if T and P fluctuate?

Ptr Ttr

OFF

Ptr Ttr

ON

Page 30: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

Break Even Time

Minimum idle time for amortizing the cost of component

shutdown

Tbe

Request

Active

Request

Off Active

Workload

State

Tof f

Page 31: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

DPM Applicability

If idle period length < Tbe

Not possible to save energy by turning off

Need accurate estimation of upcoming idle periods: Underestimation: potential energy savings lost

Overestimation: perf delay + possible –ve energy savings

Challenge: Manage energy/performance tradeoff

DPM Policy Goal: Maximize energy savings by utilizing sleep states while minimizing performance delay

Page 32: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

Classification of DPM Policies

Timeout Policies

Predictive Policies

Stochastic Policies

Hybrid Policies

Online learning DPM

Page 33: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

Timeout Policies

Use elapsed idle time to predict the total

idle period duration

Request

Active

Request

Off Active

Workload

State

To

Tidle

Assume, if Tidle > To

P(Tidle > To + Tbe) ≈ 1

To is referred to as the timeout

Can be fixed or adaptive

Page 34: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

Timeout Policies

Advantages

Extremely simple to implement

Safety (in terms of perf delay) can be easily improved

by increasing To

Disadvantages

Waste energy while waiting for To to expire

Heuristic in nature: no guarantee on energy savings

or performance delay

Always incur performance penalty on wakeup: no

mechanism to wake before request arrival

Page 35: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

Predictive Policies Description:

Use statistical analysis to predict the length of the idle period

Shut down if predicted idle period is longer than Tbe

Advantages

More energy efficient than timeout policies

Disadvantages

Depend a lot on correlation between past and future events

Tend to be aggressive in shutdown and hence higher

performance latency

Heuristic with no performance guarantees

Page 36: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

Stochastic Policy Example:

Time-indexed Semi-Markov Model

Service

Requester

(SR)

Service

Queue (SQ)

Service

Provider

(SP)

Globally optimal solution with performance constraints met

Valid for a stationary class of workloads

Implemented and measured large savings in mobile systems

Simunic et.al, TCAD’01

Page 37: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

Hybrid: Online Learning for Power Management

Selects the best performing

expert for managing power

Selected expert manages

power

for the operative period

Evaluates performance of all

the experts

………..EXP 1

Controller

Device

EXP 2 EXP 3 EXP n

EXP y :Dormant Experts EXP y :Operational Expert

Get up to 70% energy savings per device

DPM: A state of the art DPM policy

DVFS: v-f setting

Performance converges to that of the best performing expert with successive idle periods at rate TNO /)(ln

Dhiman et.al, TCAD’09

Page 38: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

Summary

Simple power management policies provide great

energy performance tradeoffs

Lower v-f setting offer worse e/p tradeoffs due to

high performance delay

Research topics: Multithreaded workload scheduling and power management

Speed up a core instead of slow down

Fast sleep times

Interaction with the rest of the system

Page 39: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

39

Sources and References

Jihong Kim, ES Class, SNU

Peter Marwedel, “Embedded Systems

Design,” 2004.

Frank Vahid, Tony Givargis, “Embedded

System Design,” Wiley, 2002.

Wayne Wolf, “Computers as

Components,” Morgan Kaufmann, 2001.

Nikil Dutt @ UCI

Page 40: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

SMALL PROJECT PART 2

40

Page 41: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

Motivation

With limited battery power and low number of cores,

DVFS is still primary method of runtime power saving in

smartphones

Linux frequency governors

CPU: ondemand, performance, powersave

GPU: trustzone, msm

Given policies are static, or depends only on respective

utilizations (or are proprietary)

CPU vs GPU frequencies scaled independently

41

Page 42: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

Development phones

Qualcomm mobile development platform (MDP)

Snapdragon MSM8660 / MSM8960

Android 4.0.3 / 4.1.2, rooted

Checkout procedure

CSE 2148 (2nd floor, facing south)

Friday 1/16 10-12p, 1-4:30p; Tuesday 1/20 1-1:45p

Bring phone release form (in project 2 directory)

Signature required from both team members

Return at demo day of project part 2 (or final project)

42

Page 43: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

Runtime controls & statistics

Power and utilization statistics available through OS

CPU frequencies

CPU utilizations

GPU frequencies

GPU utilizations

Battery voltage

Battery power

Battery capacity

Use these statistics to analyze workloads and design

your DVFS policies

43

Page 44: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

Tasks

Observe the applications’ CPU/GPU power and

performance profiles, observe the tradeoffs

SPEC benchmarks for CPU only governor

Interactive apps for CPU/GPU governor

Design two governors:

CPU-only policy that minimizes energy-delay product

of SPEC benchmarks

CPU/GPU policy that optimizes energy efficiency

while maximizing user satisfaction

44

Page 45: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

Materials provided

Information on where to find CPU, GPU, battery

statistics

Trepn utility to get power information

C skeleton for reading statistics$ arm-linux-gnueabi-gcc -static -o m.out m.c

$ adb push m.out /data/local

$ adb shell

# ./data/local/m.out

45

Page 46: Power, Energy and Performance - Computer Science and ...cseweb.ucsd.edu/classes/wi15/cse237A-a/handouts/3_power.pdf · Successful Low Power S/W Techniques 1. ... Multithreaded workload

Project deliverables Code

Governor for CPU-only that optimizes energy-delay product

DVFS policy for CPU/GPU

Report Governor policy descriptions and justifications

Results for both governors

In-class demo CPU-only governor: energy-delay product for our benchmark

combination (similar to SPEC)

CPU/GPU: demo with interactive apps

Reminders:

Enroll in Piazza http://piazza.com/class#winter2015/cse237a

Email TA if you cannot access the class on Ted.ucsd.edu

46