eecs 388: embedded systems - ittc

Post on 01-May-2022

18 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

EECS 388: Embedded Systems

12. Power and Energy

Heechul Yun

1

Agenda

• Background

• How to measure?

• How to save energy/power?

2

3H Sutter, “The Free Lunch Is Over”, Dr. Dobb's Journal, 2005(Updated in 2009)

4

Power Consumption (Server)

• Memory consumes significant power– E.g.,) Intel Haswell-ULT: 15W, 2 x 4G DDR3 DRAM: 10W

Figure source: Luiz André Barroso and Urs Hölzle, The Datacenter as a Computer: An Introduction to the Design of Warehouse-ScaleMachines, Morgan & Claypool, 2009

Power Consumption (Smart Phone)

• Audio playback with backlight off on a smartphone

5

DVFS and DPM

• Dynamic Voltage/Frequency Scaling (DVFS)– Power ~ f V2

– Reduce frequency & voltage

• Dynamic Power Management (DPM)– Multiple power states

• CPU C-states (standby, sleep, deep sleep, …)• DDR3 power states (standby, powerdown, self-refresh, …)

• Goal: Making a “Good” Tradeoff– Minimize performance hit, maximize power reduction

6

Background

- f: clock frequency- V: voltage

7

staticPCfV 2

2

1

staticdynamic PPPower

Background

8

2~ fVPower

TimePowerEnergy

• Frequency doesn’t matter. Is that right?

fTime

1~

(Let’s ignore Pstatic for now.)

Background

9

2~ fVPower

TimePowerEnergy

• If you reduce frequency, you can also reduce voltage

fTime

1~

Vf ~

Background

10

3~ fPower

TimePowerEnergy

• Is reducing frequency always good?

fTime

1~

Background

11

2~ f

TimePTimePEnergy staticdynamic

• Is reducing frequency always good?

f

1~

PowerTop

12

Intel’s Recent Processors

• RAPL (Running Average Power Limit)

13Source: http://web.eece.maine.edu/~vweaver/projects/rapl/

Source: http://http://software.intel.com/en-us/articles/intel-power-governor

14

Platform level monitoringOdroid-XU-E boardProcessor: Exynos 5 Octa

Source: http://hardkernel.com/main/products/prdt_info.php?g_code=G137463363079

External Measurement

15

Source: http://www.hardkernel.com/main/products/prdt_info.php?g_code=G137361754360

Source: http://www.rakuten.com/prod/p3-kill-a-watt-ps-10-10-outlets-power-strip-receptacle-10/220012603.html?listingId=284206025&scid=pla_google_3KingsAudio&adid=18172&gclid=CIvs97jTq70CFa5DMgodcEkAHg

http://www.amazon.com/P3-International-P4460-Electricity-Monitor/dp/B000RGF29Q/ref=sr_1_3?ie=UTF8&qid=1395680823&sr=8-3&keywords=power+meter

How to save Power/Energy?

• Techniques for perf/energy tradeoffs

– DVFS

– Turbo boost

– Power gating

– Core heterogeneity

• Considerations

– Sensitive to time (performance)

– Sensitive to energy consumption

16

A Measurement Study

• An Analysis of Power Consumption in a Smartphone, USENIX ATC’10

17

Impact

A Smartphone

• (very old) 2.5G GPRS phone– Battery: 1200mAh, 3.7V Li-ion (4.4Wh)

18

What to Know?

• Where does the energy go?

– Detailed component-level power breakdown

– On various usage scenarios

• How to save energy?

– The efficacy of DVFS (dynamic voltage-frequency scaling) schemes

19

Methodology

• Hardware

– A development board, configured to measure individual component (CPU, memory, Modem, …) power consumption

– Using a DAQ (data acquisition) system

• Read the paper. You can find very detailed descriptions

• Software

– On Android 1.5, using a set of micro-benchmarks as well as real applications

20

Idle

• System is awake, but no applications are active• CPU and RAM are not top power consumers

21

Audio Playback

• Backlight off• Comparable to idle state

22

Video Playback

• Backlight is a dominant factor

23

Backlight

• User controllable (~255 levels)

24

CPU and Memory

• 100MHz (low perf) 400MHz (max perf)• equake’s power consumption increases significantly• mcf’s power consumption doesn’t increase much

25

Internal Flash and SD Card

• Benchmark: flash read/write (dd)• Why are they (internal and SD) different?

26

Findings

• Where does the energy go?

– GSM, display, backlight

– Not CPU and DRAM

• Is DVFS useful?

– Reduce power but not necessarily energy

– Only memory bound applications get energy savings

27

Two Additional Smartphones

28

Quiz

• Which phone do you want to use DVFS?

29

Is DVFS useful?

• Yes: Nexus One, Freerunner (weak)• No: G1

30Further reading: E. Le Sueur and G. Heiser, “Dynamic voltage and frequency scaling: the laws of diminishing returns,” HotPower’10

Challenge: How To Configure?

• Too many possible configurations

– low or high freq?

– More cores or less cores?

– Little core vs. big core?

• Platform variation

– A policy that works well on a platform does not necessarily work on another platform

31

Challenge: How To Configure?

• Too many possible configurations

– low or high freq?

– More cores or less cores?

– Little core vs. big core?

• Platform variation

– A policy that works well on a platform does not necessarily work on another platform

32

Energy Saving Strategies

• Model-based approach

– Offline: build an energy/performance model

– Online: compute an “optimal” assignment

• Heuristic approach

– Race to idle

– Never idle

– Adaptive control

33

System-wide Energy Optimization for Multiple DVS Components and

Real-time TasksHeechul Yun, Po-Liang Wu, Anshu Arya, Tarek

Abdelzaher, Cheolgi Kim, and Lui ShaUniversity of Illinois at Urbana and ChampaignIEEE Real-Time and Embedded Technology and

Applications Symposium (RTAS), 2010

34

CPU-only DVFS

• “DVFS is increasingly ineffective” [Le Sueur, HotPower’10]– Increased importance of static power– Small voltage margin for DVFS to be effective– Reduced freq. increased runtime often increased energy

35

- f: clock frequency- V: voltage- k: constant

staticPkfV 2

staticdynamic PPP

CPU-only DVFS

36

0

100

200

300

400

500

600

700

40

60

80

10

0

12

0

14

0

16

0

18

0

20

0

22

0

24

0

26

0

28

0

30

0

32

0

34

0

36

0

38

0

40

0

Valid range (~200Mhz)

Not effective, But…

fc

(Mhz)

Energy(mJ)

Task cache stall ratio = 0 %

Motivation

37

CPU(Mhz) Mem(Mhz) Time(s) Energy(mJ)

200 100 3.46 1690

100 100 3.55 1182

Memxfer5b : memory benchmark program

Half of CPU clock

Energy saved 30%

Exec. time increased only 3%

Motivation

38

CPU(Mhz) Mem(Mhz) Time(s) Energy(mJ)

200 100 4.26 2364

200 50 4.28 2106

Dhrystone: CPU benchmark program

Half of Mem clock

Energy saved 10%

Exec time increased only 0.05%

Task Model

• Task = Computation + Memory fetch

39

computation

memory fetch(cache stall)

time

power

Computation Memoryfetch

time

power

Task Model (2)

40

C M

C : computationM : off-chip memory fetch

(cache-stall cycles)

power

time

CMLower MEM freq

power

time

CM

Lower CPU freq

power

time

Task Model (3)

• Execution time of a task

– C : CPU cycles of a given task (excluding memory stalls)

– M : memory cycles of a given task (memory stall cycles)

– fc : CPU clock frequency

– fm : Memory clock frequency

41

mc f

M

f

Ce

Power Model

• Power of a component (i.e., CPU)

– k : capacitance constant

– f : frequency of the component

– V : supplying voltage

– R : leakage power

42

RkfVP 2

Different k for different modes: kactive - active mode capacitance

kstandby- standby mode capacitance

Energy Model

43

e P

Memory Fetch

power

idle

CPU active

Bus, memstandby

time

CPU standby

Bus, memactive

System static

CPU, bus, memidle

Ecpu

pure exec block

Emem

MEM fetch block

Eidle

idle block

Dynamic power

• System wide energy model– Considers CPU, bus, and memory power consumption

– Considers active, standby and idle modes

– Other components are assumed to be static (included in R)

Energy Equation and Validation

Capacitance (nF) Power (mW)

Kca Kcs Kma* Kms* I R

0.505 0.224 0.540 0.210 6.570 67.434

44

)()(

)()( 2*22*2

ePRI

f

MRfVkfVk

f

CRfVkfVkE

m

mmaccpucs

c

mmscca

Obtained coefficients in the energy equation

• Validated on a ARM926-ejs based platform via regression analysis

Heechul Yun, Po-Liang Wu, Anshu Arya, Tarek Abdelzaher, Cheolgi Kim, and Lui Sha. “System-wide Energy Optimization for Multiple DVS Components and Real-time Tasks,” ECRTS, 2010

Static MultiDVFS Problem

• Given a set of periodic real-time tasks (T1, …,Tn), where each task invocation requires up to Ci CPU cycles and up to Mi memory cycles at worst.

• Find the energy optimal static frequencies for multiple DVFS capable components (CPU and memory)

45

Problem Formulation

Minimize

Subjects to

where

46

n

i

idleimemicomp

i

EEEP

H

1

,, )(

.11

n

i i

i

P

e

H : hyper periodei : execution time of task iEcomp,i : computation block energy of task iEmem,i : cache stall block energy of task iEidle : idle block energy

Energy vs. Utilization

47

Task set cache stall ratio (MH/(CH+MH) ): 0.3

0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

MAX

CPU-only

Static

utilization

No

rmal

ized

ave

rage

po

wer

co

nsu

mp

tio

n

MultiDVFS

Summary

• Memory-aware time/energy model – Consider CPU and memory frequencies/voltages

– Validated on a real hardware platform

• MultiDVFS– Joint optimization of CPU and memory

frequencies/voltages,

– Minimize energy consumption of periodic real-time tasks

48

Recap: First Attempt

• 1000 samples (minus the first sample. Why?)

49

CFS (nice=0)

Mean 23.8

Max 47.9

99pct 47.4

Min 20.7

Median 20.9

Stdev. 7.7

Why?

Recap: DVFS

• Dynamic voltage and frequency scaling (DVFS)

• Lower frequency/voltage saves power

• Vary clock speed depending on the load

• Cause timing variations

• Disabling DVFS

50

# echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor# echo performance > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor# echo performance > /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor# echo performance > /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor

Recap: Energy Saving Strategies

• Model-based approach

– Offline: build an energy/performance model

– Online: compute an “optimal” assignment

• Heuristic approach

– Race to idle

– Never idle

– Adaptive control

51

POET: A Portable Approach to Minimizing Energy Under Soft Real-

time ConstraintsConnor Imes, David H. K. Kim, Martina Maggio, and

Henry HoffmannUniversity of Illinois at Urbana and ChampaignIEEE Real-Time and Embedded Technology and

Applications Symposium (RTAS), 2015

52

Systems

53

Configurations

• Per-application/per-platform, off-line profiling

54

Platform Variation

55

Connor Imes, David H. K. Kim, Martina Maggio, and Henry Hoffmann, “POET: A Portable Approach to Minimizing Energy Under Soft Real-time Constraints,” RTAS, 2015

POET Approach

• Control theory based

– (1) observe error (2) compute control (3) apply control

56

Connor Imes, David H. K. Kim, Martina Maggio, and Henry Hoffmann, “POET: A Portable Approach to Minimizing Energy Under Soft Real-time Constraints,” RTAS, 2015

Controller

• Goal: meet the speed target

• Observe error

• Compute control signal

57

Optimizer

• Given

– C configurations,

– measured speed s(t),

– time window tau

• Goal: minimize energy

– Subject to

• Meeting performance (#of jobs in a given time window tau)

• Sum of time spent on each setting = tau

58

Example Usage

• Apply to periodic tasks– One control per job

• Heartbeat API– measure rate

59

Connor Imes, David H. K. Kim, Martina Maggio, and Henry Hoffmann, “POET: A Portable Approach to Minimizing Energy Under Soft Real-time Constraints,” RTAS, 2015

Results

60

Connor Imes, David H. K. Kim, Martina Maggio, and Henry Hoffmann, “POET: A Portable Approach to Minimizing Energy Under Soft Real-time Constraints,” RTAS, 2015

Summary

• Power/energy/speed relationship– Model vs. practice

• Control options– DVFS– On/off– Core heterogeneity

• Management approaches– Model based– Heuristic based– Control theory based

61

top related