system-wide energy minimization for real-time tasks: lower bound and approximation xiliang zhong and...
Post on 21-Dec-2015
223 views
TRANSCRIPT
System-Wide Energy Minimization for Real-Time Tasks: Lower Bound
and Approximation
Xiliang Zhong and Cheng-Zhong Xu
Dept. of Electrical & Computer Engg. Wayne State University
Detroit, Michiganhttp://www.cic.eng.wayne.edu
2
Outline Introduction
Processor and system energy model Related Work System-Wide Energy Optimization for
periodic tasks The optimal algorithm A fully polynomial time approximation scheme Performance Evaluation
System-Wide Energy Optimization for sporadic Tasks Solution and evaluation
Conclusions
3
Introduction Mobile/Embedded devices are power
critical, with limited battery capacity
Software assisted power management Dynamic power management (DPM)
Resource shutdown after a timeout
Dynamic voltage/frequency scaling (DVS) Processing speed designed for peak
performance Slowdown the processor voltage / speed when
not fully utilized
4
0
0.5
1
0.2 0.4 0.6 0.8 1
Normalized CPU speed
Ener
gy p
er c
ycle
DVSNo-DVS
Dynamic voltage scaling (DVS)
The dynamic CPU power is , P ∝ v2f
Reducing v also reduce the maximum processors frequency
Approximately, energy per cycle∝ f2
Processor slowdown leads to super-linear energy
savings, while linear execution time increase
Energy per cycle of PXA processor
5
System-Wide Energy
Processor also has leakage power Applications may use other components such as
memory and peripheral devices Can be in active, standby, sleep, and shutdown
states System-wide energy consumed in running a
task CPU, resource standby and active energy
Lowering CPU frequency can increase overall energy expenditure due to prolonged resource standby time of other components
6
System-Wide Energy (cont.)
critical speed, the speed with minimum energy per cycle Not energy
efficient using lower speed
Execute a task at speed no lower than its critical speed, then put the devices into low power state A combined use of slowdown and shutdown
0
2
4
6
8
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalized speed
Ene
rgy
per
cycl
e Processor onlyStandby power 0.2 WStandby power 0.6 WStandby power 1.2 W
x10-9
Energy per cycle of PXA processor with different standby power
7
Related Work CPU energy minimization for periodic tasks:
Heuristics [Mejia-Alvarez’04], approximations [Chen and Kuo’05] Few studies on system-wide energy minimization
Applications w/o deadlines Subject to a performance loss [Choi et al.’04]
Real-time periodic tasks on CPU w/ continuous speed levels
Heuristics [Zhuo and Chakrabarti’05] Real-time periodic tasks on CPU w/ discrete speed levels
Heuristics [Jejurikar and Gupta’04] This work
Pseudo-polynomial algorithm for optimal solutions and polynomial approximated schemes
Applicable to both offline periodic tasks and online sporadic tasks in processors with practical discrete levels
8
System-wide energy optimization
Periodic Tasks (Offline) : worst case execution time under max
speed : task period and deadline : normalized speed of task
Sporadic Tasks (Online) Task releases have irregular intervals Online scheduling based on uncompleted tasks,
no assumption about future task releases The objective is to minimize
overall energy consumption including CPU and all other system components while meeting deadline constraints of all the tasks
9
Energy Minimization for Periodic Tasks
Minimization of energy consumption for n periodic tasks in a hyper-period,
Feasible constraint under EDF
Boundary constraint Practical processors with discrete speed levels
The minimization is an NP-hard Multiple Choice KnapSack (MCKP) problem
There exist pseudo-polynomial solutions to MCKP with integer coefficients, not applicable in this problem
10
An Example Basic idea: first solve subprobs with fewer #tasks A system with an PXA processor with 5
normalized speed [0.15 0.4 0.6 0.8 1] System with memory, flash, and WNIC An example real-time workload w/ 4 periodic
tasksTask
Executiontime
Period
Utilization
Requiredresources
Critical speed
1 6.4 16 0.4 cpu 0.4
2 1.6 20 0.08 cpu,memory 0.4
3 1.2 12 0.1 cpu,mem,flash 0.6
4 1.08 9 0.12 cpu,mem, WNIC
0.6
11
Solution to task 1 Task 1, execution time 6.4; deadline 16; utilization 0.4 Branch on four normalized speeds [0.4 0.6 0.8 1]
f: pruned by feasibility condtione: pruned by energy condition
f e e
0, 0
(1, 2.72) (0.667, 4.267) (0.5, 7.2) (0.4, 10.24)
task 1(utilization, energy)
State pruning Feasibility condition:
The 1st node at speed 0.4 removed with utilization already 1 Energy condition
Task 1 at the smallest speed (2nd , 0.6); tasks 2-4 at the max. Total Energy=7.6 (upper bound)
Task 1 at 3rd or 4th speed (0.8 or 1); tasks 2-4 at the min. The required energy exceeds 7.6. The two states can be removed
12
Solution to the first three tasks
pairs of (utilization, energy)f: pruned by feasibility condtione: pruned by energy conditiond: pruned by dominance
f e e
f f
0, 0
(1, 2.72) (0.667, 4.267) (0.5, 7.2) (0.4, 10.24)
(0.867, 5.75) (0.767, 6.467) (0.747, 7.147)
(0.93, 8.47) (0.87, 9.40)(0.867, 9.107) (0.847, 9.786)
task 1
task 2
task 3f f f d
Dominance condition The states (0.867, 9.107) and (0.87, 9.4) of task 3
First one leads to smaller utilization Any feasible schedule by the second can also be
satisfied by the first First one uses less energy; the second can be removed
f e e
f f
f f f e e
0, 0
1, 2.72 0.667, 4.267 0.5, 7.2 0.4, 10.24
0.867, 5.75 0.767, 6.467 0.747, 7.147
0.93, 8.47 0.87, 9.400.867, 9.107 0.847, 9.786
1.07, 10.37 0.987, 11.159
0.967, 11.84
task 1
task 2
task 3
task 4
f f f d
(utilization, energy) f: pruned by feasibility condtione: pruned by energy conditiond: pruned by dominance
optimal stateMaximum state number reduced to 6/4*4*3*3 = 0.4 %
14
A fully polynomial approximation scheme (FPTAS)
State # is pseudo-polynomial in task number. can be reduced by providing approximated solutions
Approximated with worst case perf. guarantee An algorithm is said to be an approximation scheme if
for a given in (0,1), we have
A more desirable approximation scheme (FPTAS) has a polynomial running time in both the number of tasks and the performance ratio
15
A fully polynomial approximation scheme (cont.)
Divide the energy values into a number of groups each of size r, Each value scaled and rounded to Energy values in the same group are treated
equally Find the group size r, subject to a given
performance bound Energy value of each task introduces an error no
larger than group size r Accumulated errors of n tasks no larger than n*r A lower bound of E* is when all tasks run at their
critical speeds (Emin), i.e., E*≥ Emin
Solving derives group size r
16
Performance Evaluation Simulation Settings
A system with an PXA processor memory: standby power 0.2W, standby time 20%~60% of
task execution flash drive: 0.4W and 10%~25% wireless interface: 1W and 5%~20%
Periodic Tasks Randomly generated deadlines w/ utilization from 0.1~1 Each task randomly chooses a subset of resources Algorithms implemented
CPU-DVS, speed control for CPU energy consumption CS-DVS, a heuristic algorithm for system-wide energy
savings [Jejurikar and Gupta ISLPED2004], OPT-P, the proposed optimal solution Approximated scheme with perf. bounds 0.01, 0.1, 0.5
17
Performance Evaluation (Periodic tasks)
• Energy consumption up to 16% more efficient than CS-DVS
11.11.21.31.4
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Processor Utilizations
Ener
gy C
onsu
mpt
ion CS-DVS 0.5-APPROX 0.1-APPROX
OPT-P CPU-DVS
16%
23%
8%
• Proposed algorithms 23% less energy than CPU-only solutions
• Approximation algorithms effectively bound the performance errors
18
Energy Minimization for Sporadic Tasks
Online energy minimization for all uncompleted tasks
n feasible constraints under EDF
boundary constraint
On a processor with discrete speed levels Prove the problem is an instance of Multi-
dimensional MCKP (NP-hard in the strong sense, any optimal solution has exponential running time)
19
J1
J2 J3
5 Time1 3 7
Consider three tasks released at time 0 with deadlines 3, 5, 7
Feasibility of a task (e.g. J2) is not affected by tasks finished later (tasks in a non-decreasing order of deadlines)
Satisfy one constraint (e.g. J3) at each iteration Can be solved by a pseudo-polynomial
algorithm for the optimal solution and an approximation scheme (FPTAS)
Sporadic Tasks (cont.)
20
Performance Evaluation (Sporadic tasks)
Experimental Settings Varied number of tasks Task inter-release times generated by an
exponential dist. Algorithms implemented
TV-DVS, adaptive speed scaling for CPU energy consumption on processors w/ continuous levels [Zhong and Xu RTSS2005]
DVSST, CPU energy consumption with only frequency scaling available (continuous levels) [Qadi et al. RTSS2003]
OPT-S, the proposed optimal solution 0.1, 0.5-approximation, approximated solutions
with different performance settings
21
Energy consumption (Sporadic tasks)
0.9
1.3
1.7
2.1
2.5
2.9
10 20 30 40 50 60 70 80 90 100
Number of Tasks
Ene
rgy
Con
sum
ptio
n
TV-DVS
0.5-APPROX
OPT-SDVSST
•Large task number: 23% more efficient
56%
23%
• Small task number: Energy consumption up to 56% more efficient than TVDVS and DVSST
22
Conclusion System-wide energy minimization for periodic
tasks pseudo-polynomial algorithm for the optimal solution approximated solution in moderate running time with
bounded performance degradation (FPTAS) Minimization for online sporadic tasks
Pseudo-polynomial algorithm and an FPTAS by exploiting inherent properties of online task scheduling
On-going work Implementation of the policies in an embedded system
with PXA270 processor Energy/Time overhead voltage and speed switches;
overhead in putting a resource into low power state
23
Thank you!
System-Wide Energy Minimization for Real-Time Tasks: Lower Bound and Approximation
24
Algorithm running time• Running time measured in a Pentium 4 machine with 2 GHz processor• OPT-P has a higher complexity than CS-DVS• Below 90 ms for systems with up to 50 tasks• All approximation algorithms require no more than 0.4 s to finish
0.01
1
100
10000
0 20 40 60 80 100Number of tasks
Alg
orith
m ru
nnin
g tim
e(s
)
OPT-P0.01-APPROX0.1-APPROX0.5-APPROXCS-DVS
0
2
4
6
10 20 30 40 50 60 70 80 90 100
Number of Tasks
Com
plex
ity in
CPU
tim
e (s
)
OPT-S
TV-DVS
0.1-APPROX
0.5-APPROX
• Algorithm running time for schedules in a 10-minutes run• OPT-S has higher running time, but <1% task execution time• Comparable time for approximation algorithms with TV-DVS