feedback performance control of distributed computing systems: a

110
- 1 - http://www.artist-embedded.org/ ARTIST2 Summer School 2008 in Europe ARTIST2 Summer School 2008 in Europe Autrans Autrans (near Grenoble), France (near Grenoble), France September 8 September 8 - - 12, 2008 12, 2008 Invited Speaker: Tarek F. Abdelzaher Invited Speaker: Tarek F. Abdelzaher Department of Computer Science Department of Computer Science University of Illinois University of Illinois Feedback Performance Control of Feedback Performance Control of Distributed Computing Systems: Distributed Computing Systems: A Real A Real - - time Perspective time Perspective

Upload: others

Post on 03-Feb-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Feedback Performance Control of Distributed Computing Systems: A

- 1 -

http://www.artist-embedded.org/

ARTIST2 Summer School 2008 in EuropeARTIST2 Summer School 2008 in EuropeAutransAutrans (near Grenoble), France(near Grenoble), France

September 8September 8--12, 200812, 2008

Invited Speaker: Tarek F. AbdelzaherInvited Speaker: Tarek F. Abdelzaher

Department of Computer ScienceDepartment of Computer Science

University of IllinoisUniversity of Illinois

Feedback Performance Control of Feedback Performance Control of Distributed Computing Systems:Distributed Computing Systems:

A RealA Real--time Perspectivetime Perspective

Page 2: Feedback Performance Control of Distributed Computing Systems: A

- 2 -

Feedback Performance Control of Distributed Computing Systems

● Feedback control has been a great success story in performance management of engineering and physical artifacts.

● It is time to advance a branch of theory that addresses feedback control of distributed software systems!

2

Page 3: Feedback Performance Control of Distributed Computing Systems: A

- 3 -

Why Feedback Control of Software?

● Claim 1: In 10 years, most computing innovation will be focused on distributed systems that interact with the physical world

● Claim 2: They will operate under increased uncertainty

● Claim 3: They will be burdened with increased autonomy

3

Page 4: Feedback Performance Control of Distributed Computing Systems: A

- 4 -

Where is Computer Science Research Going?

Languages

OperatingSystems

Theory

CoreArchitecture

The beginning: Centralizedmachines

4

Page 5: Feedback Performance Control of Distributed Computing Systems: A

- 5 -

Where is Computer Science Research Going?

Graphics Databases

MANET

Security

High-performance

Computing

Centralizedmachines

Parallel machinesTowardsDistribution

LANs

TowardsApplicationsLanguages

OperatingSystems

Theory

CoreArchitecture

5

Page 6: Feedback Performance Control of Distributed Computing Systems: A

- 6 -

Grid Computing

Bio-info

Quantum

Brain

Inte

rface

s

Where is Computer Science Research Going?

Computing

InterdisciplinaryApplicationResearch

TowardsDistribution

Internet WWW

GISInfosphere

Graphics Databases

MANET

Security

High-performance

Computing

Centralizedmachines

Parallel machines

Languages

OperatingSystems

Theory

CoreArchitecture

LANs

6

Page 7: Feedback Performance Control of Distributed Computing Systems: A

- 7 -

Grid Computing

Bio-info

Quantum

Brain

Inte

rface

s

Where is Computer Science Research Going?

Computing

InterdisciplinaryApplicationResearch

Internet

GISInfosphere

Graphics Databases

MANET

Security

High-performance

Computing

Centralizedmachines

Parallel machines

Languages

OperatingSystems

Theory

CoreArchitecture

Cyber-PhysicalComputing

WWW

LANs

7

Page 8: Feedback Performance Control of Distributed Computing Systems: A

- 8 -

Grid Computing

Bio-info

Quantum

Brain

Inte

rface

s

Where is Computer Science Research Going?

Computing

InterdisciplinaryApplicationResearch

Internet

GISInfosphere

Graphics Databases

MANET

Security

High-performance

Computing

Centralizedmachines

Parallel machines

Languages

OperatingSystems

Theory

CoreArchitecture

Cyber-Physical(Distributed) Computing

WWW

LANs

8

Page 9: Feedback Performance Control of Distributed Computing Systems: A

- 9 -

Where is Computer Science Research Going?

Cyber-PhysicalDistributed Systems

For example:

In the US, the Presidential Counsel of Advisors in Science and Technology named systems that interact with the physical world the

#1 Research Priority in the US

Claim 1: In 10 years, most computing innovation will be focused on distributed systems that interact with the physical world

9

Page 10: Feedback Performance Control of Distributed Computing Systems: A

- 10 -

Why Feedback Control of Software?

● Claim 1: In 10 years, most computing innovation will be focused on distributed systems that interact with the physical world

● Claim 2: They will operate under increased uncertainty

● Claim 3: They will be burdened with increased autonomy

10

Page 11: Feedback Performance Control of Distributed Computing Systems: A

- 11 -

The Mounting Uncertainty

● Larger (distributed) systems

● Higher connectivity and interactive complexity

● Increasingly data-centric nature – Time it takes to execute is driven by data

– Worst case is too pessimistic or unbounded

● More complex sensing at higher-level of abstractions– Data mining engines to convert data to actionable information

● Increasingly hybrid nature– Fusion of digital, analog, social, and biological models to

understand overall behavior

11

Page 12: Feedback Performance Control of Distributed Computing Systems: A

- 12 -

©

Evidence by Example: The Real-Time Scheduling Theory Roadmap

60s: Cyclic

Executive

70s: RMA

80s: Spring

Fully specified(1) task set, (2) arrivals,(3) resources

Arrivals

Task Set

ResourcesResourcesResourcesResources

Trend:Relaxing WorkloadAssumptions!

90s:QoS

2K: Control

12

Page 13: Feedback Performance Control of Distributed Computing Systems: A

- 13 -

©

Evidence by Example: The Real-Time Scheduling Theory Roadmap

60s: Cyclic

Executive

70s: RMA

80s: Spring

Fully specified(1) task set, (2) arrivals,(3) resources

Arrivals

Task Set

ResourcesResourcesResourcesResources

Trend:Relaxing WorkloadAssumptions!

90s:QoS

2K: Control

Claim 2: A significant challenge will be one of providing guarantees under uncertainty

13

Page 14: Feedback Performance Control of Distributed Computing Systems: A

- 14 -

Why Feedback Control of Software?

● Claim 1: In 10 years, most computing innovation will be focused on distributed systems that interact with the physical world

● Claim 2: They will operate under increased uncertainty

● Claim 3: They will be burdened with increased autonomy

14

Page 15: Feedback Performance Control of Distributed Computing Systems: A

- 15 -

EmbeddedDevicesApplications

“Sensor Networks”• Unattended multihop

ad hoc wireless

Industrial cargo, machineryfactory floor, …

Smart Spaces,Assisted LivingMedical

Factor #1: Device Proliferation (By Moore’s Law)RFIDs

15

Page 16: Feedback Performance Control of Distributed Computing Systems: A

- 16 -

Factor #2: Integration at Scale (Isolation costs!)

Total Ship Computing Environment (TSCE)

High end: complex systems with global integration

Examples: Global Information Grid, Total Ship Computing Environment

High EndLow End

Low end: ubiquitous embedded devicesLarge-scale networked embedded systemsSeamless integration with a physical environment

Picture courtesy of Patrick Lardieri

Global Information Grid

Integration and ScalingChallenges

World Wide Sensor Web(Feng Zhao) Future Combat System

(Rob Gold)

16

Page 17: Feedback Performance Control of Distributed Computing Systems: A

- 17 -

Factor #3: Biological Evolution

17

Page 18: Feedback Performance Control of Distributed Computing Systems: A

- 18 -

● It’s too slow!– The exponential proliferation of data sources (afforded by Moore’s Law)

is not matched by a corresponding increase in human ability to consume information!

Factor #3: Biological Evolution

18

Page 19: Feedback Performance Control of Distributed Computing Systems: A

- 19 -

Confluence of TrendsThe Overarching Challenge

Trend1: Device Proliferation(by Moore’s Law)

Trend2: Integration at Scale(Isolation has cost)

Trend3: Humans are not getting faster19

Page 20: Feedback Performance Control of Distributed Computing Systems: A

- 20 -

Confluence of TrendsThe Overarching Challenge

Trend1: Device Proliferation(by Moore’s Law)

Trend2: Integration at Scale(Isolation has cost)

Core Challenge:Autonomous Distributed Systems

of Embedded Devices

Trend3: Humans are not getting faster20

Page 21: Feedback Performance Control of Distributed Computing Systems: A

- 21 -

Confluence of TrendsThe Overarching Challenge

Trend1: Device Proliferation(by Moore’s Law)

Trend2: Integration at Scale(Isolation has cost)

Core Challenge:Autonomous Distributed Systems

of Embedded Devices

Self-regulating

Self-tuning

Self-healing

Self…

… and interacting

with the physical

world

Trend3: Humans are not getting faster21

Page 22: Feedback Performance Control of Distributed Computing Systems: A

- 22 -

Why Feedback Control of Software?

● Claim 1: In 10 years, most computing innovation will be focused on distributed systems that interact with the physical world

● Claim 2: They will operate under increased uncertainty

● Claim 3: They will be burdened with increased autonomy

22

Page 23: Feedback Performance Control of Distributed Computing Systems: A

- 23 -

Examples● Distributed vehicular traffic control algorithms for mass (e.g., city scale)

evacuation in disaster scenarios

– Hundreds of thousands of vehicles

– Poor communication infrastructure (mostly vehicle to vehicle communication)

– Complex time-varying underlying topology (accidents, gridlock, obstructions, …)

– Need for personalized directions to balance load, optimize throughput, etc.

● Large energy-optimal data centers

– Tens of thousands of machines

– Hundreds of performance management knobs, including computing and cooling

– Time-varying demand on multiple time scales

– Need to minimize energy (cost) while meeting service-level agreements

● Utility optimizing wireless ad hoc networks

23

Page 24: Feedback Performance Control of Distributed Computing Systems: A

- 24 -

Feedback Performance Control of Distributed Software Systems

● Software performance optimization is a resource management problem

– Resource allocation and scheduling are adjusted dynamically in response to external stimuli to optimize an objective while meeting constraints

● A key challenge: bridge the levels of abstraction– Software: tasks, priorities, deadlines, queues, arrival times,

precedence constraints, …

– Control: state variables, deviations, dynamic models, stability conditions, …

● A key challenge: formulate and solve a control problem

24

Page 25: Feedback Performance Control of Distributed Computing Systems: A

- 25 -

©

Bridging the Levels of Abstraction: An ExampleComputingTasks

Feedback Control

Bound

ActualUtilization Completed

UtilizationBound

(Task) flow valve

Using feedback control to remove deadline misses while maximizing throughout:

• Compute an aggregate workload or utilization bound such that all tasks are schedulable if bound is not exceeded.• Control the aggregate workload not to exceed the bound (a form of level control).

25

Page 26: Feedback Performance Control of Distributed Computing Systems: A

- 26 -

26

Software Queues: A Unifying Construct in Software Performance Control

QueueingTheory

ControlTheory

Real-time Scheduling

Theory

Predict statistical properties

Networks of Software QueuesAnalyze timing as a function of queueingpolicy

Design feedback loops to control dynamic behavior

Page 27: Feedback Performance Control of Distributed Computing Systems: A

- 27 -

©

Example: A Web Server Model

Input HTTP requests

IP PacketQueue

TCP ListenQueue

CPU ReadyQueue

I/OQueue

ThreadExecution

Blocking

Unblocking

Send reply

TCP OutputPacket Queue

Done

Web Server Model

Why web servers can be modeled by difference equations!

27

Page 28: Feedback Performance Control of Distributed Computing Systems: A

- 28 -

©

Distributed Software Performance Control

ComputingTasks

Resource Scheduling

DistributedResource Queues

SchedulabilityAnalysis

Feasible Regions

Feedback Control

FeasibleRegionBoundary

SchedulingTheory

ControlTheory

28

Page 29: Feedback Performance Control of Distributed Computing Systems: A

- 29 -

©

ComputingTasks

Resource Scheduling

DistributedResource Queues

SchedulabilityAnalysis

Feasible Regions

Feedback Control

FeasibleRegionBoundary

SchedulingTheory

ControlTheory

Part I: Bridge the abstractions: low-level metrics queue state

Part I

Distributed Software Performance Control

29

Page 30: Feedback Performance Control of Distributed Computing Systems: A

- 30 -

©

ComputingTasks

Resource Scheduling

DistributedResource Queues

SchedulabilityAnalysis

Feasible Regions

Feedback Control

FeasibleRegionBoundary

SchedulingTheory

ControlTheory

Part II: Formulate and solve a (queue state) control problem

Part II

Distributed Software Performance Control

30

Page 31: Feedback Performance Control of Distributed Computing Systems: A

- 31 -

©

Bridging the Abstractions: A Schedulability Study

ComputingTasks

Resource Scheduling

DistributedResource Queues

SchedulabilityAnalysis

Feasible Regions

Feedback Control

FeasibleRegionBoundary

SchedulingTheory

ControlTheory

Requirements on meeting deadlines utilization (queue state) performance control

31

Page 32: Feedback Performance Control of Distributed Computing Systems: A

- 32 -

©

Bridging the Abstractions: A Schedulability Study

ComputingTasks

Resource Scheduling

DistributedResource Queues

SchedulabilityAnalysis

Feasible Regions

Feedback Control

FeasibleRegionBoundary

SchedulingTheory

ControlTheory

Requirements on meeting deadlines utilization (queue state) performance control

Part I32

Page 33: Feedback Performance Control of Distributed Computing Systems: A

- 33 -

©

Bridging the Abstractions: A Schedulability Study

ComputingTasks

Resource Scheduling

DistributedResource Queues

SchedulabilityAnalysis

Feasible Regions

Feedback Control

FeasibleRegionBoundary

SchedulingTheory

ControlTheory

Part I

Relax the Periodicity

Assumption

Extend to Distributed

Systems

Requirements on meeting deadlines utilization (queue state) performance control

33

Page 34: Feedback Performance Control of Distributed Computing Systems: A

- 34 -

Relaxing the Periodicity Assumption

● Is there a utilization bound such that a system of aperiodic tasks arriving at arbitrary time instances meets all deadlines as long as the bound is not exceeded?

● If so, this bound would make a good control set point.

34

Page 35: Feedback Performance Control of Distributed Computing Systems: A

- 35 -

Aperiodic Tasks and Instantaneous Utilization

● Instantaneous utilization U(t) is a function of time, t

● U(t) is defined over the current invocations

U(t) = Σi Ci /Di

D1

D3

D2

35

Page 36: Feedback Performance Control of Distributed Computing Systems: A

- 36 -

Aperiodic Tasks and Instantaneous Utilization

● Instantaneous utilization U(t) is a function of time, t

● U(t) is defined over the current invocations

U(t) = Σi Ci /Di

D1

D3

D2

Arrived but deadlinehas not expired

36

Page 37: Feedback Performance Control of Distributed Computing Systems: A

- 37 -

Fixed versus Dynamic Priority Scheduling

● Fixed-priority scheduling:– All invocations of a task have same

priority

● Dynamic-priority scheduling:– Invocation priorities may not be the

same

● What about Aperiodic Tasks?– Equivalent for fixed priority

scheduling?

FixedPriority

Dynamic Priority

37

Page 38: Feedback Performance Control of Distributed Computing Systems: A

- 38 -

Arrival-Time-Independent Scheduling

● Fixed-priority scheduling:– All invocations of a task have same

priority

● Dynamic-priority scheduling:– Invocation priorities may not be the

same

● Arrival-time-independent scheduling:

– Invocation priorities are not a function of invocation arrival times

FixedPriority

Arrival-timeindependent

Dynamic Priority

38

Page 39: Feedback Performance Control of Distributed Computing Systems: A

- 39 -

Why Arrival-Time Independent Scheduling?

● Easy to implement on current non-real-time operating systems with fixed-priority support (e.g., UNIX, the #1 OS for web servers)

– Requires a finite number of priority levels

– Priorities are statically assigned to threads

39

Page 40: Feedback Performance Control of Distributed Computing Systems: A

- 40 -

A Sense of Optimality

● A scheduling policy is optimal in a class if it maximizes the schedulable utilization bound among all policies in the class

● “Backward Compatibility”:– Rate monotonic is the optimal fixed-priority policy (for

periodic tasks)– EDF is optimal dynamic-priority policy– New: Deadline monotonic is the optimal arrival-time

independent policy

40

Page 41: Feedback Performance Control of Distributed Computing Systems: A

- 41 -

Deriving a Utilization Bound for Aperiodic Tasks

Main idea: ● Minimize, over all arrival patterns ζ , the

maximum Uζ(t) that precedes a deadline violation

DeadlineViolation

Uζ(t)= Σi Ci/Di

t

MaximumUζ(t)

41

Page 42: Feedback Performance Control of Distributed Computing Systems: A

- 42 -

Quick-and-Dirty Derivation

● Observe that each task i contributes Ci to the area under the Uζ (t) curve – see figure below.

Uζ(t)

t

MaximumUζ(t)

D C / Di i i Deadline

Violation

42

Page 43: Feedback Performance Control of Distributed Computing Systems: A

- 43 -

Corollary

● The total area under the Uζ (t) curve is Σ Cicarried over all arrived tasks

t

DeadlineViolation

D C / Di i i

MaximumUζ(t)

Uζ(t)

43

Page 44: Feedback Performance Control of Distributed Computing Systems: A

- 44 -

A Geometric Interpretation

● Minimize, the sum Σ Ci across all unschedulable patterns. Say minimum is Cmin

● Minimize curve hight while area = Cmin

t

MaximumUζ(t)

Ubound

DeadlineViolation

Uζ(t)

44

Page 45: Feedback Performance Control of Distributed Computing Systems: A

- 45 -

A Geometric Interpretation

● Minimize, the sum Σ Ci across all unschedulable patterns. Say minimum is Cmin

● Minimize curve hight while area = Cmin

t

MaximumUζ(t)

DeadlineViolation

Uζ(t) Ubound

45

Page 46: Feedback Performance Control of Distributed Computing Systems: A

- 46 -

Main Result

● A set of aperiodic tasks is schedulable using an optimal fixed-priority policy if:

U t( ) ≤+

11 1

2

46

Page 47: Feedback Performance Control of Distributed Computing Systems: A

- 47 -

Main Result

● A set of aperiodic tasks is schedulable using an optimal fixed-priority policy if:

U t( ) ≤+

11 1

2

58.6%

47

Page 48: Feedback Performance Control of Distributed Computing Systems: A

- 48 -

©

More Main Results

● A set of n recurrent tasks is schedulable using an optimal arrival-time-independent policy if:

● This bound is tight

U tn

( )( )

≤+ − −

11 11

21

1

U t n( ) ≤ +12

12 n < 3

n ≥ 3

48

Page 49: Feedback Performance Control of Distributed Computing Systems: A

- 49 -

©

Example of Tightness● Consider n=2

D

D

D/2

U(t) = 0.75

49

Page 50: Feedback Performance Control of Distributed Computing Systems: A

- 50 -

Utilization Control

● Controller scales CPU speed, w, to the slowest value that keeps measured instantaneous utilization below the bound

Control

Ubound

Actuator(DVS) Server

Synthetic UtilizationCounter, Ucount

w

U tw

Ucount( ) =1

requests

50

Page 51: Feedback Performance Control of Distributed Computing Systems: A

- 51 -

©

Bridging the Abstractions

ComputingTasks

Resource Scheduling

DistributedResource Queues

SchedulabilityAnalysis

Feasible Regions

Feedback Control

FeasibleRegionBoundary

SchedulingTheory

ControlTheory

Transform fine-grained performance requirements into aggregate state variables that are easy to control

Part I

Relax the Periodicity

Assumption

Extend to Distributed

Systems

51

Page 52: Feedback Performance Control of Distributed Computing Systems: A

- 52 -

Main Results in Schedulability of Multistage Execution

● Let U1, U2, …, Un be the utilization values of n machines

● The end-to-end deadline of a task T traversing the n-machine path is met if:

Machine 1 Machine nMachine 2

U1 U2 Un

Task TU U

Ui i

ii

n ( )1 21

11

−−

≤=∑

52

Page 53: Feedback Performance Control of Distributed Computing Systems: A

- 53 -

53

Main Results in Schedulability of Multistage Execution

● A Multidimensional utilization bound

Machine 1 Machine nMachine 2

U1 U2 Un

Task T0

0.2

0.4

0.6

0.8

0 0.2 0.4 0.6 0.80.8

0.4

0 U UU

i i

ii

n ( )1 21

11

−−

≤=∑

Page 54: Feedback Performance Control of Distributed Computing Systems: A

- 54 -

54

Main Results in Schedulability of Multistage Execution

● A Multidimensional utilization bound

Machine 1 Machine nMachine 2

U1 U2 Un

Task T0

0.2

0.4

0.6

0.8

0 0.2 0.4 0.6 0.80.8

0.4

0 U UU

i i

ii

n ( )1 21

11

−−

≤=∑

Reduces to

for a single stage

U t( ) ≤+

11 1

2

U

Page 55: Feedback Performance Control of Distributed Computing Systems: A

- 55 -

Optimization Subject to Schedulability Constraints

● Intersect the schedulability surface of the system with objective function

● Find optimal point on intersection curve

● Use run-time feedback control to measure and correct performance dynamically to reach optimal.

0

0.2

0.4

0.6

0.8

0 0.2 0.4 0.6 0.80.8

0.4

0

U UU

i i

ii

n ( )1 21

11

−−

≤=∑

Set point

Schedulability surfaceObjective function

Control

55

Page 56: Feedback Performance Control of Distributed Computing Systems: A

- 56 -

56

Main Results in Schedulability of Multistage Execution

Machine 1 Machine nMachine 2

U1 U2 Un

…U U

Ui i

ii

n ( )1 21

11

−−

≤=∑

Task T● Tight Bound! Worst case scenario: cross traffic

Page 57: Feedback Performance Control of Distributed Computing Systems: A

- 57 -

57

Main Results in Schedulability of Multistage Execution

Machine 1 Machine nMachine 2

U1 U2 Un

…U U

Ui i

ii

n ( )1 21

11

−−

≤=∑

Task T● Worst case scenario: cross traffic

What if all tasks follow the same multistage path?

Machine 1 Machine nMachine 2

U1 U2 Un

Page 58: Feedback Performance Control of Distributed Computing Systems: A

- 58 -

58

Main Results in Schedulability of Multistage Execution

Machine 1 Machine nMachine 2

U1 U2 Un

…U U

Ui i

ii

n ( )1 21

11

−−

≤=∑

Task T● Worst case scenario: cross traffic

What if all tasks follow the same multistage path?

Machine 1 Machine nMachine 2

U1 U2 Un

…Better schedulability! Adversary has fewer degrees of freedom!

Task T and others

Page 59: Feedback Performance Control of Distributed Computing Systems: A

- 59 -

59

Main Results in Schedulability of Multistage Execution

● A fundamental question:– How does delay of a low priority task Tm depend on execution

times of higher priority tasks on multiple stages of its path?

Machine 1 Machine nMachine 2

U1 U2 Un

Task Tmand others

Page 60: Feedback Performance Control of Distributed Computing Systems: A

- 60 -

60

∑∈

=jobs i

i,1C Delay∑

=stages j

j1,CDelay

? i j

ji,C Delay Is ∑ ∑∈ ∈

=jobs stages

J2 J1

J1 J2

J1 J2

J1 J2 J3

Many jobs, one stageJ1

J1

J1time

time

Many stages, one job

Many jobs, many stages

? i j

ji,C Delay Is ∑ ∑∈ ∈

<jobs stages

S1

S1

S2

S3

S1

S2

S3time

Delay Composition in Pipelined Execution

Page 61: Feedback Performance Control of Distributed Computing Systems: A

- 61 -

61

ji,stages allji,stages all

Cmax CmaxDelay

jobspriority higher

jobspriority higher

∑∑ +≤C1,1 C2,1 C3,1 C4,1

C1,2 C2,2 C3,2 C4,2

C1,3 C2,3 C3,3 C4,3

C1,4 C2,4 C3,4 C4,4

Delay Composition in Pipelined Execution

Machine 1 Machine nMachine 2 …

Task Tmand others

C3,1

C1,1

C4,1

C2,1

C3,2

C1,2

C4,2

C2,2C3,n

C1,nC4,n

C2,n

Page 62: Feedback Performance Control of Distributed Computing Systems: A

- 62 -

62ji,stages allji,

stages allCmax CmaxDelay

jobsjobs

∑∑ +≤

Delay Composition in Pipelined Execution

Machine 1 Machine nMachine 2 …

Task Tmand others

C3,1

C1,1

C4,1

C2,1

C3,2

C1,2

C4,2

C2,2C3,n

C1,nC4,n

C2,n

max C1,jmax Cm-1,j max C3,j max C2,j…max Ci,n)max Ci,3max Ci,2(max Ci,1

Virtual queueing delayVirtual execution time

T2 T1T3Tm-1Tm

Page 63: Feedback Performance Control of Distributed Computing Systems: A

- 63 -

63

Properties● Results applicable to both periodic and aperiodic tasks

● Results applicable under any priority-based scheduling policy where prioritization is consistent across stages

ji,stages allji,stages all

Cmax CmaxDelayjobs

jobs∑∑ +≤

max C1,jmax Cm-1,j max C3,j max C2,j…max Ci,n)max Ci,3max Ci,2(max Ci,1

Virtual queueing delayVirtual execution time

T2 T1T3Tm-1Tm

Machine 1 Machine nMachine 2 …C3,1

C1,1

C4,1

C2,1

C3,2

C1,2

C4,2

C2,2C3,n

C1,nC4,n

C2,n

Page 64: Feedback Performance Control of Distributed Computing Systems: A

- 64 -

64

General Observation● For each shared segment of the path that a higher priority task

shares with a lower priority one in a distributed system, the former delays the latter by at most its max computation time over the segment.

ji,stages allji,stages all

Cmax CmaxDelayjobs

jobs∑∑ +≤

max C1,jmax Cm-1,j max C3,j max C2,j…max Ci,n)max Ci,3max Ci,2(max Ci,1

Virtual queueing delayVirtual execution time

T2 T1T3Tm-1Tm

Machine 1 Machine nMachine 2 …C3,1

C1,1

C4,1

C2,1

C3,2

C1,2

C4,2

C2,2C3,n

C1,nC4,n

C2,n

Page 65: Feedback Performance Control of Distributed Computing Systems: A

- 65 -

65

● Control theory

● Circuit theory – Kirchoff’s laws - analyze current and voltage

● Reduce schedulability problem on distributed system to problem on uniprocessor?

Block diagramreduction

Application flow graphin a distributed systemCircuit

A Reduction-based Approach to Distributed System Analysis?

Page 66: Feedback Performance Control of Distributed Computing Systems: A

- 66 -

66

A Recent Result: Delay Composition Algebra● A compositional algebraic framework to analyze timing

issues in distributed systems– Operands represent workload on composed sub-systems

– Set of operators to systematically transform distributed real-time task systems to uniprocessor task-systems

– Traditional uniprocessor analysis can be applied to infer schedulability of distributed tasks

● Less pessimistic than traditional analysis with increasing system scale

Page 67: Feedback Performance Control of Distributed Computing Systems: A

- 67 -

67

Delay Composition AlgebraTwo operators that apply to node workloads while simplifying the resource graph

● W = PIPE (W1, W2)

● W = SPLIT (W1)

Page 68: Feedback Performance Control of Distributed Computing Systems: A

- 68 -

68

Delay Composition AlgebraTwo operators that apply to node workloads while simplifying the resource graph

● W = PIPE (W1, W2)

● W = SPLIT (W1)

Example:

Page 69: Feedback Performance Control of Distributed Computing Systems: A

- 69 -

69

Delay Composition AlgebraTwo operators that apply to node workloads while simplifying the resource graph

● W = PIPE (W1, W2)

● W = SPLIT (W1)

Example:

PIPE

Page 70: Feedback Performance Control of Distributed Computing Systems: A

- 70 -

70

Delay Composition AlgebraTwo operators that apply to node workloads while simplifying the resource graph

● W = PIPE (W1, W2)

● W = SPLIT (W1)

Example:

Page 71: Feedback Performance Control of Distributed Computing Systems: A

- 71 -

71

Delay Composition AlgebraTwo operators that apply to node workloads while simplifying the resource graph

● W = PIPE (W1, W2)

● W = SPLIT (W1)

Example:

SPLIT

Page 72: Feedback Performance Control of Distributed Computing Systems: A

- 72 -

72

Delay Composition AlgebraTwo operators that apply to node workloads while simplifying the resource graph

● W = PIPE (W1, W2)

● W = SPLIT (W1)

Example:

Page 73: Feedback Performance Control of Distributed Computing Systems: A

- 73 -

73

Delay Composition AlgebraTwo operators that apply to node workloads while simplifying the resource graph

● W = PIPE (W1, W2)

● W = SPLIT (W1)

Example: PIPE

PIPE

Page 74: Feedback Performance Control of Distributed Computing Systems: A

- 74 -

74

Delay Composition AlgebraTwo operators that apply to node workloads while simplifying the resource graph

● W = PIPE (W1, W2)

● W = SPLIT (W1)

Example:

Page 75: Feedback Performance Control of Distributed Computing Systems: A

- 75 -

75

Delay Composition AlgebraTwo operators that apply to node workloads while simplifying the resource graph

● W = PIPE (W1, W2)

● W = SPLIT (W1)

Example: PIPE

PIPE

Page 76: Feedback Performance Control of Distributed Computing Systems: A

- 76 -

76

Delay Composition AlgebraTwo operators that apply to node workloads while simplifying the resource graph

● W = PIPE (W1, W2)

● W = SPLIT (W1)

Example:

Page 77: Feedback Performance Control of Distributed Computing Systems: A

- 77 -

77

The Workload Matrix

● Each cell i,j holds the delay that job i imposes on job j in the subsystem that the matrix represents.

● Example: System consisting of four jobs (J1-J4) of which only J1, J2, and J4 execute on the stage

Page 78: Feedback Performance Control of Distributed Computing Systems: A

- 78 -

78

The Operators

Page 79: Feedback Performance Control of Distributed Computing Systems: A

- 79 -

Schedulability Analysis Results

79

Page 80: Feedback Performance Control of Distributed Computing Systems: A

- 80 -

Impact of Delay Composition Algebra

● A simple approach to transform arbitrary task graphs into equivalent uniprocessor workloads amenable to existing analysis techniques

● Transforms schedulability conditions of a distributed task set into a utilization bound (queue state) amenable to control

● Opportunities for future research:– Needs extension to resource blocking (mutual exclusion

constraints), spatially partitioned resources (as opposed to prioritized), transactions (multiple resource acquired in an all-or-nothing fashion), …

– Advantages: simplicity, scalability.

80

Page 81: Feedback Performance Control of Distributed Computing Systems: A

- 81 -

©

Distributed Software Performance Control

ComputingTasks

Resource Scheduling

DistributedResource Queues

SchedulabilityAnalysis

Feasible Regions

Feedback Control

FeasibleRegionBoundary

SchedulingTheory

ControlTheory

Part I Part II

Requirements on meeting deadlines utilization (queue state) performance control

81

Page 82: Feedback Performance Control of Distributed Computing Systems: A

- 82 -

©

Distributed Software Performance Control

ComputingTasks

Resource Scheduling

DistributedResource Queues

SchedulabilityAnalysis

Feasible Regions

Feedback Control

FeasibleRegionBoundary

SchedulingTheory

ControlTheory

Part IIPart I

Requirements on meeting deadlines utilization (queue state) performance control

82

Page 83: Feedback Performance Control of Distributed Computing Systems: A

- 83 -

©

Control of Performance Tradeoffs in Distributed Systems

Resources

Utility (of global effect)

Raw inputs from physical world

Time

R

T

U

Distributed System

83

Page 84: Feedback Performance Control of Distributed Computing Systems: A

- 84 -

©

Resources

Utility (of global effect)

Raw inputs

Time

R

T

U

Distributed System

Time constraint

OptimumFeedback control solutions forconstrained performance optimization?

Control

Control of Performance Tradeoffs in Distributed Systems

84

Page 85: Feedback Performance Control of Distributed Computing Systems: A

- 85 -

©

A Methodology for Applying Feedback Control to Software

● Mapping: Performance management problem Feedback problem

– Determine the controlled performance metric (output) and its desired value (set point)

– Determine the available actuators (input)– Map the performance control problem into a set of control loops

● Modeling:– Model the input-output relation as a difference equation:

Outputk = Σi ai Outputk-i + Σi bi Inputk-i

● Controller design:– Use control theory to find function f, such that:

Input = f(set point – output) makes output set point85

Page 86: Feedback Performance Control of Distributed Computing Systems: A

- 86 -

Control Examples in Distributed Systems

● Case 1: Centralized control, centralized actuation

● Case 2: Centralized control, distributed actuation

● Case 3: Distributed (localized) control and actuation

86

Page 87: Feedback Performance Control of Distributed Computing Systems: A

- 87 -

Control Examples in Distributed Systems

● Case 1: Centralized control, centralized actuation

● Case 2: Centralized control, distributed actuation

● Case 3: Distributed (localized) control and actuation

87

Page 88: Feedback Performance Control of Distributed Computing Systems: A

- 88 -

©

Experimental Testbed and Evaluation

● A real-time computing cluster is developed under Linux

● 4 worker Linux PCs connected by 100Mbps LAN to a front-end load distributor

LoadDistributor

Real-TimeServerReal-TimeServerReal-TimeServerReal-TimeServer

LANIncomingAperiodic tasks

Apache: web interfaceCGI: request processing

(HTTP/TCP)

AdmissionControl

Measurements

Throughput,Utilization,Miss ratio

Sensing

Delay element (sampling) + lag (moving average)PI controller

Static gain

88

Page 89: Feedback Performance Control of Distributed Computing Systems: A

- 89 -

©

Server Utilization

0

20

40

60

80

100

120

40 50 60 70 80 90

No ACAC

Cluster Utilization

RequestRate

Utilization-based admission control does not underutilize server

89

Page 90: Feedback Performance Control of Distributed Computing Systems: A

- 90 -

©

Miss Ratio Profile

0102030405060708090

40 50 60 70 80 90

No ACAC

Missed Deadlines(No AC)

Rejected(with AC)

RequestRate (req/s)

% missed or rejected requests

Utilization-based admission control improves task success ratio

90

Page 91: Feedback Performance Control of Distributed Computing Systems: A

- 91 -

Control Examples in Distributed Systems

● Case 1: Centralized control, centralized actuation

● Case 2: Centralized control, distributed actuation

● Case 3: Distributed (localized) control and actuation

91

Page 92: Feedback Performance Control of Distributed Computing Systems: A

- 92 -

Control Examples in Distributed Systems

● Case 1: Centralized control, centralized actuation

● Case 2: Centralized control, distributed actuation

● Case 3: Distributed (localized) control and actuation

92

Page 93: Feedback Performance Control of Distributed Computing Systems: A

- 93 -

Energy Management in Server Farms

Energy expended on:

● Computing (powering up racks of machines)– Sensors: Machine utilization, Delay, Throughput, …

– Actuators: DVS, turning machines On/Off

● Cooling– Sensors: Temperature, air flow, …

– Actuators: Air-conditioning units, fans, …

● Energy bill is 40-50% of total profit

93

Page 94: Feedback Performance Control of Distributed Computing Systems: A

- 94 -

Problem Formulation

Control knobs

гxi

Step 1: Formulate objective and constraints

Step 2: Determine optimality conditions (KKT)

Step 3: Set up control loops

гx1 = гaverage=

гx2

=

гx2

Set point

94

Page 95: Feedback Performance Control of Distributed Computing Systems: A

- 95 -

A Server Farm Case study

Tier 1 Tier 2 Tier 3

requests

On/Off DVS

Composed distributed middleware

m1 m2 m3f1 f2 f3 Find knob settings, (m1, m2, m3, f1,

f2, f3) such that energy consumption is reduced

95

Page 96: Feedback Performance Control of Distributed Computing Systems: A

- 96 -

Formulation

Power estimation of a machine at tier i

Queuing equation using number of machines and arrival rate

Power estimation function of a machine at tier i

Find best composition of

(m1, m2, m3, U1, U2, U3),

hence (m1, m2, m3, f1, f2, f3)

Formulate constrained optimization

96

Page 97: Feedback Performance Control of Distributed Computing Systems: A

- 97 -

● Derive necessary condition for optimality– Karush-Kuhn-Tucker (KKT) condition

● Errori = Гavg - Г(mi, Ui) , where Гavg is average of Г(mi, Ui)

Optimality Conditions

97

Page 98: Feedback Performance Control of Distributed Computing Systems: A

- 98 -

EvaluationDVS + On/Off

DVS alone

On/Off alone

Optimal

98

Page 99: Feedback Performance Control of Distributed Computing Systems: A

- 99 -

Control Examples in Distributed Systems

● Case 1: Centralized control, centralized actuation

● Case 2: Centralized control, distributed actuation

● Case 3: Distributed (localized) control and actuation

99

Page 100: Feedback Performance Control of Distributed Computing Systems: A

- 100 -

Control Examples in Distributed Systems

● Case 1: Centralized control, centralized actuation

● Case 2: Centralized control, distributed actuation

● Case 3: Distributed (localized) control and actuation

100

Page 101: Feedback Performance Control of Distributed Computing Systems: A

- 101 -

101

Problem: Allocate rates to elastic flows in a wireless network so as to maximize network utility, while meeting end-to-end delay requirements

Example: Flow Rate Control in a Wireless Network subject to Delay Constraints

Page 102: Feedback Performance Control of Distributed Computing Systems: A

- 102 -

Problem Formulation

Control knobs

гxi

Step 1: Formulate objective and constraints

Step 3: Determine optimality conditions (KKT)

Step 4: Set up control loops

Step 2: Decentralize the constraints:

0,

0

1

,...,1

≤+=⎯⎯ →⎯

=∑

niiibecomes

nii

SumSumTermSum

Term

),...,()1()(),...,()1()(

1 njjj

miiii

xxGkkhCkxkxψυυ

υυ+−=

+−=

102

Page 103: Feedback Performance Control of Distributed Computing Systems: A

- 103 -

103

Decentralizing the Delay ConstraintFor each flow s and hop i, define additional variables to hold the value of the left hand side of the constraint for hops i and up to the destination

Each node only requires value from immediate downstream node

Source constraint compares end-to-end delay to deadline

Ratio of delay on all downstream nodes to deadline

message

Page 104: Feedback Performance Control of Distributed Computing Systems: A

- 104 -

Simulation SetupImplemented on ns2

50 nodes placed uniformly at random

802.11 with prioritized scheduling; DSDV routing

5 elastic flows, 3 priorities with no. of flows in each in the ratio 1:2:4 (for high:medium:low) – end-to-end deadlines 2, 4, 7s

Algorithms:

No rate controlNUM w/o delay constraints performs only rate control based on capacity constraintsNUM with delay constraints performs rate control based on capacity as well as delay constraints

Utility function

Importance proportional to urgencyImportance same for all flows regardless of urgency (‘Eq. Util. Flows’)104

Page 105: Feedback Performance Control of Distributed Computing Systems: A

- 105 -

Simulation Results

High Priority Flows

Low Priority Flows

105

Page 106: Feedback Performance Control of Distributed Computing Systems: A

- 106 -

©

Conclusions● Emerging distributed systems will feature increased scale,

interaction with a physical environment, uncertainty, and autonomy

● We need analytic tools for predicting and controlling the temporal behavior of such systems

● Theoretical foundations are needed to bridge the gap between software and feedback control abstractions

● A theory is developed for translating schedulability constraintsin a class of distributed systems into constraints on utilization (or virtual queue) metrics

● These constraints can be decentralized leading to localized algorithms that collectively converge to global optima

● The feedback control problem derives from an optimization problem: The distributed system as an iterative optimizer

106

Page 107: Feedback Performance Control of Distributed Computing Systems: A

- 107 -

A Summary of Challenges● What are other general categories of fine-grained

constraints that can be converted to aggregate state variables amenable to control?

● How to translate desired global properties into localized protocol interactions?

● How to model nonlinearities specific to software?

● How to incorporate complex information extraction algorithms (e.g., data mining) in control loops?

● How to achieve convergence in poorly structured evolving systems? (e.g., mobile ad hoc networks, changing connectivity, non-uniform resource density, etc.)

107

Page 108: Feedback Performance Control of Distributed Computing Systems: A

- 108 -

©

Backup: Instantaneous versus Real Utilization

● An important property is that:

avg. instantaneous utilization < avg. real utilization.

Hence, server is not underutilizaed!

● Proof:

Real utilization = 1Synthetic utilization < 0.58

0

108

Page 109: Feedback Performance Control of Distributed Computing Systems: A

- 109 -

109

Simulation Results (2/3)

High Priority Flows

Low Priority Flows

Page 110: Feedback Performance Control of Distributed Computing Systems: A

- 110 -

110

Simulation Results (3/3)

High Priority Flows

Low Priority Flows