stability and asymptotic optimality - university of florida · stability and asymptotic optimality...

57
- A. Rybko, 2006 Stability and Asymptotic Optimality of h -MaxWeight Policies Sean Meyn Department of Electrical and Computer Engineering University of Illinois & the Coordinated Science Laboratory Sean Meyn Department of Electrical and Computer Engineering University of Illinois & the Coordinated Science Laboratory NSF support: ECS 05-23620 and DARPA ITMANET

Upload: hoangkhanh

Post on 01-Apr-2018

226 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

- A. Rybko, 2006

Stability and Asymptotic Optimality of h-MaxWeight Policies

Sean Meyn

Department of Electrical and Computer Engineering University of Illinois & the Coordinated Science Laboratory

Sean Meyn

Department of Electrical and Computer Engineering University of Illinois & the Coordinated Science Laboratory

NSF support: ECS 05-23620 and DARPA ITMANET

Page 2: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Control Techniques for Complex Networks Draft copy April 22 2007

II Workload 158

5 Workload & Scheduling 1595.1 Single server queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1605.2 Workload for the CRW scheduling model . . . . . . . . . . . . . . . . 1635.3 Relaxations for the fluid model . . . . . . . . . . . . . . . . . . . . . . 167

III Stability & Performance 318

9 Optimization 374

10 ODE methods 436

10.5 Safety stocks and trajectory tracking . . . . . . . . . . . . . . . . . . . 46210.6 Fluid-scale asymptotic optimality . . . . . . . . . . . . . . . . . . . . . 467

Control Techniques for Complex Networks Draft copy April 22 2007

III Stability & Performance 318

10 ODE methods 436

10.5 Safety stocks and trajectory tracking . . . . . . . . . . . . . . . . . . . 46210.6 Fluid-scale asymptotic optimality . . . . . . . . . . . . . . . . . . . . . 467

11 Simulation & Learning 485

11.4 Control variates and shadow functions . . . . . . . . . . . . . . . . . . 50311.5 Estimating a value function . . . . . . . . . . . . . . . . . . . . . . . . 51611.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532

11.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534

A Markov Models 538A.1 Every process is (almost) Markov . . . . . . . . . . . . . . . . . . . . 538A.2 Generators and value functions . . . . . . . . . . . . . . . . . . . . . . 540A.3 Equilibrium equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 543A.4 Criteria for stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552A.5 Ergodic theorems and coupling . . . . . . . . . . . . . . . . . . . . . . 560A.6 Converse theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568

List of Figures 572

I Modeling & Control 34

Control Techniques for Complex Networks Draft copy April 22 2007

4 Scheduling 994.1 Controlled random-walk model . . . . . . . . . . . . . . . . . . . . . . 1014.2 Fluid model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094.3 Control techniques for the fluid model . . . . . . . . . . . . . . . . . . 116

III Stability & Performance 318

9 Optimization 3749.4 Optimality equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 3929.6 Optimization in networks . . . . . . . . . . . . . . . . . . . . . . . . . 408I Modeling & Control 34

4.8 MaxWeight and MinDrift . . . . . . . . . . . . . . . . . . . . . . . . . 1454.9 Perturbed value function . . . . . . . . . . . . . . . . . . . . . . . . . 1484.10 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1524.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

III Stability & Performance 318

8 Foster-Lyapunov Techniques 319

8.1 Lyapunov functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324

8.4 MaxWeight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3428.5 MaxWeight and the average-cost optimality equation . . . . . . . . . . 348

WoWW rkload & Scheduling 1595.1 Single server queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1605.2 WoWW rkrr load foff r the CRWRR scheduling model . . . . . . . . . . . . . . . . 1635.3 Relaxations foff r the fluid model . . . . . . . . . . . . . . . . . . . . . . 167

I Stability & Perfoff rmance 318

Optimization 374

III Stability11&6600 Perfoff rmance

10 ODE metho11d6677s77

10.5 Safety stocks and traja ectory tracking . . . . . . . . . . . . . . . . . .10.6 Fluid-scale asymptotic optimality . . . . . . . . . . . . . . . . . . . .

11 Simulation & Learning

11.4 Control variates and shadow funff ctions . . . . . . . . . . . . . . . . .11.5 Estimating a value fuff nction . . . . . . . . . . . . . . . . . . . . . . .11.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

A Markov Mo313d1188e88ls

A.1 Every process is (almost) Markorr v . . . . . . . . . . . . . . . . . . .A.2 Genera3377t77o44rs and value funff ctions . . . . . . . . . . . . . . . . . . . . .A 3 Equilibrium equations

eling & Control 34

l TecTT hniques foff r Complex Networks Draftff copy April 22 2007

duling 9EE9EEControlled random-walk model . . . . . . . . . . . . . . . . . . . . 11 ... 1066 NN1NNFluid model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11. 1077 EE

9EE

Control techniques foff r the fluid model . . . . . . . . . . . . . . . . . . 116EEx

ability & Perfoff rmance 318

mization 374Optimality equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 392Optimization in networkrr s . . . . . . . . . . . . . . . . . . . . . . . . . 408I M.. o44d0088e88 ling & Control

4.8 MaxWeWW ight and MinDriftff . . . . . . . . . . . . . . . . . . . . . . .4.9 Perturbr ed value funff ction . . . . . . . . . . . . . . . . . . . . . . .4.10 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

III Stability & Perfoff rmance 318

8 Foster-LyLL apunov TecTT hniques 319

8.1 LyLL apunov fuff nctions . . . . . . . . . AA . . . .MMaarrkkoo .vv . . .MMooddellss . . . . . . . . 324

8oo.4rmm MaannccaxWeWW ight . . . . . . . . . . . . . . . . . . .A.A 11 EE .vvee . . . .rryy pprrocc .eess . . . . .ss iss (a( llmm .ss .)) 34MMaa2arar8.5 MaxWeWW ight and the avaa erage-cost optimalitAAyA.A.22e22 quGGaGGeetieennonneen33erer3333 .444444rss .aa . . . . .nndd vvaaluue .uu .nncc34titioo8nn

I Models & BackgroundII h-MaxWeight PoliciesIII Heavy Traffic Conclusions

Outline

Page 3: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

IModels & Background

I Modeling & Control 34

Control Techniques for Complex Networks Draft copy April 22 2007

4 Scheduling 994.1 Controlled random-walk model . . . . . . . . . . . . . . . . . . . . . . 1014.2 Fluid model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094.3 Control techniques for the fluid model . . . . . . . . . . . . . . . . . . 116

III Stability & Performance 318

9 Optimization 3749.4 Optimality equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 3929.6 Optimization in networks . . . . . . . . . . . . . . . . . . . . . . . . . 408

Page 4: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Controlled Random-Walk Model

Statistics & topology:

Constituency constraints:

- Lippman 1975- Henderson & M. 1997

A(k) =

⎡⎢⎢⎢⎣

A1(k)0

A3(k)0

⎤⎥⎥⎥⎦

B(k) =

⎡⎢⎢⎢⎣

−S1(k) 0 0 0S1(k) −S2(k) 0 00 0 −S3(k) 00 0 S3(k) −S4(k)

⎤⎥⎥⎥⎦

U(k) ≥ 0

Q(k+1) = Q(k)+B(k+1)U(k)+A(k+1), Q(0) = x

C =1 0 0 10 1 1 0

C U(k) ≤ 1

Station 1 Station 2

α1

α3

µ2

µ3

µ1

µ4

Page 5: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Fluid Model & Workload

- Newell 1982, - Vandergraft 1983 - Perkins & Kumar 1989 - Chen & Mandelbaum 1991, - Cruz 1991

(0) = x

Station 1 Station 2

α1

α3

µ2

µ3

µ1

µ4

q( qt) = x + Bz(t) + αt , t≥ 0

ξ1 =

⎡⎢⎢⎢⎣

m10

m4m4

⎤⎥⎥⎥⎦ ↪ ξ2 =

⎡⎢⎢⎢⎣

m2m2m30

⎤⎥⎥⎥⎦

ρ1 = m1α1 + m4α3

ρ2 = m2α1 + m3α3

with mi = µ−1i

B = E[B(k)] =

⎡⎢⎢⎢⎣

−µ1 0 0 0µ1 −µ2 0 00 0 −µ3 00 0 µ3 −µ4

⎤⎥⎥⎥⎦

α = E[A(k)] =

⎡⎢⎢⎢⎣

α10α30

⎤⎥⎥⎥⎦

Fluid modelcapturesmean-flow:

Workloadandload parameters:

Page 6: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Value Functions

Station 1 Station 2

α1

α3

µ2

µ3

µ1

µ4

q(t) = x + Bz(t) + αt

alue function Relative

= average cost

value function

=Q(k+1) − Q(k) B(k+1)U(k)+A(k+1)

η = c(x)π(dx)

J(x) =∞

0c(q(t; x)) dt h(x) =

∞0

E[c(Q(t; x)) − η] dt

Page 7: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Value Functions

- M 96, 01, ... following -Dai 95, - Dai & M 95

Station 1 Station 2

α1

α3

µ2

µ3

µ1

µ4

q(t) = x + Bz(t) + αt

alue function

Large-state solidarity

Holds for wide class of stabilizingpolicies, including average-cost optimal policy

Relative value function

=Q(k+1) − Q(k) B(k+1)U(k)+A(k+1)

η = c(x)π(dx)

limJ(x)

h(x)= 1

J(x) =∞

0c(q(t; x)) dt h(x) =

∞0

E[c(Q(t; x)) − η] dt

‖x‖→∞

Page 8: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Myopic Policy: Fluid Modelq(t) = x + Bz(t) + αt

Given: Convex monotone cost function,

Constraints: subset of

c : R�+

R�+

→ R+

X

feasible values of

when

U(x)

x = q(t) ∈ X

d+

dtq(t) = Bζ(t) +

ζ(t)

α

Page 9: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Myopic Policy: Fluid Model

Given: Convex monotone cost function,

Constraints: subset of

c : R�+

R�+

→ R+

X

feasible values of

when

U(x)

x = q(t) ∈ X

d+

dtq(t) = Bζ(t) +

ζ(t)

α

arg minu∈U(x)

d+

dtc(q(t)) = arg minu∈U(x)

〈∇c(x),Bu + α〉

Page 10: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Myopic Policy: CRW Model

Given: Convex monotone cost function,

Constraints: subset of (lattice constraints, etc.)

=Q(k+1) − Q(k) B(k+1)U(k)+A(k+1)

c : R�+

R�+

→ R+

X

feasible values of

when

U (x)

x = Q(k) ∈ X

U(k)

Page 11: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Myopic Policy: CRW Model

Given: Convex monotone cost function,

Myopic policy:

Constraints: subset of (lattice constraints, etc.)

=Q(k+1) − Q(k) B(k+1)U(k)+A(k+1)

c : R�+

R�+

→ R+

X

feasible values of

when

U (x)

x = Q(k) ∈ X

U(k)

arg minu∈U (x)

E[c(Q(k + 1)) Q(k) = x, U(k) = u]

Page 12: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Myopic Policy: CRW Model

Motivation: Average cost optimal policy is h-myopic,

is the relative value function,

=Q(k+1) − Q(k) B(k+1)U(k)+A(k+1)

h: R�+ → R+

infh(x) =∞

0E[c(Q(t; x)) − η ] dt*

U

Page 13: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Myopic Policy: CRW Model

Dynamic programming equation:

minu∈U (x)

E[h(Q h(x)(k + 1)) Q(k) = = η+c (x)−x, U(k) = u]

Motivation: Average cost optimal policy is h-myopic,

is the relative value function,

=Q(k+1) − Q(k) B(k+1)U(k)+A(k+1)

h: R�+ → R+

infh(x) =∞

0E[c(Q(t; x)) − η ] dt*

*

U

Page 14: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Fluid Model & Myopia

- Chen & Yao 93- M’ 01

(0) = x

Station 1 Station 2

α1

α3

µ2

µ3

µ1

µ4

q( qt) = x + Bz(t) + αt , t≥ 0

q(t) = t T≥ 00

Given: Convex monotone cost function,

Myopic policy for fluid model is stabilizing:

c : R�+ → R+

Page 15: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Myopia & Instability

- Kumar & Seidman 89- Rybko & Stolyar 93

Station 1 Station 2

α1

α3

µ2

µ3

µ1

µ4

Myopic policy may or may not be stabilizing

Example: Two station model above with linear cost,

Myopic policy for CRW model: Priority to exit buffers

c(x) = x1 + x2 + x3 + x4

Page 16: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Myopia & Instability

- Kumar & Seidman 89- Rybko & Stolyar 93

Station 1 Station 2

α1

α3

µ2

µ3

µ1

µ4

Myopic policy may or may not be stabilizing

Example: Two station model above with linear cost,

Myopic policy for CRW model: Priority to exit buffers

Periodic starvation creates instability

c(x) = x1 + x2 + x3 + x4

0 200 400 600 800 10000

200

400

600

800

0 200 400 600 800 10000

200

400

600

800

Poisson Arrivals and Service Fluid Arrivals and Service

Page 17: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Myopia & InstabilityQuadratic Cost

- Tassiulas & Ephremides 92

Station 1 Station 2

α1

α3

µ2

µ3

µ1

µ4

c(x) = 12 [x2

1 + x22 + x2

3 + x24]

Myopic policy stabilizing for diagonal quadratic

Example: Two station model above with,

Myopic policy: Approximated by linear switching curves

Page 18: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Myopia & InstabilityQuadratic Cost

- Tassiulas & Ephremides 92

Station 1 Station 2

α1

α3

µ2

µ3

µ1

µ4

c(x) = 12 [x2

1 + x22 + x2

3 + x24]

Myopic policy stabilizing for diagonal quadratic

Condition (V3) holds with Lyapunov function

For positive constants ε and

Example: Two station model above with,

Myopic policy: Approximated by linear switching curves

V = c

PV (x) := E[V (Q(k + 1)) Q(k) = x] ≤ V (x) − ε‖x‖ + η

η

Page 19: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

MaxWeight Policy

- Tassiulas & Ephremides 92

Station 1 Station 2

α1

α3

µ2

µ3

µ1

µ4

Tassiulas considers myopic policy for fluid model

where ,

subject to lattice constraints

arg minu∈U (x)

c(x),Bu+ α

c(x) = 12xTDx D = diag (d1, . . . , d )

Page 20: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

MaxWeight Policy

Station 1 Station 2

α1

α3

µ2

µ3

µ1

µ4

Tassiulas considers myopic policy for fluid model

Obtains negative drift: For non-zero x,

Implies (V3) for MaxWeight policy

subject to lattice constraints

arg minu∈U (x)

c(x),Bu+ α

c(x),Bu+ α ≤ − ε‖x‖

Page 21: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

MaxWeight Policy

Station 1 Station 2

α1

α3

µ2

µ3

µ1

µ4

Tassiulas considers myopic policy for fluid model

Obtains negative drift: For non-zero x,

Implies (V3) for MaxWeight policy

Implies (V3) for myopic policy

since myopic has minimum drift

subject to lattice constraints

arg minu∈U (x)

c(x),Bu+ α

c(x),Bu+ α ≤ − ε‖x‖

Page 22: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Questions Since 1996 limJ(x)

h(x)= 1

‖x‖→∞

Value functions for fluid and stochastic models: Quadratic growth for linear cost with similar asymptotes; Policies are similar for large state-values

Page 23: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Questions Since 1996 limJ(x)

h(x)= 1

‖x‖→∞

Value functions for fluid and stochastic models: Quadratic growth for linear cost with similar asymptotes; Policies are similar for large state-values

What is the gap between policies?

What is the gap between value functions?

How to translate policy for fluid model to cope with volatility?

Connections with heavy traffic theory?

Page 24: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Questions Since 1996 limJ(x)

h(x)= 1

‖x‖→∞

Value functions for fluid and stochastic models: Quadratic growth for linear cost with similar asymptotes; Policies are similar for large state-values

Many positive answers in new monograph, as well as new applications for value function approximation

Today’s lecture focuses on third and fourth topics

What is the gap between policies?

What is the gap between value functions?

How to translate policy for fluid model to cope with volatility?

Connections with heavy traffic theory?

Page 25: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

IIh-MaxWeight Policies

∆(x) = −µ1+α1

µ1

∆(x) = −µ1+α1

−µ2+µ1

∆(x) = α1

−µ2

I Modeling & Control 34

Control Techniques for Complex Networks Draft copy April 22 2007

4.8 MaxWeight and MinDrift . . . . . . . . . . . . . . . . . . . . . . . . . 1454.9 Perturbed value function . . . . . . . . . . . . . . . . . . . . . . . . . 1484.10 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1524.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

III Stability & Performance 318

8 Foster-Lyapunov Techniques 319

8.1 Lyapunov functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324

8.4 MaxWeight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3428.5 MaxWeight and the average-cost optimality equation . . . . . . . . . . 348

Page 26: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Why Does MW Work?

∆(x) =

∆(x)

E[ ]Q(k + 1) − Q(k) Q(k) = =x Bu + α

Geometric explanation

Define drift vector field (for given policy)

MaxWeight policy:

with c diagonal quadratic

arg minu∈U (x)

c(x),

Page 27: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Why Does MW Work?∆(x) = E[ ]Q(k + 1) − Q(k) Q(k) = x

Example: Queues in tandem µ1 µ2

α1

MaxWeight policy: serve buffer 1

∆(x) = −µ1+α1

µ1

∆(x) = −µ1+α1

−µ2+µ1

∆(x) = α1

−µ2

x1

x2

Page 28: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Why Does MW Work?∆(x) = E[ ]Q(k + 1) − Q(k) Q(k) = x

Example: Queues in tandem

Key observation: Boundaries of the state space are repelling

µ1 µ2

α1

MaxWeight policy: serve buffer 1

∆(x) = −µ1+α1

µ1

∆(x) = −µ1+α1

−µ2+µ1

∆(x) = α1

−µ2

x1

x2

Page 29: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Why Does MW Work?∆(x) = E[ ]Q(k + 1) − Q(k) Q(k) = x

Example: Queues in tandem

Key observation: Boundaries of the state space are repelling

Consequence of vanishing partial derivatives on boundary

µ1 µ2

α1

MaxWeight policy: serve buffer 1

∆(x) = −µ1+α1

µ1

∆(x) = −µ1+α1

−µ2+µ1

∆(x) = α1

−µ2

x1

x2

Level sets of c (diag quadratic)

Page 30: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

h-MaxWeight Policy

Given: Convex monotone function h

Boundary conditions

∂xjh (x) = 0 when xj = 0.

∆(x) = −µ1+α1

µ1

∆(x) = −µ1+α1

−µ2+µ1

∆(x) = α1

−µ2

Page 31: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

h-MaxWeight Policy

Given: Convex monotone function h

Boundary conditions

Economic interpretation:

Marginal disutility vanishes for vanishingly small inventory

∂xjh (x) = 0 when xj = 0.

∆(x) = −µ1+α1

µ1

∆(x) = −µ1+α1

−µ2+µ1

∆(x) = α1

−µ2

Page 32: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

h-MaxWeight Policy

Given: Convex monotone function h

Boundary conditions

Economic interpretation:

Condition rarely holds, but we can fix that ...

Marginal disutility vanishes for vanishingly small inventory

∂xjh (x) = 0 when xj = 0.

∆(x) = −µ1+α1

µ1

∆(x) = −µ1+α1

−µ2+µ1

∆(x) = α1

−µ2

Page 33: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

h-MaxWeight Policy

h0Given: Convex monotone function (perhaps violating ∂ condition)

Introduce perturbation: For fixed and anyθ ≥ 1

xi := xi + θ(e−xi/θ −1),

x ∈ R�+

and x = (x1, . . . , x�)T ∈ R

�+

∆(x) = −µ1+α1

µ1

∆(x) = −µ1+α1

−µ2+µ1

∆(x) = α1

−µ2

Page 34: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

h-MaxWeight Policy

h(x) = h0

h0

(x), x∈ R�+

Given: Convex monotone function (perhaps violating ∂ condition)

Introduce perturbation: For fixed

Perturbed function:

Convex, monotone, and boundary conditions are satisfied

and anyθ ≥ 1

xi := xi + θ(e−xi/θ −1),

x ∈ R�+

and x = (x1, . . . , x�)T ∈ R

�+

∆(x) = −µ1+α1

µ1

∆(x) = −µ1+α1

−µ2+µ1

∆(x) = α1

−µ2

Page 35: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

h-MaxWeight PolicyPerturbed linear function

µ1 µ2

α1

h0

h-myopic and h-MaxWeight polices stabilizing

provided is sufficiently large

linear: never satisfies ∂ condition

θ ≥ 1

Page 36: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

h-MaxWeight PolicyPerturbed linear function

µ1 µ2

α1

h0

h-myopic and h-MaxWeight polices stabilizing

Example: Tandem queues

provided is sufficiently large

linear: never satisfies ∂ condition

θ ≥ 1

h-MaxWeight policy: serve buffer 1

Level sets of h

q2 = θ log 1 −− c1

c2( )

∆(x) = −µ1+α1

µ1

∆(x) = −µ1+α1

−µ2+µ1

∆(x) = α1

−µ2

x1

x2

q2

Page 37: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

h-MaxWeight PolicyPerturbed value function

µ1 µ2

α1

h0

h-myopic and h-MaxWeight polices stabilizing

provided is sufficiently large

minimal fluid value function,

θ ≥ 1

J(x) = inf∞

0c(q(t; x)) dt

Page 38: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

h-MaxWeight PolicyPerturbed value function

µ1 µ2

α1

h0

h-myopic and h-MaxWeight polices stabilizing

provided is sufficiently large

minimal fluid value function,

Resulting policy very similar to average-cost optimal policy:

θ ≥ 1

J(x) = inf∞

0c(q(t; x)) dt

Optimal policy: serve buffer 1

10 20 30 40 50 60 70 80 90 100

10

20

30

x1

x2

Page 39: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

IIIHeavy Traffic

Control Techniques for Complex Networks Draft copy April 22 2007

II Workload 158

5 Workload & Scheduling 1595.1 Single server queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1605.2 Workload for the CRW scheduling model . . . . . . . . . . . . . . . . 1635.3 Relaxations for the fluid model . . . . . . . . . . . . . . . . . . . . . . 167

III Stability & Performance 318

9 Optimization 374

10 ODE methods 436

10.5 Safety stocks and trajectory tracking . . . . . . . . . . . . . . . . . . . 46210.6 Fluid-scale asymptotic optimality . . . . . . . . . . . . . . . . . . . . . 467

Page 40: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Relaxations & Asymptotic Optimality

Single example for sake of illustration:

Model of Dai & Wang

α1

Station 1

Station 2

Page 41: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Relaxations & Asymptotic Optimality

Single example for sake of illustration:

Assume: Homogeneous model

iService rate at Station i is µ

α1

Station 1

Station 2

Page 42: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Relaxations & Asymptotic Optimality

Homogeneous CRW model:

Q1(k + 1) − Q1(k) = −S1(k + 1)U1(k) + A1(k + 1)

Q2(k + 1) − Q2(k) = −S1(k + 1)U2(k) + S1(k + 1)U1(k)

Q3(k + 1) − Q3(k) = −S2(k + 1)U3(k) + S2(k + 1)U2(k)

Q4(k + 1) − Q4(k) = −S2(k + 1)U4(k) + S2(k + 1)U3(k)

Q5(k + 1) − Q5(k) = −S1(k + 1)U5(k) + S2(k + 1)U4(k)

α1

Station 1

Station 2

Page 43: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Relaxations & Asymptotic Optimality

Homogeneous CRW model:

Constituency constraints:

Q1(k + 1) − Q1(k) = −S1(k + 1)U1(k) + A1(k + 1)

Q2(k + 1) − Q2(k) = −S1(k + 1)U2(k) + S1(k + 1)U1(k)

Q3(k + 1) − Q3(k) = −S2(k + 1)U3(k) + S2(k + 1)U2(k)

Q4(k + 1) − Q4(k) = −S2(k + 1)U4(k) + S2(k + 1)U3(k)

Q5(k + 1) − Q5(k) = −S1(k + 1)U5(k) + S2(k + 1)U4(k)

U1(k) + U2(k) + U5(k) ≤ 1 U3(k) + U4(k) ≤ 1

Ui(k) ∈ {0,1}

α1

Station 1

Station 2

Page 44: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Relaxations & Asymptotic Optimality

Workload (units of inventory)

Y2(k) = 2(Q1(k) + Q2(k) + Q3(k)) + Q4(k)

Y1(k) = 3Q1(k) + 2Q2(k) + Q3(k) + Q4(k) + Q5(k)

α1

Station 1

Station 2

Page 45: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Relaxations & Asymptotic Optimality

Workload (units of inventory)

Idleness processes:

Y2(k) = 2(Q1(k) + Q2(k) + Q3(k)) + Q4(k)

Y1(k) = 3Q1(k) + 2Q2(k) + Q3(k) + Q4(k) + Q5(k)

ι1(k) = 1 − (U1(k) + U2(k) + U5(k))

ι2(k) = 1 − (U3(k) + U4(k))

α1

Station 1

Station 2

Page 46: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Relaxations & Asymptotic Optimality

Workload (units of inventory)

Dynamics:

Idleness processes:

Y2(k) = 2(Q1(k) + Q2(k) + Q3(k)) + Q4(k)

Y1(k) = 3Q1(k) + 2Q2(k) + Q3(k) + Q4(k) + Q5(k)

Y1(k + 1) − Y1(k) = −S1(k + 1) + 3A1(k + 1)

Y2(k + 1) − Y2(k) = −S2(k + 1) + 2A1(k + 1)

+

+

S1(k + 1)ι1(k)

S2(k + 1)ι2(k)

ι1(k) = 1 − (U1(k) + U2(k) + U5(k))

ι2(k) = 1 − (U3(k) + U4(k))

α1

Station 1

Station 2

Page 47: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Relaxations & Asymptotic Optimality

Workload Relaxation of N. Laws

with constraints on idleness process relaxed,

Y1(k + 1) − Y1(k) = −S1(k + 1) + 3A1(k + 1) + S1(k + 1)ι1(k)

ι1(k) ∈ {0, 1,2, . . . }

- Laws 90- Kelly & Laws 93

α1

Station 1

Station 2

Page 48: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Relaxations & Asymptotic Optimality

c(y) = min c(x)

s. t. 3x1 + 2x2 + x3 + x4 + x5 = y

x ∈ Z5+ (+ buffer constraints)

Workload Relaxation of N. Laws

Optimization based on the effective cost,

with constraints on idleness process relaxed,

Y1(k + 1) − Y1(k) = −S1(k + 1) + 3A1(k + 1) + S1(k + 1)ι1(k)

ι1(k) ∈ {0, 1,2, . . . }

- Laws 90- Kelly & Laws 93- Harrison, Kushner, Reiman, Williams, Dai, Bramson, ...

α1

Station 1

Station 2

Page 49: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Asymptotic Optimality

Optimal policy is non-idling for one-dimensional relaxation

Dynamic programing equation solved via Pollaczek-Khintchine formula

Page 50: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Asymptotic OptimalityHeavy traffic assumptions

Load is unity for nominal model

Single bottleneck to define relaxation

Cost is linear, and effective cost has a unique optimizer

Model sequence:

Load less than unity for each n

A(n)(k) =

{A(k) 1 − n−1

0 with probability

with probability

n−1

Page 51: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Asymptotic Optimality

h-MaxWeight policy asymptotically optimal, with logarithmic regret

h0(x) = h∗(y) + b2 c(( (x) − c(y)

2

Page 52: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Asymptotic Optimality

h-MaxWeight policy asymptotically optimal, with logarithmic regret

h0(x) = h∗(y) + b2 c(( (x) − c(y)

2

η

η average cost under h-MW policy

∗ = O(n optimal average cost for relaxation)

Page 53: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Asymptotic Optimality

h-MaxWeight policy asymptotically optimal, with logarithmic regret

h0(x) = h∗(y) + b2 c(( (x) − c(y)

2

η

η average cost under h-MW policy

∗ = O(n optimal average cost for relaxation)

η∗ ≤ η ≤ η∗ + O(log(n))

Page 54: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Conclusions

Control Techniques for Complex Networks Draft copy April 22 2007

III Stability & Performance 318

10 ODE methods 436

10.5 Safety stocks and trajectory tracking . . . . . . . . . . . . . . . . . . . 46210.6 Fluid-scale asymptotic optimality . . . . . . . . . . . . . . . . . . . . . 467

11 Simulation & Learning 485

11.4 Control variates and shadow functions . . . . . . . . . . . . . . . . . . 50311.5 Estimating a value function . . . . . . . . . . . . . . . . . . . . . . . . 51611.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532

11.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534

A Markov Models 538A.1 Every process is (almost) Markov . . . . . . . . . . . . . . . . . . . . 538A.2 Generators and value functions . . . . . . . . . . . . . . . . . . . . . . 540A.3 Equilibrium equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 543A.4 Criteria for stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552A.5 Ergodic theorems and coupling . . . . . . . . . . . . . . . . . . . . . . 560A.6 Converse theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568

List of Figures 572

Page 55: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Conclusions

h-MaxWeight policy stabilizing under very general conds.

General approach to policy translation. Resulting policy mirrors optimal policy in examples

Asymptotically optimal, with logarithmic regret for model with single bottleneck

Page 56: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Conclusions

Future work

h-MaxWeight policy stabilizing under very general conds.

General approach to policy translation. Resulting policy mirrors optimal policy in examples

Asymptotically optimal, with logarithmic regret for model with single bottleneck

Models with multiple bottlenecks?

On-line learning for policy improvement?

Page 57: Stability and Asymptotic Optimality - University of Florida · Stability and Asymptotic Optimality ... 408. Controlled Random-Walk Model ... Today’s lecture focuses on third and

Control Techniques for Complex Networks Draft copy April 22 2007

II Workload 158

5 Workload & Scheduling 1595.1 Single server queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1605.2 Workload for the CRW scheduling model . . . . . . . . . . . . . . . . 1635.3 Relaxations for the fluid model . . . . . . . . . . . . . . . . . . . . . . 167

III Stability & Performance 318

9 Optimization 374

10 ODE methods 436

10.5 Safety stocks and trajectory tracking . . . . . . . . . . . . . . . . . . . 46210.6 Fluid-scale asymptotic optimality . . . . . . . . . . . . . . . . . . . . . 467

Control Techniques for Complex Networks Draft copy April 22 2007

III Stability & Performance 318

10 ODE methods 436

10.5 Safety stocks and trajectory tracking . . . . . . . . . . . . . . . . . . . 46210.6 Fluid-scale asymptotic optimality . . . . . . . . . . . . . . . . . . . . . 467

11 Simulation & Learning 485

11.4 Control variates and shadow functions . . . . . . . . . . . . . . . . . . 50311.5 Estimating a value function . . . . . . . . . . . . . . . . . . . . . . . . 51611.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532

11.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534

A Markov Models 538A.1 Every process is (almost) Markov . . . . . . . . . . . . . . . . . . . . 538A.2 Generators and value functions . . . . . . . . . . . . . . . . . . . . . . 540A.3 Equilibrium equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 543A.4 Criteria for stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552A.5 Ergodic theorems and coupling . . . . . . . . . . . . . . . . . . . . . . 560A.6 Converse theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568

List of Figures 572

I Modeling & Control 34

Control Techniques for Complex Networks Draft copy April 22 2007

4 Scheduling 994.1 Controlled random-walk model . . . . . . . . . . . . . . . . . . . . . . 1014.2 Fluid model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094.3 Control techniques for the fluid model . . . . . . . . . . . . . . . . . . 116

III Stability & Performance 318

9 Optimization 3749.4 Optimality equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 3929.6 Optimization in networks . . . . . . . . . . . . . . . . . . . . . . . . . 408I Modeling & Control 34

4.8 MaxWeight and MinDrift . . . . . . . . . . . . . . . . . . . . . . . . . 1454.9 Perturbed value function . . . . . . . . . . . . . . . . . . . . . . . . . 1484.10 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1524.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

III Stability & Performance 318

8 Foster-Lyapunov Techniques 319

8.1 Lyapunov functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324

8.4 MaxWeight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3428.5 MaxWeight and the average-cost optimality equation . . . . . . . . . . 348

WoWW rkload 158

WoWW rkload & Scheduling 1595.1 Single server queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1605.2 WoWW rkrr load foff r the CRWRR scheduling model . . . . . . . . . . . . . . . . 1635.3 Relaxations foff r the fluid model . . . . . . . . . . . . . . . . . . . . . . 167

I Stability & Perfoff rmance 318

Optimization 374

0 ODE methods 436

10.5 Safety stocks and traja ectory tracking . . . . . . . . . . . . . . . . . . 46210.6 Fluid-scale asymptotic optimality . . . . . . . . . . . . . . . . . . . . . 467

Control Tec11TT hn151588iques foff r Complex Networks Draftff copy Ap

III Stability11&6600 Perfoff rmance

10 ODE metho11d6677s77

10.5 Safety stocks and traja ectory tracking . . . . . . . . . . . . . . . . . .10.6 Fluid-scale asymptotic optimality . . . . . . . . . . . . . . . . . . . .

11 Simulation & Learning

11.4 Control variates and shadow funff ctions . . . . . . . . . . . . . . . . .11.5 Estimating a value fuff nction . . . . . . . . . . . . . . . . . . . . . . .11.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

A Markov Mo313d1188e88ls

A.1 Every process is (almost) Markorr v . . . . . . . . . . . . . . . . . . .A.2 Genera3377t77o44rs and value funff ctions . . . . . . . . . . . . . . . . . . . . .A.3 Equilibrium equations . . . . . . . . . . . . . . . . . . . . . . . .A.4 Criter44i44a433 f66off r stability . . . . . . . . . . . . . . . . . . . . . . . . . . .A... 5.. E.. r.go.. ..r. d44i4466 t22heorems and coupling . . . . . . . . . . . . . . . . . . . . .A... 6.. C.. o..nv.nn e4646s6677e77 theorems . . . . . . . . . . . . . . . . . . . . . . . . . . .

eling & Control 34

l TecTT hniques foff r Complex Networks Draftff copy April 22 2007

duling 9EE9EEControlled random-walk model . . . . . . . . . . . . . . . . . . . . 11 ... 1066 NN1NNFluid model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11. 1077 EE

9EE

Control techniques foff r the fluid model . . . . . . . . . . . . . . . . . . 116EEx

ability & Perfoff rmance 318

mization 374Optimality equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 392Optimization in networkrr s . . . . . . . . . . . . . . . . . . . . . . . . . 408I M.. o44d0088e88 ling & Control

4.8 MaxWeWW ight and MinDriftff . . . . . . . . . . . . . . . . . . . . . . .4.9 Perturbr ed value funff ction . . . . . . . . . . . . . . . . . . . . . . .4.10 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

III Stability & Perfoff rmance 318

8 FosterAAAALAA yMMLL aMMpunMMaarrkkokkoovovv TMMecMMooTMM hnddeellillssquss es 319

8.1 LyLL apAAuA.A.n11ovEEfEEvvuvvefvvnerercrryytiyy ponpprroosoocc . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3..24..

8.4 MaxWA.A.22eWW ighGGE

tGGe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34.. 2..

8.5 MaxWA.A.33eWW ighEEC

tEEqqaqquunu d44liibt33

bbrh3366rrie66 avaa eraeeqquugu

ilile-ctt ooossst op. ti.m.. alit. .. y eq..ua. ti.on. . . . . . . . . . . 34.. 8..

N. Laws. Dynamic routing in queueing networks. PhD thesis, Cambridge University, Cambridge, UK, 1990.

L. Tassiulas. Adaptive back-pressure congestion control based on local information. 40(2):236–250, 1995.

L. Tassiulas and A. Ephremides. Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks. 1992.

S. P. Meyn. Sequencing and routing in multiclass queueing networks. Part II: Workload relaxations. 2003.

S. P. Meyn. Stability and asymptotic optimality of generalized MaxWeight policies. Submitted for publication, 2006.

S. P. Meyn. Control techniques for complex networks. To appear, Cambridge University Press, 2007.

References