stability and asymptotic optimality - university of florida · stability and asymptotic optimality...

- A. Rybko, 2006

Stability and Asymptotic Optimality of h-MaxWeight Policies

Sean Meyn

Department of Electrical and Computer Engineering University of Illinois & the Coordinated Science Laboratory

Sean Meyn

Department of Electrical and Computer Engineering University of Illinois & the Coordinated Science Laboratory

NSF support: ECS 05-23620 and DARPA ITMANET

Control Techniques for Complex Networks Draft copy April 22 2007

II Workload 158

5 Workload & Scheduling 1595.1 Single server queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1605.2 Workload for the CRW scheduling model . . . . . . . . . . . . . . . . 1635.3 Relaxations for the fluid model . . . . . . . . . . . . . . . . . . . . . . 167

III Stability & Performance 318

9 Optimization 374

10 ODE methods 436

10.5 Safety stocks and trajectory tracking . . . . . . . . . . . . . . . . . . . 46210.6 Fluid-scale asymptotic optimality . . . . . . . . . . . . . . . . . . . . . 467



10 ODE methods 436


11 Simulation & Learning 485

11.4 Control variates and shadow functions . . . . . . . . . . . . . . . . . . 50311.5 Estimating a value function . . . . . . . . . . . . . . . . . . . . . . . . 51611.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532

11.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534

A Markov Models 538A.1 Every process is (almost) Markov . . . . . . . . . . . . . . . . . . . . 538A.2 Generators and value functions . . . . . . . . . . . . . . . . . . . . . . 540A.3 Equilibrium equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 543A.4 Criteria for stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552A.5 Ergodic theorems and coupling . . . . . . . . . . . . . . . . . . . . . . 560A.6 Converse theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568

List of Figures 572

I Modeling & Control 34


4 Scheduling 994.1 Controlled random-walk model . . . . . . . . . . . . . . . . . . . . . . 1014.2 Fluid model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094.3 Control techniques for the fluid model . . . . . . . . . . . . . . . . . . 116


9 Optimization 3749.4 Optimality equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 3929.6 Optimization in networks . . . . . . . . . . . . . . . . . . . . . . . . . 408I Modeling & Control 34

4.8 MaxWeight and MinDrift . . . . . . . . . . . . . . . . . . . . . . . . . 1454.9 Perturbed value function . . . . . . . . . . . . . . . . . . . . . . . . . 1484.10 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1524.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154


8 Foster-Lyapunov Techniques 319

8.1 Lyapunov functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324

8.4 MaxWeight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3428.5 MaxWeight and the average-cost optimality equation . . . . . . . . . . 348

WoWW rkload & Scheduling 1595.1 Single server queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1605.2 WoWW rkrr load foff r the CRWRR scheduling model . . . . . . . . . . . . . . . . 1635.3 Relaxations foff r the fluid model . . . . . . . . . . . . . . . . . . . . . . 167

I Stability & Perfoff rmance 318

Optimization 374

III Stability11&6600 Perfoff rmance

10 ODE metho11d6677s77

10.5 Safety stocks and traja ectory tracking . . . . . . . . . . . . . . . . . .10.6 Fluid-scale asymptotic optimality . . . . . . . . . . . . . . . . . . . .

11 Simulation & Learning

11.4 Control variates and shadow funff ctions . . . . . . . . . . . . . . . . .11.5 Estimating a value fuff nction . . . . . . . . . . . . . . . . . . . . . . .11.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

A Markov Mo313d1188e88ls

A.1 Every process is (almost) Markorr v . . . . . . . . . . . . . . . . . . .A.2 Genera3377t77o44rs and value funff ctions . . . . . . . . . . . . . . . . . . . . .A 3 Equilibrium equations

eling & Control 34

l TecTT hniques foff r Complex Networks Draftff copy April 22 2007

duling 9EE9EEControlled random-walk model . . . . . . . . . . . . . . . . . . . . 11 ... 1066 NN1NNFluid model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11. 1077 EE

9EE

Control techniques foff r the fluid model . . . . . . . . . . . . . . . . . . 116EEx

ability & Perfoff rmance 318

mization 374Optimality equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 392Optimization in networkrr s . . . . . . . . . . . . . . . . . . . . . . . . . 408I M.. o44d0088e88 ling & Control

4.8 MaxWeWW ight and MinDriftff . . . . . . . . . . . . . . . . . . . . . . .4.9 Perturbr ed value funff ction . . . . . . . . . . . . . . . . . . . . . . .4.10 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

III Stability & Perfoff rmance 318

8 Foster-LyLL apunov TecTT hniques 319

8.1 LyLL apunov fuff nctions . . . . . . . . . AA . . . .MMaarrkkoo .vv . . .MMooddellss . . . . . . . . 324

8oo.4rmm MaannccaxWeWW ight . . . . . . . . . . . . . . . . . . .A.A 11 EE .vvee . . . .rryy pprrocc .eess . . . . .ss iss (a( llmm .ss .)) 34MMaa2arar8.5 MaxWeWW ight and the avaa erage-cost optimalitAAyA.A.22e22 quGGaGGeetieennonneen33erer3333 .444444rss .aa . . . . .nndd vvaaluue .uu .nncc34titioo8nn

I Models & BackgroundII h-MaxWeight PoliciesIII Heavy Traffic Conclusions

Outline

IModels & Background





9 Optimization 3749.4 Optimality equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 3929.6 Optimization in networks . . . . . . . . . . . . . . . . . . . . . . . . . 408

Controlled Random-Walk Model

Statistics & topology:

Constituency constraints:

- Lippman 1975- Henderson & M. 1997

A(k) =

⎡⎢⎢⎢⎣

A1(k)0

A3(k)0

⎤⎥⎥⎥⎦

B(k) =

⎡⎢⎢⎢⎣

−S1(k) 0 0 0S1(k) −S2(k) 0 00 0 −S3(k) 00 0 S3(k) −S4(k)

⎤⎥⎥⎥⎦

U(k) ≥ 0

Q(k+1) = Q(k)+B(k+1)U(k)+A(k+1), Q(0) = x

C =1 0 0 10 1 1 0

C U(k) ≤ 1

Station 1 Station 2

α1

α3

µ2

µ3

µ1

µ4

Fluid Model & Workload

- Newell 1982, - Vandergraft 1983 - Perkins & Kumar 1989 - Chen & Mandelbaum 1991, - Cruz 1991

(0) = x

Station 1 Station 2

α1

α3

µ2

µ3

µ1

µ4

q( qt) = x + Bz(t) + αt , t≥ 0

ξ1 =

⎡⎢⎢⎢⎣

m10

m4m4

⎤⎥⎥⎥⎦ ↪ ξ2 =

⎡⎢⎢⎢⎣

m2m2m30

⎤⎥⎥⎥⎦

ρ1 = m1α1 + m4α3

ρ2 = m2α1 + m3α3

with mi = µ−1i

B = E[B(k)] =

⎡⎢⎢⎢⎣

−µ1 0 0 0µ1 −µ2 0 00 0 −µ3 00 0 µ3 −µ4

⎤⎥⎥⎥⎦

α = E[A(k)] =

⎡⎢⎢⎢⎣

α10α30

⎤⎥⎥⎥⎦

Fluid modelcapturesmean-flow:

Workloadandload parameters:

Value Functions

Station 1 Station 2

α1

α3

µ2

µ3

µ1

µ4

q(t) = x + Bz(t) + αt

alue function Relative

= average cost

value function

=Q(k+1) − Q(k) B(k+1)U(k)+A(k+1)

η = c(x)π(dx)

J(x) =∞

0c(q(t; x)) dt h(x) =

∞0

E[c(Q(t; x)) − η] dt

Value Functions

- M 96, 01, ... following -Dai 95, - Dai & M 95

Station 1 Station 2

α1

α3

µ2

µ3

µ1

µ4

q(t) = x + Bz(t) + αt

alue function

Large-state solidarity

Holds for wide class of stabilizingpolicies, including average-cost optimal policy

Relative value function

=Q(k+1) − Q(k) B(k+1)U(k)+A(k+1)

η = c(x)π(dx)

limJ(x)

h(x)= 1

J(x) =∞

0c(q(t; x)) dt h(x) =

∞0

E[c(Q(t; x)) − η] dt

‖x‖→∞

Myopic Policy: Fluid Modelq(t) = x + Bz(t) + αt

Given: Convex monotone cost function,

Constraints: subset of

c : R�+

R�+

→ R+

X

feasible values of

when

U(x)

x = q(t) ∈ X

d+

dtq(t) = Bζ(t) +

ζ(t)

α

Myopic Policy: Fluid Model


Constraints: subset of

c : R�+

R�+

→ R+

X

feasible values of

when

U(x)

x = q(t) ∈ X

d+

dtq(t) = Bζ(t) +

ζ(t)

α

arg minu∈U(x)

d+

dtc(q(t)) = arg minu∈U(x)

〈∇c(x),Bu + α〉

Myopic Policy: CRW Model


Constraints: subset of (lattice constraints, etc.)

=Q(k+1) − Q(k) B(k+1)U(k)+A(k+1)

c : R�+

R�+

→ R+

X

feasible values of

when

U (x)

x = Q(k) ∈ X

U(k)



Myopic policy:

Constraints: subset of (lattice constraints, etc.)

=Q(k+1) − Q(k) B(k+1)U(k)+A(k+1)

c : R�+

R�+

→ R+

X

feasible values of

when

U (x)

x = Q(k) ∈ X

U(k)

arg minu∈U (x)

E[c(Q(k + 1)) Q(k) = x, U(k) = u]


Motivation: Average cost optimal policy is h-myopic,

is the relative value function,

=Q(k+1) − Q(k) B(k+1)U(k)+A(k+1)

h: R�+ → R+

infh(x) =∞

0E[c(Q(t; x)) − η ] dt*

U


Dynamic programming equation:

minu∈U (x)

E[h(Q h(x)(k + 1)) Q(k) = = η+c (x)−x, U(k) = u]

Motivation: Average cost optimal policy is h-myopic,

is the relative value function,

=Q(k+1) − Q(k) B(k+1)U(k)+A(k+1)

h: R�+ → R+

infh(x) =∞

0E[c(Q(t; x)) − η ] dt*

*

U

Fluid Model & Myopia

- Chen & Yao 93- M’ 01

(0) = x

Station 1 Station 2

α1

α3

µ2

µ3

µ1

µ4

q( qt) = x + Bz(t) + αt , t≥ 0

q(t) = t T≥ 00


Myopic policy for fluid model is stabilizing:

c : R�+ → R+

Myopia & Instability

- Kumar & Seidman 89- Rybko & Stolyar 93

Station 1 Station 2

α1

α3

µ2

µ3

µ1

µ4

Myopic policy may or may not be stabilizing

Example: Two station model above with linear cost,

Myopic policy for CRW model: Priority to exit buffers

c(x) = x1 + x2 + x3 + x4

Myopia & Instability

- Kumar & Seidman 89- Rybko & Stolyar 93

Station 1 Station 2

α1

α3

µ2

µ3

µ1

µ4

Myopic policy may or may not be stabilizing

Example: Two station model above with linear cost,

Myopic policy for CRW model: Priority to exit buffers

Periodic starvation creates instability

c(x) = x1 + x2 + x3 + x4

0 200 400 600 800 10000

200

400

600

800

0 200 400 600 800 10000

200

400

600

800

Poisson Arrivals and Service Fluid Arrivals and Service

Myopia & InstabilityQuadratic Cost

- Tassiulas & Ephremides 92

Station 1 Station 2

α1

α3

µ2

µ3

µ1

µ4

c(x) = 12 [x2

1 + x22 + x2

3 + x24]

Myopic policy stabilizing for diagonal quadratic

Example: Two station model above with,

Myopic policy: Approximated by linear switching curves

Myopia & InstabilityQuadratic Cost


Station 1 Station 2

α1

α3

µ2

µ3

µ1

µ4

c(x) = 12 [x2

1 + x22 + x2

3 + x24]

Myopic policy stabilizing for diagonal quadratic

Condition (V3) holds with Lyapunov function

For positive constants ε and

Example: Two station model above with,

Myopic policy: Approximated by linear switching curves

V = c

PV (x) := E[V (Q(k + 1)) Q(k) = x] ≤ V (x) − ε‖x‖ + η

η

MaxWeight Policy


Station 1 Station 2

α1

α3

µ2

µ3

µ1

µ4

Tassiulas considers myopic policy for fluid model

where ,

subject to lattice constraints

arg minu∈U (x)

c(x),Bu+ α

c(x) = 12xTDx D = diag (d1, . . . , d )

MaxWeight Policy

Station 1 Station 2

α1

α3

µ2

µ3

µ1

µ4


Obtains negative drift: For non-zero x,

Implies (V3) for MaxWeight policy


arg minu∈U (x)

c(x),Bu+ α

c(x),Bu+ α ≤ − ε‖x‖

MaxWeight Policy

Station 1 Station 2

α1

α3

µ2

µ3

µ1

µ4


Obtains negative drift: For non-zero x,

Implies (V3) for MaxWeight policy

Implies (V3) for myopic policy

since myopic has minimum drift


arg minu∈U (x)

c(x),Bu+ α

c(x),Bu+ α ≤ − ε‖x‖

Questions Since 1996 limJ(x)

h(x)= 1

‖x‖→∞

Value functions for fluid and stochastic models: Quadratic growth for linear cost with similar asymptotes; Policies are similar for large state-values


h(x)= 1

‖x‖→∞


What is the gap between policies?

What is the gap between value functions?

How to translate policy for fluid model to cope with volatility?

Connections with heavy traffic theory?


h(x)= 1

‖x‖→∞


Many positive answers in new monograph, as well as new applications for value function approximation

Today’s lecture focuses on third and fourth topics

What is the gap between policies?

What is the gap between value functions?

How to translate policy for fluid model to cope with volatility?

Connections with heavy traffic theory?

IIh-MaxWeight Policies

∆(x) = −µ1+α1

µ1

∆(x) = −µ1+α1

−µ2+µ1

∆(x) = α1

−µ2








Why Does MW Work?

∆(x) =

∆(x)

E[ ]Q(k + 1) − Q(k) Q(k) = =x Bu + α

Geometric explanation

Define drift vector field (for given policy)

MaxWeight policy:

with c diagonal quadratic

arg minu∈U (x)

c(x),

Why Does MW Work?∆(x) = E[ ]Q(k + 1) − Q(k) Q(k) = x

Example: Queues in tandem µ1 µ2

α1

MaxWeight policy: serve buffer 1

∆(x) = −µ1+α1

µ1

∆(x) = −µ1+α1

−µ2+µ1

∆(x) = α1

−µ2

x1

x2


Example: Queues in tandem

Key observation: Boundaries of the state space are repelling

µ1 µ2

α1


∆(x) = −µ1+α1

µ1

∆(x) = −µ1+α1

−µ2+µ1

∆(x) = α1

−µ2

x1

x2


Example: Queues in tandem

Key observation: Boundaries of the state space are repelling

Consequence of vanishing partial derivatives on boundary

µ1 µ2

α1


∆(x) = −µ1+α1

µ1

∆(x) = −µ1+α1

−µ2+µ1

∆(x) = α1

−µ2

x1

x2

Level sets of c (diag quadratic)

h-MaxWeight Policy

Given: Convex monotone function h

Boundary conditions

∂

∂xjh (x) = 0 when xj = 0.

∆(x) = −µ1+α1

µ1

∆(x) = −µ1+α1

−µ2+µ1

∆(x) = α1

−µ2

h-MaxWeight Policy


Boundary conditions

Economic interpretation:

Marginal disutility vanishes for vanishingly small inventory

∂


∆(x) = −µ1+α1

µ1

∆(x) = −µ1+α1

−µ2+µ1

∆(x) = α1

−µ2

h-MaxWeight Policy


Boundary conditions

Economic interpretation:

Condition rarely holds, but we can fix that ...

Marginal disutility vanishes for vanishingly small inventory

∂


∆(x) = −µ1+α1

µ1

∆(x) = −µ1+α1

−µ2+µ1

∆(x) = α1

−µ2

h-MaxWeight Policy

h0Given: Convex monotone function (perhaps violating ∂ condition)

Introduce perturbation: For fixed and anyθ ≥ 1

xi := xi + θ(e−xi/θ −1),

x ∈ R�+

and x = (x1, . . . , x�)T ∈ R

�+

∆(x) = −µ1+α1

µ1

∆(x) = −µ1+α1

−µ2+µ1

∆(x) = α1

−µ2

h-MaxWeight Policy

h(x) = h0

h0

(x), x∈ R�+

Given: Convex monotone function (perhaps violating ∂ condition)

Introduce perturbation: For fixed

Perturbed function:

Convex, monotone, and boundary conditions are satisfied

and anyθ ≥ 1

xi := xi + θ(e−xi/θ −1),

x ∈ R�+

and x = (x1, . . . , x�)T ∈ R

�+

∆(x) = −µ1+α1

µ1

∆(x) = −µ1+α1

−µ2+µ1

∆(x) = α1

−µ2

h-MaxWeight PolicyPerturbed linear function

µ1 µ2

α1

h0

h-myopic and h-MaxWeight polices stabilizing

provided is sufficiently large

linear: never satisfies ∂ condition

θ ≥ 1

h-MaxWeight PolicyPerturbed linear function

µ1 µ2

α1

h0


Example: Tandem queues


linear: never satisfies ∂ condition

θ ≥ 1

h-MaxWeight policy: serve buffer 1

Level sets of h

q2 = θ log 1 −− c1

c2( )

∆(x) = −µ1+α1

µ1

∆(x) = −µ1+α1

−µ2+µ1

∆(x) = α1

−µ2

x1

x2

q2

h-MaxWeight PolicyPerturbed value function

µ1 µ2

α1

h0



minimal fluid value function,

θ ≥ 1

J(x) = inf∞

0c(q(t; x)) dt

h-MaxWeight PolicyPerturbed value function

µ1 µ2

α1

h0



minimal fluid value function,

Resulting policy very similar to average-cost optimal policy:

θ ≥ 1

J(x) = inf∞

0c(q(t; x)) dt

Optimal policy: serve buffer 1

10 20 30 40 50 60 70 80 90 100

10

20

30

x1

x2

IIIHeavy Traffic


II Workload 158



9 Optimization 374

10 ODE methods 436


Relaxations & Asymptotic Optimality

Single example for sake of illustration:

Model of Dai & Wang

α1

Station 1

Station 2


Single example for sake of illustration:

Assume: Homogeneous model

iService rate at Station i is µ

α1

Station 1

Station 2


Homogeneous CRW model:

Q1(k + 1) − Q1(k) = −S1(k + 1)U1(k) + A1(k + 1)

Q2(k + 1) − Q2(k) = −S1(k + 1)U2(k) + S1(k + 1)U1(k)

Q3(k + 1) − Q3(k) = −S2(k + 1)U3(k) + S2(k + 1)U2(k)

Q4(k + 1) − Q4(k) = −S2(k + 1)U4(k) + S2(k + 1)U3(k)

Q5(k + 1) − Q5(k) = −S1(k + 1)U5(k) + S2(k + 1)U4(k)

α1

Station 1

Station 2


Homogeneous CRW model:

Constituency constraints:

Q1(k + 1) − Q1(k) = −S1(k + 1)U1(k) + A1(k + 1)

Q2(k + 1) − Q2(k) = −S1(k + 1)U2(k) + S1(k + 1)U1(k)

Q3(k + 1) − Q3(k) = −S2(k + 1)U3(k) + S2(k + 1)U2(k)

Q4(k + 1) − Q4(k) = −S2(k + 1)U4(k) + S2(k + 1)U3(k)

Q5(k + 1) − Q5(k) = −S1(k + 1)U5(k) + S2(k + 1)U4(k)

U1(k) + U2(k) + U5(k) ≤ 1 U3(k) + U4(k) ≤ 1

Ui(k) ∈ {0,1}

α1

Station 1

Station 2


Workload (units of inventory)

Y2(k) = 2(Q1(k) + Q2(k) + Q3(k)) + Q4(k)

Y1(k) = 3Q1(k) + 2Q2(k) + Q3(k) + Q4(k) + Q5(k)

α1

Station 1

Station 2



Idleness processes:

Y2(k) = 2(Q1(k) + Q2(k) + Q3(k)) + Q4(k)

Y1(k) = 3Q1(k) + 2Q2(k) + Q3(k) + Q4(k) + Q5(k)

ι1(k) = 1 − (U1(k) + U2(k) + U5(k))

ι2(k) = 1 − (U3(k) + U4(k))

α1

Station 1

Station 2



Dynamics:

Idleness processes:

Y2(k) = 2(Q1(k) + Q2(k) + Q3(k)) + Q4(k)

Y1(k) = 3Q1(k) + 2Q2(k) + Q3(k) + Q4(k) + Q5(k)

Y1(k + 1) − Y1(k) = −S1(k + 1) + 3A1(k + 1)

Y2(k + 1) − Y2(k) = −S2(k + 1) + 2A1(k + 1)

+

+

S1(k + 1)ι1(k)

S2(k + 1)ι2(k)

ι1(k) = 1 − (U1(k) + U2(k) + U5(k))

ι2(k) = 1 − (U3(k) + U4(k))

α1

Station 1

Station 2


Workload Relaxation of N. Laws

with constraints on idleness process relaxed,

Y1(k + 1) − Y1(k) = −S1(k + 1) + 3A1(k + 1) + S1(k + 1)ι1(k)

ι1(k) ∈ {0, 1,2, . . . }

- Laws 90- Kelly & Laws 93

α1

Station 1

Station 2


c(y) = min c(x)

s. t. 3x1 + 2x2 + x3 + x4 + x5 = y

x ∈ Z5+ (+ buffer constraints)

Workload Relaxation of N. Laws

Optimization based on the effective cost,

with constraints on idleness process relaxed,

Y1(k + 1) − Y1(k) = −S1(k + 1) + 3A1(k + 1) + S1(k + 1)ι1(k)

ι1(k) ∈ {0, 1,2, . . . }

- Laws 90- Kelly & Laws 93- Harrison, Kushner, Reiman, Williams, Dai, Bramson, ...

α1

Station 1

Station 2

Asymptotic Optimality

Optimal policy is non-idling for one-dimensional relaxation

Dynamic programing equation solved via Pollaczek-Khintchine formula

Asymptotic OptimalityHeavy traffic assumptions

Load is unity for nominal model

Single bottleneck to define relaxation

Cost is linear, and effective cost has a unique optimizer

Model sequence:

Load less than unity for each n

A(n)(k) =

{A(k) 1 − n−1

0 with probability

with probability

n−1


h-MaxWeight policy asymptotically optimal, with logarithmic regret

h0(x) = h∗(y) + b2 c(( (x) − c(y)

2



h0(x) = h∗(y) + b2 c(( (x) − c(y)

2

η

η average cost under h-MW policy

∗ = O(n optimal average cost for relaxation)



h0(x) = h∗(y) + b2 c(( (x) − c(y)

2

η

η average cost under h-MW policy

∗ = O(n optimal average cost for relaxation)

η∗ ≤ η ≤ η∗ + O(log(n))

Conclusions



10 ODE methods 436




11.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534


List of Figures 572

Conclusions

h-MaxWeight policy stabilizing under very general conds.

General approach to policy translation. Resulting policy mirrors optimal policy in examples

Asymptotically optimal, with logarithmic regret for model with single bottleneck

Conclusions

Future work

h-MaxWeight policy stabilizing under very general conds.

General approach to policy translation. Resulting policy mirrors optimal policy in examples

Asymptotically optimal, with logarithmic regret for model with single bottleneck

Models with multiple bottlenecks?

On-line learning for policy improvement?


II Workload 158



9 Optimization 374

10 ODE methods 436




10 ODE methods 436




11.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534


List of Figures 572





9 Optimization 3749.4 Optimality equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 3929.6 Optimization in networks . . . . . . . . . . . . . . . . . . . . . . . . . 408I Modeling & Control 34






WoWW rkload 158

WoWW rkload & Scheduling 1595.1 Single server queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1605.2 WoWW rkrr load foff r the CRWRR scheduling model . . . . . . . . . . . . . . . . 1635.3 Relaxations foff r the fluid model . . . . . . . . . . . . . . . . . . . . . . 167

I Stability & Perfoff rmance 318

Optimization 374

0 ODE methods 436

10.5 Safety stocks and traja ectory tracking . . . . . . . . . . . . . . . . . . 46210.6 Fluid-scale asymptotic optimality . . . . . . . . . . . . . . . . . . . . . 467

Control Tec11TT hn151588iques foff r Complex Networks Draftff copy Ap

III Stability11&6600 Perfoff rmance

10 ODE metho11d6677s77

10.5 Safety stocks and traja ectory tracking . . . . . . . . . . . . . . . . . .10.6 Fluid-scale asymptotic optimality . . . . . . . . . . . . . . . . . . . .

11 Simulation & Learning

11.4 Control variates and shadow funff ctions . . . . . . . . . . . . . . . . .11.5 Estimating a value fuff nction . . . . . . . . . . . . . . . . . . . . . . .11.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

A Markov Mo313d1188e88ls

A.1 Every process is (almost) Markorr v . . . . . . . . . . . . . . . . . . .A.2 Genera3377t77o44rs and value funff ctions . . . . . . . . . . . . . . . . . . . . .A.3 Equilibrium equations . . . . . . . . . . . . . . . . . . . . . . . .A.4 Criter44i44a433 f66off r stability . . . . . . . . . . . . . . . . . . . . . . . . . . .A... 5.. E.. r.go.. ..r. d44i4466 t22heorems and coupling . . . . . . . . . . . . . . . . . . . . .A... 6.. C.. o..nv.nn e4646s6677e77 theorems . . . . . . . . . . . . . . . . . . . . . . . . . . .

eling & Control 34

l TecTT hniques foff r Complex Networks Draftff copy April 22 2007

duling 9EE9EEControlled random-walk model . . . . . . . . . . . . . . . . . . . . 11 ... 1066 NN1NNFluid model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11. 1077 EE

9EE

Control techniques foff r the fluid model . . . . . . . . . . . . . . . . . . 116EEx

ability & Perfoff rmance 318

mization 374Optimality equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 392Optimization in networkrr s . . . . . . . . . . . . . . . . . . . . . . . . . 408I M.. o44d0088e88 ling & Control

4.8 MaxWeWW ight and MinDriftff . . . . . . . . . . . . . . . . . . . . . . .4.9 Perturbr ed value funff ction . . . . . . . . . . . . . . . . . . . . . . .4.10 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

III Stability & Perfoff rmance 318

8 FosterAAAALAA yMMLL aMMpunMMaarrkkokkoovovv TMMecMMooTMM hnddeellillssquss es 319

8.1 LyLL apAAuA.A.n11ovEEfEEvvuvvefvvnerercrryytiyy ponpprroosoocc . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3..24..

8.4 MaxWA.A.22eWW ighGGE

tGGe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34.. 2..

8.5 MaxWA.A.33eWW ighEEC

tEEqqaqquunu d44liibt33

bbrh3366rrie66 avaa eraeeqquugu

ilile-ctt ooossst op. ti.m.. alit. .. y eq..ua. ti.on. . . . . . . . . . . 34.. 8..

N. Laws. Dynamic routing in queueing networks. PhD thesis, Cambridge University, Cambridge, UK, 1990.

L. Tassiulas. Adaptive back-pressure congestion control based on local information. 40(2):236–250, 1995.

L. Tassiulas and A. Ephremides. Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks. 1992.

S. P. Meyn. Sequencing and routing in multiclass queueing networks. Part II: Workload relaxations. 2003.

S. P. Meyn. Stability and asymptotic optimality of generalized MaxWeight policies. Submitted for publication, 2006.

S. P. Meyn. Control techniques for complex networks. To appear, Cambridge University Press, 2007.

References

stability and asymptotic optimality - university of florida · stability and asymptotic optimality...

Documents