stability and asymptotic optimality - university of florida · stability and asymptotic optimality...
TRANSCRIPT
- A. Rybko, 2006
Stability and Asymptotic Optimality of h-MaxWeight Policies
Sean Meyn
Department of Electrical and Computer Engineering University of Illinois & the Coordinated Science Laboratory
Sean Meyn
Department of Electrical and Computer Engineering University of Illinois & the Coordinated Science Laboratory
NSF support: ECS 05-23620 and DARPA ITMANET
Control Techniques for Complex Networks Draft copy April 22 2007
II Workload 158
5 Workload & Scheduling 1595.1 Single server queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1605.2 Workload for the CRW scheduling model . . . . . . . . . . . . . . . . 1635.3 Relaxations for the fluid model . . . . . . . . . . . . . . . . . . . . . . 167
III Stability & Performance 318
9 Optimization 374
10 ODE methods 436
10.5 Safety stocks and trajectory tracking . . . . . . . . . . . . . . . . . . . 46210.6 Fluid-scale asymptotic optimality . . . . . . . . . . . . . . . . . . . . . 467
Control Techniques for Complex Networks Draft copy April 22 2007
III Stability & Performance 318
10 ODE methods 436
10.5 Safety stocks and trajectory tracking . . . . . . . . . . . . . . . . . . . 46210.6 Fluid-scale asymptotic optimality . . . . . . . . . . . . . . . . . . . . . 467
11 Simulation & Learning 485
11.4 Control variates and shadow functions . . . . . . . . . . . . . . . . . . 50311.5 Estimating a value function . . . . . . . . . . . . . . . . . . . . . . . . 51611.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532
11.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534
A Markov Models 538A.1 Every process is (almost) Markov . . . . . . . . . . . . . . . . . . . . 538A.2 Generators and value functions . . . . . . . . . . . . . . . . . . . . . . 540A.3 Equilibrium equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 543A.4 Criteria for stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552A.5 Ergodic theorems and coupling . . . . . . . . . . . . . . . . . . . . . . 560A.6 Converse theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568
List of Figures 572
I Modeling & Control 34
Control Techniques for Complex Networks Draft copy April 22 2007
4 Scheduling 994.1 Controlled random-walk model . . . . . . . . . . . . . . . . . . . . . . 1014.2 Fluid model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094.3 Control techniques for the fluid model . . . . . . . . . . . . . . . . . . 116
III Stability & Performance 318
9 Optimization 3749.4 Optimality equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 3929.6 Optimization in networks . . . . . . . . . . . . . . . . . . . . . . . . . 408I Modeling & Control 34
4.8 MaxWeight and MinDrift . . . . . . . . . . . . . . . . . . . . . . . . . 1454.9 Perturbed value function . . . . . . . . . . . . . . . . . . . . . . . . . 1484.10 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1524.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
III Stability & Performance 318
8 Foster-Lyapunov Techniques 319
8.1 Lyapunov functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
8.4 MaxWeight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3428.5 MaxWeight and the average-cost optimality equation . . . . . . . . . . 348
WoWW rkload & Scheduling 1595.1 Single server queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1605.2 WoWW rkrr load foff r the CRWRR scheduling model . . . . . . . . . . . . . . . . 1635.3 Relaxations foff r the fluid model . . . . . . . . . . . . . . . . . . . . . . 167
I Stability & Perfoff rmance 318
Optimization 374
III Stability11&6600 Perfoff rmance
10 ODE metho11d6677s77
10.5 Safety stocks and traja ectory tracking . . . . . . . . . . . . . . . . . .10.6 Fluid-scale asymptotic optimality . . . . . . . . . . . . . . . . . . . .
11 Simulation & Learning
11.4 Control variates and shadow funff ctions . . . . . . . . . . . . . . . . .11.5 Estimating a value fuff nction . . . . . . . . . . . . . . . . . . . . . . .11.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A Markov Mo313d1188e88ls
A.1 Every process is (almost) Markorr v . . . . . . . . . . . . . . . . . . .A.2 Genera3377t77o44rs and value funff ctions . . . . . . . . . . . . . . . . . . . . .A 3 Equilibrium equations
eling & Control 34
l TecTT hniques foff r Complex Networks Draftff copy April 22 2007
duling 9EE9EEControlled random-walk model . . . . . . . . . . . . . . . . . . . . 11 ... 1066 NN1NNFluid model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11. 1077 EE
9EE
Control techniques foff r the fluid model . . . . . . . . . . . . . . . . . . 116EEx
ability & Perfoff rmance 318
mization 374Optimality equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 392Optimization in networkrr s . . . . . . . . . . . . . . . . . . . . . . . . . 408I M.. o44d0088e88 ling & Control
4.8 MaxWeWW ight and MinDriftff . . . . . . . . . . . . . . . . . . . . . . .4.9 Perturbr ed value funff ction . . . . . . . . . . . . . . . . . . . . . . .4.10 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
III Stability & Perfoff rmance 318
8 Foster-LyLL apunov TecTT hniques 319
8.1 LyLL apunov fuff nctions . . . . . . . . . AA . . . .MMaarrkkoo .vv . . .MMooddellss . . . . . . . . 324
8oo.4rmm MaannccaxWeWW ight . . . . . . . . . . . . . . . . . . .A.A 11 EE .vvee . . . .rryy pprrocc .eess . . . . .ss iss (a( llmm .ss .)) 34MMaa2arar8.5 MaxWeWW ight and the avaa erage-cost optimalitAAyA.A.22e22 quGGaGGeetieennonneen33erer3333 .444444rss .aa . . . . .nndd vvaaluue .uu .nncc34titioo8nn
I Models & BackgroundII h-MaxWeight PoliciesIII Heavy Traffic Conclusions
Outline
IModels & Background
I Modeling & Control 34
Control Techniques for Complex Networks Draft copy April 22 2007
4 Scheduling 994.1 Controlled random-walk model . . . . . . . . . . . . . . . . . . . . . . 1014.2 Fluid model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094.3 Control techniques for the fluid model . . . . . . . . . . . . . . . . . . 116
III Stability & Performance 318
9 Optimization 3749.4 Optimality equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 3929.6 Optimization in networks . . . . . . . . . . . . . . . . . . . . . . . . . 408
Controlled Random-Walk Model
Statistics & topology:
Constituency constraints:
- Lippman 1975- Henderson & M. 1997
A(k) =
⎡⎢⎢⎢⎣
A1(k)0
A3(k)0
⎤⎥⎥⎥⎦
B(k) =
⎡⎢⎢⎢⎣
−S1(k) 0 0 0S1(k) −S2(k) 0 00 0 −S3(k) 00 0 S3(k) −S4(k)
⎤⎥⎥⎥⎦
U(k) ≥ 0
Q(k+1) = Q(k)+B(k+1)U(k)+A(k+1), Q(0) = x
C =1 0 0 10 1 1 0
C U(k) ≤ 1
Station 1 Station 2
α1
α3
µ2
µ3
µ1
µ4
Fluid Model & Workload
- Newell 1982, - Vandergraft 1983 - Perkins & Kumar 1989 - Chen & Mandelbaum 1991, - Cruz 1991
(0) = x
Station 1 Station 2
α1
α3
µ2
µ3
µ1
µ4
q( qt) = x + Bz(t) + αt , t≥ 0
ξ1 =
⎡⎢⎢⎢⎣
m10
m4m4
⎤⎥⎥⎥⎦ ↪ ξ2 =
⎡⎢⎢⎢⎣
m2m2m30
⎤⎥⎥⎥⎦
ρ1 = m1α1 + m4α3
ρ2 = m2α1 + m3α3
with mi = µ−1i
B = E[B(k)] =
⎡⎢⎢⎢⎣
−µ1 0 0 0µ1 −µ2 0 00 0 −µ3 00 0 µ3 −µ4
⎤⎥⎥⎥⎦
α = E[A(k)] =
⎡⎢⎢⎢⎣
α10α30
⎤⎥⎥⎥⎦
Fluid modelcapturesmean-flow:
Workloadandload parameters:
Value Functions
Station 1 Station 2
α1
α3
µ2
µ3
µ1
µ4
q(t) = x + Bz(t) + αt
alue function Relative
= average cost
value function
=Q(k+1) − Q(k) B(k+1)U(k)+A(k+1)
η = c(x)π(dx)
J(x) =∞
0c(q(t; x)) dt h(x) =
∞0
E[c(Q(t; x)) − η] dt
Value Functions
- M 96, 01, ... following -Dai 95, - Dai & M 95
Station 1 Station 2
α1
α3
µ2
µ3
µ1
µ4
q(t) = x + Bz(t) + αt
alue function
Large-state solidarity
Holds for wide class of stabilizingpolicies, including average-cost optimal policy
Relative value function
=Q(k+1) − Q(k) B(k+1)U(k)+A(k+1)
η = c(x)π(dx)
limJ(x)
h(x)= 1
J(x) =∞
0c(q(t; x)) dt h(x) =
∞0
E[c(Q(t; x)) − η] dt
‖x‖→∞
Myopic Policy: Fluid Modelq(t) = x + Bz(t) + αt
Given: Convex monotone cost function,
Constraints: subset of
c : R�+
R�+
→ R+
X
feasible values of
when
U(x)
x = q(t) ∈ X
d+
dtq(t) = Bζ(t) +
ζ(t)
α
Myopic Policy: Fluid Model
Given: Convex monotone cost function,
Constraints: subset of
c : R�+
R�+
→ R+
X
feasible values of
when
U(x)
x = q(t) ∈ X
d+
dtq(t) = Bζ(t) +
ζ(t)
α
arg minu∈U(x)
d+
dtc(q(t)) = arg minu∈U(x)
〈∇c(x),Bu + α〉
Myopic Policy: CRW Model
Given: Convex monotone cost function,
Constraints: subset of (lattice constraints, etc.)
=Q(k+1) − Q(k) B(k+1)U(k)+A(k+1)
c : R�+
R�+
→ R+
X
feasible values of
when
U (x)
x = Q(k) ∈ X
U(k)
Myopic Policy: CRW Model
Given: Convex monotone cost function,
Myopic policy:
Constraints: subset of (lattice constraints, etc.)
=Q(k+1) − Q(k) B(k+1)U(k)+A(k+1)
c : R�+
R�+
→ R+
X
feasible values of
when
U (x)
x = Q(k) ∈ X
U(k)
arg minu∈U (x)
E[c(Q(k + 1)) Q(k) = x, U(k) = u]
Myopic Policy: CRW Model
Motivation: Average cost optimal policy is h-myopic,
is the relative value function,
=Q(k+1) − Q(k) B(k+1)U(k)+A(k+1)
h: R�+ → R+
infh(x) =∞
0E[c(Q(t; x)) − η ] dt*
U
Myopic Policy: CRW Model
Dynamic programming equation:
minu∈U (x)
E[h(Q h(x)(k + 1)) Q(k) = = η+c (x)−x, U(k) = u]
Motivation: Average cost optimal policy is h-myopic,
is the relative value function,
=Q(k+1) − Q(k) B(k+1)U(k)+A(k+1)
h: R�+ → R+
infh(x) =∞
0E[c(Q(t; x)) − η ] dt*
*
U
Fluid Model & Myopia
- Chen & Yao 93- M’ 01
(0) = x
Station 1 Station 2
α1
α3
µ2
µ3
µ1
µ4
q( qt) = x + Bz(t) + αt , t≥ 0
q(t) = t T≥ 00
Given: Convex monotone cost function,
Myopic policy for fluid model is stabilizing:
c : R�+ → R+
Myopia & Instability
- Kumar & Seidman 89- Rybko & Stolyar 93
Station 1 Station 2
α1
α3
µ2
µ3
µ1
µ4
Myopic policy may or may not be stabilizing
Example: Two station model above with linear cost,
Myopic policy for CRW model: Priority to exit buffers
c(x) = x1 + x2 + x3 + x4
Myopia & Instability
- Kumar & Seidman 89- Rybko & Stolyar 93
Station 1 Station 2
α1
α3
µ2
µ3
µ1
µ4
Myopic policy may or may not be stabilizing
Example: Two station model above with linear cost,
Myopic policy for CRW model: Priority to exit buffers
Periodic starvation creates instability
c(x) = x1 + x2 + x3 + x4
0 200 400 600 800 10000
200
400
600
800
0 200 400 600 800 10000
200
400
600
800
Poisson Arrivals and Service Fluid Arrivals and Service
Myopia & InstabilityQuadratic Cost
- Tassiulas & Ephremides 92
Station 1 Station 2
α1
α3
µ2
µ3
µ1
µ4
c(x) = 12 [x2
1 + x22 + x2
3 + x24]
Myopic policy stabilizing for diagonal quadratic
Example: Two station model above with,
Myopic policy: Approximated by linear switching curves
Myopia & InstabilityQuadratic Cost
- Tassiulas & Ephremides 92
Station 1 Station 2
α1
α3
µ2
µ3
µ1
µ4
c(x) = 12 [x2
1 + x22 + x2
3 + x24]
Myopic policy stabilizing for diagonal quadratic
Condition (V3) holds with Lyapunov function
For positive constants ε and
Example: Two station model above with,
Myopic policy: Approximated by linear switching curves
V = c
PV (x) := E[V (Q(k + 1)) Q(k) = x] ≤ V (x) − ε‖x‖ + η
η
MaxWeight Policy
- Tassiulas & Ephremides 92
Station 1 Station 2
α1
α3
µ2
µ3
µ1
µ4
Tassiulas considers myopic policy for fluid model
where ,
subject to lattice constraints
arg minu∈U (x)
c(x),Bu+ α
c(x) = 12xTDx D = diag (d1, . . . , d )
MaxWeight Policy
Station 1 Station 2
α1
α3
µ2
µ3
µ1
µ4
Tassiulas considers myopic policy for fluid model
Obtains negative drift: For non-zero x,
Implies (V3) for MaxWeight policy
subject to lattice constraints
arg minu∈U (x)
c(x),Bu+ α
c(x),Bu+ α ≤ − ε‖x‖
MaxWeight Policy
Station 1 Station 2
α1
α3
µ2
µ3
µ1
µ4
Tassiulas considers myopic policy for fluid model
Obtains negative drift: For non-zero x,
Implies (V3) for MaxWeight policy
Implies (V3) for myopic policy
since myopic has minimum drift
subject to lattice constraints
arg minu∈U (x)
c(x),Bu+ α
c(x),Bu+ α ≤ − ε‖x‖
Questions Since 1996 limJ(x)
h(x)= 1
‖x‖→∞
Value functions for fluid and stochastic models: Quadratic growth for linear cost with similar asymptotes; Policies are similar for large state-values
Questions Since 1996 limJ(x)
h(x)= 1
‖x‖→∞
Value functions for fluid and stochastic models: Quadratic growth for linear cost with similar asymptotes; Policies are similar for large state-values
What is the gap between policies?
What is the gap between value functions?
How to translate policy for fluid model to cope with volatility?
Connections with heavy traffic theory?
Questions Since 1996 limJ(x)
h(x)= 1
‖x‖→∞
Value functions for fluid and stochastic models: Quadratic growth for linear cost with similar asymptotes; Policies are similar for large state-values
Many positive answers in new monograph, as well as new applications for value function approximation
Today’s lecture focuses on third and fourth topics
What is the gap between policies?
What is the gap between value functions?
How to translate policy for fluid model to cope with volatility?
Connections with heavy traffic theory?
IIh-MaxWeight Policies
∆(x) = −µ1+α1
µ1
∆(x) = −µ1+α1
−µ2+µ1
∆(x) = α1
−µ2
I Modeling & Control 34
Control Techniques for Complex Networks Draft copy April 22 2007
4.8 MaxWeight and MinDrift . . . . . . . . . . . . . . . . . . . . . . . . . 1454.9 Perturbed value function . . . . . . . . . . . . . . . . . . . . . . . . . 1484.10 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1524.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
III Stability & Performance 318
8 Foster-Lyapunov Techniques 319
8.1 Lyapunov functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
8.4 MaxWeight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3428.5 MaxWeight and the average-cost optimality equation . . . . . . . . . . 348
Why Does MW Work?
∆(x) =
∆(x)
E[ ]Q(k + 1) − Q(k) Q(k) = =x Bu + α
Geometric explanation
Define drift vector field (for given policy)
MaxWeight policy:
with c diagonal quadratic
arg minu∈U (x)
c(x),
Why Does MW Work?∆(x) = E[ ]Q(k + 1) − Q(k) Q(k) = x
Example: Queues in tandem µ1 µ2
α1
MaxWeight policy: serve buffer 1
∆(x) = −µ1+α1
µ1
∆(x) = −µ1+α1
−µ2+µ1
∆(x) = α1
−µ2
x1
x2
Why Does MW Work?∆(x) = E[ ]Q(k + 1) − Q(k) Q(k) = x
Example: Queues in tandem
Key observation: Boundaries of the state space are repelling
µ1 µ2
α1
MaxWeight policy: serve buffer 1
∆(x) = −µ1+α1
µ1
∆(x) = −µ1+α1
−µ2+µ1
∆(x) = α1
−µ2
x1
x2
Why Does MW Work?∆(x) = E[ ]Q(k + 1) − Q(k) Q(k) = x
Example: Queues in tandem
Key observation: Boundaries of the state space are repelling
Consequence of vanishing partial derivatives on boundary
µ1 µ2
α1
MaxWeight policy: serve buffer 1
∆(x) = −µ1+α1
µ1
∆(x) = −µ1+α1
−µ2+µ1
∆(x) = α1
−µ2
x1
x2
Level sets of c (diag quadratic)
h-MaxWeight Policy
Given: Convex monotone function h
Boundary conditions
∂
∂xjh (x) = 0 when xj = 0.
∆(x) = −µ1+α1
µ1
∆(x) = −µ1+α1
−µ2+µ1
∆(x) = α1
−µ2
h-MaxWeight Policy
Given: Convex monotone function h
Boundary conditions
Economic interpretation:
Marginal disutility vanishes for vanishingly small inventory
∂
∂xjh (x) = 0 when xj = 0.
∆(x) = −µ1+α1
µ1
∆(x) = −µ1+α1
−µ2+µ1
∆(x) = α1
−µ2
h-MaxWeight Policy
Given: Convex monotone function h
Boundary conditions
Economic interpretation:
Condition rarely holds, but we can fix that ...
Marginal disutility vanishes for vanishingly small inventory
∂
∂xjh (x) = 0 when xj = 0.
∆(x) = −µ1+α1
µ1
∆(x) = −µ1+α1
−µ2+µ1
∆(x) = α1
−µ2
h-MaxWeight Policy
h0Given: Convex monotone function (perhaps violating ∂ condition)
Introduce perturbation: For fixed and anyθ ≥ 1
xi := xi + θ(e−xi/θ −1),
x ∈ R�+
and x = (x1, . . . , x�)T ∈ R
�+
∆(x) = −µ1+α1
µ1
∆(x) = −µ1+α1
−µ2+µ1
∆(x) = α1
−µ2
h-MaxWeight Policy
h(x) = h0
h0
(x), x∈ R�+
Given: Convex monotone function (perhaps violating ∂ condition)
Introduce perturbation: For fixed
Perturbed function:
Convex, monotone, and boundary conditions are satisfied
and anyθ ≥ 1
xi := xi + θ(e−xi/θ −1),
x ∈ R�+
and x = (x1, . . . , x�)T ∈ R
�+
∆(x) = −µ1+α1
µ1
∆(x) = −µ1+α1
−µ2+µ1
∆(x) = α1
−µ2
h-MaxWeight PolicyPerturbed linear function
µ1 µ2
α1
h0
h-myopic and h-MaxWeight polices stabilizing
provided is sufficiently large
linear: never satisfies ∂ condition
θ ≥ 1
h-MaxWeight PolicyPerturbed linear function
µ1 µ2
α1
h0
h-myopic and h-MaxWeight polices stabilizing
Example: Tandem queues
provided is sufficiently large
linear: never satisfies ∂ condition
θ ≥ 1
h-MaxWeight policy: serve buffer 1
Level sets of h
q2 = θ log 1 −− c1
c2( )
∆(x) = −µ1+α1
µ1
∆(x) = −µ1+α1
−µ2+µ1
∆(x) = α1
−µ2
x1
x2
q2
h-MaxWeight PolicyPerturbed value function
µ1 µ2
α1
h0
h-myopic and h-MaxWeight polices stabilizing
provided is sufficiently large
minimal fluid value function,
θ ≥ 1
J(x) = inf∞
0c(q(t; x)) dt
h-MaxWeight PolicyPerturbed value function
µ1 µ2
α1
h0
h-myopic and h-MaxWeight polices stabilizing
provided is sufficiently large
minimal fluid value function,
Resulting policy very similar to average-cost optimal policy:
θ ≥ 1
J(x) = inf∞
0c(q(t; x)) dt
Optimal policy: serve buffer 1
10 20 30 40 50 60 70 80 90 100
10
20
30
x1
x2
IIIHeavy Traffic
Control Techniques for Complex Networks Draft copy April 22 2007
II Workload 158
5 Workload & Scheduling 1595.1 Single server queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1605.2 Workload for the CRW scheduling model . . . . . . . . . . . . . . . . 1635.3 Relaxations for the fluid model . . . . . . . . . . . . . . . . . . . . . . 167
III Stability & Performance 318
9 Optimization 374
10 ODE methods 436
10.5 Safety stocks and trajectory tracking . . . . . . . . . . . . . . . . . . . 46210.6 Fluid-scale asymptotic optimality . . . . . . . . . . . . . . . . . . . . . 467
Relaxations & Asymptotic Optimality
Single example for sake of illustration:
Model of Dai & Wang
α1
Station 1
Station 2
Relaxations & Asymptotic Optimality
Single example for sake of illustration:
Assume: Homogeneous model
iService rate at Station i is µ
α1
Station 1
Station 2
Relaxations & Asymptotic Optimality
Homogeneous CRW model:
Q1(k + 1) − Q1(k) = −S1(k + 1)U1(k) + A1(k + 1)
Q2(k + 1) − Q2(k) = −S1(k + 1)U2(k) + S1(k + 1)U1(k)
Q3(k + 1) − Q3(k) = −S2(k + 1)U3(k) + S2(k + 1)U2(k)
Q4(k + 1) − Q4(k) = −S2(k + 1)U4(k) + S2(k + 1)U3(k)
Q5(k + 1) − Q5(k) = −S1(k + 1)U5(k) + S2(k + 1)U4(k)
α1
Station 1
Station 2
Relaxations & Asymptotic Optimality
Homogeneous CRW model:
Constituency constraints:
Q1(k + 1) − Q1(k) = −S1(k + 1)U1(k) + A1(k + 1)
Q2(k + 1) − Q2(k) = −S1(k + 1)U2(k) + S1(k + 1)U1(k)
Q3(k + 1) − Q3(k) = −S2(k + 1)U3(k) + S2(k + 1)U2(k)
Q4(k + 1) − Q4(k) = −S2(k + 1)U4(k) + S2(k + 1)U3(k)
Q5(k + 1) − Q5(k) = −S1(k + 1)U5(k) + S2(k + 1)U4(k)
U1(k) + U2(k) + U5(k) ≤ 1 U3(k) + U4(k) ≤ 1
Ui(k) ∈ {0,1}
α1
Station 1
Station 2
Relaxations & Asymptotic Optimality
Workload (units of inventory)
Y2(k) = 2(Q1(k) + Q2(k) + Q3(k)) + Q4(k)
Y1(k) = 3Q1(k) + 2Q2(k) + Q3(k) + Q4(k) + Q5(k)
α1
Station 1
Station 2
Relaxations & Asymptotic Optimality
Workload (units of inventory)
Idleness processes:
Y2(k) = 2(Q1(k) + Q2(k) + Q3(k)) + Q4(k)
Y1(k) = 3Q1(k) + 2Q2(k) + Q3(k) + Q4(k) + Q5(k)
ι1(k) = 1 − (U1(k) + U2(k) + U5(k))
ι2(k) = 1 − (U3(k) + U4(k))
α1
Station 1
Station 2
Relaxations & Asymptotic Optimality
Workload (units of inventory)
Dynamics:
Idleness processes:
Y2(k) = 2(Q1(k) + Q2(k) + Q3(k)) + Q4(k)
Y1(k) = 3Q1(k) + 2Q2(k) + Q3(k) + Q4(k) + Q5(k)
Y1(k + 1) − Y1(k) = −S1(k + 1) + 3A1(k + 1)
Y2(k + 1) − Y2(k) = −S2(k + 1) + 2A1(k + 1)
+
+
S1(k + 1)ι1(k)
S2(k + 1)ι2(k)
ι1(k) = 1 − (U1(k) + U2(k) + U5(k))
ι2(k) = 1 − (U3(k) + U4(k))
α1
Station 1
Station 2
Relaxations & Asymptotic Optimality
Workload Relaxation of N. Laws
with constraints on idleness process relaxed,
Y1(k + 1) − Y1(k) = −S1(k + 1) + 3A1(k + 1) + S1(k + 1)ι1(k)
ι1(k) ∈ {0, 1,2, . . . }
- Laws 90- Kelly & Laws 93
α1
Station 1
Station 2
Relaxations & Asymptotic Optimality
c(y) = min c(x)
s. t. 3x1 + 2x2 + x3 + x4 + x5 = y
x ∈ Z5+ (+ buffer constraints)
Workload Relaxation of N. Laws
Optimization based on the effective cost,
with constraints on idleness process relaxed,
Y1(k + 1) − Y1(k) = −S1(k + 1) + 3A1(k + 1) + S1(k + 1)ι1(k)
ι1(k) ∈ {0, 1,2, . . . }
- Laws 90- Kelly & Laws 93- Harrison, Kushner, Reiman, Williams, Dai, Bramson, ...
α1
Station 1
Station 2
Asymptotic Optimality
Optimal policy is non-idling for one-dimensional relaxation
Dynamic programing equation solved via Pollaczek-Khintchine formula
Asymptotic OptimalityHeavy traffic assumptions
Load is unity for nominal model
Single bottleneck to define relaxation
Cost is linear, and effective cost has a unique optimizer
Model sequence:
Load less than unity for each n
A(n)(k) =
{A(k) 1 − n−1
0 with probability
with probability
n−1
Asymptotic Optimality
h-MaxWeight policy asymptotically optimal, with logarithmic regret
h0(x) = h∗(y) + b2 c(( (x) − c(y)
2
Asymptotic Optimality
h-MaxWeight policy asymptotically optimal, with logarithmic regret
h0(x) = h∗(y) + b2 c(( (x) − c(y)
2
η
η average cost under h-MW policy
∗ = O(n optimal average cost for relaxation)
Asymptotic Optimality
h-MaxWeight policy asymptotically optimal, with logarithmic regret
h0(x) = h∗(y) + b2 c(( (x) − c(y)
2
η
η average cost under h-MW policy
∗ = O(n optimal average cost for relaxation)
η∗ ≤ η ≤ η∗ + O(log(n))
Conclusions
Control Techniques for Complex Networks Draft copy April 22 2007
III Stability & Performance 318
10 ODE methods 436
10.5 Safety stocks and trajectory tracking . . . . . . . . . . . . . . . . . . . 46210.6 Fluid-scale asymptotic optimality . . . . . . . . . . . . . . . . . . . . . 467
11 Simulation & Learning 485
11.4 Control variates and shadow functions . . . . . . . . . . . . . . . . . . 50311.5 Estimating a value function . . . . . . . . . . . . . . . . . . . . . . . . 51611.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532
11.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534
A Markov Models 538A.1 Every process is (almost) Markov . . . . . . . . . . . . . . . . . . . . 538A.2 Generators and value functions . . . . . . . . . . . . . . . . . . . . . . 540A.3 Equilibrium equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 543A.4 Criteria for stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552A.5 Ergodic theorems and coupling . . . . . . . . . . . . . . . . . . . . . . 560A.6 Converse theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568
List of Figures 572
Conclusions
h-MaxWeight policy stabilizing under very general conds.
General approach to policy translation. Resulting policy mirrors optimal policy in examples
Asymptotically optimal, with logarithmic regret for model with single bottleneck
Conclusions
Future work
h-MaxWeight policy stabilizing under very general conds.
General approach to policy translation. Resulting policy mirrors optimal policy in examples
Asymptotically optimal, with logarithmic regret for model with single bottleneck
Models with multiple bottlenecks?
On-line learning for policy improvement?
Control Techniques for Complex Networks Draft copy April 22 2007
II Workload 158
5 Workload & Scheduling 1595.1 Single server queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1605.2 Workload for the CRW scheduling model . . . . . . . . . . . . . . . . 1635.3 Relaxations for the fluid model . . . . . . . . . . . . . . . . . . . . . . 167
III Stability & Performance 318
9 Optimization 374
10 ODE methods 436
10.5 Safety stocks and trajectory tracking . . . . . . . . . . . . . . . . . . . 46210.6 Fluid-scale asymptotic optimality . . . . . . . . . . . . . . . . . . . . . 467
Control Techniques for Complex Networks Draft copy April 22 2007
III Stability & Performance 318
10 ODE methods 436
10.5 Safety stocks and trajectory tracking . . . . . . . . . . . . . . . . . . . 46210.6 Fluid-scale asymptotic optimality . . . . . . . . . . . . . . . . . . . . . 467
11 Simulation & Learning 485
11.4 Control variates and shadow functions . . . . . . . . . . . . . . . . . . 50311.5 Estimating a value function . . . . . . . . . . . . . . . . . . . . . . . . 51611.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532
11.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534
A Markov Models 538A.1 Every process is (almost) Markov . . . . . . . . . . . . . . . . . . . . 538A.2 Generators and value functions . . . . . . . . . . . . . . . . . . . . . . 540A.3 Equilibrium equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 543A.4 Criteria for stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552A.5 Ergodic theorems and coupling . . . . . . . . . . . . . . . . . . . . . . 560A.6 Converse theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568
List of Figures 572
I Modeling & Control 34
Control Techniques for Complex Networks Draft copy April 22 2007
4 Scheduling 994.1 Controlled random-walk model . . . . . . . . . . . . . . . . . . . . . . 1014.2 Fluid model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094.3 Control techniques for the fluid model . . . . . . . . . . . . . . . . . . 116
III Stability & Performance 318
9 Optimization 3749.4 Optimality equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 3929.6 Optimization in networks . . . . . . . . . . . . . . . . . . . . . . . . . 408I Modeling & Control 34
4.8 MaxWeight and MinDrift . . . . . . . . . . . . . . . . . . . . . . . . . 1454.9 Perturbed value function . . . . . . . . . . . . . . . . . . . . . . . . . 1484.10 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1524.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
III Stability & Performance 318
8 Foster-Lyapunov Techniques 319
8.1 Lyapunov functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
8.4 MaxWeight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3428.5 MaxWeight and the average-cost optimality equation . . . . . . . . . . 348
WoWW rkload 158
WoWW rkload & Scheduling 1595.1 Single server queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1605.2 WoWW rkrr load foff r the CRWRR scheduling model . . . . . . . . . . . . . . . . 1635.3 Relaxations foff r the fluid model . . . . . . . . . . . . . . . . . . . . . . 167
I Stability & Perfoff rmance 318
Optimization 374
0 ODE methods 436
10.5 Safety stocks and traja ectory tracking . . . . . . . . . . . . . . . . . . 46210.6 Fluid-scale asymptotic optimality . . . . . . . . . . . . . . . . . . . . . 467
Control Tec11TT hn151588iques foff r Complex Networks Draftff copy Ap
III Stability11&6600 Perfoff rmance
10 ODE metho11d6677s77
10.5 Safety stocks and traja ectory tracking . . . . . . . . . . . . . . . . . .10.6 Fluid-scale asymptotic optimality . . . . . . . . . . . . . . . . . . . .
11 Simulation & Learning
11.4 Control variates and shadow funff ctions . . . . . . . . . . . . . . . . .11.5 Estimating a value fuff nction . . . . . . . . . . . . . . . . . . . . . . .11.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A Markov Mo313d1188e88ls
A.1 Every process is (almost) Markorr v . . . . . . . . . . . . . . . . . . .A.2 Genera3377t77o44rs and value funff ctions . . . . . . . . . . . . . . . . . . . . .A.3 Equilibrium equations . . . . . . . . . . . . . . . . . . . . . . . .A.4 Criter44i44a433 f66off r stability . . . . . . . . . . . . . . . . . . . . . . . . . . .A... 5.. E.. r.go.. ..r. d44i4466 t22heorems and coupling . . . . . . . . . . . . . . . . . . . . .A... 6.. C.. o..nv.nn e4646s6677e77 theorems . . . . . . . . . . . . . . . . . . . . . . . . . . .
eling & Control 34
l TecTT hniques foff r Complex Networks Draftff copy April 22 2007
duling 9EE9EEControlled random-walk model . . . . . . . . . . . . . . . . . . . . 11 ... 1066 NN1NNFluid model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11. 1077 EE
9EE
Control techniques foff r the fluid model . . . . . . . . . . . . . . . . . . 116EEx
ability & Perfoff rmance 318
mization 374Optimality equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 392Optimization in networkrr s . . . . . . . . . . . . . . . . . . . . . . . . . 408I M.. o44d0088e88 ling & Control
4.8 MaxWeWW ight and MinDriftff . . . . . . . . . . . . . . . . . . . . . . .4.9 Perturbr ed value funff ction . . . . . . . . . . . . . . . . . . . . . . .4.10 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
III Stability & Perfoff rmance 318
8 FosterAAAALAA yMMLL aMMpunMMaarrkkokkoovovv TMMecMMooTMM hnddeellillssquss es 319
8.1 LyLL apAAuA.A.n11ovEEfEEvvuvvefvvnerercrryytiyy ponpprroosoocc . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3..24..
8.4 MaxWA.A.22eWW ighGGE
tGGe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34.. 2..
8.5 MaxWA.A.33eWW ighEEC
tEEqqaqquunu d44liibt33
bbrh3366rrie66 avaa eraeeqquugu
ilile-ctt ooossst op. ti.m.. alit. .. y eq..ua. ti.on. . . . . . . . . . . 34.. 8..
N. Laws. Dynamic routing in queueing networks. PhD thesis, Cambridge University, Cambridge, UK, 1990.
L. Tassiulas. Adaptive back-pressure congestion control based on local information. 40(2):236–250, 1995.
L. Tassiulas and A. Ephremides. Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks. 1992.
S. P. Meyn. Sequencing and routing in multiclass queueing networks. Part II: Workload relaxations. 2003.
S. P. Meyn. Stability and asymptotic optimality of generalized MaxWeight policies. Submitted for publication, 2006.
S. P. Meyn. Control techniques for complex networks. To appear, Cambridge University Press, 2007.
References