dynamic optimization with random and …dc800dx8184/thesis...family including my wife heather, my...

DYNAMIC OPTIMIZATION

WITH RANDOM AND SMOOTHED CONSTRAINTS

A DISSERTATION

SUBMITTED TO THE DEPARTMENT OF ELECTRICAL

ENGINEERING

AND THE COMMITTEE ON GRADUATE STUDIES

OF STANFORD UNIVERSITY

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

Ekine Omoni Akuiyibo

August 2011

http://creativecommons.org/licenses/by-nc/3.0/us/

This dissertation is online at: http://purl.stanford.edu/dc800dx8184

© 2011 by Ekine Akuiyibo. All Rights Reserved.

Re-distributed by Stanford University under license with the author.

This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.

ii



http://purl.stanford.edu/dc800dx8184

I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.

Stephen Boyd, Primary Adviser


John Gill, III, Co-Adviser


Daniel O'Neill

Approved for the Stanford University Committee on Graduate Studies.

Patricia J. Gumport, Vice Provost Graduate Education

This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file inUniversity Archives.

iii

Abstract

Stochastic control theory is a well known mathematical framework that is widely ap-

plied because of its robustness. This thesis describes the use of stochastic control,

convex optimization, and time smoothing, for resource allocation in radio communi-

cation systems.

Radio systems are inherently complex due to randomness in the propagation

medium, i.e., wireless channels, caused by fading, interference from participatory

devices, device mobility, etc. As the demand for heterogeneous multimedia services

increases, it has become increasingly important to devise more efficient resource allo-

cation strategies. These resource allocation problems can be represented as network

flow control problems where the goal is to design flow controllers, which are equivalent

to network protocols. Over the years, adaptive controllers have been proposed and

implemented to replace less efficient static controllers. Unfortunately, most current

adaptive schemes are complicated and largely heuristic.

We present a stochastic control framework, that integrates adaptive modulation

and network utility maximization, for designing simple and implementable adaptive

flow rate controllers for efficient and fair resource allocation. For each problem pre-

sented, we first characterize the exact optimal controller and in a few special cases we

show how to compute this optimal controller. For cases where computing the optimal

controller is intractable, we design efficient suboptimal controllers using approximate

dynamic programming.

iv

In the first problem we look at a resource allocation problem with random affine

resource constraints. Using quadratic stage costs, this problem reduces to a varia-

tion on (affine) linear quadratic regulation where the resulting flow rate controller

is an affine function of the system state and the random resources. In the second

problem, we design flow rate controllers to optimally tradeoff average transmit power

and average utility of smoothed flows on a single time-varying wireless link. The

smoothing takes into account the heterogeneous time profiles of different applications

in the dynamic environment, producing significantly different control policies. Lastly,

in the third problem, we design multi-link flow rate controllers that optimally tradeoff

average flow utility, average transmit power, and average queue maintenance costs.

We derive interesting queue-length-based metrics associated with different utility and

cost functions.

v

Acknowledgments

This thesis is the culmination of works made possible by the help and support of

several people.

First, I would like to thank my advisor, Prof. Stephen Boyd, for his unwavering

support and encouragement throughout my time in his research group. His seemingly

effortless brilliance, clarity, and simplicity, I know now to be a result of his strong

passion for teaching. I continue to incorporate these tenets in my own work.

I would like to thank Prof. Daniel O’Neill for the several discussions we had about

application of my work to wireless communications. His suggestions were central to

the problems we investigated.

I would like to thank Prof. Kunle Olukotun for chairing my orals committee and

Prof. John Gill for being a member of my orals and reading committees. Many thanks

to Prof. Brad Osgood and Dr. Noe Lozano for enabling me secure initial funding to

come to Stanford — their support over the years encompasses much more. Thanks

to Prof. Abbas El Gamal for introducing me to Dr. Olivier Lévêque. The work

with Olivier and Prof. Emre Telatar at École Polytechnique Fédérale de Lausanne

reinvigorated my research curiosity.

I would like to thank current and former members of my research group, Arezou,

Argyrios, Borja, Brendan, (cvx, Michael), Eric, Ivan, Joëlle, Joseph, Kwangmoo,

Matt, and Neal. It has been a pleasure to learn from, and work with you, these last

few years.

vi

Special thanks to Jacob, Tosin, and Yang, for the numerous extra sessions — your

checks are in the mail!

Finally, I have been blessed with the constant love and support from my incredible

family including my wife Heather, my parents, Deaconess (Dr.) Ibiere Akuiyibo and

Dr. Dimabo Akuiyibo, my siblings, Owualata, Kuraye, Dumo, Preye, my parents-in-

law, Helen and Scott, and my nephews, Carl and Daniel. This accomplishment is ours

to share. I am especially grateful to Heather for her patience, her daily encouragement

and cheerful disposition, and for having the kindest heart imaginable.

Mom and Dad permitted me to aspire — this thesis is dedicated to them.

vii

Contents

Abstract iv

Acknowledgments vi

1 Overview 1

1.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Stochastic Control with Random Resource Constraints 4

2.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 Optimal policy via dynamic programming . . . . . . . . . . . . . . . 8

2.4 Evaluating the Bellman operator . . . . . . . . . . . . . . . . . . . . 10

2.5 Numerical example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Adaptive Modulation with Smoothed Flow Utility 22

3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2 Problem setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3 Optimal policy characterization . . . . . . . . . . . . . . . . . . . . . 29

3.4 Single flow case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

viii

3.5 A suboptimal policy for the multiple flow case . . . . . . . . . . . . . 45

3.6 Upper bound policies . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.7 Other concave utilities . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.8 Alternative evaluation of ADP policy . . . . . . . . . . . . . . . . . . 63

3.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4 Queue-Length-Based 2-Hop Rate Control 72

4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.2 Network description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.3 Optimal policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.4 One flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.5 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4.6 Queue metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

4.7 Multiple flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

4.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

Bibliography 96

ix

List of Figures

2.1 Sample flow allocation trajectories, optimal policy (red), equal sharing

policy (blue). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2 Sample utility state trajectories, optimal policy (red), equal sharing

policy (blue). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3 Error histograms for flow 1. Top. Absolute difference between the

optimal policy utility state and target utility state. Bottom. Absolute

difference between the equal sharing utility state and target utility state. 18







3.1 Value functions for T = 1, 5, 10, 20, 25, 50. . . . . . . . . . . . . . . . . 35

3.2 Value functions for α = 1/10, 1/3, 1/2, 2/3, 3/4. . . . . . . . . . . . . 36

3.3 Top. Comparing V (blue) with V (red, dashed) for the lightly smoothed

flow. Bottom. Comparing V (blue) with V (red, dashed) for the heav-

ily smoothed flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

x

3.4 Top. Optimal policy ϕ⋆(s, g) for smoothing time T = 1 (θ = 0.37).

Bottom. Optimal policy for T = 50 (θ = 0.98). . . . . . . . . . . . . . 42

3.5 Average utility versus average power; heavily smoothed flow (top, dashed),

lightly smoothed flow (bottom). . . . . . . . . . . . . . . . . . . . . . 43

3.6 Sample power, flow, and smoothed flow trajectories; lightly smoothed

flow (left), heavily smoothed flow (right). . . . . . . . . . . . . . . . . 44

3.7 ADP controllers for 200 values in λ ∈ [0.5, 10]. . . . . . . . . . . . . . 50

3.8 Prescient bound (blue), ADP controller (red), for λ = 0.5, infeasible

controller region (shaded). . . . . . . . . . . . . . . . . . . . . . . . . 53

3.9 Prescient bound (blue), ADP controllers (red), for λ = {0.5, 1}, infea-

sible controller region (shaded). . . . . . . . . . . . . . . . . . . . . . 54

3.10 Prescient bound (blue), ADP controller (red), for λ = {0.5, 1, 1.5},

infeasible controller region (shaded). . . . . . . . . . . . . . . . . . . . 54

3.11 Prescient bound (blue), ADP controllers (red), for 200 values of λ ∈

[0.5, 10], infeasible controller region (shaded). . . . . . . . . . . . . . . 55

3.12 Top. Comparing V (blue) with V (red, dashed) for the lightly smoothed

flow. Bottom. Comparing V (blue) with V (red, dashed) for the heav-

ily smoothed flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.13 Top. Optimal policy ϕ⋆(s, g) for smoothing time T = 1 (θ = 0.37).

Bottom. Optimal policy for T = 100 (θ = 0.99). . . . . . . . . . . . . 58

3.14 Top. Optimal policy level curves for T = 1 (θ = 0.37). Bottom.

Optimal policy level curves for T = 100 (θ = 0.99). . . . . . . . . . . 59

3.15 Top. Log utility average utility versus average power. Bottom. Log

utility average utility U = −1.5 versus average power. . . . . . . . . . 60

3.16 Top. Log utility sample power trajectory for lightly smoothed flow.

Bottom. Log utility sample power trajectory for heavily smoothed flow. 61

3.17 Saturated linear utility functions: U0.5,0.375, U1,1, U2,0.5. . . . . . . . . 62

xi

3.18 Top. Value function and power-law approximate for U1,1(s) and T = 1.

Bottom. Value function and power-law approximate for U1,1(s) and

T = 50. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.19 Top. Power law policy ϕ⋆(s, g) for U1,1(s) and smoothing time T = 1

(θ = 0.37). Bottom. Policy law policy for U1,1(s) and T = 50 (θ = 0.98). 65

3.20 Top. Saturated linear U1,1 tradeoff curves for T = 1, 50. Bottom.

Power, flow and smoothed flow trajectories. . . . . . . . . . . . . . . 66

4.1 Single-hop network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.2 Single hop, single flow. . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.3 Comparing optimal value function (blue) with approximate value func-

tion (red, dashed). χ(q) = 0.5q2. Vmax = 0.32, qmax = 0.31, qzero =

0.73. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.4 Top. Sample flow 1 trajectory. Middle. Sample queue length trajec-

tory. Bottom. Sample flow 2 trajectory. χ(q) = 0.5q2. . . . . . . . . . 84

4.5 Comparing optimal value function (blue) with approximate value func-

tion (red, dashed). χ(q) = 0.5q2. Vmax = 0.17, qmax = 0.13, qzero =

0.43. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.6 Top. Sample flow 1 trajectory. Middle. Sample queue length trajec-

tory. Bottom. Sample flow 2 trajectory. χ(q) = 0.5q2. . . . . . . . . . 86

4.7 Comparing objective versus weighted queue maintenance cost. Quadratic

queue maintenance cost (red), linear queue maintenance cost (blue). . 87

4.8 Average service rate versus average queueing delay. Quadratic costs

(red, dashed), linear costs (blue). . . . . . . . . . . . . . . . . . . . . 91

4.9 Average service rate versus overall objective J . Quadratic costs (red,

dashed), linear costs (blue). . . . . . . . . . . . . . . . . . . . . . . . 92

xii

4.10 Average arrival rate versus average queue-length. Quadratic costs (red,


4.11 Average service rate versus average queue-length. Quadratic costs (red,


xiii

Chapter 1

Overview

This work develops a mathematical framework for designing flow rate controllers for

resource allocation problems in wireless communication systems. In each chapter

we formulate the problem as a discrete-time stochastic control problem with convex

objective. In all problems, the information pattern (information available at decision

time) includes a random component. We compute an optimal feedback control law

using stochastic dynamic programming for cases with quadratic objective and/or

small system state dimension. In cases with large system state dimension we propose

simple controllers based on approximate dynamic programming.

We outline the thesis in §1.1 giving for each chapter a brief description of the

problem and our results. In §1.2, we describe some basic notation used throughout

the thesis.

1.1 Outline

In chapter 2 we consider an infinite-horizon stochastic control problem with quadratic

stage costs and affine constraints. The constraints are functions of random resources

revealed to the controller only in the time period that the constraints apply. We

1

CHAPTER 1. OVERVIEW 2

compute the optimal control policy using a Riccati-like recursion and show that the

optimal feedback control policy is affine in the state and random resources. We

illustrate policy performance using a simple example where the random resource is

the time-varying link capacity of a wireless communications channel.

In chapter 3 we consider the problem of choosing the data flow rate on a wireless

link with randomly varying channel gain, to optimally trade off average transmit

power and the average utility of the smoothed data flow rate. The smoothing allows

us to model the demands of an application that can tolerate variations in flow over

a certain time interval. Our setup involves both adaptive modulation and utility

maximization, but is nonstandard in several respects: We consider the utility of the

smoothed flows, and we consider multiple flows over the same wireless link. For

the case of a single flow, we compute the optimal data flow rate policy numerically

using stochastic dynamic programming and gridding. For the case of multiple flows

on a single link, where dynamic programming becomes intractable due to the curse

of dimensionality, we propose an approximate dynamic programming approach to

obtain suboptimal data flow rate policies. We show via numerical examples that

these suboptimal policies can work very well.

In chapter 4 we consider a variation to the flow rate control problem from chapter

3 where we have a relay node (buffer), through which flows traverse from source

(transmitter) to destination (receiver). We consider the problem of choosing the data

flow rates on the (two) wireless links with randomly varying channel gains to optimally

trade off average utility of the instantaneous data flow rate at the receiver and the

average system costs. Cost consists of the average transmit power on the wireless

links and average costs incurred at the relay node. The relay allows us to define

queue-length-based performance metrics, which we show are correlated to choice of

utility and cost functions via numerical examples.

CHAPTER 1. OVERVIEW 3

1.2 Notation

We use R to denote the set of real numbers. The set of real n-dimensional vectors is

denoted Rn, and the set of real m× n matrices is denoted Rm×n. We delimit vectors

and matrices with square brackets, with the components separated by space. We use

parentheses to construct column vectors from comma separated lists. For example, if

a, b, c ∈ R, we have

(a, b, c) =

a

b

c

= [ a b c ]T ,

which is an element of R3. The symbol 1 denotes a vector all of whose components are

one (with dimension determined from context). We use diag(x1, . . . , xn), or diag(x),

x ∈ Rn, to denote a n× n diagonal matrix with diagonal entries x1, . . . , xn.

The inequality symbol ≥ and its strict form > is used to denote elementwise vector

(or matrix) inequality. The curled inequality symbol � (and its strict form ≻) is used

to denote matrix inequality. For example, for A,B ∈ Rn×n, A � B means that the

matrix A− B is positive semidefinite.

Chapter 2

Stochastic Control with Random

Resource Constraints

We consider a discrete-time stochastic control problem with convex costs and affine

constraints over an infinite time horizon. The constraints are functions of random

resources revealed to the controller only in the time period that the constraints ap-

ply. For the case with quadratic sage costs, we show that we can compute the opti-

mal control policy (and optimal value function) using dynamic programming and a

Riccati-like recursion, and knowing only the first and second moments of the random

resources. We show that the optimal control law is a feedback policy that is affine

in the state and random resources. We illustrate our policy performance using an

example where the random resource is the time-varying capacity of a wireless link.

2.1 Problem statement

System dynamics. Consider the discrete-time linear dynamical system

xt+1 = Axt +But + wt, t = 0, 1, . . . ,

4

CHAPTER 2. RANDOM CONSTRAINTS 5

where xt ∈ Rn denotes the state vector, ut ∈ Rm denotes the input vector, A ∈ Rn×n

denotes the dynamics matrix, B ∈ Rn×m denotes the input matrix, and wt ∈ Rn

denotes a process noise or disturbance. We will assume that the wt are independent

and identically distributed (IID) with mean and covariance

wt = Ewt, Wt = EwtwTt , t = 0, . . . .

Control policies. Our goal is to find a stationary feedback control policy, i.e.,

function ϕ : R(n) → Rm, with

ut = ϕ(xt), t = 0, 1, . . . .

At each time period, the control policy must satisfy (ut, xt) ∈ Ct(gt), where gt is

random and so the set Ct is random as well. In other words, our control policy is

chosen so that at each time t, (ut, xt) is contained in the random process on sets

Ct(gt). We give our precise description for gt and Ct(gt) below. We will assume that

Ct(gt), t = 0, 1, . . . is nonempty, i.e., the control law is always feasible. We have also

assumed that the random resource is available in each time period and so we augment

our state information to result in a conditional control policy with

ut = ϕ(xt, g|g = gt), t = 0, 1, . . . ,

where the distribution of gt is known. Informally, we say that our goal is to find a

function ϕ, with

ut = ϕ(xt, gt), t = 0, 1, . . . ,

that satisfy the constraints (ut, xt) ∈ Ct(gt). A simple interpretation of this new

control policy is that our policy is informed by the random resource and so we have


more information for the decision process. This new information is reflected in the

new state, consisting of the original system state vector augmented with the random

resource. From now we drop the random set argument and write Ct to imply Ct(gt).

Random linear equality constraints. We restrict the random sets to consist of

affine (linear) combinations of the augmented state. We require that at each time t

Fut +Hxt = gt, t = 0, 1, . . . , (2.1)

where gt ∈ Rp is a random vector with first and second moments

gt = E gt, Gt = Σt + gtgTt , t = 0, 1, . . . ,

with variances Σt = E gtgTt − gtg

Tt , t = 0, 1, . . .. We will assume that the exogenous

input wt and the random resource gt are independent. The matrices F ∈ Rp×m, and

H ∈ Rp×n represent known system parameters. The constraint set is thus defined by

the set

Ct = {(ut, xt) | Fut +Hxt − gt = 0}, t = 0, 1, . . . .

As an example, suppose gt ∈ R is a resource we want to allocate. We take F = 1T

(where 1 is the vector with all components one), and H = 0. (Note the feasibility

requirement implicit in this example that R(Ft) = Rp.) For fixed xt (and known gt),

each Ct is affine in the input ut. Thus the solution set is also an affine (convex) set

[BV04, §2].

Objective. Our objective J , is the total average stage cost of over an infinite time

horizon given by

J = EN→∞

1

N

N∑

τ=0

(1/2)ℓτ (x, u) (2.2)


where ℓt is a convex quadratic function with the form

ℓ(x, u) =

x

u

1

T

Q MT qT

M R rT

q r c

x

u

1

,

where Q ∈ Rn×n is the state quadratic cost matrix, R ∈ Rm×m is the input quadratic

cost matrix, M ∈ Rn×m is the input-to-state cost matrix, q ∈ Rn is the state linear

cost vector, r ∈ Rm is the input linear cost vector, and c is a constant cost incurred

in every time period. We require that the quadratic costs be nonnegative so we take

the quadratic matrices to be symmetric and positive semidefinite, i.e., Q = QT � 0,

R = RT � 0. The cost parameters are known problem data.

The stochastic control problem is to find the control policy, ϕ that minimizes

the objective (2.2), and satisfies the random constraints (2.1). Our formulation is

nonstandard because of the random convex linear equality constraint sets.

2.2 Results

Using dynamic programming, and a Ricatti-like recursion, we show that the optimal

control policy is an affine function of the state and the random constraint vector, and

takes the form

ϕt(xt, gt) = Ktxt + Ltgt + st, (2.3)

where Kt ∈ Rm×n is the state feedback matrix, Lt ∈ Rm×p is the feedback matrix

associated with the random resource, and st ∈ Rm is an offset. We show that we

are able to compute the optimal law (the optimal feedback matrices and the offset)

ahead of time knowing only the first and second moments of the random resource.


2.3 Optimal policy via dynamic programming

We let Vt(z) denote the optimal objective value (minimum possible value of J) of the

truncated problem started in state xt = z at time period t. We will show that Vt(z)

is a convex quadratic function with the form

Vt(z) = (1/2)zTPtz + qTt z + rt, (2.4)

where Pt, qt, and rt are found by backward recursion. During our evaluation for Vt(z),

we will find an explicit expression for the optimal control policy which appears as an

intermediate result.

Expressions for Vt and ϕ⋆. We will now describe the derivation of (2.4) obtained

by applying the dynamic programming algorithm (recursively from some t = N down

to t = 0). Suppose at finite time t = N the final state cost is given by

VN(z) = (1/2)zTQNz + qTNz + c,

thus PN = QN , qN = qN , and rN = c. Now suppose that that the cost-to-go at time

from time t+ 1 is given by

Vt+1(z) = (1/2)zTPt+1z + qTt+1z + rt+1.

At time t, we choose ut = v that minimizes the the current cost ℓt(z, v) plus the

optimal cost-to-go from the next state V (xt+1) while satisfying the (linear equality)

constraints revealed in the current time period t. (We assume that gt (Ct) is known

at time t.) For this choice of v, the current cost plus the cost-to-go from xt+1 is

(1/2)ℓt(z, v) + (1/2)(Az+Bv+wt)TPt+1(Az+Bv+wt) + qTt+1(Az+Bv+wt) + rt+1.


Rewriting using the Bellman equation [Bel57], we have that the minimum expected

cost to go, Vt(z), is given by

Vt(z) = Egt

{

min(v,z)∈Ct

{

(1/2)ℓt(z, v) + Ewt

(Vt+1(Axt +Bv + wt))}

}

, (2.5)

where

ϕ⋆ = argmin(v,z)∈Ct

{

(1/2)ℓt(z, v) + Ewt

(Vt+1(Az +Bv + wt))}

, (2.6)

the minimizer over v, is the optimal control action. Since the constraint is affine, Ct

is a convex set, and so (2.6) is an equality constrained convex quadratic minimization

problem also called a (constrained) quadratic program (QP). To find the coefficients

of the value function we must therefore solve (2.6) and (2.5) in that order. We let T

be the Bellman operator where

(TtV )(z) = Egt

{

min(v,z)∈Ct

{

(1/2)ℓt(z, v) + Ewt

(V (Az +Bv + wt))}

}

,

and so

Vt = T Vt+1.

The Bellman operator takes as argument the Bellman equation. The operator repre-

sents the procedure of solving the QP and then taking the expectation over gt.

Convergence of Pt depends on the problem dynamics, constraints and stage costs,

i.e., parameters A, B, Q R, F , and H . With our problem formulation, i.e., linear

dynamics and quadratic stage costs, we have found Pt to always converge. We found

that Pt converges to a steady state value, and convergence occurs rapidly as t decreases

below N . Let Pss be the steady state value of Pt and let Jss be the objective cost

using Pss. We compare Jss to the optimal objective value to find that the steady-state

coefficient is optimal (up to very small numerical errors) for a wide range of problems.


2.4 Evaluating the Bellman operator

In this section we describe an explicit evaluation of the Bellman operator. The Bell-

man operator is shorthand notation for solving the QP and evaluating the expectation

over the random channel gains.

We solve the (equality constrained) QP

ϕ⋆ = argmin(v,z)∈Ct

{

(1/2)ℓt(z, v) + Ewt

(Vt+1(Az +Bv + wt))}

,

using Lagrange multipliers. Let L(v, λ) be the partial Lagrangian

L(v, λ) = (1/2)ℓt(z, v) + Ewt

(Vt+1(Az +Bv + wt)) + λT (gt − Fv −Hz),

where λi, i = 1, . . . , p, are Lagrange multipliers. For v optimal, we must have that

∇vL = (1/2)∇vℓt(z, v) + Ewt

(∇vVt+1(Az +Bv + wt))− F Tλ = 0,

and

∇λL = gt − Fv −Hz = 0.

From here we work out the KKT conditions to be

R +BTPt+1Bt F T

F 0

v

λ

=

−(BTPt+1A +MT )z − (BT qt+1 + r)

gt −Hz

.

If we let

Z =

R + BTPt+1B F T

F 0

−1

,


we have that

v

λ

=

Z11 Z12

Z21 Z22

−(BTPt+1A+MT )z − (BT qt+1 + r)

gt −Hz

,

where Z11, Z12, Z21, Z22, are block matrix entries of Z. We can now write the

expression for the optimal input ϕ⋆ in terms of the entries of Z as

ϕ⋆ = v = −Z11(BTPt+1A+MT )z − Z12Hz + Z12gt − Z11(B

T qt+1 + r).

From (2.3) we get that our feedback matrices and offset as

Kt = −(Z11(BTPt+1A +MT ) + Z12H),

Lt = Z12,

st = −Z11(BT qt+1 + r).

The above are the components of the optimal control law. To evaluate the coeffi-

cients of our value function recursion, we must now substitute the optimal input ϕ⋆

into (2.5). We omit the cumbersome algebra and rewrite the resulting expression in

quadratic function form

T Vt+1 = Egt

1

2

z

gt

1

T

Y11 Y12 Y13

Y T12 Y22 Y23

Y T13 Y T

23 Y33

z

gt

1

,


where the symmetrized entries of Y are as follows

Y11 = Q+ 2MKt +KTt RKt + (A+BKt)

TPt+1(A +BKt),

Y12 = MLt +KTt RLt + (A +BKt)

TPt+1BLt,

Y13 = Mst +KTt Rst + (A +BKt)

T (Pt+1Bst + qt+1) +KTt r + q,

Y22 = LTt (R +BTPt+1B)Lt,

Y23 = LTt Rst + (BLt)

T (Pt+1Bst + qt+1) + LTt r,

Y33 = sTt (R +BTPt+1B)st + 2(rT + qTt+1B)st + 2rt+1 +Tr(WPt+1) + c.

Finally, the coefficients of (2.4) are given as

Pt = Y11, (2.7)

qt = Y12gt + Y13, (2.8)

rt =1

2Tr(GY22) + Y T

23gt +1

2Y33. (2.9)

Therefore for

Vt(z) = (1/2)zTPtz + qTt z + rt,

we get that

Pt = Q + 2MKt +KTt RKt + (A+BKt)

TPt+1(A+BKt),

qt = (MLt +KTt RLt + (A +BKt)

TPt+1BLt)gt

+ Mst +KTt Rst + (A+BKt)

T (Pt+1Bst + qt+1) +KT r + q,

rt = (1/2)Tr(GLTt (R +BTPt+1B)Lt) + LT

t Rst + (BLt)T (Pt+1Bst + qt+1) + LT r

+ (1/2)sTt (R +BTPt+1B)st + 2((rT + qTt+1B)st + ct + rt+1) +Tr(WPt+1),


where we substituted the optimal control ϕ⋆

ϕ⋆ = Ktz + Ltgt + st.

Note we should have that

FKt −H = 0, FLt = I, Fst = 0.

2.4.1 Remarks

In general there is no closed form analytical solution for a constrained QP. There

are, however, very efficient methods for solving QPs with convex equality constraints

including Lagrangian duality method that we employed above. For more on quadratic

optimization problems and efficient solution methods, see, e.g., [BL00, BNO03, BTN01,

Tuy98, BV04].

The analytic solution we obtain is inferred from our problem formulation. To see

this, note that the solution set of any consistent system of linear equations can be

written as an affine set [BV04, §2]. This is true for our problem since we have assumed

that the equality constraints are consistent, i.e., a feasible point (ut, xt) always exists.

Suppose we can parameterize explicitly the constraint set, C, using a free parameter

y ∈ Rk. We can write the set elements (fixed x) as

{(u, x) | (u = Gy + u, x)},

where u satisfies {u | F u + Hxt = gt}, and the columns of G ∈ Rm×k are in the

nullspace of F (N (F )). Analytically this says that the solution set is a hyperplane

and for our problem the parameters of the hyperplane consist of elements in the

random vector space.


2.5 Numerical example

We consider an example where at each time period t, a random resource gt ∈ R is to

be divided into n parts. For example the resource could be capital to be distributed

among n different assets, or the resource could be capacity of a wireless communica-

tions link to be shared by n underlying (flows) applications. (Note that F = 1 and

H = 0 defines the constraint region accordingly.)

2.5.1 Stochastic flow control on a wireless link

Let ut be the flow rate allocation for flows supported on a wireless link. We let

xt+1 = Axt + But be the utility state of the flows. At each time t, the wireless link

capacity in is given by the random variable gt (we must have 1Tut = gt). System

performance is measured via concave quadratic utility function of the utility state

ℓ(u, x) = −(x− xtar)TQ(x− xtar),

where xtar is a specified target utility state value, and there is no cost (utility) asso-

ciated with choice of input i.e., R = 0.

Our numerical instance has n = 3 flows sharing one link, which has link capacity

gt that is exponentially distributed with mean E gt = 1.5. We associate with each flow

a concave quadratic utility Uj(xj) = −(xj − 1)2, i.e., the target flow value xtar = 1.

We take QN = 10I. The dynamics and input matrices are

A =

0.368 0 0

0 0.905 0

0 0 0.980

, B =

0.632 0 0

0 0.095 0

0 0 0.020

.


Results. We compute the optimal control law using the convex quadratic value

function (2.4), and the coefficients given in (2.7). The optimal control law is

ut =

−0.576 0.103 0.534

0.054 −8.627 4.586

0.522 8.525 −5.120

xt +

0.010

0.093

0.897

gt +

0.453

2.355

−2.808

.

We compare our policy to a simple equal sharing scheme: at each time period t, the

available (random) resource is divided equally among flows on the wireless link, i.e.,

ust =

1/3

1/3

1/3

gt.

Figure 2.1 shows sample flow trajectories obtained using the optimal control law (red)

along with flow trajectories of the simple sharing scheme (blue). Figure 2.2 shows

the resulting state trajectories. We plot the target trajectory (dashed) along with

trajectories corresponding to the optimal policy (blue) and the equal sharing policy

(blue).

2.6 Conclusion

In this chapter we computed the optimal control law for a nonstandard linear stochas-

tic control problem with convex quadratic costs (to be minimized) and random con-

straints on system inputs. Using using stochastic dynamic programming, a Ricatti-

like recursion, and Lagrange multipliers, we show that the optimal controller is a

simple affine function of the augmented system state — the system state concate-

nated with the random resource constraint (known at decision time). We show also


flow

1flo

w2

flow

3

t10 20 30 40 50 60 70 80 90 100

10 20 30 40 50 60 70 80 90 100

10 20 30 40 50 60 70 80 90 100

0

2

4

6

0

2

4

6

0

2

4

6

Figure 2.1: Sample flow allocation trajectories, optimal policy (red), equal shar-ing policy (blue).


stat

e1

stat

e2

stat

e3

t10 20 30 40 50 60 70 80 90 100

10 20 30 40 50 60 70 80 90 100

10 20 30 40 50 60 70 80 90 100

0

1

2

0

1

2

0

1

2

Figure 2.2: Sample utility state trajectories, optimal policy (red), equal sharingpolicy (blue).


|xopt1 − xtar1 |

|xs1 − xtar1 |

0 0.1 0.2 0.3 0.4 0.5

0 0.1 0.2 0.3 0.4 0.5

0

10

20

30

40

0

10

20

30

40

Figure 2.3: Error histograms for flow 1. Top. Absolute difference between theoptimal policy utility state and target utility state. Bottom. Absolute differencebetween the equal sharing utility state and target utility state.


|xopt2 − xtar2 |

|xs2 − xtar2 |

0 0.1 0.2 0.3 0.4 0.5

0 0.1 0.2 0.3 0.4 0.5

0

10

20

30

40

0

10

20

30

40



|xopt3 − xtar3 |

|xs3 − xtar3 |

0 0.1 0.2 0.3 0.4 0.5

0 0.1 0.2 0.3 0.4 0.5

0

10

20

30

40

0

10

20

30

40



that we can compute the optimal policy knowing only the first and second moments

of the random resource constraints.

We illustrate our optimal controller operation using a simple wireless link example

where flow rate (via random channel gain) is the random resource allocated. The

optimal controller gives significantly better results compared to an equal allocation

controller.

Chapter 3

Adaptive Modulation with Smoothed

Flow Utility

We consider the problem of choosing the data flow rate on a wireless link with ran-

domly varying channel gain, to optimally trade off average transmit power and the

average utility of the smoothed data flow rate. The smoothing allows us to model the

demands of an application that can tolerate variations in flow over a certain time in-

terval; we will see that this smoothing leads to a substantially different optimal data

flow rate policy than without smoothing. We pose the problem as a discrete-time

infinite-horizon stochastic control problem with linear dynamics, and convex objec-

tive. For the case of a single flow, the optimal data flow rate policy can be numerically

computed using stochastic dynamic programming and gridding. For the case of mul-

tiple flows on a single link, where dynamic programming becomes intractable due to

the curse of dimensionality, we propose an approximate dynamic programming ap-

proach to obtain suboptimal data flow rate policies. We illustrate, through numerical

examples, that these approximate policies can perform very well.

22

CHAPTER 3. SMOOTHING 23

3.1 Background

In the wireless communications literature, varying a link’s transmit rate (and power)

depending on channel conditions is called adaptive modulation; see, e.g., [Hay68,

Cav72, Hen74, WS95, GC97]. Adaptive modulation (and coding) schemes work with

the premise that channel state information is available at the transmitter, either via

feedback from the receiver, or using channel estimation techniques at the transmit-

ter. One drawback of adaptive modulation is that it is a physical layer optimization

technique with no knowledge of optimization protocols and techniques in the upper

layers.

Maximizing a total utility function is also very common in various communica-

tions and networking problem formulations, where it is referred to as network utility

maximization (NUM); see, e.g., [KMT97, LL99, CLCD07, NML08, CXH+10]. In the

NUM framework, performance of an upper layer protocol (e.g., TCP) is determined

by utility of flow attributes, e.g., utility of link flow rate.

Our setup involves both adaptive modulation and utility maximization, but is

nonstandard in several respects: We consider the utility of the smoothed flows, and

we consider multiple flows over the same wireless link [OABG10, AB10, ABO10].

3.2 Problem setup

channel RXTX

(Channel state information available)


3.2.1 Average smoothed flow utility

A wireless communication link supports n data flows in a channel that varies with

time, which we model using discrete-time intervals t = 0, 1, 2, . . .. We let ft ∈ Rn+ be

the data flow rate vector on the link, where (ft)j , j = 1, . . . , n, is the jth flow’s data

rate at time t and R+ denotes the set of nonnegative numbers. We let Ft = 1Tft

denote the total flow rate over all flows, where 1 is the vector with all entries one.

The flows, and the total flow rate, will depend on the random channel gain (through

the flow policy, described below) and so are random variables.

We will work with a smoothed version of the flow rates, which is meant to capture

the tolerance of the applications using the data flows to time variations in data rate.

This was introduced in [TZB08] using delivery contracts, in which the utility is a

function of the total flow over a given time interval; here, we use instead a very

simple first order linear smoothing. At each time t, the smoothed data flow rate

vector st ∈ Rn+ is given by

st+1 = Θst + (I −Θ)ft, t = 0, 1, . . . ,

where Θ = diag(θ), and θj ∈ [0, 1), j = 1, . . . , n, is the smoothing parameter for the

flow 1flow 2

flow n

TX RX


jth flow, and we take s0 = 0. Thus we have

(st)j =

t−1∑

τ=0

(1− θj)θt−1−τj (fτ )j ,

where at time t, each smoothed flow rate (st)j is the exponentially weighted average

of previous flow rates.

The smoothing parameter θj determines the level of smoothing on flow j. Small

smoothing parameter values (θj close to zero) correspond to light smoothing; large

values (θj close to one) correspond to heavy smoothing. (Note that θj = 0 means that

flow j is not smoothed; we have (st+1)j = (ft)j.) The level of smoothing can be related

to the time scale over which the smoothing occurs. We define Tj = 1/ log(1/θj) to

be the smoothing time associated with flow j. Roughly speaking, the smoothing time

is the time interval over which the effect of a flow on the smoothed flow decays by

a factor 1/e. Light smoothing corresponds to short smoothing times, while heavy

smoothing corresponds to longer smoothing times.

We associate with each smoothed flow rate (st)j a strictly concave nondecreasing

differentiable utility function Uj : R+ → R, where the utility of (st)j is Uj((st)j).

The average utility derived over all flows, over all time is

U = limN→∞

E1

N

N−1∑

t=0

U(st),

where U(st) = U1((st)1)+ · · ·+Un((st)n). Here, the expectation is over the smoothed

flows st, and we are assuming that the expectations and limit above exist.

While most of our results will hold for more general utilities, we will focus on the

family of power utility functions, defined for x ≥ 0 as

U(x) = βxα, (3.1)


parameterized by α ∈ (0, 1) and β > 0. The parameter α sets the curvature (or risk

aversion), while β sets the overall weight of the utility. (For small values of α, U

approaches a log utility.)

Before proceeding, we make some general comments on our use of smoothed flows.

The smoothing can be considered a type of time averaging; then we apply a concave

utility function; and finally, we average this utility. The time averaging and utility

function operations do not commute, except in the case when the utility is linear (or

affine). Jensen’s inequality tells us that average smoothed utility is greater than or

equal to the average utility applied directly to the flow rates, i.e.,

U((st)j) ≥1

t

t−1∑

τ=0

(1− θj)θt−1−τj U(fτ )j .

So the time smoothing step does affect our average utility; we will see later that it

has a dramatic affect on the optimal flow policy.

3.2.2 Average power

We model the wireless channel with time-varying positive gain parameters gt, t =

0, 1, . . ., which we assume are independent identically distributed (IID), with known

distribution. At each time t, the gain parameter affects the power Pt required to

support the total data flow rate Ft. The power Pt is given by

Pt = φ(Ft, gt),

where φ : R+ × R++ → R+ is increasing and strictly convex in Ft for each value of

gt (R++ is the set of positive numbers).

While our results will hold for the more general case, we will focus on the more


specific power function described here. We suppose that the signal-to-interference-

and-noise (SINR) of the channel is given by gtPt. (Here gt includes the effect of

time-varying channel gain, noise, and interference.) The channel capacity is then

µ log(1 + gtPt), where µ is a constant; this must equal at least the total flow rate Ft,

so we obtain

Pt = φ(Ft, gt) =eFt/µ − 1

gt. (3.2)

The total average power is

P = limN→∞

E1

N

N−1∑

t=0

Pt,

where again, we are assuming that the expectations and limit exist.

3.2.3 Flow rate control problem

The overall objective is to maximize a weighted difference between average utility and

average power,

J = U − λP , (3.3)

where λ ∈ R++ is used to trade off average utility and power.

We require that the flow policy is causal, i.e., when ft is chosen, we know the

previous and current values of the flows, smoothed flows, and channel gains. Standard

arguments in stochastic control (see, e.g., [Ber05, Ber07, Åst70, Whi82, BS96]) can

be used to conclude that, without loss of generality, we can assume that the flow

control policy has the form

ft = ϕ(st, gt), (3.4)

where ϕ : Rn+ × R++ → Rn

+. In other words, the policy depends only on the current

smoothed flows, and the current channel gain value.

The flow rate control problem is to choose the flow rate policy ϕ to maximize the


overall objective in (3.3). This is a standard convex stochastic control problem, with

linear dynamics.

3.2.4 Our results

We let J⋆ be the optimal overall objective value, and ϕ⋆ be an optimal policy. We

will show that in the general (multiple flow) case, the optimal policy includes a “no-

transmit" zone, i.e., a region in the (st, gt) space in which the optimal flow rate is

zero. Not surprisingly, the optimal flow policy can be roughly described as waiting

until the channel gain is large, or until the smoothed flow has fallen to a low level, at

which point we transmit (i.e., choose nonzero ft). Roughly speaking, the higher the

level of smoothing, the longer we can afford to wait for a large channel gain before

transmitting. The average power required to support a given utility level decreases,

sometimes dramatically, as the level of smoothing increases.

We show that the optimal policy for the case of a single flow is readily computed

numerically, working from Bellman’s [Bel57] characterization of the optimal policy,

and is not particularly sensitive to the details of the utility functions, smoothing

levels, or power functions.

For the case of multiple flows, we cannot easily compute (or even represent) the

optimal policy. For this case we propose an approximate policy, based on approximate

dynamic programming [BT96, Pow07]. By computing an upper bound on J⋆, by

allowing the flow control policy to use future values of channel gain (i.e., relaxing the

causality requirement [BSS10]), we show in numerical experiments that such policies

are nearly optimal.


3.3 Optimal policy characterization

3.3.1 No smoothing

We first consider the special case Θ = 0, in which case there is no smoothing. Then

we have st = ft−1, so the average smoothed utility is then the same as the average

utility, i.e.,

U = limN→∞

E1

N

N−1∑

t=0

U(ft),

In this case the optimal policy is trivial, since the stochastic control problem reduces

to a simple optimization problem at each time step. At time t, we simply choose ft

to maximize U(ft)− λPt. Thus we have

ϕ(st, gt) = argmaxft≥0

(U(ft)− λPt) ,

which does not depend on st. A simple and effective approach is to pre-solve this

problem for a suitably large set of values of the channel gain gt, and store the resulting

tables of individual flow rates (ft)i versus gt; online we can interpolate between points

in the table to find the (nearly) optimal policy. Another option is to fit a simple

function to the optimal flow rate data, and use this function as our (nearly) optimal

policy.

For future reference, we note that the problem can also be solved using a water-

filling method (see, e.g., [BV04, §5.5]). Dropping the time index t, and using j to

denote the flow index, we must solve the problem

maximize∑n

j=1 Uj(fj)− λφ(F, g)

subject to F = 1Tf, f ≥ 0,(3.5)


with variables fj and F . Introducing a Lagrange multiplier ν for the equality con-

straint (which we can show must be nonnegative, using monotonicity of φ with F ),

we are to maximizen∑

j=1

Uj(fj)− λφ(F, g) + ν(F − 1Tf)

over fj ≥ 0. This problem is separable in fj and F , so we can maximize over fj and

F separately. We find that

fj = argmaxw≥0

(Uj(w)− νw) , j = 1, . . . , n,

and

F = argmaxy≥0

(νy − λφ(y, g)) .

(Each of these can be expressed in terms of conjugate functions; see, e.g., [BV04,

§3.3].) We then adjust ν (say, using bisection) so that 1Tf = F . An alternative is

to carry out bisection on ν, defining fj in terms of ν as above, until λφ′(1Tf, g) = ν,

where φ′ refers to the derivative with respect to y.

For our particular power law utility functions (3.1) we can give an explicit formula

for fj in terms of ν:

fj =

(

αjβjν

)1/(1−αj )

.

For our particular power function (3.2), we use bisection to find the value of ν that

yields

1T f = µ log(νµg/λ),

where the flow values come from the equation above. (The left-hand side is decreasing

in ν, while the right-hand side is increasing.)


3.3.2 General case

We now consider the more general case, with smoothing. We can characterize the

optimal flow rate policy ϕ⋆ using stochastic dynamic programming [Put94, Ros83a,

Den82, WB09] and a form of Bellman’s equation [Bel57]. The optimal flow rate policy

has the form

ϕ⋆(z, g) = argmaxw≥0

(

V (Θz + (I −Θ)w)− λφ(1Tw, g))

, (3.6)

where V : Rn+ → R is the Bellman (relative) value function. The value function (and

optimal value) is characterized via the fixed point equation

J⋆ + V = T V, (3.7)

where for any function W : Rn+ → R, the Bellman operator T is given by

(TW )(z) = U(z) + E

(

maxw≥0

(

W (Θz + (I −Θ)w)− λφ(1Tw, g))

)

,

where the expectation is over g. The fixed point equation and Bellman operator are

invariant under adding a constant; that is, we have T (W + a) = TW + a, for any

constant (function) a, and similarly, V satisfies the fixed point equation if and only

if V + a does. So without loss of generality we can assume that V (0) = 0.

3.3.3 Value iteration

In this section we describe the numerical value iteration algorithm we use for our

average (cost) utility problem. For more technical details on value iteration (and

policy iteration) we refer the reader to [Bla62, Der66, Der70, ABFG+93, Sen96, Ber98,

Ber05, Ber07].


System space. We describe an algorithm for the bounded and finite cardinality

one dimensional case. This same procedure extends to multiple dimensions.

We let X ∈ R be a subset of the discrete smoothed flow rate space with car-

dinality |X | = N . We let U ∈ R be a subset of the discrete flow rate space with

cardinality |U| = M . We let G ∈ R be a subset of the discrete channel gain space

with cardinality |G| = L. The set of channel gains is constructed such that each

element in G is independent and identically distributed and equally probable. For

notational convenience we let xi = {1, . . . , N} denote an element of the smoothed

flow rate space, we let uj = {1, . . . ,M} denote an element of the flow rate space,

and we let gl = {1, . . . , L} denote an element of the channel gain space, i.e., xi ∈ X ,

uj ∈ U , gl ∈ G.

Smoothed flow rate (State) transitions. Smoothed flow rate transition (moving

from one smoothed rate to another) may occur after the smoothed flow rate is updated

by the flow rate in the current time period. In this setting, the smoothed flow rate

represents the state of the system and the flow rate is the current input. During

operation, each state xi can be acted upon by any input uj. We let the matrix

P ∈ ZN×M

Pij ∈ {1, . . . , N}

represent all the possible state transitions for the system space described above. So

here, Pij is the next state (the state you transition to) after current state xi is acted

upon by input uj (rounded to the nearest discrete state space value). The matrix P

is called the state transition matrix and is probabilistic in most dynamic systems. In

our problem transitions are deterministic because at any time

xk = θxi + (1− θ)uj,


where xk ∈ X , i.e., given the current state and the current input, the next state is

fixed. Thus P is time invariant (Pij = xk), and can be computed ahead of time.

Bellman equation. We wish to find a constant J⋆ and a function V that satisfy

the equation

J⋆ + V = T V,

where T V represents applying the Bellman operator to V , where for any function

W : Rn+ → R, the Bellman operator T is given by

(TW )(z) = U(z) + E

(

maxw≥0

(

W (Θz + (I −Θ)w)− λφ(1Tw, g))

)

.

We let Vxi= V (xi) denote the value of being in state xi.

Algorithm. Without loss of generality, we assume that Vx1 = 0. Note that as a

result Vxidenotes the value (offset) of state xi relative to Vx1 (value of state x1). We

let V (k)xi denote the kth iterate of Vxi

. We let V (k)xi denote the kth iterate of T Vxi

. The

k iterate of J we denote J (k). The value iteration algorithm is as follows:

initialize V(0)i = 0, i = 1, . . . , N .

repeat

1. V (k)xi := 0.

2. Solve Bellman equation. V (k)xi := T V (k)

xi , i = 1, . . . , N .

3. Update. J (k) := V(k)x1 .

4. Normalize. V (k+1)xi := V

(k)xi − V

(k)x1 , i = 1, . . . , N .

5. Stopping criterion. quit when J (k) converges.

Remarks. For technical conditions under which the value function exists, and can

be obtained via value iteration, see, e.g., [Der70, Bla62, ABFG+93]. We will simply


assume here that the value function exists, and J (k) and V (k) converge to J⋆ and V ,

respectively. (The algorithm converged for all the simulations we performed using

several different values for the smoothing parameter and multiple utility function

parameters.)

The iterations above preserve several attributes of the iterates, which we can then

conclude holds for V . First of all, concavity of V (k) is preserved, i.e., if V (k) is concave,

so is V (k+1). It is clear that normalization does not affect concavity, since we simply

add a constant to the function. The Bellman operator T preserves concavity since

partial maximization of a function concave in two sets of variables results in a concave

function (see, i.e., [BV04, §3.2]), and expectation over a family of concave functions

yields a concave function; finally, addition (of U) preserves concavity. So we can

conclude that V is concave.

Another attribute that is preserved in value iteration is monotonicity; if V (k) is

monotone increasing (in each component of its argument), then so is V (k+1). We

conclude that V is monotone increasing.

3.3.4 Example value functions

Figures 3.1 and 3.2 show example value functions we obtained using the value iteration

algorithm described above. In the top plot, we keep the utility function constant and

vary the smoothing times. We use α = 1/2, β = 1, and average smoothing times

T = 1, 5, 10, 20, 25, 50.

In the bottom plot, we vary the risk aversion parameter with fixed smoothing param-

eter (fixed average smoothing time). We use

α = 1/10, 1/3, 1/2, 2/3, 3/4,


s

V(s)

0 0.25 0.5 0.75 1

10

20

30

40

Figure 3.1: Value functions for T = 1, 5, 10, 20, 25, 50.

and T = 1, β = 1. In the top figure that as T increases, the value functions have

increasingly less curvature, i.e., the value of being in a state increases from one value

function to the next with increasing T . Roughly this says that there is a value reward

for waiting. The curves in the bottom plot are less intuitive to interpret by looking

only at the value functions. We will see later that a difference in the risk aversion

parameter has quantifiable effect on the optimal policy.

3.3.5 No-transmit region

For some values of z and g the optimal action is to not transmit, i.e., ϕ⋆(z, g) = 0.

From the form of the optimal policy, we see that ϕ(z, g) = 0 if and only if w = 0 is


s

V(s)

0 0.25 0.5 0.75 1

0.5

1

1.5

Figure 3.2: Value functions for α = 1/10, 1/3, 1/2, 2/3, 3/4.


optimal for the (convex) problem

maximize V (Θz + (I −Θ)w)− λφ(1Tw, g)

subject to w ≥ 0,

with variable w ∈ Rn. This is the case if and only if

−λ∇φ(0, g) ∈ ∂(−V (Θz)),

where ∂ denotes the subdifferential.

Differentiable V . For V differentiable, the optimality conditions simplify to give

(I −Θ)∇V (Θz) + λφ′(0, g)1 ≤ 0

(see, e.g., [BV04, p142]). We can rewrite this as

∂V

∂zi(Θz) ≤

λφ′(0, g)

1− θi, i = 1, . . . , n.

Using the specific power function (3.2) associated with the log capacity formula, we

obtain

∇V (Θz) ≤λ

µg

(

1

1− θ1, . . . ,

1

1− θn

)

(3.8)

as the necessary and sufficient condition under which ϕ(z, g) = 0. Since ∇V is

decreasing (by concavity of V ), we can interpret (3.8) roughly as follows: Do not

transmit if the channel is bad (g small) or if the smoothed flows are large (z large).


3.4 Single flow case

3.4.1 Optimal policy

In the case of a single flow (i.e., n = 1) we can easily carry out value iteration numer-

ically, by discretizing the argument z and values of g, and computing the expectation

and maximization numerically. For the single flow case, then, we can compute the

optimal policy and optimal performance (up to small numerical integration errors).

3.4.2 Power law suboptimal policy

We replace the optimal value function (in the above optimal flow policy expression)

with a simple analytic approximation of the value function to get the approximate

policy

ϕ(z, g) = argmaxw≥0

(

V (θz + (1− θ)w)− λφ(w, g))

, (3.9)

where V (z) is an approximation of the value function.

Since V is increasing, concave, and satisfies V (0) = 0, it is reasonable to fit it with

a power law function as well, say V (z) ≈ βzα, with β > 0, α ∈ (0, 1). For example,

we can find the minimax (Chebyshev fit) by varying α; for each α we choose β to

minimize

maxi

∣

∣

∣Vi − βzαi

∣

∣

∣,

where zi are the discretized values of z, with associated value function values Vi. We

do this by bisection on β.

Experiments show that these power law approximate functions are, in general,

reasonable approximations for the value function. For our power law utilities, these

approximations yield very good matches to the true value function. We let J be

the objective value obtained using our power law approximate suboptimal policy ϕ.


θ 0.37 0.90 0.98J⋆ 0.26 0.32 0.34J 0.25 0.31 0.32η 4% 3.2% 6.3%

Table 3.1: Comparing the optimal objective value J⋆ with J .

(Recall that J⋆ is the optimal objective value associated with ϕ⋆.) To determine how

well our power law policy performs, we run both controllers and compare objective

values. Table 3.1 shows J and J⋆ for different θ (η is the percent error in objective

value). Clearly ϕ is a good approximation for ϕ⋆.

For other concave utilities, the approximation is not as accurate; but experiments

show that the associated approximate policies still yield nearly optimal performance.

We can derive an explicit expression for the approximate policy (3.9) for our power

function:

ϕ(z, g) =

κ(z, g)− γz κ(z, g)− γz > 0

0 κ(z, g)− γz ≤ 0,(3.10)

where

κ(z, g) = µ(1− α)W

(µ(1− α)(1− θ))−1e(γz

µ(1−α))

(

λβα(1−θ)gµ

)1/(1−α)

, γ = θ/(1− θ),

and W is the Lambert function, i.e., W(a) is the solution of wew = a [CGH+96].

Note that this suboptimal policy is not needed in the single flow case since we can

obtain the optimal policy numerically. However, we found that the difference between

our power law policy and the optimal policy (see example value functions below) is

small enough that in practice they are virtually the same. This approximate policy

is needed in the case of multiple flows.


3.4.3 Example

In this section we give simple numerical examples to illustrate the effect of smoothing

on the resulting flow rate policy in the single flow case. We consider two examples,

with different levels of smoothing. The first flow is lightly smoothed (T = 1; θ = 0.37),

while the second flow is heavily smoothed (T = 50; θ = 0.98). We use utility function

U(s) = s1/2, i.e., α = 1/2, β = 1 in our utility (3.1). The channel gains gt are IID

exponential variables with mean E gt = 1. We use the power function (3.2), with

µ = 1.

We first consider the case λ = 1. The value functions are shown in figure 3.3,

together with the power law approximations, which are 1.7s0.6 (light smoothing) and

42.7s0.74 (heavy smoothing). Figure 3.4 shows the optimal policies for the lightly

smoothed flow (θ = 0.37), and the heavily smoothed flow (θ = 0.98). We can see

that the optimal policies are quite different. As expected, the lightly smoothed flow

transmits more often, i.e., has a smaller no-transmit region.

Average power vs average utility. Figure 3.5 further illustrates the difference

between the two flow rate policies. Using values of λ ∈ (0, 1], we computed (via sim-

ulation) the average power-average utility trade-off curve for each flow. As expected,

we can see that the heavily smoothed flow achieves more average utility, for a given

average power, than the lightly smoothed flow. (The heavily smoothed flow requires

less average power to achieve a target average utility.)

Comparing average power. We compare the average power required by each flow

to generate a given average utility. Given a target average utility, we can estimate

the average power required roughly from figure 3.5, or more precisely via simulation

as follows: choose a target average utility, and then run each controller, adjusting

λ separately, until we reach the target utility. In our example, we chose U = 0.7


s

V(s),V(s)

0 1 2

1

2

3

s

V(s),V(s)

0 1 2

20

40

60

80

Figure 3.3: Top. Comparing V (blue) with V (red, dashed) for the lightlysmoothed flow. Bottom. Comparing V (blue) with V (red, dashed) for the heavilysmoothed flow.


01

23

45

0

1

2

3

0

0.5

1

1.5

g

ϕ⋆(s,g)

01

23

45

0

1

2

3

0

0.5

1

1.5

gs

ϕ⋆(s,g)

Figure 3.4: Top. Optimal policy ϕ⋆(s, g) for smoothing time T = 1 (θ = 0.37).Bottom. Optimal policy for T = 50 (θ = 0.98).


0 0.5 1 1.5 2 2.5 3

0.4

0.5

0.6

0.7

0.8

0.9

1

P

U

Figure 3.5: Average utility versus average power; heavily smoothed flow (top,dashed), lightly smoothed flow (bottom).

and found λ = 0.29 for the lightly smoothed flow, and λ = 0.35 for the heavily

smoothed flow. Figure 3.6 shows the associated power trajectories for each flow, along

with the corresponding flow and smoothed flow trajectories. The dashed (horizontal)

line indicates the average power, average flow, and averaged smoothed flow for each

trajectory. Clearly the lightly smoothed flow requires more power than the heavily

smoothed flow, by around 25%: the heavily smoothed flow requires P = 0.7, compared

to P = 0.93 for the lightly smoothed flow.


0 50 1000

0.5

1

1.5

0 50 1000

0.5

1

1.5

0 50 1000

0.51

1.52

0 50 1000

0.51

1.52

0 50 1000

0.5

1

1.5

0 50 1000

0.5

1

1.5

P

tt

sf

Figure 3.6: Sample power, flow, and smoothed flow trajectories; lightlysmoothed flow (left), heavily smoothed flow (right).


α P1 P2 η

1/10 0.032 0.013 59%1/3 0.59 0.39 34%1/2 0.93 0.70 25%2/3 1.15 0.97 16%3/4 1.22 1.08 11%

Table 3.2: Average power required for target U = 0.7, lightly smoothed flow(P1), heavily smoothed flow (P2).

Utility curvature. Table 3.2 shows results from similar experiments using different

values of α, η = (P1 − P2)/P1. We see that for each α value, as expected, the heavily

smoothed flow requires less power. Note also that η decreases as α increases. This is

not surprising as lower curvature (higher α) corresponds to lower risk aversion.

3.5 A suboptimal policy for the multiple flow case

3.5.1 Approximate dynamic programming policy

In this section we describe a suboptimal policy that can be used in the multiple flow

case. Our proposed policy has the same form as the optimal policy, with the true

value function V replaced with an approximation or surrogate V adp:

ϕadp(z, g) = argmaxw≥0

(

V adp(Θz + (I −Θ)w)− λφ(1Tw, g))

. (3.11)

A policy obtained by replacing V with an approximation is called an approximate

dynamic programming (ADP) policy [BT96, Pow07, Man60]. (Note that by this

definition (3.9) is an ADP policy for n = 1.)

We construct V adp in a simple way. Let Vj : R+ → R denote the power law

approximate function for the associated single flow problem with only the jth flow.


(This can obtained numerically as described above.) We then take

V adp(z) = V1(z1) + · · ·+ Vn(zn).

This approximate value function is separable, i.e., a sum of functions of the indi-

vidual flows, whereas the exact value function is (in general) not. The approximate

policy, however, is not separable; the optimization problem solving to assign flow

rates couples the different flow rates.

In the literature on approximate dynamic programming, Vj would be considered

basis functions [SS85, TZ97, dFR03]; however, we fix the coefficients of the basis

functions as one. (We have found that very little improvement in the policy is obtained

by optimizing over the coefficients.) Evaluating the approximate policy, i.e., solving

(3.11), reduces to solving the optimization problem

maximize∑n

j=1 Vj(θjzj − (1− θj)fj)− λφ(F, g)

subject to F = 1Tf, fj ≥ 0, j = 1, . . . , n,(3.12)

with optimization variables fj, F . This is a convex optimization problem; its special

structure allows it to be solved extremely efficiently, via water-filling. We let

Jadp = Uadp − λP adp,

denote the objective value obtained using our ADP policy.

3.5.2 Solution via water-filling

In this section we give a solution (3.12) using the water-filling method described earlier

in §3.3.1. Recall the single flow power law approximate functions Vj(zj) = βjzαj .


Substituting into our expression for V adp we get

V adp(z) =n∑

j=1

βjzαj .

The resulting resource allocation problem is

maximize∑n

j=1 βj(θjzj − (1− θj)fj)αj − λφ(F, g)

subject to F = 1Tf, fj ≥ 0, j = 1, . . . , n,

with optimization variables fj , F .

At each time t, we are to maximize

n∑

j=1

(

βj(θjzj + (1− θj)fj)αj − νfj

)

− (λφ(F, g)− νF ),

over variables fj ≥ 0, where as before, ν > 0 is a Lagrange multiplier associated with

the equality constraint. We can express fj in terms of ν (since ν is chosen):

fj =1

1− θj

(

αjβj(1− θj)

ν

)1/(1−αj )

− θjzj

+

.

We then use bisection on ν to find the value of ν for which

1T f = µ log(νµg/λ).

Since our surrogate value function is only approximate, there is no reason to solve

this to great accuracy; experiments show that around 5–10 bisection iterations is more

than enough.

Each iteration of the water-filling algorithm has a cost that is O(n) which means

that we can solve (3.12) very fast.


3.5.3 ADP example

In this section we compute ADP controllers in the multi-flow case for a set of λ values

corresponding to different tradeoff average utility and average power values.

Our multi-flow example consists of three flows on the single link, i.e., st ∈ R3+, each

with a different smoothing time. The total utility is U(st) = ((st)1)1/2 + ((st)2)

1/2 +

((st)3)1/2, corresponding to utility parameter α = 1/2 for each flow. We use power

function φ(ft, gt) = (e1T ft − 1)/gt where gt is exponential with E gt = 1. We take

µ = 1 and 200 discrete values of λ ∈ [0.5, 10]. Finally we take the smoothing times

to be T = {1, 10, 50}. Our approximate value function is

V adp = 1.7(st)0.61 + 9.8(st)

0.752 + 42.7(st)

0.743 .

Constructing our ADP value function is trivial provided we have computed the asso-

ciated single flow power law approximate value functions. Table 3.3 gives the single

flow power law function parameters for T = {1, 2, 3, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50}.

(Note that the example can be easily extended to many more flows.) Each λ > 0

obtains an ADP controller (P adp, Uadp), in the (P , U) plane. Figure 3.7 shows the

ADP controllers obtained using our ADP policy (power law approximate functions).

As we would expect, higher λ values correspond to lower total average power. In the

section following we characterize ADP performance.


T α β

1 0.60 1.722 0.64 2.963 0.66 4.045 0.70 5.3010 0.74 9.7815 0.74 13.7420 0.76 18.3925 0.74 22.9330 0.74 27.1635 0.74 31.6540 0.74 35.7245 0.74 39.6350 0.73 42.70

Table 3.3: Power law approximate functions parameters for different smoothingtimes.


P

U

0.25 0.5 0.75 10

0.5

1

1.5

Figure 3.7: ADP controllers for 200 values in λ ∈ [0.5, 10].

3.6 Upper bound policies

In this section we describe two heuristic data flow rate policies that achieve upper

bounds for the (multi-flow) flow rate control problem (upper bounds on the optimal

objective value J⋆). These upper bounds give us a way to measure the performance

of our suboptimal flow policy ϕadp: If Jadp is close to an upper bound, then we know

that our suboptimal flow policy is nearly optimal.

3.6.1 Steady-state policy

The steady-state policy is given by

ϕss(s, gt) = argmaxft≥0

(U(s)− λPt) , (3.13)


where gt is channel gain at time t, and s is the steady-state flow rate vector (indepen-

dent of time) obtained by solving the optimization problem

maximize U(s)− λφ(s,E g)

subject to s ≥ 0,

with optimization variable s, and λ known. Let J ste be our steady-state upper bound

on J⋆ obtained using the policy (3.13) to solve (3.3). Note that in the above opti-

mization problem, we ignore time (and hence, smoothing), and variations in channel

gains, and so for each λ, s is the optimal (steady-state) flow vector. (This is sometimes

called the certainty equivalent problem associated with the stochastic programming

problem [BL97, Pre95].)

By Jensen’s inequality (and convexity of the max) it is easy to see J ste is an

upper bound on J⋆. Note that once s is determined, we can evaluate (3.13) using the

water-filling algorithm described earlier.

3.6.2 Prescient policy

To obtain a prescient upper bound on J⋆ we relax the causality requirement imposed

earlier on the flow policy in (3.4) and assume complete knowledge of the channel

gains for all t, i.e., we have complete knowledge of future channel gains. (We refer

the reader to [BSS10] for more details on prescient bounds.) The flow rate control

problem becomes one (large) multi-period optimization problem

maximize J

subject to sτ+1 = Θsτ + (I −Θ)fτ , τ = 0, 1, . . . , N − 1

Fτ = 1Tfτ , fτ ≥ 0, τ = 0, 1, . . . , N − 1,

(3.14)


where the optimization variables are the flow rates f0, . . . , fN−1, and smoothed flow

rates s1, . . . , sN . The problem data are the initial smoothed flow rate vector s0 and

the channel gains sequence. g0, . . . , gN−1. (Note the initial smoothed flow rate vector

will not affect the computation and resulting upper bound.) This is a convex opti-

mization problem. For each channel realization, the optimal J is a random variable

(parameterized by λ) so we evaluate via Monte Carlo simulation (for large N) – solve

(3.14) for N independent realizations of the channel gains taking the mean over N as

the upper bound. We let

Jpre = Upre − λP pre,

denote our prescient upper bound on J⋆.

3.6.3 Numerical example

In this section we compare the performance of our simple ADP policies to the prescient

policy described above. We use the previous problem instance from §3.5.3. For each

λ ∈ [0.5, 10]), we carried out Monte Carlo simulation (100 realizations, each with

N = 1000 time-steps).

Each λ obtains a point (P pre, Upre) in the U – P plane; the line that passe through

(P pre, Upre) with slope λ: U − Upre = λ(P − P pre) is a cutting plane. Every feasible

controller must lie on or below the cutting plane for all values of λ, i.e., the points

above the line are infeasible for all values of the tradeoff parameter. Figure 3.8

illustrates the cutting plane (blue) for λ = 0.5. We plot on the same figure the

corresponding ADP controller (red). We call the shaded portion of the graph the

infeasible controller region.

In figures 3.9 and 3.10 we plot the corresponding bounds and ADP controllers for

λ = {0.5, 1} and λ = {0.5, 1, 1.5}. Each additional λ value, produces an associated

infeasible controller region. The resultant infeasible controller region is the union of


P

U

0.25 0.5 0.75 10

0.5

1

1.5

Figure 3.8: Prescient bound (blue), ADP controller (red), for λ = 0.5, infeasiblecontroller region (shaded).

the individual infeasible regions explaining why its size increases. (We show each

additional region with a darker shade of gray.) Notice that the ADP controllers are

all feasible and appear close to the bounding lines.

Figure 3.11 shows the ADP controllers and the associated upper bounds (and

resultant infeasible controller region) for all 200 λ ∈ [0.5, 10]. We can see that all

the ADP controllers are feasible and perform well to very well depending on λ. For

example, for λ = 1, Jadp = 0.47 (Uadp = 0.69, P adp = 0.22), and Jpre = 0.5 (Upre =

0.74, P pre = 0.24), so we know that 0.47 ≤ J⋆ ≤ 0.5. So in this example, for λ = 1,

Jadp is no more than 0.03-suboptimal.


P

U

0.25 0.5 0.75 10

0.5

1

1.5

Figure 3.9: Prescient bound (blue), ADP controllers (red), for λ = {0.5, 1},infeasible controller region (shaded).

P

U

0.25 0.5 0.75 10

0.5

1

1.5

Figure 3.10: Prescient bound (blue), ADP controller (red), for λ = {0.5, 1, 1.5},infeasible controller region (shaded).


P

U

0.25 0.5 0.75 10

0.5

1

1.5

Figure 3.11: Prescient bound (blue), ADP controllers (red), for 200 values ofλ ∈ [0.5, 10], infeasible controller region (shaded).

3.7 Other concave utilities

In this section we resolve the single flow rate control problem using the log utility,

which is common in the economics and networking literature. We also solve the

problem using a saturated linear utility, which is an example of a utility where the

value function approximation is not as accurate.

3.7.1 Log utility

We let U(z) = log(z) be the utility function from the general family of utility functions

defined for x > 0, parameterized by α ∈ [0, 1] and β > 0, having the form

Uαβ(x) =

log(x/β) α = 0

((x/β)α − 1)/α 0 < α ≤ 1,(3.15)


where log is a special case, for α = 0. When α = 0, the effect of β is additive in the

utility, and so has no effect when maximizing utility. We use β = 1.

We consider two examples. The first flow is lightly smoothed (T = 1; θ = 0.37);

the second flow is heavily smoothed (T = 100; θ = 0.99). The channel gains gt are

IID exponential variables with mean E gt = 1. We use the power function (3.2), with

µ = 1.

We see from figure 3.12, which shows the (optimal) value functions together with

their power law approximate value functions (8.9s0.25 (light smoothing) and 192.5s0.26

(heavy smoothing)), that the approximations are very accurate. The corresponding

power law approximate policies are shown in figures 3.14 and ??. Again, we see the

same characteristics in the previous examples. We notice an even larger no-transmit

region here, which is due not only to the longer smoothing time (T = 100) but also

because of the risk-limit properties of log utility.

Average power vs average utility. Figure 3.15 shows the average power versus

average utility curves for both flows. The bottom plot illustrates further the average

power difference requirements. The average power required to generate U = −1.5

is P = 0.23 (λ = 0.2.1) for the heavily smoothed flow and P = 2.3 (λ = 0.25) for

the lightly smoothed flow. This corresponds to a 10× reduction in power for this

example.

Figure 3.16 shows sample power trajectories for both flows.

3.7.2 Saturated linear utility

In our second example we use the saturated linear utility function given by

Uαβ(x) = min(αx, β), (3.16)


s

V(s),V(s)

0 1 2

2

4

6

8

s

V(s),V(s)

0 1 2

50

100

150

Figure 3.12: Top. Comparing V (blue) with V (red, dashed) for the lightlysmoothed flow. Bottom. Comparing V (blue) with V (red, dashed) for the heavilysmoothed flow.


g s

ϕ⋆(s,g)

0.51

1.5

0

1

2

3

0

1

2

3

g s

ϕ⋆(s,g)

0.51

1.5

0

1

2

30

1

2

3

Figure 3.13: Top. Optimal policy ϕ⋆(s, g) for smoothing time T = 1 (θ = 0.37).Bottom. Optimal policy for T = 100 (θ = 0.99).


g

0.5 1 1.50

1

2

3

g

s0.5 1 1.5

0

1

2

Figure 3.14: Top. Optimal policy level curves for T = 1 (θ = 0.37). Bottom.

Optimal policy level curves for T = 100 (θ = 0.99).


U

P0 0.5 1 1.5 2 2.5 3 3.5

−3

−2.5

−2

−1.5

−1

−0.5U

P0.23 1 1.5 2.28 3 3.5

−3

−2.5

−2

−1.5

−1

−0.5

Figure 3.15: Top. Log utility average utility versus average power. Bottom.

Log utility average utility U = −1.5 versus average power.


P

t0 50 100

2

4

6

8

P

t0 50 100

2

4

6

8

Figure 3.16: Top. Log utility sample power trajectory for lightly smoothed flow.Bottom. Log utility sample power trajectory for heavily smoothed flow.


0 1 20

0.5

1

U(s)

s

Figure 3.17: Saturated linear utility functions: U0.5,0.375, U1,1, U2,0.5.

where α is the linear multiplicative parameter (slope), and β is flow saturation value.

Figure 3.17 above shows functions for different slopes and saturation values. The

slope α is the marginal utility of the function for values less than β. After the flow

saturation value is achieved, there is no additional utility for increase in flow value,

i.e., marginal utility is zero for x > β (maximum utility occurs at β).

For simplicity we use in this example U1,1, which corresponds to a utility with

slope and flow saturation equal to 1 (blue curve in above figure). All other problem

parameters are the same as in §3.4.3.

Figure 3.18 show the (optimal) value functions as well as their power law approx-

imations and figure 3.19 shows the resulting power law policies. We can see that our

value function approximation is not as accurate as with previous utility functions.

Figure 3.20 shows average utility versus average power trade off. It is interesting to

note that smoothing with this utility function does not appear to result in any power

savings. Recall that smoothing has no effect on utility when U is linear because in

this scenario the average utility of the smoothed flow is equal to the smoothed average


utility in the case where the utility is linear,i.e.,

Uj = (1/N)N∑

t=1

U(θjst) = (1/N)N∑

t=1

θjU(st) = U .

3.8 Alternative evaluation of ADP policy

We describe in this section an interior point method that exploits the problem struc-

ture also yielding a very efficient solution method for the ADP flow rate policy, i.e.,

solving (3.12); see, e.g., [MBL10]. Recall the optimization problem

maximize V adp(Θs+ (I −Θ)f)− λφ(F, g)

subject to F = 1Tf, fj ≥ 0, j = 1, . . . , n.

We describe a barrier method. Let us first make some simplifying substitutions to

our problem definition. We can rewrite (3.12) as

maximize U(y)

subject to Ay = 0, yj ≥ 0, j = 1, . . . , n,

where

y = (f, F )

U(y) = V adp(Θs+ (I −Θ)f)− λφ(F, g)

A = (1, . . . , 1,−1)T .

Suppose y⋆ is the minimizing point of (3.12), which we assume exists, we solve for

y⋆ by computing a sequence of points y(k), k = 1, . . ., where y(k+1) = y(k) + t(k)∆y(k)

except when y(k) = y⋆, i.e., y(k) is optimal. The critical step is computing the update


0 10

1

2

s

V(s),V(s)

0 1 20

10

20

30

40

50

60

70

s

V(s),V(s)

Figure 3.18: Top. Value function and power-law approximate for U1,1(s) andT = 1. Bottom. Value function and power-law approximate for U1,1(s) andT = 50.


0

1

00.5

11.5

22.5

0

0.2

0.4

0.6

0.8

gs

ϕ⋆(s,g)

0

1

00.5

11.5

22.5

3

0

0.5

1

1.5

gs

ϕ⋆(s,g)

Figure 3.19: Top. Power law policy ϕ⋆(s, g) for U1,1(s) and smoothing timeT = 1 (θ = 0.37). Bottom. Policy law policy for U1,1(s) and T = 50 (θ = 0.98).


0 0.5 1 1.5 2 2.5 30

0.1

0.2

0.3

0.4

0.5

P

U

0 50 1000

1.5

0 50 1000

1.5

0 50 1000

1

2

0 50 1000

1

2

0 50 1000

1

2

0 50 1000

1

2

P

tt

sf

Figure 3.20: Top. Saturated linear U1,1 tradeoff curves for T = 1, 50. Bottom.

Power, flow and smoothed flow trajectories.


t(k)∆y(k) which includes a scalar step length t(k) > 0, and a search direction (or step)

∆y(k). We compute the step length using a backtracking line search [BV04, §9] and

the search direction using Newton’s method [BV04, §9].

Logarithmic barrier. Barrier methods allow us to approximate inequality con-

strained problems as equality constrained problems to which we can apply standard

iteration algorithms. We use a logarithmic barrier function, i.e., we replace the in-

equality constraints with −(1/t)∑n

j=1 log fj where t > 0 is a parameter that sets the

accuracy of the approximation. We let ψ(y) =∑n

j=1 log fj giving the approximation

minimize −U(y)− (1/t)ψ(y)

subject to Ay = 0, yj ≥ 0, j = 1, . . . , n,(3.17)

with optimization variables yj, j = 1, . . . , n+1. For t > 0, the objective is convex and

so (3.17) is a convex optimization problem and the solution defined as y⋆(t) (assuming

it exists) is unique. (The approximation gets better as t increases).

Optimality conditions Using the dual variable ν ∈ R we can write the optimality

conditions for the approximate problem as

Ay⋆ = 0, fj ≥ 0, j = 1, . . . , n

λj ≥ 0, j = 1, . . . , n

−∇U(y⋆) +∑n

j=1λj∇ψ(y

⋆) + ATν⋆ = 0

−λjfj = 1/t, j = 1, . . . , n.

Newton method. Since the objective in (3.17) is twice differentiable we calculate

the search direction using Newton’s method. (We assume that the necessary technical

conditions hold). The primal Newton step, ∆y and the associated dual Newton step,


∆ν) at (y, ν), are given by the linear equations

−∇2U(y)− 1t∇2ψ(y) AT

A 0

∆y

∆ν

= −

−∇U (y)− 1t∇ψ(y) + ATν

Ay

.

(3.18)

In the primal-dual interpretation of the equations in (3.18), the right-hand side is

called the residual. The residual vector r(y, ν) : Rn+1×R → Rn+1×R, is a function

of the primal variable y and the dual variable ν. The optimality conditions can be

expressed simply as r(y⋆, ν⋆) = 0, with

r(y, ν) = (rdual(y, ν), rprimal(y, ν)),

defined explicitly as

rdual(y, ν) = −∇U(y)− 1/t ∇ψ(y) + ATν, rprimal(y, ν) = Ay,

The Newton method is as follows

given a starting point y ∈ dom U , ν, tolerance ǫ > 0, β ∈ (0, 1), α ∈ (0, 1/2).

repeat

1. Compute primal and dual Newton step ∆y, ∆ν.

2. Backtracking line search on ‖r‖2.

t := 1.

while ‖r(y + t∆y, ν + t∆ν)‖2 > (1− αt)‖r(y, ν)‖2, t := βt.

3. Update. y := y + t∆y, ν := ν + t∆tν.

until Ay = 0 and ‖r(y, ν)‖2 ≤ ǫ.

Note that although the algorithm requires an primal initialization point y ∈ U , it need

not be feasible. We will see that the y at which the Newton method is initialized could

have an effect on the the number of iterations needed to converge. The initial dual


value ν, however, does not affect algorithm performance.

Forming and solving the KIT system. In practice, the main drawback of the

Newton method is the cost of forming the Hessian (top left block) of the KKT matrix

in equation (3.18) and solving for the newton step: ∆y and ∆ν. In our problem, we

show that we can exploit the special structure of the KKT system to compute the

Newton step in O(n) time instead of O(n3) time of a generic matrix inversion.

We let d ∈ Rn+1 be the gradient of the objective in (3.17), i.e.,

d = −∇U (y)− 1/t ∇ψ(y),

to get

dj = −Kjαj(1− θj)(θjsj + (1− θj)yj)αj−1 −

1

tyj, j = 1, . . . , n, dn+1 =

exp(yn+1)

g.

We let H be the Hessian, i.e.,

H = −∇2U(y)−1

t∇2ψ(y),

with

H =

H1

. . .

Hn

Hn+1

,

where

Hj = −Kj(α2j − α)(1− θj)

2(θ1sj + (1− θj)yj)αj−2 + 1/(ty2j ), j = 1, . . . , n,


and

Hn+1 = exp(yn+1)/g.

Let g = g + AT ν and h = Ay, we can write (3.18) as

H AT

A 0

∆y

∆ν

= −

g

h

.

It is easy to see that H is positive definite, i.e., H ≻ 0, and so we can solve the system

of equations via block elimination:

∆ν = −(AH−1AT )−1(h− AH−1g), ∆y = H−1(g − AT∆ν).

The cost of the elimination method is summarized as follows.

1. Invert H (n operations, since H is diagonal).

2. Invert AH−1AT (n operations, AT ∈ Rn+1, AH−1AT ∈ R).

3. Evaluate (AH−1AT )−1(h−AH−1g) (n operations).

4. Evaluate H−1(g − AT∆ν) (n operations).

The total cost therefore is O(n).

3.9 Conclusion

In this chapter we present a variation on a multi-period stochastic network utility

maximization problem as a constrained convex stochastic control problem. We show

that judging flow utilities dynamically, i.e., with a utility function and a smoothing

time scale, is a good way to account for network applications with heterogeneous rate

demands.


For the case of a single flow, our numerically computed value functions obtain

flow policies that optimally trade-off average utility and average power. We show

that simple power law functions are reasonable approximations of the optimal value

functions, and that these simple functions obtain near optimal performance.

For the case of multiple flows on a single link (where the value function is not prac-

tically computable using dynamic programming), we approximate the value function

with a combination of the simple one-dimensional power law functions. Simulations,

and comparison with upper bounds on the optimal value, show that the resulting

ADP policy can obtain very good performance.

Chapter 4

Queue-Length-Based 2-Hop Rate

Control

In this chapter we consider a simple variation to the flow rate control problem from

§3.2.3, where we have a relay node (buffer), through which flows traverse from source

(transmitter) to destination (receiver). We consider the problem of choosing the

data flow rates on the (two) wireless links with randomly varying channel gains to

optimally trade off average utility of the instantaneous data flow rate at the receiver

and the average system costs. Cost consists of the average transmit power costs on the

wireless links and average costs incurred at the relay node. We pose the problem as

a convex stochastic control problem with linear dynamics. Using stochastic dynamic

programming, we show the effect of different utility and cost functions on flow rates

and buffer occupancy. We illustrate our results via numerical examples.

4.1 Background

Throughput-optimal scheduling based on queue-length, backpressure, and differen-

tial backlog algorithms are introduced in [TE92] have been widely extended and

72

CHAPTER 4. 2-HOP 73

now include fair scheduling algorithms also based on queue-length and elastic traf-

fic sources [ES07, Sto05, NML08, ES07, GNT06]. Cross-layer network optimization

[OGB08a, SS07], decomposition [CLCD07, PC06], and distributed [LR07, PC06], so-

lution methods have also been employed for these problems.

Our setup includes cross-layer optimization via channel-aware utility maximiza-

tion. We augment the (known) queue-length information with the random channel

gains as the system state variable for our causal feedback data flow rate control policy.

4.2 Network description

(ft)11 (ft)21

(ft)1n (ft)2n

qt

Figure 4.1: Single-hop network.

A wireless network comprised of one transmitter, one relay, and one receiver sup-

ports n data flows across m wireless channels that vary with time. Network time is

modeled using discrete-time intervals t = 0, 1, . . .. The relay contains a buffer where

a queue is maintained for each data flow in the network. In this simple problem

setup there are m = 2 (by construction), wireless links: the first link supports traffic

between the transmitter and the relay node, and the second link supports traffic be-

tween the relay and the receiver as shown in figure 4.1. We let Ft ∈ Rm×n denote the

flow rate matrix where (ft)ij is the flow rate on link i, for flow j, at time t. We let

(ft)i ∈ Rn+ (the ith row of Ft) be the vector of flow rates on link i. We let f (j)

t ∈ Rm+

CHAPTER 4. 2-HOP 74

(the jth column of Ft) be the vector of flow rates associated with flow j.

4.2.1 Average utility

Network utility is associated with the flow rate delivered at the receiver. In the single-

hop configuration therefore, this corresponds to the utility derived on the link from

relay to receiver, i.e., the second wireless link. At each time t the link utility is given

by

U(ft)2 = U1(ft)21 + · · ·+ Un(ft)2n.

The overall average utility is given by

U = limN→∞

E1

N

N−1∑

t=0

U(ft)2,

where we assume that the expectation exists. We take the utility functions Uj, j =

1, . . . , n, to be concave nondecreasing differentiable functions. Network satisfaction is

thus derived only as a result of packets delivered to the (destination) receiver. (There

is no direct utility for the intermediate transmission from source to relay.) Total

average utility is separable across flows.

4.2.2 Average cost

Average power cost. The flow rates will depend on the random channel gains

associated with the time varying wireless channel and so are random variables. We

let gt ∈ Rm++ denote the channel gain vector, where at time t, (gt)i is the channel

gain parameter on link i; we assume that (gt)i are independent identically distributed

(IID) random variables, with known distribution. In general wireless links in near

proximity interfere with each other and the link gains are therefore correlated. We

choose here to assume independence for simplicity of our problem setup. At each

CHAPTER 4. 2-HOP 75

time t, let Tt ∈ Rm+ denote the total flow rate vector where

(Tt)i = (ft)i1+, · · · ,+(ft)in,

is the total flow on link i. The power required to support Tt (and the corresponding

network flow rate matrix) is given by

Pt = ψ(Tt, gt),

where

ψ(Tt, gt) =m∑

i=1

ψi((Tt)i, (gt)i),

where ψi is convex and increasing in (Tt)i. The total average power is

P = limN→∞

E1

N

N−1∑

t=0

Pt.

The total average power is separable in the links (flow rate vectors) but not necessarily

separable in the flows.

Average queue cost. The relay maintains n queues, one for each flow. We let

qt ∈ Rn+ denote the queue length vector where at time t, (qt)j is the value of the jth

queue. The jth queue evolves according to the linear dynamical system

(qt+1)j = (qt)j + Af(j)t , A =

[

1 −1]

, j = 1, . . . , n

where A describes the queue balance equation. We impose a queue maintenance

cost, which encourages transmission when feasible from the relay to the receiver. The

queue cost also discourages the queue lengths from growing unbounded. At time t,

CHAPTER 4. 2-HOP 76

the total queue cost is given by Ct = χ(qt), where

χ(qt) =

n∑

j=1

χj(qt)j,

and χj are strictly convex and increasing in queue length. The total average queue

maintenance cost is

C = limN→∞

E1

N

N−1∑

t=0

Ct.

The total average queue cost is separable across the flows.

4.2.3 Flow constraints

We require that flow rates are nonnegative for all t. We require also that the queue

lengths are nonnegative: At any time t, the flow rates assigned on link 2 must be less

than or equal to the associated queue lengths from the previous time period t − 1,

i.e., at any time t, you can not assign more rate than was in the buffer at time t− 1.

Let Ct denote the set of feasible flow rates at each time t. We get

Ct = {(ft)ij | (ft)ij ≥ 0, (qt−1)j − (ft)2j ≥ 0, i = 1, 2, j = 1, . . . , n}.

Roughly, we can summarize the constraint set as follows. The channel gains on link 1

drive the flow rates on the link and consequently the queue lengths. Queue size and

link 2 channel gains directly determine network utility.

CHAPTER 4. 2-HOP 77

4.2.4 2-hop flow control problem

The overall objective is to maximize a weighted difference between average utility,

and average costs. The objective is given by

J = U − λ(P + C), (4.1)

where λ is a tradeoff parameter between average utility and average cost. At each

time t, we know the previous flow rates, and the current and previous queue lengths

and channel gains. We wish to find a stationary flow rate control policy, i.e., function

ϕ : Rn+ ×Rn

+ × Rm++ → Rm×n

+ ,

with

Ft = ϕ(st, qt, gt), t = 0, 1, . . . ,

such that the constraints are satisfied at each time t, i.e., Ft ∈ Ct.

The flow rate control problem is to choose the flow rate policy ϕ to maximize the

overall objective in (4.1) while satisfying the flow rate constraints. This is a standard

convex stochastic control problem, with linear dynamics.

4.3 Optimal policy

The optimal policy is characterized by

ϕ⋆(q, g) = argmaxC

(

V (qT + Aw)− λψ(T, g))

(4.2)

CHAPTER 4. 2-HOP 78

where V : Rn → R is the Bellman (relative) value function. The value function (and

optimal value) is characterized via the fixed point equation

J⋆ + V = T V,

where for any function S : Rn+ → R, the Bellman operator T is given by

(T S)(z) = U(w2) + E

(

maxW∈C

(

S(qT + Aw2)− λψ(T, g))

)

,

where the expectation is over g. The fixed point equation and Bellman operator are

invariant under adding a constant; that is, we have T (S + a) = T S + a, for any

constant (function) a, and similarly, V satisfies the fixed point equation if and only

if V + a does. So without loss of generality we can assume that V (0) = 0. Similar

to §3.3.3 the value function can be found by value iteration [Bla62, Der66, Der70,

Sen96, Ber98, Ber05, Ber07], i.e., repeated application of the Bellman operator until

the value function converges.

Notice that the value function is a function only of the queue length vector q (not

smoothed flow). As a result, the monotonicity property from §3.3.3 no longer applies.

We can see why this is true using a simple proof by contradiction: Infinitely long

queues are undesirable for our problem.

CHAPTER 4. 2-HOP 79

4.4 One flow

Consider a simple scenario with one flow (n = 1). To simplify notation we let qt =

(qt)1 and ft = ((ft)1, (ft)2), as shown in figure 4.2, (qt ∈ R, and ft ∈ R2.) The queue

(ft)1 (ft)2

qt

Figure 4.2: Single hop, single flow.

length update is given by

qt+1 = qt + Aft,

with constraint set

C = {f | f1 ≥ 0, f2 ≥ 0, q − f2 ≥ 0}.

The value function characterized by

V (q) = E

(

maxf∈C

(U(f2) + V (q + Af)− λ(χ(q) + ψ(f, g)))

)

,

and the optimal policy is given by

ϕ⋆(q, g) = argmaxw∈C

(V (q + Aw)− λψ(w, g)) .

CHAPTER 4. 2-HOP 80

We can evaluate the optimal policy by solving an optimization problem. At each time

t, we must solve problem

maximize V (q + Af)− λψ(f, g)

subject to f1 ≥ 0, f2 ≥ 0, q − f2 ≥ 0,(4.3)

where the optimization variable is f = (f1, f2). Similar to the case of one dimension

in the previous chapter, we can compute the value function easily using using value

iteration and gridding. For concave utility and convex cost, we know that V is a

concave function and so (4.3) is a convex optimization problem.

4.4.1 Approximate policy

Since V is concave, and satisfies V (0) = 0, we can find a reasonable analytic approx-

imation, say V , using a mixture of power and polynomial (typically degree M < 3)

functions. Our approximate function V is given by

V (z) = kzα +

M∑

p=1

cpzp,

with α ∈ (0, 1). The approximate policy is given by

ϕ(q, g) = argmaxw∈C

(

V (q + Aw)− λψ(w, g))

.

4.5 Numerical Examples

In this section we give numerical examples. We take utility U(z) = z1/2 and we use

function ψi(fi, gi) = (efi − 1)/gi, to represent the power required to support flow on

link i. We take the channel gains to be exponentially distributed with E gt = (1, 1),

i.e., channel gains on each link are exponentially distributed random variables with

CHAPTER 4. 2-HOP 81

mean 1. (Channel gains are independent across links and time.)

4.5.1 Quadratic queue cost

In these first set of examples we impose a (weighted) quadratic queue maintenance

cost

χ(q) = Kq2, K = {0.1, 0.25, 0.5, 0.75, 1}.

Table 4.1 gives the approximate value function parameters for the different weights.

We let J sqr be the overall objective value when χ(q) = Kq2. For λ = 1 we get

χ(q) α k c1 c2

0.1q2 0.39 0.43 0.60 −0.570.25q2 0.40 0.45 0.60 −0.990.5q2 0.45 0.56 0.44 −1.490.75q2 0.50 0.74 0.14 −1.85

q2 0.55 0.95 −0.19 −2.16

Table 4.1: Approximate value function parameters for K ={0.1, 0.25, 0.5, 0.75, 1}.

J sqr = {0.110, 0.104, 0.0986, 0.0951, 0.0924}.

Figure 4.3 shows the optimal value function for K = 0.5 along with the approximate

function (red dashed) given by V (q) = 0.56q0.45+0.44q−1.49q2. Maximum (positive)

utility Vmax = 0.32 is achieved at q = 0.31 (qmax). Queue lengths greater than q = 0.73

(qzero) contribute negative utility to the overall objective. Figure 4.4 shows sample

trajectories for this K = 0.5. The top and bottom plots show the trajectories for flow

1 and flow 2. The middle plot shows queue length trajectory, with the dotted line

indicating the maximal utility queue-length qmax = 0.31. We can see that the queue

length never exceeds qmax. We infer a suitable maximum buffer size of 0.31.

CHAPTER 4. 2-HOP 82

4.5.2 Linear queue cost

In these set of examples we impose a (weighted) linear queue maintenance cost

χ(q) = Kq, K = {0.1, 0.25, 0.5, 0.75, 1}.

The approximate value function parameters are given in table 4.3. We let J lin be the

χ(q) α k c1 c2

0.1q 0.46 0.61 0.065 −0.280.25q 0.51 0.85 −0.57 −0.370.5q 0.58 1.27 −1.64 −0.380.75q 0.60 1.47 −2.38 −0.39

q 0.61 1.61 −2.98 −0.40

Table 4.2: Approximate value function parameters for K ={0.1, 0.25, 0.5, 0.75, 1}.

overall objective value when χ(q) = Kq. For λ = 1 we get

J lin = {0.0967, 0.0808, 0.0665, 0.0575, 0.0509}.

Figure 4.5 shows the optimal value function for K = 0.5 along with the approximate

function (red dashed) given by V (q) = 1.27q0.58−1.64q−0.38q2. Maximum (positive)

utility Vmax = 0.17 is achieved at q = 0.13 (qmax). Queue lengths greater than q = 0.43

(qzero) contribute negative utility to the overall objective. Figure 4.6 shows sample

trajectories for this K = 0.5. The top and bottom plots show the trajectories for

flow 1 and flow 2. The middle plot shows queue length trajectory, with the dotted

line indicating the maximal utility queue-length qmax = 0.12. We infer a suitable

maximum buffer size of 0.12.

CHAPTER 4. 2-HOP 83

q

V

0 0.31 0.5 0.73 1−0.1

0

0.1

0.2

0.32

Figure 4.3: Comparing optimal value function (blue) with approximate valuefunction (red, dashed). χ(q) = 0.5q2. Vmax = 0.32, qmax = 0.31, qzero = 0.73.

CHAPTER 4. 2-HOP 84

t

queue

flow

1flow

2

10 20 30 40 50 60 70 80 90 100

10 20 30 40 50 60 70 80 90 100

10 20 30 40 50 60 70 80 90 100

0

0.1

00.10.20.3

0

0.1

0.2

Figure 4.4: Top. Sample flow 1 trajectory. Middle. Sample queue lengthtrajectory. Bottom. Sample flow 2 trajectory. χ(q) = 0.5q2.

CHAPTER 4. 2-HOP 85

q

V

0 0.13 0.43 0.6 1−0.05

0

0.05

0.1

0.17

Figure 4.5: Comparing optimal value function (blue) with approximate valuefunction (red, dashed). χ(q) = 0.5q2. Vmax = 0.17, qmax = 0.13, qzero = 0.43.

CHAPTER 4. 2-HOP 86

t

queue

flow

1flow

2

10 20 30 40 50 60 70 80 90 100

10 20 30 40 50 60 70 80 90 100

10 20 30 40 50 60 70 8090

100

0

0.1

0

0.1

0.2 0.3

Figure 4.6: Top. Sample flow 1 trajectory. Middle. Sample queue lengthtrajectory. Bottom. Sample flow 2 trajectory. χ(q) = 0.5q2.

CHAPTER 4. 2-HOP 87

J

K

0.1 0.25 0.5 0.75 10.05

0.075

0.1

0.125

Figure 4.7: Comparing objective versus weighted queue maintenance cost.Quadratic queue maintenance cost (red), linear queue maintenance cost (blue).

Objective versus K. Figure 4.7 shows the overall objective value versus the quadratic

(red) and linear (blue) queue maintenance costs. As we expect, objective value J de-

creases as the weight K increases. In these examples (0 ≤ q ≤ 1), linear costs result

in higher penalty.

CHAPTER 4. 2-HOP 88

K J sqr J lin

0.1 0.110 0.09670.25 0.104 0.08080.5 0.0986 0.06650.75 0.0951 0.0575

1 0.0924 0.0509

Table 4.3: J sqr versus J lin for K = {0.1, 0.25, 0.5, 0.75, 1}.

CHAPTER 4. 2-HOP 89

4.6 Queue metrics

In this section we define some simple performance metrics for our 2-hop flow rate

allocation problem analogous to performance metrics of a G/G/n queueing system.

We are interested in the average flow rate allocated across links 1 and 2 (average

arrival/service rate), average queue-length (average buffer size), and the associated

system delays.

Average arrival rate A. We let

A = limN→∞

E1

N

N−1∑

t=0

n∑

j=1

(ft)1j,

denote the average arrival rate, i.e., the average flow rate on link 1.

Average service rate S. We let

S = limN→∞

E1

N

N−1∑

t=0

n∑

j=1

(ft)2j,

denote the average service rate, i.e., the average flow rate on link 2.

Average queue-length Q. We let

Q = limN→∞

E1

N

N−1∑

t=0

n∑

j=1

(qt)j ,

denote the average queue-length. Using these definitions we can derive other perfor-

mance measures of interest.

We let the average queue-length Q be the average capacity (roughly) of link 2.

CHAPTER 4. 2-HOP 90

The average queueing delay D is given by

D = 1/(Q− S).

Figure 4.8 shows two curves representing the average queueing delay D versus the

average service rate S for quadratic and linear costs. It shows that decrease in service

rate corresponds to an increase in queueing delay. This is not surprising because

a decreased service rate means that packets are getting delivered less quickly (on

average) to the receiver. Figure 4.9 shows the average service rate versus objective

J . Here as we expect, an increase in the average service rate corresponds to higher

average utility and therefore higher objective value as is shown for both cost functions.

Figures 4.10 and 4.11 show the average queue-lengths for the quadratic (red, dashed)

and linear costs (blue) versus average arrival and service rates respectively. As we

would expect, higher average queue-lengths correspond to higher average arrival rates.

Perhaps not as intuitive we find that higher queue-lengths correspond higher service

rates. We showed using numerical examples in the previous section that (higher)

queue-lengths up to qmax are better.

CHAPTER 4. 2-HOP 91

Ave

rage

serv

ice

rate

Average queueing delay5 20 35 50 65 80 95

0.01

0.02

0.03

0.04

0.05

Figure 4.8: Average service rate versus average queueing delay. Quadratic costs(red, dashed), linear costs (blue).

CHAPTER 4. 2-HOP 92

Ave

rage

serv

ice

rate

J

0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.120.01

0.02

0.03

0.04

0.05

Figure 4.9: Average service rate versus overall objective J . Quadratic costs(red, dashed), linear costs (blue).

CHAPTER 4. 2-HOP 93

Ave

rage

arri

valra

te

Average queue-length0.02 0.05 0.1 0.15 0.2 0.25

0.01

0.02

0.03

0.04

0.05

Figure 4.10: Average arrival rate versus average queue-length. Quadratic costs(red, dashed), linear costs (blue).

Ave

rage

serv

ice

rate

Average queue-length0.02 0.05 0.1 0.15 0.2 0.25

0.01

0.02

0.03

0.04

0.05

Figure 4.11: Average service rate versus average queue-length. Quadratic costs(red, dashed), linear costs (blue).

CHAPTER 4. 2-HOP 94

χ(q) J sqr Q A S D

0.1q2 0.1103 0.25 0.055 0.052 5.050.25q2 0.1042 0.17 0.048 0.045 8.000.5q2 0.0986 0.12 0.036 0.036 11.900.75q2 0.0951 0.10 0.033 0.033 14.93

q2 0.0924 0.09 0.030 0.030 16.67

Table 4.4: Performance metrics for quadratic queue costs.

χ(q) J lin Q A S D

0.1q 0.0967 0.17 0.050 0.047 8.130.25q 0.0808 0.08 0.032 0.030 20.000.5q 0.0665 0.047 0.022 0.021 38.460.75q 0.0575 0.033 0.016 0.016 58.82

q 0.0509 0.024 0.013 0.013 90.91

Table 4.5: Performance metrics for linear costs.

Tables 4.4 and 4.5 summarize these metrics for quadratic and linear queue costs

respectively. We observe that Q <∞ for each of the examples, which is a necessary

condition for system stability.

4.7 Multiple flows

Similar to §3.5.1 we can construct an approximate policy for the case of multiple flows

as follows:

ϕ⋆(q, g) = argmaxF(q)

(

−m∑

i=1

φi(fi, gi) + U(fm) + V (q + Af)

)

,

where

V = V s1 + · · ·+ V s

l ,

CHAPTER 4. 2-HOP 95

is an approximate value function. Note that V sk : R → R is the optimal value function

from the associated single-hop case.

4.8 Conclusion

In this chapter we extended the flow rate control problem to the 2-link case where

we assign flow rates to optimally tradeoff average utility and average power and

queue maintenance costs. In the case of a single flow, we showed how maximum

(positive) utility and the associated average queue-length, vary with different utility

and transmit and queue cost functions. We claim that this maximal average queue-

length is a good way to size flow rate buffers depending on network traffic. We define

and derive performance metrics to distinguish single flow controllers, and give the

associated approximate controller for the case of multiple flows.

Bibliography

[AB10] E. Akuiyibo and S. Boyd. Adaptive modulation with smoothed flow

utility. EURASIP J. on Wireless Communications and Networking, 2010,

2010.

[ABFG+93] A. Arapostathis, V. Borkarand, E. Fernández-Gaucherand, M. K. Ghosh,

and S. I. Marcus. Discrete-time controlled Markov processes with aver-

age cost criterion: A survey. SIAM J. on Control and Optimization,

31(2):282–344, 1993.

[ABO10] E. Akuiyibo, S. Boyd, and D. O’Neill. Adaptive modulation in wireless

networks with smoothed flow utility. In Proceedings of IEEE Globecom,

pages 1–5, 2010.

[AM90] B. Anderson and J. Moore. Optimal Control - Linear Quadratic Methods.

Prentice-Hall, 1990.

[Åst70] K. Åström. Introduction to Stochastic Control Theory. Dover, 1970.

[AST01] M. Akian, A. Sulem, and M. Taksar. Dynamic optimization of long-term

growth rate for a portfolio with transaction costs and logarithmic utility.

Mathematical Finance, 11(2):152–188, 2001.

96

BIBLIOGRAPHY 97

[BD62] R. Bellman and S. Dreyfus. Applied Dynamic Programming. Princeton

University Press, 1962.

[Bel57] R. Bellman. Dynamic Programming. Courier Dover Publications, 1957.

[Bem06] A. Bemporad. Model predictive control design: New trends and tools.

In Proceedings of 45th IEEE Conference on Decision and Control, pages

6678–6683, 2006.

[Ber98] D. Bertsekas. A new value iteration method for the average cost dynamic

programming problem. SIAM J. on Control and Optimization, 36:742–

759, 1998.

[Ber05] D. Bertsekas. Dynamic Programming and Optimal Control: Volume 1.

Athena Scientific, 2005.

[Ber07] D. Bertsekas. Dynamic Programming and Optimal Control: Volume 2.

Athena Scientific, 2007.

[BGGN04] A. Ben-Tal, A. Goryashko, E. Guslitzer, and A. Nemirovski. Adjustable

robust solutions of uncertain linear programs. Mathematical Program-

ming, 99:351–376, 2004.

[BGW90] R. Bitmead, M. Gevers, and V. Wertz. Adaptive Optimal Control: The

Thinking Man’s GPC. Prentice-Hall, 1990.

[BK97] I. Buckley and R. Korn. Optimal Cash Management for Equity Index

Tracking in the Presence of Fixed and Proportional Transaction Costs.

Johannes-Gutenberg-Univ., 1997.

[BL97] J. Birge and F. Louveaux. Introduction to Stochastic Programming.

Springer, 1997.

BIBLIOGRAPHY 98

[BL00] J. Borwein and A. Lewis. Convex Analysis and Nonlinear Optimization.

Springer, 2000.

[Bla62] D. Blackwell. Discrete dynamic programming. Ann. Math. Statist.,

33:719–726, 1962.

[BN98] A. Ben-Tal and A. Nemirovski. Robust convex optimization. Mathemat-

ics of Operations Research, 23(4):769–805, 1998.

[BN07] A. Ben-Tal and A. Nemirovski. Selected topics in robust convex opti-

mization. Mathematical Programming: Series A and B, 112(1):125–158,

2007.

[BNO03] D. Bertsekas, A. Nedić, and A. Ozdaglar. Convex Analysis and Opti-

mization. Athena Scientific, 2003.

[BS96] D. Bertsekas and S. Shreve. Stochastic optimal control: The discrete-time

case. Athena Scientific, 1996.

[BSS10] D. Brown, J. Smith, and P. Sun. Information relaxations and duality in

stochastic dynamic programs. Operations Research, 2010. To appear.

[BT96] D. Bertsekas and J. Tsitsiklis. Neuro-Dynamic Programming. Athena

Scientific, 1996.

[BTN01] A. Ben-Tal and A. Nemirovski. Lectures on Modern Convex Optimiza-

tion. Analysis, Algorithms, and Engineering Applications. Society for

Industrial and Applied Mathematics, 2001.

[BV04] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge Uni-

versity Press, 2004.

BIBLIOGRAPHY 99

[Cav72] J. Cavers. Variable-rate transmission for Rayleigh fading channels. IEEE

Transactions on Communication Technology, 20(1):15–22, 1972.

[CC05a] G. Calafiore and M. Campi. On two-stage portfolio allocation problems

with affine recourse. In Proceedings of the 44th IEEE Conference on

Decision and Control, and the European Control Conference 2005, pages

8042–8047, 2005.

[CC05b] G. Calafiore and M. Campi. Uncertain convex programs: Randomized

solutions and confidence levels. Mathematical Programming, 102(1):25–

46, 2005.

[CD07] G. Calafiore and F. Dabbene. Probabilistic robust control. In Proceeding

of the 2007 American Control Conference, pages 147–158, 2007.

[CGH+96] R. Corless, G. Gonnet, D. Hare, D. Jeffrey, and D. Knuth. On the

Lambert W function. Advances in Computational Mathematics, 5:329–

359, 1996.

[CK96] J. Cvitanić and I. Karatzas. Hedging and portfolio optimization under

transaction costs: A martingale approach. Mathematical Finance, 6:133–

166, 1996.

[CL88] M. Corless and G. Leitmann. Controller design for uncertain system via

Lyapunov functions. In Proceedings of the American Control Conference,

volume 3, pages 2019–2025, 1988.

[Cla95] K. Clarkson. Las vegas algorithms for linear and integer programming

when the dimension is small. J. ACM, 42(2):25–46, 1995.

[CLCD07] M. Chiang, S. Low, A. Calderbank, and J. Doyle. Distributed rate allo-

cation for inelastic flows. Proceedings of the IEEE, 95:255–312, 2007.

BIBLIOGRAPHY 100

[Con79] G. Constantinides. Multiperiod consumption and investment behavior

with convex transaction costs. Management Science, 25:1127–1137, 1979.

[CS05] M. Casey and S. Sen. The scenario generation algorithm for multistage

stochastic linear programming. Mathematics of Operations Research,

30(3):615–631, 2005.

[Cut83] C. Cutler. Dynamic Matrix Control: An Optimal Mutlivariable Control

Algorithm with Constraints. PhD thesis, University of Houston, 1983.

[CXH+10] J. Chen, W. Xu, S. He, Y. Sun, P. Thulasiraman, and X. Shen. Utility-

based asynchronous flow control algorithm for wireless sensor networks.

IEEE Journal on Selected Areas in Communications, 28(7):1116–1126,

2010.

[DDB95] M. Dahleh and I. Diaz-Bobillo. Control of Uncertain Systems: A Linear

Programming approach. Prentice-Hall, 1995.

[Dem95] R. Dembo. Optimal portfolio replication. Algorithmics Technical paper

series, 9501, 1995.

[Den82] E. Denardo. Dynamic Programming: Models and Applications. Prentice-

Hall, 1982.

[Der66] C. Derman. Denumerable state Markov decision processes — average

cost criterion. Ann. Math. Statist., 37:1545–1553, 1966.

[Der70] C. Derman. Finite state Markovian decision processes. Academic Press,

1970.

[dFR03] D. de Farias and B. Van Roy. The linear programming approach to

approximate dynamic programming. Operations Research, 51(6):850–

865, 2003.

BIBLIOGRAPHY 101

[dFR04] D. de Farias and B. Van Roy. On constraint sampling in the linear pro-

gramming approach to approximate dynamic programming. Operations

Research, 29(3):462–478, 2004.

[DI93] G. Dantzig and G. Infanger. Multi-stage stochastic linear programs for

portfolio optimization. Annals of Operations Research, 45:59–76, 1993.

[DL81] C. Desoer and R. Liu. Global parametrization of feedback systems. Deci-

sion and Control including the Symposium on Adaptive Processes, 1981

20th IEEE Conference on, 20:859–861, 1981.

[DL91] B. Dumas and E. Luciano. An exact solution to a dynamic portfo-

lio choice problem under transaction costs. The Journal of Finance,

46(2):577–595, 1991.

[DN90] M. Davis and A. Norman. Portfolio selection with transaction costs.

Mathematics of Operations Research, 15(4):676–713, 1990.

[DP00] G. Dullerud and F. Paganini. A Course in Robust Control Theory: a

Convex Approach. Springer-Verlag, New York, 2000.

[DR00] R. Dembo and D. Rosen. The practice of portfolio replication. Algo

Research Quarterly, 3(2):11–22, 2000.

[ED98] N. Elia and M. Dahleh. Computational Methods for Controller Design.

Springer-Verlag, 1998.

[ES07] A. Eryilmaz and R. Srikant. Fair resource allocation in wireless networks

using queue-length-based scheduling and congestion control. IEEE/ACM

Transactions on Networking, 15(6):1333–1344, 2007.

BIBLIOGRAPHY 102

[EW88] Y. Ermoliev and R. Wets. Numerical Techniques for stochastic optimiza-

tion. Springer-Verlag, 1988.

[FP96] R. Freeman and J. Primbs. Control Lyapunov functions: New ideas from

an old source. In Proceedings of the 35th IEEE Conference on Decision

and Control, pages 3926–3931, Kobe, Japan, 1996.

[FPEN94] G. Franklin, J. Powell, and A. Emami-Naeini. Feedback Control of Dy-

namic Systems. Addison-Wesley, 1994.

[FPW80] G. Franklin, J. Powell, and M. Workman. Digital Control of Dynamic

Systems. Addison-Wesley, 1980.

[FvdS09] F. Fu and M. van der Schaar. A systematic framework for dynamically

optimizing multi-user wireless video transmission. Technical report, Me-

dia Lab, University of California, Los Angeles, 2009.

[GBY06] M. Grant, S. Boyd, and Y. Ye. CVX: Matlab software for disci-

plined convex programming, 2006. Available at http://www. stanford.

edu/ boyd/cvx.

[GC97] A. Goldsmith and S. Chua. Variable rate variable power MQAM for

fading channels. IEEE Transactions on Communications, 45(10):1218–

1230, 1997.

[GC98] A. Goldsmith and S. Chua. Adaptive coded modulation for fading chan-

nels. IEEE Transactions on Communications, 46(5):595–602, 1998.

[GK07] P. Goulart and E. Kerrigan. Output feedback receding horizon control of

constrained systems. International Journal of Control, 80(1):8–20, 2007.

BIBLIOGRAPHY 103

[GK08] P. Goulart and E. Kerrigan. Input-to-state stability of robust receding

horizon control with an expected value cost. Automatica, 44(4):1171–

1174, 2008.

[GKM06] P. Goulart, E. Kerrigan, and J. Maciejowski. Optimization over state

feedback policies for robust control with constraints. Automatica,

42(4):523–533, 2006.

[GNT06] L. Georgiadis, M. Neely, and L. Tassiulas. Resource allocation and cross-

layer control in wireless networks. Foundations and Trends in Network-

ing, 1(1):1–144, 2006.

[Hay68] J. Hayes. Adaptive feedback communications. IEEE Transactions on

Communication Technology, 16(1):29–34, 1968.

[HBVL97] A. Hansson, S. Boyd, L. Vandenberghe, and M. Lobo.

Optimal linear static control with moment and yield

objectives. Unpublished manuscript. Available at

http://www.stanford.edu/~boyd/papers/yield.html, 1997.

[Hen74] V. Hentinen. Error performance for adaptive transmission on fad-

ing channels. IEEE Transactions on Communication Technology,

22(9):1331–1337, 1974.

[HLCK08] J. Huang, Z. Li, M. Chiang, and A. Katsaggelos. Joint source adaptation

and resource allocation for multi-user wireless video streaming. IEEE

Transactions on Circuits and Systems for Video Technology, 18(5):582–

595, 2008.

http://www.stanford.edu/~boyd/papers/yield.html

BIBLIOGRAPHY 104

[HWL00] D. Goodman H. Wang, N. Mandayam and P. Ligdas. Dynamic utility

maximization for mobile data communications under delay constraints.

In Proceedings of IEEE VTC, 2000.

[JCOB02] D. Julian, M. Chiang, D. O’Neill, and S. Boyd. Qos and fairness con-

strained convex optimization of resource allocation for wireless cellular

and ad hoc networks. In Proceedings of INFOCOM, pages 477–486, 2002.

[KH05] W. Kwon and S. Han. Receding Horizon Control. Springer-Verlag, 2005.

[KM05] P. Kall and J. Mayer. Stochastic Linear Programming: Models, Theory,

and Computation. Springer, 2005.

[KMT97] F. Kelly, A. Maulloo, and D. Tan. Rate control for communication net-

works: Shadow prices, proportional fairness and stability. Journal of the

Operational Research Society, 49:237–252, 1997.

[KV86] P. Kumar and P. Varaiya. Stochastic systems: Estimation, identification

and adaptive control. Prentice-Hall, 1986.

[KW04] H. Kusher and P. Whiting. Convergence of proportional-fair sharing

algorithms under general conditions. IEEE Transactions on Wireless

Communications, 3(4):1250–1259, 2004.

[L03a] J. Löfberg. Approximations of closed-loop minimax MPC. In Proceedings

of 42nd IEEE Conference on Decision and Control, 2003., volume 2,

pages 1438–1442, 2003.

[L03b] J. Löfberg. Minimax approaches to robust model predictive control. PhD

thesis, Linköping University, 2003.

BIBLIOGRAPHY 105

[LFB07] M. Lobo, M. Fazel, and S. Boyd. Portfolio optimization with linear and

fixed transaction costs. Annals of Operations Research, 152(1):376–394,

2007.

[Liu05] H. Liu. Optimal consumption and investment with transaction costs and

multiple risky assets. The Journal of Finance, 59(1):289–338, 2005.

[LL99] S. Low and D. Lapsley. Optimization flow control I: Basic algorithms

and convergence. IEEE/ACM Transactions on Networking, 7(6):861–

874, 1999.

[LL02] H. Liu and M. Loewenstein. Optimal portfolio selection with transaction

costs and finite horizons. Review of Financial Studies, 15(3):805–835,

2002.

[Loa78] C. Van Loan. Nineteen dubious ways to compute the matrix exponential.

SIAM Review, 20(4):801–836, 1978.

[LR07] X. Lin and S. Rasool. A distributed joint channel-assignment, scheduling

and routing algorithm for multi-channel ad hoc wireless networks. In

Proceedings of the IEEE Infocom, Anchorage, 2007.

[Man60] A. Manne. Linear programming and sequential decisions. Management

Science, 6(3):259–267, 1960.

[Mar03] P. Martí Colom. Analysis and Design of Realtime Control Systems with

Varying Control Timing Constraints. PhD thesis, Universitat Politècnica

de Catalunya, June 2003.

[MB08] J. Mattingley and S. Boyd. CVXMOD - Convex optimization software

in python, 2008. Available at http://cvxmod.net.

BIBLIOGRAPHY 106

[MBL10] R. Madan, S. Boyd, and S. Lall. Fast algorithms for resource allocation

in wireless cellular networks. IEEE Transactions on Networking, 2010.

To appear.

[MC76] M. Magill and G. Constantinides. Portfolio selection with transaction

costs. J. Econ. Theory, 13:245–263, 1976.

[Mey06] S. Meyn. Control Techniques for Complex Networks. Cambridge Univer-

sity Press, 2006.

[MRRS00] D. Q. Mayne, J. B. Rawlings, C. V. Rao, and P. O. M. Scokaert. Con-

strained model predictive control: Stability and optimality. Automatica,

36(6):789–814, June 2000.

[NBW98] J. Nilsson, B. Bernhardsson, and B. Wittenmark. Some topics in real-

time control. In Proceedings of the American Control Conference, vol-

ume 4, pages 2386–2390, 1998.

[NML08] M. Neely, E. Modiano, and C. Li. Fairness and optimal stochastic control

for heterogeneous networks. IEEE/ACM Transactions on Networking,

16(2):396–409, 2008.

[NMR05] M. Neely, E. Modiano, and C. Rohrs. Dynamic power allocation and

routing for time-varying wireless networks. IEEE/JSAC, 23(1):89–103,

2005.

[OABG10] D. O’Neill, E. Akuiyibo, S. Boyd, and A. Goldsmith. Optimizing adap-

tive modulation in wireless networks via multi-period network utility

maximization. In Proceedings of International Conference on Communi-

cations, 2010.

BIBLIOGRAPHY 107

[OGB08a] D. O’Neill, A. Goldsmith, and S. Boyd. Cross-layer design with adaptive

modulation: Delay, rate, energy tradeoffs. In Proceedings of Globecom,

2008.

[OGB08b] D. O’Neill, A. Goldsmith, and S. Boyd. Optimizing adaptive modu-

lation in wireless networks via utility maximization. In Proceedings of

International Conference on Communications, 2008.

[PC06] D. Palomar and M. Chiang. Alternative decompositions for distributed

maximization of network utility: framework and applications. In Pro-

ceedings of the IEEE Infocom, Barcelona, 2006.

[Pow07] W. Powell. Approximate dynamic programming: Solving the curses of

dimensionality. John Wiley & Sons, 2007.

[PPR04] S. Prajna, P. Parrilo, and A. Rantzer. Nonlinear control synthesis by con-

vex optimization. IEEE Transactions on Automatic Control, 49(2):310–

314, 2004.

[PPSV+02] L. Palopoli, C. Pinello, A. Sangiovani-Vincentelli, L. El Ghaoui, and

A. Bicchi. Synthesis of robust control systems under resource constraints.

In Hybrid Systems: Computation and Control (HSCC 2002), pages 337–

350, 2002.

[Pre95] A. Prekopa. Stochastic Programming. Kluwer Academic Publishers,

1995.

[Pri07a] J. Primbs. Dynamic hedging of basket options under proportional trans-

action costs using receding horizon control. Submitted, 2007.

BIBLIOGRAPHY 108

[Pri07b] J. Primbs. Dynamic hedging of basket options under proportional trans-

action costs using receding horizon control. Manuscript. Available at

http://www.stanford.edu/~japrimbs/PapersNew.htm, 2007.

[PS07] J. Primbs and C. Sung. Stochastic receding horizon control of constrained

linear systems with state and control multiplicative noise. Submitted to

IEEE Transactions on Automatic Control. Available at

http://www.stanford.edu/~japrimbs/PapersNew.htm, 2007.

[Put94] M. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic

Programming. John Wiley & Sons, 1994.

[QC99] X. Qiu and K. Chawla. On the performance of adaptive modulation in

cellular systems. IEEE Transactions on Communications, 47(6):884–895,

1999.

[RM51] H. Robbins and S. Monro. A stochastic approximation method. Annals

of Mathematical Statistics, 22:400–407, 1951.

[Ros83a] S. Ross. Introduction to Stochastic Dynamic Programming: Probability

and Mathematical. Academic Press, 1983.

[Ros83b] S. Ross. Introduction to Stochastic Dynamic Programming: Probability

and Mathematical. Academic Press, Inc. Orlando, FL, USA, 1983.

[Sal86] S. Salcudean. Algorithms for Optimal Design of Feedback Compensators.

PhD thesis, University of California, Berkeley, 1986.

[Sam69] P. Samuelson. Lifetime portfolio selection by dynamic stochastic pro-

gramming. Review of Economics and Statistics, 51:239–246, 1969.

http://www.stanford.edu/~japrimbs/PapersNew.htm

http://www.stanford.edu/~japrimbs/PapersNew.htm

BIBLIOGRAPHY 109

[SB10a] J. Skaf and S. Boyd. Controller coefficient truncation using Lyapunov

performance certificate. International Journal of Robust and Nonlinear

Control, pages 4699–4706, 2010.

[SB10b] J. Skaf and S. Boyd. Design of affine controllers via convex optimization.

IEEE Transactions on Automatic Control, 55(11):2476–2487, 2010.

[Sen96] L. I. Sennott. The convergence of value iteration in average cost Markov

decision chains. Operations research letters, 19:11–16, 1996.

[SL05] G. Song and Y. Li. Utility-based resource allocation and scheduling

in ofdm-based wireless broadband networks. IEEE Communications

Maagazine, 43(12):127–134, 2005.

[Son83] E. D. Sontag. A Lyapunov-like characterization of asymptotic controlla-

bility. SIAM Journal on Control and Optimization, 21(3):462–471, 1983.

[SR08] A. Shapiro and A. Ruszczyński. Lectures on stochastic programming.

Manuscript., 2008. Available at

http://www2.isye.gatech.edu/~ashapiro/publications/html.

[SS85] P. Schweitzer and A. Seidmann. Generalized polynomial approximations

in Markovian decision processes. Journal of Mathematical Analysis and

Applications, 110:568–582, 1985.

[SS05] C. Swamy and D. Shmoys. Sampling-based approximation algorithms

for multi-stage stochastic optimization. In Proceedings of the 2005 46th

Annual IEEE Symposium on Foundations of Computer Science, pages

357–366, 2005.

[SS07] S. Shakkottai and R. Srikant. Network optimization and control. Foun-

dations and Trends in Networking, 2(3):271–379, 2007.

http://www2.isye.gatech.edu/~ashapiro/publications/html

BIBLIOGRAPHY 110

[SSC03] M. Sznaier, R. Suarez, and J. Cloutier. Suboptimal control of constrained

nonlinear systems via receding horizon constrained control Lyapunov

functions. International Journal on Robust and Nonlinear Control, 13(3-

4):247–259, 2003.

[Ste86] R. Stengel. Stochastic Optimal Control. John Wiley & Sons, Inc., 1986.

[Sto05] A. Stolyar. Maximizing queueing network utility subject to stability:

Greedy primal-dual algorithm. Queueing Systems, 50:401–457, 2005.

[Sun06] C. Sung. Applications of Modern Control Theory in Portfolio Optimiza-

tion. PhD thesis, Stanford University, 2006.

[TCE08] U. Topcu, G. Calafiore, and L. El Ghaoui. Multistage investments with

recourse: A single-asset case with transaction costs. Available from the

author. Submitted to Conference on Decision and Control (CDC08),

2008.

[TE92] L. Tassiulas and A. Ephremides. Stability properties of constrained

queueing systems and scheduling for maximum throughput in multihop

radio networks. IEEE Transactions on Automatic Control, 37(12):1936–

1948, 1992.

[TR05] K. Talluri and G. Van Ryzin. The Theory and Practice of Revenue

Management. Springer, 2005.

[Tuy98] H. Tuy. Convex Analysis and Global Optimization, volume 22 of Non-

convex Optimization and Its Applications. Kluwer, 1998.

[TZ93] M. Trick and S. Zin. A linear programming approach to solving stochastic

dynamic programs. Technical report, Carnegie Mellon University, 1993.

BIBLIOGRAPHY 111

[TZ97] M. Trick and S. Zin. Spline approximations to value functions: A linear

programming approach. Macroeconomics Dynamics, 1, 1997.

[TZB08] N. Trichakis, A. Zymnis, and S. Boyd. Dynamic network utility maxi-

mization with delivery contracts. In Proceedings IFAC World Congress,

pages 2907–2912, 2008.

[VB96] L. Vandenberghe and S. Boyd. Semidefinite programming. SIAM Review,

38(1):45–95, 1996.

[vB02a] D. van Hessem and O. Bosgra. Closed-loop stochastic dynamic process

optimization under input and state constraints. In Proceedings of the

2002 American Control Conference, Anchorage, AK, 2002.

[vB02b] D. van Hessem and O. Bosgra. A conic reformulation of model pre-

dictive control including bounded and stochastic disturbances and input

constraints. In Proceedings of the 2002 Conference on Decision and Con-

trol, Las Vegas, Nevada USA, 2002.

[vB03] D. van Hessem and O. Bosgra. A full solution to the constrained stochas-

tic closed-loop MPC problem via state and innovations feedback and its

receding horizon implementation. In Proceedings of the 2003 Conference

on Decision and Control, Maui, Hawaii, USA, 2003.

[vB04] D. van Hessem and O. Bosgra. Closed-loop stochastic model predictive

control in a receding horizon implementation on a continuous polymer-

ization reactor example. In Proceedings of the 2004 American Control

Conference, Boston, MA, 2004.

BIBLIOGRAPHY 112

[VBW98] L. Vandenberghe, S. Boyd, and S. Wu. Determinant maximization with

linear matrix inequality constraints. SIAM Journal on Matrix Analysis

and Applications, 19(2):499–533, 1998.

[Vid85] M. Vidyasagar. Control System Synthesis: A Factorization Approach.

MIT press, 1985.

[VM10] G. Vejarna and J. McNair. Queue-stability-based transmission power

control in wireless multihop networks. In Proceedings of Globecom, 2010.

[VW69] R. Van Slyke and R. Wets. L-shaped linear programs with applications

to optimal control and stochastic linear programming. SIAM Journal of

Applied Mathematics, 17:638–663, 1969.

[WB08] Y. Wang and S. Boyd. Fast model predictive control using online opti-

mization. In Proceedings IFAC World Congress, pages 6974–6997, July

2008.

[WB09] Y. Wang and S. Boyd. Performance bounds for linear stochastic control.

Systems and Control Letters, 58(3):178–182, 2009.

[Whi82] P. Whittle. Optimization Over Time: Dynamic Programming and

Stochastic Control. Wiley, 1982.

[WNT95] B. Wittenmark, J. Nilsson, and M. Torngren. Timing problems in real-

time control systems. In Proceedings of the American Control Confer-

ence, volume 3, pages 2000–2004, 1995.

[WOS07] C. Wang, C. Ong, and M. Sim. Model predictive control using affine dis-

turbance feedback for constrained linear system. 46th IEEE Conference

on Decision and Control, 2007, pages 1275–1280, 2007.

BIBLIOGRAPHY 113

[WS95] W. Webb and R. Steele. Variable rate QAM for mobile radio. IEEE

Transactions on Communications, 43(7):2223–2230, 1995.

[WSW99] B. Wah, Y. Shang, and Z. Wu. Discrete Lagrangian methods for opti-

mizing the design of multiplierless QMF banks. IEEE Transactions on

Circuits and Systems II, 46(9):1179–1191, 1999.

[YBJ76] D. Youla, J. Bongiorno, and H. Jabr. Modern Wiener-Hopf design of

optimal controllers–Part I: The single-input-output case. IEEE Trans-

actions on Automatic Control, 21:3–13, 1976.

[YJB76] D. Youla, H. Jabr, and J. Bongiorno. Modern Wiener-Hopf design of

optimal controllers–Part II: The multivariable case. IEEE Transactions

on Automatic Control, 21:319–338, 1976.

[ZDG96] K. Zhou, J. Doyle, and K. Glover. Robust and Optimal Control. Prentice-

Hall, 1996.

[ZTN08] X. Zhan, M. Tao, and C. Ng. Utility-based wireless resource allocation

for variable rate transmission. IEEE Transactions on Wireless Commu-

nications, 7(9):3292–3296, 2008.

[ZV06] W. Ziemba and R. Vickson. Stochastic Optimization Models in Finance.

World Scientific, 2006.

dynamic optimization with random and …dc800dx8184/thesis...family including my wife heather, my...

Documents