![Page 1: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/1.jpg)
Online SamplingOnline Sampling
forfor
Markov Decision ProcessesMarkov Decision Processes
Bob Givan
Joint work w/ E. K. P. Chong, H. Chang, G. Wu
Electrical and Computer Engineering
Purdue University
![Page 2: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/2.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 2November 4-9, 2001
Markov Decision Process (MDP) Ingredients:
System state x in state space X Control action a in A(x) Reward R(x,a) State-transition probability P(x,y,a)
Find control policy to maximize objective fun
![Page 3: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/3.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 3November 4-9, 2001
Optimal Policies• Policy – mapping from state and time to actions
Stationary Policy – mapping from state to actions
Goal – a policy maximizing the objective functionVH*(x0) = max Obj [R(x0,a0), …, R(xH-1,aH-1)]
where the “max” is over all policies u = u0,…,uH-1
For large H, a0 independent of H. (w/ergodicity assum.)
Stationary optimal action a0 for H =
via receding horizon control
![Page 4: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/4.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 4November 4-9, 2001
Q Values Fix a large H, focus on finite-horizon reward
Define Q(x,a) = R(x,a) + E[VH-1*(y)] “Utility” of action a at state x. Name: Q-value of action a at state x.
Key identities (Bellman’s equations): VH*(x) = maxa Q(x,a)
0*(x) = argmaxa Q(x,a)
![Page 5: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/5.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 5November 4-9, 2001
Solution Methods Recall:
u0*(x) = argmaxa Q(x,a)
Q(x,a) = R(x,a) + E [VH-1*(y)]
Problem: Q-value depends on optimal policy. State space is extremely large (often continuous)
Two-pronged solution approach: Apply a receding-horizon method Estimate Q-values via simulation/sampling
![Page 6: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/6.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 6November 4-9, 2001
Methods for Q-value EstimationPrevious work by other authors:
Unbiased sampling (exact Q value)[Kearns et al., IJCAI-99]
Policy rollout (lower bound) [Bertsekas & Castanon, 1999]
Our techniques:
Hindsight optimization (upper bound)
Parallel rollout (lower bound)
![Page 7: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/7.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 7November 4-9, 2001
Expectimax Tree for V*
Max
Exp Exp
Max Max HorizonH
k
# states
......
......
............
............
...... ...... ...... ......
...... ...... ............ ...... ...... ......
# actions
(kn )H leaves
n
![Page 8: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/8.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 8November 4-9, 2001
Unbiased SamplingMax
Exp Exp
Max MaxHorizon H
k
# states
......
......
............
............
...... ...... ...... ......
Samplingdepth H s
Samplingwidth C
...... ...... ............ ...... ...... ......
(kC )H s leaves
# actions
(kn )H leaves
n
![Page 9: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/9.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 9November 4-9, 2001
Unbiased Sampling (Cont’d) For a given desired accuracy, how large
should sampling width and depth be?
Answered: Kearns, Mansour, and Ng (1999)
Requires prohibitive sampling width and depth
e.g. C 108, Hs > 60 to distinguish “best” and “worst” policies in our scheduling domain
We evaluate with smaller width and depth
![Page 10: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/10.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 10November 4-9, 2001
How to Look Deeper?Max
Exp Exp
Max Max HorizonH
k
# states
......
......
............
............
...... ...... ...... ......
Tiny Samplingdepth Hs
Tiny Samplingwidth C
...... ...... ............ ...... ...... ......
(kC )Hs leaves
# actions
(kn )H leaves
n
![Page 11: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/11.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 11November 4-9, 2001
Policy Roll-outMax
Exp Exp
Max Max......
......
......
...... ...... ...... ......
......
......
Exp
.... ..Exp Exp Exp
Max MaxMaxMax ......
......
Selected bypolicy u
......
......
Prunedactions
Action selected bypolicy PI(u)
![Page 12: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/12.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 12November 4-9, 2001
Policy Rollout in Equations Write VH
u (y) for the value of following policy u
Recall: Q(x,a) = R(x,a) + E [VH-1*(y)]
= R(x,a) + E [maxu VH-1u(y)]
Given a base policy u, use
R(x,a) + E [VH-1u(y)]
as an lower bound estimate of Q-value.
Resulting policy is PI(u), given infinite sampling
![Page 13: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/13.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 13November 4-9, 2001
Max
Exp......k
# actionsVPI(u) (x )
......
Samplingwidth C' << C H
Vu(X ak )sample
Vu(X ak )sample
Exp
......
(# states) H
Vu (X a1 )sample
Vu (X a1 )sample
Policy Roll-out (cont’d)
![Page 14: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/14.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 14November 4-9, 2001
Parallel Policy Rollout Generalization of policy rollout, due to
[Chang, Givan, and Chong, 2000]
Given a set U of base policies, use
R(x,a) + E [maxu∊U VH-1u(y)]
as an estimate of Q-value
More accurate estimate than policy rollout
Still gives a lower bound to true Q-value
Still gives a policy no worse than any in U
![Page 15: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/15.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 15November 4-9, 2001
Hindsight Optimization – Tree ViewMax
Exp Exp
Max Max......
......
......
......
...... ...... ...... ......
Pull out Exp's
......
............
......
Combine Max's
Exp Exp Exp Exp
Max MaxMaxMax
......
......
![Page 16: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/16.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 16November 4-9, 2001
Hindsight Optimization – Equations Swap Max and Exp in expectimax tree.
Solve each off-line optimization problem
O (kC’ • f(H)) time where f(H) is the offline problem complexity
Jensen’s inequality implies upper bounds
)],((max[)(~ 1
0,..., 10
H
i iiaaH axRExVH
)]}([),({max)( *1
* yVEaxRxV HyaH
V~ *V
![Page 17: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/17.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 17November 4-9, 2001
Hindsight Optimization (Cont’d)
Max
Exp Exp
Max Max
......
......
......
......
......
Samplingwidth C' << C H
......Horizon
H -1
Selecting best action seq. from kH -1 choices:an deterministic/off-line optimization problem
kH-1
k
# actions
(# states) H
![Page 18: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/18.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 18November 4-9, 2001
Application to Example Problems
Apply unbiased sampling, policy rollout, parallel rollout, and hindsight optimization to: Multi-class deadline scheduling Random early dropping Congestion control
![Page 19: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/19.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 19November 4-9, 2001
Basic Approach
Traffic model provides a stochastic description of possible future outcomes
Method
Formulate network decision problems as POMDPs by incorporating traffic model
Solve belief-state MDP online using sampling(choose time-scale to allow for computation time)
![Page 20: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/20.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 20November 4-9, 2001
Domain 1: Deadline Scheduling
Objective: Minimize weighted loss
......
w 1
w 2
w 3
w 7
weights
Multiclass Trafficwith deadlines
Schedulerserved
dropped
![Page 21: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/21.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 21November 4-9, 2001
Domain 2: Random Early Dropping
TrafficSources
1
2
4
3
Server
Objective: Minimize delaywithout sacrificing throughput
![Page 22: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/22.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 22November 4-9, 2001
Domain 3: Congestion Control
ControlDelays
...
G 1 G 2
S 0
S 1
S 2
S 3
High Priority Cross Traffic
Fully ControlledSources
Bottleneck Nodein paths to G 2
d1
d2
d3
...
Objective : optimize delay, throughput, loss, andfairness
![Page 23: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/23.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 23November 4-9, 2001
Traffic Modeling A Hidden Markov Model (HMM) for each source
Note: state is hidden, model is partially observed
3 state example model
.07
.08
.02.01
.02
.06
0
12 .25 2 packets.25 1 packet.50 0 packets
Traffic generationprobabilities
Transitionprobabilities
1 packet .900 packets .10
1 packet .980 packets .02
![Page 24: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/24.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 24November 4-9, 2001
Deadline Scheduling Results
Non-sampling Policies:
EDF: earliest deadline first. Deadline sensitive, class insensitive.
SP: static priority. Deadline insensitive, class sensitive.
CM: current minloss [Givan et al., 2000] Deadline and class sensitive. Minimizes weighted loss for the current packets.
![Page 25: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/25.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 25November 4-9, 2001
Deadline Scheduling Results
Objective: minimize weighted loss
Comparison: Non-sampling policies Unbiased sampling (Kearns et al.) Hindsight optimization Rollout with CM as base policy Parallel rollout
Results due to H. S. Chang
![Page 26: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/26.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 26November 4-9, 2001
Deadline Scheduling Results
![Page 27: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/27.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 27November 4-9, 2001
Deadline Scheduling Results
![Page 28: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/28.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 28November 4-9, 2001
Deadline Scheduling Results
![Page 29: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/29.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 29November 4-9, 2001
Random Early Dropping Results
Objective: minimize delay subject to throughput loss-tolerance
Comparison: Candidate policies: RED and “buffer-k” KMN-sampling Rollout of buffer-k Parallel rollout Hindsight optimization
Results due to H. S. Chang.
![Page 30: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/30.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 30November 4-9, 2001
Random Early Dropping Results
![Page 31: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/31.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 31November 4-9, 2001
Random Early Dropping Results
![Page 32: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/32.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 32November 4-9, 2001
Congestion Control Results
MDP Objective: minimize weighted sum of throughput, delay, and loss-rate
Fairness is hard-wired
Comparisons: PD-k (proportional-derivative with k target queue) Hindsight optimization Rollout of PD-k == parallel rollout
Results due to G. Wu, in progress
![Page 33: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/33.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 33November 4-9, 2001
Congestion Control Results
![Page 34: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/34.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 34November 4-9, 2001
Congestion Control Results
![Page 35: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/35.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 35November 4-9, 2001
Congestion Control Results
![Page 36: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/36.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 36November 4-9, 2001
Congestion Control Results
![Page 37: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/37.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 37November 4-9, 2001
Results Summary Unbiased sampling cannot cope
Parallel rollout wins in 2 domains Not always equal to simple rollout of one base
policy
Hindsight optimization wins in 1 domain
Simple policy rollout – the cheapest method Poor in domain 1 Strong in domain 2 with best base policy – but
how to find this policy? So-so in domain 3 with any base policy
![Page 38: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/38.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 38November 4-9, 2001
Talk Summary
Case study of MDP sampling methods
New methods offering practical improvements Parallel policy rollout Hindsight optimization
Systematic methods for using traffic models to help make network control decisions Feasibility of real-time implementation depends on
problem timescale
![Page 39: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/39.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 39November 4-9, 2001
Ongoing Research
Apply to other control problems (different timescales): Admission/access control QoS routing Link bandwidth allotment Multiclass connection management Problems arising in proxy-services Diagnosis and recovery
![Page 40: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/40.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 40November 4-9, 2001
Ongoing Research (Cont’d)
Alternative traffic models Multi-timescale models Long-range dependent models Closed-loop traffic Fluid models
Learning traffic model online
Adaptation to changing traffic conditions
![Page 41: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/41.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 41November 4-9, 2001
Congestion Control (Cont’d)
![Page 42: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/42.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 42November 4-9, 2001
Congestion Control Results
![Page 43: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/43.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 43November 4-9, 2001
Hindsight Optimization (Cont’d)
TrafficSimulation
HindsightOptimizer
Averaging
Q -valueEstim ate
T rafficT races
H indsight-optim alValues
...
...
StateEstim ate
C andidateAction
Action Selection
Action Evaluator
SelectedAction
![Page 44: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/44.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 44November 4-9, 2001
Policy Rollout (Cont’d)
TrafficSimulation
HindsightOptimizer
Averaging
Q -valueEstim ate
T rafficT races
H indsight-optim alValues
...
...
StateEstim ate
C andidateAction
Action Selection
Action Evaluator
SelectedAction
Base Policy
Policy-performance
![Page 45: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/45.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 45November 4-9, 2001
Receding-horizon Control
For large horizon H, policy is ~ stationary.
At each time, if state is x, then apply action
u*(x) = argmaxa Q(x,a)
= argmaxa R(x,a) + E [VH-1*(y)]
Compute estimate of Q-value at each time.
![Page 46: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/46.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 46November 4-9, 2001
Congestion Control (Cont’d)
![Page 47: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/47.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 47November 4-9, 2001
Domain 3: Congestion Control
High-priority traffic: Open-loop controlled
Low-priority traffic: Closed-loop controlled
Resources: Bandwidth and buffer
Objective: optimize throughput, delay, loss, and fairness
BottleneckNode
High-priority Traffic
Best-effort Traffic
. . .
. . .
![Page 48: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/48.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 48November 4-9, 2001
Congestion Control Results
![Page 49: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/49.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 49November 4-9, 2001
Congestion Control Results
![Page 50: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/50.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 50November 4-9, 2001
Congestion Control Results
![Page 51: Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University](https://reader036.vdocuments.net/reader036/viewer/2022062621/551c1e6b5503469e4f8b59f5/html5/thumbnails/51.jpg)
Bob Givan Electrical and Computer Engineering
Purdue University 51November 4-9, 2001
Congestion Control Results