reinforcement learning for traffic signal controlprashla/prashanth-itsc-plenary.pdfreinforcement...
TRANSCRIPT
Reinforcement Learning for Traffic Signal Control
Prashanth L.A.
Postdoctoral Researcher, INRIA Lille – Team SequeL
work done as a PhD student at Department of Computer Science and Automation, IndianInstitute of Science
October 2014
Prashanth L.A. (INRIA) Reinforcement Learning for Traffic Signal Control October 2014 1 / 14
Traffic signal control using RL
On a good day, the traffic is . . .
Prashanth L.A. (INRIA) Reinforcement Learning for Traffic Signal Control October 2014 2 / 14
Traffic signal control using RL
And on a bad day, it can be . . .
Prashanth L.A. (INRIA) Reinforcement Learning for Traffic Signal Control October 2014 3 / 14
Traffic signal control using RL
Traffic Light Control (TLC)
Aim: Maximize traffic flow(long-term performance criterion)
Input:Coarse congestion estimates
Output:Policy for switching traffic lights
Low Medium High
L1 L2
Prashanth L.A. (INRIA) Reinforcement Learning for Traffic Signal Control October 2014 4 / 14
Traffic signal control using RL
Traffic Light Control (TLC)
Aim: Maximize traffic flow(long-term performance criterion)
Input:Coarse congestion estimates
Output:Policy for switching traffic lights
Low Medium High
L1 L2
Prashanth L.A. (INRIA) Reinforcement Learning for Traffic Signal Control October 2014 4 / 14
Traffic signal control using RL
Traffic Light Control (TLC)
Aim: Maximize traffic flow(long-term performance criterion)
Input:Coarse congestion estimates
Output:Policy for switching traffic lights
Low Medium High
L1 L2
Prashanth L.A. (INRIA) Reinforcement Learning for Traffic Signal Control October 2014 4 / 14
Traffic signal control using RL
Traffic Light Control (TLC)
Aim: Maximize traffic flow(long-term performance criterion)
Input:Coarse congestion estimates
Output:Policy for switching traffic lights
Low Medium High
L1 L2
Prashanth L.A. (INRIA) Reinforcement Learning for Traffic Signal Control October 2014 4 / 14
Traffic signal control using RL
Desirable attributes of TLC algorithm
Dynamic:Adapts to traffic conditions
Model free:Do not assume a system model
Scalable:Easily implementable on large
road networks
Solution:Reinforcement Learning
Prashanth L.A. (INRIA) Reinforcement Learning for Traffic Signal Control October 2014 5 / 14
Traffic signal control using RL
Desirable attributes of TLC algorithm
Dynamic:Adapts to traffic conditions
Model free:Do not assume a system model
Scalable:Easily implementable on large
road networks
Solution:Reinforcement Learning
Prashanth L.A. (INRIA) Reinforcement Learning for Traffic Signal Control October 2014 5 / 14
Traffic signal control using RL
Desirable attributes of TLC algorithm
Dynamic:Adapts to traffic conditions
Model free:Do not assume a system model
Scalable:Easily implementable on large
road networks
Solution:Reinforcement Learning
Prashanth L.A. (INRIA) Reinforcement Learning for Traffic Signal Control October 2014 5 / 14
Traffic signal control using RL
Desirable attributes of TLC algorithm
Dynamic:Adapts to traffic conditions
Model free:Do not assume a system model
Scalable:Easily implementable on large
road networks
Solution:Reinforcement Learning
Prashanth L.A. (INRIA) Reinforcement Learning for Traffic Signal Control October 2014 5 / 14
Traffic signal control using RL
Desirable attributes of TLC algorithm
Dynamic:Adapts to traffic conditions
Model free:Do not assume a system model
Scalable:Easily implementable on large
road networks
Solution:Reinforcement Learning
Prashanth L.A. (INRIA) Reinforcement Learning for Traffic Signal Control October 2014 5 / 14
Traffic signal control using RL
Traffic Signal Control MDP
State. sn = (q1, · · · , qN , t1, · · · , tN)
Actions. an = {feasible sign configurations in state sn}
Cost.
k(sn, an) = r1 ∗(∑
i∈Ip
qi(n) + ti(n)
)+ s1 ∗
(∑i/∈Ip
qi(n) + ti(n)
)
more weightage to main road traffic
Prashanth L.A. (INRIA) Reinforcement Learning for Traffic Signal Control October 2014 6 / 14
Traffic signal control using RL
Qlearning based TLC algorithm
Q-learning
Q(sn+1, an+1) = Q(sn, an) + α(n)(
k(sn, an) + γ mina
Q(sn+1, a)− Q(sn, an)).
Why function approximation?
need look-up table to store Q-value for every (s, a)Computationally expensive
two-junction corridor: 10 signalled lanes, 20 vehicles oneach lane|S× A(S)| ∼ 1014
Situation aggravated when we consider larger road networks
Prashanth L.A. (INRIA) Reinforcement Learning for Traffic Signal Control October 2014 7 / 14
Traffic signal control using RL
Q-learning with Function Approximation
Approximation.
Q(s, a) ≈ θTσs,a
Parameter θ ∈ Rd Feature σs,a ∈ Rd
Note: d << |S× A(S)|
Feature-based analog of Q-learning.
θn+1 = θn + α(n)σsn,an(k(sn, an) + γ minv∈A(sn+1)
θTnσsn+1,v − θT
nσsn,an)
Prashanth L.A. (INRIA) Reinforcement Learning for Traffic Signal Control October 2014 8 / 14
Traffic signal control using RL
Feature Selection
State (sn) Action (an) Feature (σsn,an )
qi(n) < L1 and ti(n) < T1RED 0
GREEN 1
qi(n) < L1 and ti(n) ≥ T1RED 0.2
GREEN 0.8
L1 ≤ qi(n) < L2 and ti(n) < T1RED 0.4
GREEN 0.6
L1 ≤ qi(n) < L2 and ti(n) ≥ T1RED 0.6
GREEN 0.4
qi(n) ≥ L2 and ti(n) < T1RED 0.8
GREEN 0.2
qi(n) ≥ L2 and ti(n) ≥ T1RED 1
GREEN 0
Prashanth L.A. (INRIA) Reinforcement Learning for Traffic Signal Control October 2014 9 / 14
Traffic signal control using RL
Threshold tuning using SPSA
Problem: hard to obtain exact queue lengths in practice
Solution: Use broad congestion estimates based on thresholds
Low Medium High
L1 L2
How to optimize Li’s? Use Simultaneous Perturbation Stochastic Approximation
Combine the optimization procedure with TLC algorithms:
Full state Q-learning algorithm with state aggregationFunction approximation Q-learning TLCPriority based (naive?) scheme
Prashanth L.A. (INRIA) Reinforcement Learning for Traffic Signal Control October 2014 10 / 14
Traffic signal control using RL
Feature Adaptation
Recall the approximation.
Q(s, a) ≈ θTσs,a
Parameter θ ∈ Rd Feature σs,a ∈ Rd
Is to possible to adapt features online to make them optimal?We propose an online feature adaptation algorithm
to find the “optimal” features
Prashanth L.A. (INRIA) Reinforcement Learning for Traffic Signal Control October 2014 11 / 14
For Further Reading
Publications I
Prashanth L. A. and S. Bhatnagar,Reinforcement learning with function approximation for traffic signal controlIEEE Transactions on Intelligent Transportation Systems, 2011.
Prashanth L. A. and S. Bhatnagar,Threshold Tuning using Stochastic Optimization for Graded Signal Control,IEEE Transactions on Vehicular Technology, 2012.
Prashanth L.A. and S.Bhatnagar,Reinforcement Learning with Average Cost for Adaptive Control of Traffic Lights at Intersections,IEEE Conference on Intelligent Transportation Systems, 2011.
S. Bhatnagar, V. Borkar and Prashanth.L.A.,Adaptive Feature Pursuit: Online Adaptation of Features in Reinforcement Learning,Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, by F. Lewisand D. Liu (eds.), IEEE Press Computational Intelligence Series.
S.Bhatnagar, H.L.Prasad and Prashanth.L.A.,Stochastic Recursive Algorithms for Optimization: Simultaneous Perturbation Methods,Lecture Notes in Control and Information Sciences Series, Springer (Accepted), 2012.
Prashanth L.A. (INRIA) Reinforcement Learning for Traffic Signal Control October 2014 12 / 14
For Further Reading
Publications II
S. Bhatnagar and Prashanth L. A.,Simultaneous Perturbation Newton Algorithms for Simulation Optimization,Journal of Optimization Theory and Applications, 2013.
Prashanth L. A. and Mohammad Ghavamzadeh,Actor-Critic Algorithms for Risk-Sensitive MDPsAdvances in Neural Information Processing Systems (NIPS), 2013 (Full oral presentation).
Prashanth L.A. (INRIA) Reinforcement Learning for Traffic Signal Control October 2014 13 / 14