![Page 1: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/1.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
DeepTraffic: Driving Fast through Dense Traffic
with Deep Reinforcement LearningLex Fridman
![Page 2: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/2.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
![Page 3: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/3.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Americans spend 8 billion hours stuck in traffic every year.
![Page 4: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/4.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Goal:
Deep Learning for Everyoneaccessible and fun: seconds to start, eternity* to master
http://cars.mit.eduor search for:
“DeepTraffic”
* estimated time to discover globally optimal solution
![Page 5: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/5.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
To Play: To Win:
Goal:
Deep Learning for Everyone
![Page 6: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/6.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Machine Learning from Human and Machine
Memorization
Understanding
![Page 7: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/7.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
http://cars.mit.edu/deeptesla
![Page 8: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/8.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Naturalistic Driving Data
Teslas instrumented: 18
Hours of data: 6,000+ hours
Distance traveled: 140,000+ miles
Video frames: 2+ billion
Autopilot: ~12%
![Page 9: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/9.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Naturalistic Driving Data
![Page 10: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/10.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
http://cars.mit.edu/deeptesla
![Page 11: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/11.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
• Localization and Mapping:Where am I?
• Scene Understanding:Where/who/what/why of everyone else?
• Movement Planning:How do I get from A to B?
• Driver State:What’s the driver up to?
• Communicate:How to I convey intent to the driver and to the world?
![Page 12: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/12.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Autonomous Driving: A Hierarchical View
Paden B, Čáp M, Yong SZ, Yershov D, Frazzoli E. "A Survey of Motion Planning and Control Techniques for Self-driving Urban Vehicles." IEEE Transactions on Intelligent Vehicles 1.1 (2016): 33-55.
![Page 13: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/13.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Applying Deep Reinforcment Learningto Micro-Traffic Simulation
Reference: http://www.traffic-simulation.de
![Page 14: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/14.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Formulate Driving as Reinforcement Learning Problem
How to formalize and learn driving?
![Page 15: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/15.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Philosophical Motivation for Reinforcement Learning
Takeaway from Supervised Learning:
Neural networks are great at memorization and not (yet) great at reasoning.
Hope for Reinforcement Learning:
Brute-force propagation of outcomes to knowledge about states and actions. This is a kind of brute-force “reasoning”.
![Page 16: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/16.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
(Deep) Reinforcement Learning
• Pros:• Cheap: Very little human annotation is needed.
• Robust: Can learn to act under uncertainty.
• General: Can (seemingly) deal with (huge) raw sensory input.
• Promising: Our current best framework for achieving “intelligence”.
• Cons• Constrained by Formalism: Have to formally define the
state space, the action space, the reward, and the simulated environment.
• Huge Data: Have to be able to simulate (in software or hardware) or have a lot of real-world examples.
![Page 17: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/17.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Agent and Environment
• At each step the agent:• Executes action
• Receives observation (new state)
• Receives reward
• The environment:• Receives action
• Emits observation (new state)
• Emits reward
References: [80]
![Page 18: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/18.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Markov Decision Process
𝑠0, 𝑎0, 𝑟1, 𝑠1, 𝑎1, 𝑟2, … , 𝑠𝑛−1, 𝑎𝑛−1, 𝑟𝑛, 𝑠𝑛
state
action
reward
Terminal state
References: [84]
![Page 19: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/19.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Major Components of an RL Agent
An RL agent may include one or more of these components:
• Policy: agent’s behavior function
• Value function: how good is each state and/or action
• Model: agent’s representation of the environment
𝑠0, 𝑎0, 𝑟1, 𝑠1, 𝑎1, 𝑟2, … , 𝑠𝑛−1, 𝑎𝑛−1, 𝑟𝑛, 𝑠𝑛
state
action
reward
Terminal state
![Page 20: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/20.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Robot in a Room
+1
-1
START
actions: UP, DOWN, LEFT, RIGHT
UP
80% move UP
10% move LEFT
10% move RIGHT
• reward +1 at [4,3], -1 at [4,2]
• reward -0.04 for each step
• what’s the strategy to achieve max reward?
• what if the actions were deterministic?
![Page 21: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/21.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Is this a solution?
+1
-1
• only if actions deterministic• not in this case (actions are stochastic)
• solution/policy• mapping from each state to an action
![Page 22: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/22.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Optimal policy
+1
-1
![Page 23: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/23.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Reward for each step -2
+1
-1
![Page 24: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/24.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Reward for each step: -0.1
+1
-1
![Page 25: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/25.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Reward for each step: -0.04
+1
-1
![Page 26: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/26.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Reward for each step: -0.01
+1
-1
![Page 27: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/27.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Reward for each step: +0.01
+1
-1
![Page 28: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/28.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Value Function
• Future reward 𝑅 = 𝑟1 + 𝑟2 + 𝑟3 + ⋯ + 𝑟𝑛
𝑅𝑡 = 𝑟𝑡 + 𝑟𝑡+1 + 𝑟𝑡+2 + ⋯ + 𝑟𝑛
• Discounted future reward (environment is stochastic)
𝑅𝑡 = 𝑟𝑡 + 𝛾𝑟𝑡+1 + 𝛾2𝑟𝑡+2 + ⋯ + 𝛾𝑛−𝑡𝑟𝑛= 𝑟𝑡 + 𝛾(𝑟𝑡+1 + 𝛾(𝑟𝑡+2 + ⋯))= 𝑟𝑡 + 𝛾𝑅𝑡+1
• A good strategy for an agent would be to always choose an action that maximizes the (discounted) future reward
References: [84]
![Page 29: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/29.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Q-Learning
• State-action value function: Q(s,a)• Expected return when starting in s,
performing a, and following
• Q-Learning: Use any policy to estimate Q that maximizes future reward:• Q directly approximates Q* (Bellman optimality equation)
• Independent of the policy being followed
• Only requirement: keep updating each (s,a) pair
s
a
s’
r
New State Old State Reward
Learning Rate Discount Factor
![Page 30: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/30.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Exploration vs Exploitation
• Key ingredient of Reinforcement Learning
• Deterministic/greedy policy won’t explore all actions• Don’t know anything about the environment at the beginning
• Need to try all actions to find the optimal one
• Maintain exploration
• Use soft policies instead: (s,a)>0 (for all s,a)
• ε-greedy policy• With probability 1-ε perform the optimal/greedy action
• With probability ε perform a random action
• Will keep exploring the environment
• Slowly move it towards greedy policy: ε -> 0
![Page 31: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/31.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Q-Learning: Value Iteration
References: [84]
A1 A2 A3 A4
S1 +1 +2 -1 0
S2 +2 0 +1 -2
S3 -1 +1 0 -2
S4 -2 0 +1 +1
![Page 32: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/32.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Q-Learning: Representation Matters
• In practice, Value Iteration is impractical• Very limited states/actions
• Cannot generalize to unobserved states
• Think about the Breakout game• State: screen pixels
• Image size: 𝟖𝟒 × 𝟖𝟒 (resized)
• Consecutive 4 images
• Grayscale with 256 gray levels
𝟐𝟓𝟔𝟖𝟒×𝟖𝟒×𝟒 rows in the Q-table!
References: [83, 84]
![Page 33: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/33.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Philosophical Motivation for Deep Reinforcement Learning
Takeaway from Supervised Learning:
Neural networks are great at memorization and not (yet) great at reasoning.
Hope for Reinforcement Learning:
Brute-force propagation of outcomes to knowledge about states and actions. This is a kind of brute-force “reasoning”.
Hope for Deep Learning + Reinforcement Learning:
General purpose artificial intelligence through efficient generalizable learning of the optimal thing to do given a formalized set of actions and states (possibly huge).
![Page 34: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/34.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Deep Q-Learning
Use a function (with parameters) to approximate the Q-function
• Linear• Non-linear: Q-Network
References: [83]
![Page 35: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/35.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Deep Q-Network: Atari
Mnih et al. "Playing atari with deep reinforcement learning." 2013.
References: [83]
![Page 36: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/36.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Atari Breakout
References: [85]
After120 Minutes
of Training
After10 Minutesof Training
After240 Minutes
of Training
![Page 37: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/37.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
DQN Results in Atari
References: [83]
![Page 38: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/38.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Deep Q-Network: DeepTraffic
![Page 39: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/39.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Deep Q-Network Training
Given a transition < s, a, r, s’ >, the Q-table update rule in the previous algorithm must be replaced with the following:
• Do a feedforward pass for the current state s to get predicted Q-values for all actions
• Do a feedforward pass for the next state s’ and calculate maximum overall network outputs max a’ Q(s’, a’)
• Set Q-value target for action to r + γmax a’ Q(s’, a’) (use the max calculated in step 2).
• For all other actions, set the Q-value target to the same as originally returned from step 1, making the error 0 for those outputs.
• Update the weights using backpropagation.
References: [83]
![Page 40: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/40.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Philosophical Motivation for Deep Reinforcement Learning
Takeaway from Supervised Learning:
Neural networks are great at memorization and not (yet) great at reasoning.
Hope for Reinforcement Learning:
Brute-force propagation of outcomes to knowledge about states and actions. This is a kind of brute-force “reasoning”.
Hope for Deep Learning + Reinforcement Learning:
General purpose artificial intelligence through efficient generalizable learning of the optimal thing to do given a formalized set of actions and states (possibly huge in size).
![Page 41: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/41.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Driving may need more than SLAM, Perception, and Control
References: (Karaman RRT*)
![Page 42: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/42.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Moravec’s Paradox: The “Easy” Problems are Hard
Soccer is harder than Chess
References: [8, 9]
![Page 43: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/43.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Formulate Driving as a Reinforcement Learning Problem
http://cars.mit.edu/deeptrafficjs
![Page 44: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/44.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
The Road, The Car, The Speed
![Page 45: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/45.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
The Road, The Car, The Speed
• Solo motion planning subtasks• Longitudinal: speed
• Lateral: lane choice
• Vehicular interaction subtasks• Longitudinal: car-following
• Lateral: lane changing
![Page 46: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/46.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
The Road, The Car, The Speed
![Page 47: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/47.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
“Safety System”: Motion and Control are Given
![Page 48: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/48.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Learning the “Behavioral Layer” Task
![Page 49: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/49.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Learning the “Behavioral Layer” Task
![Page 50: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/50.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Action Space
![Page 51: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/51.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Driving / Learning
![Page 52: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/52.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Learning Input
![Page 53: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/53.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Deep RL: Q-Function Learning Parameters
![Page 54: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/54.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Deep RL: Layers
![Page 55: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/55.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Deep RL: Output (Actions)
![Page 56: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/56.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
ConvNetJS: Options
![Page 57: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/57.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Formulate Driving as a Reinforcement Learning Problem
http://cars.mit.edu/deeptrafficjs
![Page 58: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/58.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Slides available at http://cars.mit.edu/gtc
![Page 59: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/59.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
OpenAI Gym: From JS to TensorFlow
1. Formulate DeepTraffic as a reinforcement learning task.
2. Use TensorFlow/Keras/PyTorch to train an agent
![Page 60: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/60.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Formulate DeepTraffic as a Reinforcement Learning Task
![Page 61: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/61.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Adding a Deep Q-Network (with Keras)
Example: https://github.com/matthiasplappert/keras-rl
![Page 62: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/62.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Adding a Deep Q-Network (with Keras)
Example: https://github.com/matthiasplappert/keras-rl
![Page 63: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/63.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
http://cars.mit.edu
DeepTraffic
v1.0: In MIT v1.1: Outside MIT
![Page 64: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/64.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
http://cars.mit.edu
DeepTraffic 2.0
1st place: Titan XP2nd place: GeForce GTX 1080 Ti3rd place: Jetson TX2
![Page 65: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/65.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
http://cars.mit.edu
DeepTraffic
Challenge to GTC Attendees:• Create account on the site and put “GTC” as
how you heard about us.• Make a neural network that travels 70+ mph.
![Page 66: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/66.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Have fun with Deep RL and DeepTraffic!
![Page 67: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable](https://reader034.vdocuments.net/reader034/viewer/2022050408/5f858d9b8b670130a900f6ac/html5/thumbnails/67.jpg)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Have fun with Deep RL and DeepTraffic!
But not too much fun...
Slides available at http://cars.mit.edu/gtc