from learning and control to

From Learning and Control to Deep Reinforcement Learning

Benjamin RechtUniversity of California, Berkeley

I took four courses in graduate school that completely revolutionized my thinking:


6.432: Detection and Estimation with Wornell9.520 Statistical Learning Theory with Poggio

6.24x: Complex systems with Megretski 6.253: Convex optimization with Berteskas


4 of these were LIDS courses!




4 of these were LIDS courses!

All are prerequisites for modern RL, but I never took an RL course…



trustable, scalable, predictable

Reinforcement Learning is the study of how to use past data to enhance the future manipulation of a dynamical system

Reinforcement Learning is the study of how to use past data to enhance the future manipulation of a dynamical system

Control Theory 🤔

What is ML?

What is ML?using past data to learn about and/or act upon

the world


the world

Environments too complex


the world


Sensing too complex


the world


Sensing too complex

Models too complex

What is Control?

What is Control?using feedback to mitigate the effects of

dynamic uncertainty


dynamic uncertainty

Environments are uncertain


dynamic uncertainty


Sensing/componentsare uncertain


dynamic uncertainty


Sensing/componentsare uncertain

Models are uncertain

How do we get the best of both?


Dynamics& ControlDetailed Models



Machine Learning

& Big Data

Optimization



Machine Learning

& Big Data

RL Methods Gxt

ux

e

minimize Ee

hPTt=1 Ct(xt, ut)

i

s.t. xt+1 = ft(xt, ut, et)ut = ⇡t(⌧t)

<latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit>

How to solve optimal control when the model f is unknown?

RL Methods Gxt

ux

e

model-based

• Model-based: fit model from data (aka, standard engineering practice)

minimize Ee

hPTt=1 Ct(xt, ut)

i




RL Methods Gxt

ux

e

model-based

• Model-based: fit model from data (aka, standard engineering practice)• Model-free

minimize Ee

hPTt=1 Ct(xt, ut)

i




RL Methods Gxt

ux

e

approximate dynamic programming

model-based


- Approximate dynamic programming: estimate cost from data

minimize Ee

hPTt=1 Ct(xt, ut)

i




RL Methods Gxt

ux

e

approximate dynamic programming

model-based


- Approximate dynamic programming: estimate cost from data- Direct policy search: search for actions from data

direct policy search

minimize Ee

hPTt=1 Ct(xt, ut)

i




Deep Reinforcement Learning


• Simply parameterize value function or policy as a deep net


• Simply parameterize value function or policy as a deep net• All of the ideas have been here since NDP!


• Simply parameterize value function or policy as a deep net• All of the ideas have been here since NDP!• Most of these algorithms don’t really “work.”

What is ML good for in control?

What is ML good for in control?• Fundamentally, almost all machine learning successes are

in nonparametric prediction (mostly classification).



Perceptual sensors in the loop



Perceptual sensors in the loop Forecasting in MPC



Perceptual sensors in the loop Forecasting in MPC

How to incorporate uncertain predictive perception in trustable, scalable, predictable autonomy?

managing uncertainty and learning in optimal control

Gxt

ux

e

changing costs

uncertain dynamics.

safety constraints

minimize Ee

hPTt=1 Ct(xt, ut)

i

s.t. xt+1 = ft(xt, ut, et)ut = ⇡t(⌧t)xt 2 X , ut 2 U

<latexit sha1_base64="QggUaKjlGoVnnNY5ZQ07WPvBgtU=">AAADUXicbVJNb9NAEF0nfJQUaApHLisiqlREUVyQ4FKpakFw6KGIpo2UNdZ6M05WXa/N7hglWP6VnJD4I9xYJ64gCWNZGr8383bGb6NMSYuDwU+v0bxz9979nQet3YePHu+1959c2TQ3AoYiVakZRdyCkhqGKFHBKDPAk0jBdXRzVvHX38BYmepLXGQQJHyqZSwFRweF7V8sgqnUBTeGL8pCqbLFkiidF4nUMpHfoaQHlCUcZ1FUvC9DYApiHDObJ2GBx3755ZKehdidh9jLQzxkRk5nGDBWy9g+9iuJuat+6Zf0mMZ/q3tQdbDWAXUfjmKZdCRDnt/irpAyqVcTCK6KUcl6PVY/y7Y1eujGBz2p1wnbnUF/sAy6nfh10iF1XIT7XsAmqcgT0CgUt3bsDzIMnBxKocCJ5xYyLm74FMYu1TwBGxRLG0r6wiETGqfGvRrpEv23o+CJtYskcpXVvHaTq8D/ceMc47dBIXWWI2ixOijOFcWUVp7SiTQgUC1cwoWRblYqZtxwgc75tVOW2hmItU2Kea6lSCewgSqco+EOtIAJl7raqvgglaKfubb0vHL6lnWyFd19J6cSbe/cXS99uFXsDPE3f/92cnXU91/1jz697pyc1tbskGfkOekSn7whJ+QjuSBDIrxTb+Z99UzjR+N3kzQbq9KGV/c8JWvR3P0DA+oP8Q==</latexit>

zt = g(xt)<latexit sha1_base64="madLnyVNkMOmsccuzY6D2fH57kg=">AAACf3icbVFdaxNBFJ2stdZoa6qPvgwGMYESdqvQ9kEIKtSHPKRo0kKyhLuTm82Q2dll5m5JXPI3fNW/5b/pbBLBJF64cDjnft8oU9KS7/+peI8OHh8+OXpaffb8+ORF7fRl36a5EdgTqUrNXQQWldTYI0kK7zKDkEQKb6PZ51K/vUdjZaq/0yLDMIFYy4kUQI4a/hgR/8jjxnxEzVGt7rf8lfF9EGxAnW2sOzqthMNxKvIENQkF1g4CP6OwAENSKFxWh7nFDMQMYhw4qCFBGxaroZf8rWPGfJIa55r4iv03o4DE2kUSucgEaGp3tZL8nzbIaXIZFlJnOaEW60aTXHFKeXkBPpYGBamFAyCMdLNyMQUDgtydtrqsamcotjYp5rmWIh3jDqtoTgYcaZESkLrcqriWSvFvoC3vyHhKf1VXtpQbX2QsyZ513DN0cy/YPSTYPf8+6J+3gvet85sP9fanzWuO2Gv2hjVYwC5Ym31lXdZjgmXsJ/vFfnsV753X8vx1qFfZ5LxiW+ZdPQC16MQ+</latexit>

perceptual sensing

How to incorporate uncertain predictive perception in trustable, scalable, predictable autonomy?

Actionable Intelligence is the study of how to use past data to enhance the future manipulation of a dynamical system

Actionable Intelligence is the study of how to use past data to enhance the future manipulation of a dynamical system

As soon as a machine learning system is unleashed in feedback with humans, that system is an actionable intelligence system, not a machine learning system.

Actionable Intelligence trustable, scalable, predictable

Mark your calendars!Deadline: November 15th, 2019

6-page papers

Formal call for papers will be out shortly!

Local organizers: Ben Recht, Claire TomlinL4DC.org

L4DC 2020 – Learning for Dynamics and ControlUC Berkeley, June 10-11, 2020

from learning and control to

Documents